I decided to share something that I’ve been working on in the field of dimensionality reduction.
In 1987, Pena and Box published a simple dimensionality reduction technique that, in my opinion, has been vastly under-rated. Assuming some minimal knowledge of time series analysis, I’ll present the beautiful results of this antique.
is observable data, is underlying factors of lower dimension than . We believe that the observed data comes from a linear combination of components of the factor. And the goal is to recover the underlying factors.
We assume .
We assume the components of the factor are independent. An important corollary is that the and matrices of ARMA model are all diagonal.
We assume to eliminate indeterminancy. This does not alter the time series structure of the problem and is simply a scaling of the . The proof follows from SVD.
Theorem 1: Representation of
Theorem 2: Autocovariance of
The autocovariance matrix of has rank r,
Theorem 3: Canoniacl Transformation
Suppose we are given matrix . Then we can transform into by the following procedure.
Define Transformation Matrix
Then applying to , , we have
is our holy grail – the underlying factors plus some noise. is the non-informational dimension of and should be discarded.
Theorem 4: Non-Uniqueness of Representation of
Representation of is not unique.
Preserves the time series structure. Where for any A of . The ARMA matrices are: