Vincent Li

I decided to share something that I’ve been working on in the field of dimensionality reduction.

In 1987, Pena and Box published a simple dimensionality reduction technique that, in my opinion, has been vastly under-rated. Assuming some minimal knowledge of time series analysis, I’ll present the beautiful results of this antique.

is observable data, is underlying factors of lower dimension than . We believe that the observed data comes from a linear combination of components of the factor. And the goal is to recover the underlying factors.

We assume .

We assume the components of the factor are independent. An important corollary is that the and matrices of ARMA model are all diagonal.

We assume to eliminate indeterminancy. This does not alter the time series structure of the problem and is simply a scaling of the . The proof follows from SVD.

Theorem 1: Representation of

Theorem 2: Autocovariance of

The autocovariance matrix of has rank r,

Theorem 3: Canoniacl Transformation

Suppose we are given matrix . Then we can transform into by the following procedure.

Define Transformation Matrix

Then applying to , , we have

is our holy grail – the underlying factors plus some noise. is the non-informational dimension of and should be discarded.

Theorem 4: Non-Uniqueness of Representation of

Representation of is not unique.

Preserves the time series structure. Where for any A of . The ARMA matrices are: