Now suppose you multiply a vector p by A and the resulting vector q (=Ap) has the same direction as the old vector. If you picture this in 2-d, what that means is q is a scalar multiple of p. It’s just stretched or shrunk along the same direction. So you didn’t even need to use the matrix A, just the scalar multiple, which we’ll call z. So q = zp, where z is just a number. Then we have 2 equations, q=Ap and q=zp, and that implies Ap=zp. With some algebraic rearranging that gives us (A-Iz)p=0 where I is an identity matrix with k rows/columns, all 0’s except 1’s down the diagonal.

Now suppose you try out every possible vector in k-space and note down all the pairs of vectors p and scalars z for which (A-Iz)p=0. There will be at most k of them, and they can be solved using a standard algorithm. These are the “eigenpairs”– eigenvalues (z’s) and eigenvectors (p’s). My German secretary told me that “eigen” means “essence” or “of itself”. The eigenvectors are the directions in which a matrix A transforms a vector “eigenly”–it doesn’t change the vector’s direction, it only dilates it (by the scalar amount z). In other words it’s the simplest transformation associated with the matrix A.

Principal component analysis starts out by describing a different problem. You have a data matrix M and you want to approximate it with a vector w. If you set it up as a sum of squares minimizing problem you end up with an algebraic expression that involves computing the eigenvectors/values for the matrix M’M. If the columns of M are centered to a zero mean this is the covariance matrix of M.

]]>H = U* D * V, where U is orthogonal (n,n), D is diagonal with decreasing values from left top to bottom right and V is orthogonal (m,m). The matrix terminology is that D are the eigenvalues, V the eigenvectors. "Eigen" means singular and so you sometimes see "eigenvalues" called singular values; the operation is usually called singular value decomposition (svd).

In principal components, you do a svd on the covariance matrix or correlation matrix of H. If H is centered on its columns (as required by Preisendorfer to be an analysis of *variance*), then svd on centered H and svd on the covariance matrix will yield equivalent results. Svd on a centered and scaled version of H will yield equivalent results to svd on a correlation matrix.

If the covariance matrix is Q, then svd on it yields:

Q=U*D* V (not necessarily the same U and V as for H unless H is centered). V is symmetric t(V)=V and t(V)=V^-1. If you project the original data matrix H on the eigenvector matrix V, you get the principal components F:

F= H*V,

Thus, the original data matrix is the product of the principal components and the eigenvectors, H=F*V, since

since F*V=F*t(V)= H*V*t(V)=H.

In this decomposition, the eigenvalues are related to the standard deviations of the left matrix series. THey are reported a little difffferently in princomp and prcomp – with one dividing by n and one dividing by (n-1). It ttook me a whole to figure this out.

The principal component programs in R (and S) have slightly different terminologies. The left matrix is called "scores" in princomp and "x" in prcomp; the right matrix is called "loadings" in princomp and "rotation" in prcomp and the weightings are called "sdev".

I first used princomp, but Im using prcomp now, since it ties directly to svd decompositions, while princomp needs to make a slight weight adjustment to recover corresponding eigenvalues.

]]>The fun bit is thinking through what the space is representing in any particular case.

Don’t be scared of eigens, they’re really pretty. ]]>