but I’m up for another one. Here goes

Dear Blog:

First: can you repeat where the MBH raw data is archived?

Second:

I have a fair grasp of linear algebra, but very limited knowledge of PCA, and through that lens,

I have this idealized- over simplified overview of the reconstruction problem. Help me with any misunderstandings.

Suppose we have three proxys (for simplicity) P1,P2,P3.

Suppose the raw data is a table with 5 column headings, with T being temperature.

[date, P1 ,P2 ,P3 ,T]

The different proxy columns may have different units, Tree ring width might be one example.

Sorting the table by date, most recent first, the temperatures are missing for rows . (the reconstruction period).

Take the first m rows as the calibration period.

Assume we have temperature values for rows in the calibration period.

Rows between m and rv form the validation period which also have temperature values.

Let be the value of the three proxys for a particular row,

I would think the first job is to find a function f to take an arbitrary to a temperature.

.

I would form a m by n matrix A, with n=3, with columns P1,P2,P3. Then form a m by 1 row column vector T from the corresponding m known temperatures

from the raw data table

and solve in the least square sense. This means finding the weights vector such that

where is the projection of T onto the column space of A.

With this function f, generated using rows with dates in the calibration period, to examine length of the error vector

based on rows in the validation date range, to the that based on rows in the calibration date range.

Or we can form the correlation coefficient

over the validation date range to check that and are correlated when we move outside the calibration period

to include validation period dates as well.

Using function f, we can get projected temperatures for any date in the raw table.

From here draw the graphs of date by f(p1,p2,p3)=T for dates into the reconstruction period as well.

I guess one can use moving averages etc when graphing but thats a detail.

Reconstruction completed.

To get x, one can use either the SVD factorization of or use the older method of

which factors positive definite to get

. Multiply both sides of by this to get weight vector x.

With the eigenvectors are in the columns of Q and the eigenvalues in the diagonal ,

one might chop some of the smaller eigenvalues in to zero, before forming for more smoothness

but that’s the extent or manual intervention (fiddling) with this approach.

My question is : Where does principal component analysis fit in at all? It seems like its beside the point. How does this help to get function f

that takes us from a proxy row to a temperature ?

One could process A by taking each proxy column, and subtracting its column mean from each element and dividing by the column variance.

to make into a correlation matrix R, and try to find your factor maxrix F such that , D a diagonal matrix

for unassigned variance.

But we introduce the

human element into a selection of how such a factorization is done to account for the all the correlations in R. In recasting R,

(F might be a 3 by 2 matrix in the baby example we cook down the number of proxys to 2) we account for as much of R as we can with a

reduced number of synthesized factors F. But to wnat end?

Please don’t tell me that they take the instead of A, and use the usual least squares method to construct the weight vector x.

This can’t be what is done by MBH, is it?

Lastly, please explain what the 100 and 200 … year windows are.

I would like to get the actual data and fiddle around with it myself.

~

]]>wonder if small hat would be better . lets see.

]]>

Dear Blog:

First: can you repeat where the MBH raw data is archived?

Second:

I have a fair grasp of linear algebra, but very limited knowledge of PCA, and through that lens,

I have this idealized- over simplified overview of the reconstruction problem. Help me with any misunderstandings.

Suppose we have three proxys (for simplicity) P1,P2,P3.

Suppose the raw data is a table with 5 column headings, with T being temperature.

[date, P1 ,P2 ,P3 ,T]

The different proxy columns may have different units, Tree ring width might be one example.

Sorting the table by date, most recent first, the temperatures are missing for rows . (the reconstruction period).

Take the first m rows as the calibration period.

Assume we have temperature values for rows in the calibration period.

Rows between m and rv form the validation period which also have temperature values.

Let be the value of the three proxys for a particular row,

I would think the first job is to find a function f to take an arbitrary to a temperature.

.

I would form a m by n matrix A, with n=3, with columns P1,P2,P3. Then form a m by 1 row column vector T from the corresponding m known temperatures

from the raw data table

and solve in the least square sense. This means finding the weights vector such that

where is the projection of T onto the column space of A.

With this function f, generated using rows with dates in the calibration period, to examine length of the error vector

based on rows in the validation date range, to the that based on rows in the calibration date range.

Or we can form the correlation coefficient

over the validation date range to check that and are correlated when we move outside the calibration period

to include validation period dates as well.

Using function f, we can get projected temperatures for any date in the raw table.

From here draw the graphs of date by f(p1,p2,p3)=T for dates into the reconstruction period as well.

I guess one can use moving averages etc when graphing but thats a detail.

Reconstruction completed.

To get x, one can use either the SVD factorization of or use the older method of

which factors positive definite to get

. Multiply both sides of by this to get weight vector x.

With the eigenvectors are in the columns of Q and the eigenvalues in the diagonal ,

one might chop some of the smaller eigenvalues in to zero, before forming for more smoothness

but that’s the extent or manual intervention (fiddling) with this approach.

My question is : Where does principal component analysis fit in at all? It seems like its beside the point. How does this help to get function f

that takes us from a proxy row to a temperature ?

One could process A by taking each proxy column, and subtracting its column mean from each element and dividing by the column variance.

to make into a correlation matrix R, and try to find your factor maxrix F such that , D a diagonal matrix

for unassigned variance.

But we introduce the

human element into a selection of how such a factorization is done to account for the all the correlations in R. In recasting R,

(F might be a 3 by 2 matrix in the baby example we cook down the number of proxys to 2) we account for as much of R as we can with a

reduced number of synthesized factors F. But to wnat end?

Please don’t tell me that they take the instead of A, and use the usual least squares method to construct the weight vector x.

This can’t be what is done by MBH, is it?

Lastly, please explain what the 100 and 200 … year windows are.

I would like to get the actual data and fiddle around with it myself.

document

\documentclass[12pt]{article}

\usepackage{latexsym}

\usepackage{amsmath}

\begin{document}

Dear Blog: \\

First: can you repeat where the MBH raw data is archived?

Next: \\

I have a fair grasp of linear algebra, but very limited knowledge of PCA, and through that lens,

I have this idealized- over simplified overview of the reconstruction problem. Help me with any misunderstandings.

Suppose we have three proxys (for simplicity) P1,P2,P3.

Suppose the raw data is a table with 5 column headings, with T being temperature.

[date, P1 ,P2 ,P3 ,T]

The different proxy columns may have different units, Tree ring width might be one example.

Sorting the table by date, most recent first, the temperatures are missing for rows $ r > r_v$. (the reconstruction period).

Take the first m rows, $1 \le r