Yesterday, we discussed the remarkable decomposition of the AVHRR reconstruction into 3 PCs, initially observed by Roman and discussed yesterday by Jeff Id. I thought that it would be interesting to see what happened with the PC3 in the AWS reconstruction (and in the process, do some comparisons of the RegEM emulation (file-compatible between Jeff Id and myself) to the archived Steig AWS reconstruction. Again, some interesting results.

First, we again see a loading onto the first three PCs (high values for first three eigenvalues), but the loading is not nearly as concentrated as the AVHRR case. Right now I presume that this is due to the huge number of AVHRR columns (5509 versus 63 AWS) relative to 42 surface columns.

svd.steig=svd(recon_aws)

par(mar=c(3,3,2,1))

barplot(svd.steig$d,col=”grey80″,ylim=c(0,360));box()

title(“Recon AWS Eigenvalues”)

I then plotted out the first 3 PCs fully expecting to see something like the AVHRR PC3 in which the pre-1980 amplitude was negligible (which I believe to be due to the decision to infill with zeros though I’m not 100% sure of this.) To my surprise, there was no such pattern as shown below.

plot.ts(ts(svd.steig$u[,1:3],start=c(1957,1),freq=12),main=”Recon AWS PC1-3″)

Being cautious about these things, I plotted up the PC4-6 for good measure. (After all, didn’t Mann rely on a PC4 somewhere?) Here they are:

plot.ts(ts(svd.steig$u[,4:6],start=c(1957,1),freq=12),main=”Recon AWS PC1-3″)

As with the AVHRR reconstruction PC3, the AWS PC4-6 – and each has a noticeable eigenvalue – have the same pattern of extreme attenuation in the portion before measurements actually started. In my opinion, these patterns must surely relate to the initial decision to infill missing values with zero.

Schneider said in a comment on another thread that infilling with zero wouldn’t matter, because of the properties of the EM algorithm. Perhaps so, but based on what we’re seeing, it sure doesn’t look like this is necessarily the case for the RegEM PTTLS version used by Steig. So I’d definitely like to understand the basis of Schneider’s assertion better than I do right now and see if there’s something in the Steig PTTLS variation that upsets this property. Now we can’t directly check the Steig variation as he’s refused to disclose his code as used, but I can check with Schneider’s methodology and I’ll try to get to that in a day or two.

### Like this:

Like Loading...

*Related*

## 53 Comments

Steve:

When I use PCA for data exploration/reduction, I usually look at the variables that weight most heavily on the factors after they have been rotated, particularly where the eigenvalues are close to 1.0 to make sure I am not ignoring some important issue. Given the sharp contrasting pattern in PC4, 5 and 6 are they loading on a particular set of readings or stations. I am assuming this is how BCPs became so visible? Am I being too simplistic?

No, it is. Just take take the “infilling part” of data (upto 1980). Using Jeffs’ data, I get six largest eigenvalues (in order): 197.3981 32.8899 16.3660 0.0000 0.0000 0.0000. I hate to repeat myself, but this phenomenon has nothing to do with initialization with zeros, it is a direct consequence of the

truncatedTLS method (with regpar=3) employed. In order to diagnose this in action, add the following to your RegEM R-code:Re: Jean S (#2),

Of course. Arrgh. I had a brain cramp and forgot that he infilled into AWS but didn;t infill into the AVHRR.

Re: Jean S (#2),

That’s a good point about the infilling pre-1980. I had a brain cramp and didn’t take into account the non-infilling in AVHRR. But I’m still puzzled by something here. If we do a SVD and the first part of Jeff’s Xafter data, that should go down to 3 PCs, but the 4th eigenvalue has a non-negligible value. What am I missing here?

Re: Steve McIntyre (#8),

Took me awhile to see what’s wrong (I’m mainly doing this in Matlab as my R skills are still rather poor): seems to me that you are performing a Mannian PCA😉 (SVD is good for PCA only if vectors are zero mean).

Re: Jean S (#12),

Interestingly, I’m noticing that right now for a completely unrelated application. Hmmm…

Mark

I suspect that the explanation here may be more mundane than the initial infill value choice.

You can view the entire AWS reconstruction as the values generated by the three PCs plus the difference between the observed data and the values as predicted by the 3 PC reconstruction.

Think of the all of the PCs as raw material from which to rebuild that data set (since that is what you can do by using the complete set of PCs). The part prior to 1980 can be expressed exactly as the linear combination of the 3 originals (which are pretty much PC1 to PC3 above and which explain the major part of the variability of the data set). What is left to explain is the difference at those places where observations exist (all after 1980). It would be quite reasonable that the remaining PCs would have very little contribution to make to the earlier period and would thus be virtually zero in that time period. Note that some of the strange behaviour is exhibited at the end of the sequences as well as at the beginning. My conclusion would be that this is an artefact of splicing the temperatures to the original construction.

Having gone through that thought process, one wonders if PC3 in the other reconstruction is also some sort of artefact of the AVHRR data set and why the authors chose not to leave the original data in that reconstruction.

Here’s a plot of PC4 with the actual monthly temperature frequencies – i.e., the # of actual monthly observations available for that point (divided by 500 for scaling) – overlaid on top:

.

.

I’ve had a play with the other ones and they look to have geographical significance.

.

My PC4 looks a little different than yours . . . are you using the original or corrected recon?

Isn’t it to be expected that all PCs above 3 will have almost zero weight before 1980, since pre 1980 there is no actual data and we know the reconstruction is exactly described by just 3 PCs?

It looks as if AWS PC4 relates mainly to Harry, which has a far higher loading on PC4 than any other AWS. Likewise, PC 63 has a much higher loading on Casey Airstip than on any other AWS.

There has been some request to help “layman readers” to understand what’s going on, here’s my attempt:

suppose you have 2 stations (your input) and your output is 3 dimensional (AWS/AVHRR “infilling part”). For the sake of illustration, assume that our TTLS rank parameter is one. Now fix a time instant. Every time instant RegEM obtaines the imputed values by estimating a matrix (how, it is immaterial here) B (3×2 in our case) of rank 1. As an example consider matrix

B=[1 1; 2 2; 3 3] (columns and rows are linear scalings of each other; hence it is rank one). Now your imputed values are (x and y denote the station data at given time instant) B*[x;y]=[x+y; 2(x+y); 3(x+y)], that is, just a linear scaling of a single value (x+y). Now the same happens for each time instant, and therefore, when you “summarize” your data (PCA) it is enough to get the value (x+y) for each time instant (your PC1) in order to obtained the full output (just scale your PC1 by 1, 2, or 3).

Re: Jean S (#7),

TTLS? I imagine it’s dah dah Least Squares?

I’ll add it to our Glossary:

http://climateaudit101.wikispot.org/Glossary_of_Acronyms

TIA, and thanks, Jean, for the tutorial.

Cheers — Pete Tillman

Re: Peter D. Tillman (#11),

TTLS=

truncatedtotal least squares (for an explenation, see here)This thread already demonstrates something about CA that I appreciate. When Steve encounters something he doesn’t expect he says so, and he is willing to be corrected on other matters rather than just digging in his heels and yelling “Is so!” in a more and more shrill manner.

This is what is so good about this site. A collaborative exercise with people being helpful; nobody worried about making a mistake or being called an idiot. This is how real and constructive progress is made; and others being educated at the same time.

What Phillip Bratby said.

I understand enough of the math to

barelymake up for the fact I’m climatically challenged, and find the collaborative energy – and good manners! – here extraordinarily refreshing. I really can’t say that often enough.#13. touche. That’s pretty funny. Me of all people on this issue. I’ve definitely fallen into bad habits from spending too much time on Team algorithms.

I’ll have to re-do this.

If some readers prefer Mathematica over R – Mathematica is much nicer – a related discussion of the principal components and a notebook is available at

http://motls.blogspot.com/2009/02/eigenvalues-in-antarctica.html

Dear Steve, please correct me if I am wrong, but when I look at your maps of PC4-PC6 – I shouldn’t have discarded them in my Mathematica notebook from the beginning – things start to make sense to me.

Right after 1957, there was arguably no satellite-based continuous mapping of the temperatures, was there? So they only had a few terrestrial stations, namely 3 or 4. They only measured the temperatures at 3 or 4 places.

By looking at your PC4-6, they added three new stations in 1980, 1984, 1985, or something like that (you see that the zeroes in the time series in PC6,5,4 are slowly turned on into nonzero values). This whole dataset comes from a few time series only, and they used other data to extrapolate these isolated stations to all of Antarctica.

At any rate, all of this sounds confusing to me. I thought that the bulk of this dataset were actually extracted from some continuous maps obtained by satellites but my argument suggests it’s not the case. Do I misunderstand something?

Steve:Lubos, there are two recons, one using AVHRR satellite and one using AWS stations. The AVHRR data is presently unavailable other than the fitted model so there’s not much that can be said about this other than the decomposition into 3 PCs. The AWS data is available. I need to re-do these PCs for pre-1980 only to deal with Jean S’ observations.See Message #1

I am puzzled by the setting to zero idea. But then again, I am very much a layman at this kind of analysis, so please reply as if to a complete beginner. Imagine a data set spanning 40 years, which in reality decreases linearly. Now suppose we have only the last 20 years. Surely, if we set the first twenty years to the mean of the last twenty, we will completely destroy the negative slope of the curve? Is that the sort of thing Steig is doing, or am I completely going down the wrong track?

Cheers, Ron.

Steve:

This may get me banned:

You wrote in the text above:

I believe the correct way to express what you intended is:

I believe a grammar professor would take the first sentence to mean the pattern shown below does not exist. Hmmmmm. Adding the ‘is’ indicates the pattern below to be the calculated pattern which does not match a previous reference.

Just enough change to matter.

Thanks,

Geo. M

@RonH,

I’m confused too, so I’ll take a stab at it.

I believe you are conflating two issues together. The first issue is that most of the data was collected in the last few years, with only a few stations going back further in time. The second is the infilling of missing data into the records for the stations.

We are not talking about taking a 40 year period with only 20 years of data and splicing a mean on the front. Steig et al took existing data from several datasets of different lengths. They then determined a trend for a bunch of temperature stations and mashed them together statistically. As I understand it, they then compared those trends with a select few stations which had a much longer period of time of data, and used that to produce what they thought would be a prediction of what all of the stations would have done if they had been there going back in time.

As part of this, I think they used the satellite AVHRR data to increase the density of data. That is, since the stations were so far apart, they filled in between them with the satellite data prior to predicting the earlier temperatures.

But they ran into the problem of missing data within the station measurement period.

Imagine twenty years of temperature data, with a single average temp for each date. If you averaged the temp over the 20-year period for each date, you would get an average with a standard deviation (example: 6 +- 0.8 degrees C).

Now let’s say you have no readings for some dates, and the missing data are randomly distributed. You might say, “OK, average all the temperatures I have for that date, and substitute that for the missing data.” However, by definition, the standard deviation for that date would be 0. That might give the appearance of more confidence in the date with the infilled (missing) data than the ones with actual readings.

What we are seeing in these discussions is an attempt to recreate what Steig et al did, then see how that affects the confidence in their results. The jury is still out on that, but some of the problems already revealed are not good.

There are problems with the late data as well, 2002-6 or so. Would its analysis add to the diagnosis?

From what I see, Steve is trying first to re-create Steig et al (in R), using the same data they used. Once that is understood, then another analysis could be done fixing the errant stations (Harry, for example) and bringing the data up to date. If the original Steig et al statistical techniques turn out to be invalid, then it might be useful to try and fix them. I think the work that the two Jeffs, RomanM, Lucia, and Steve are doing may produce something with better validity, but it is also possible that no amount of massaging of the data will produce a high-confidence result.

I especially liked the way the two Jeffs broke the data down by gridcells before doing statistics. I was horrified when I saw that RegEM doesn’t know anything about location. It is obvious with just a glance at the temperature data that it is strongly dependant on location.

Re: Scott Gibson (#24),

I like your attempt to look beyond the mathematical details – which of course are fascinating and also important. As I understand it, the problem for the climate models is that Antarctica seems to be out of phase with the rest of the world. There was warming in the period before 1970, when the global temperature decreased, and slight cooling after 1970 when there was global warming. This is not a pattern expected for the CO2 greenhouse effect. Steig et al were confusing this issue by calculating a trend from “the 1957 Geophysical Year”. So I wish more attention would be paid to linear trends after 1970.

Dear Steve, I am not sure whether you answered my concerns. AVHRR was flying since 1978, as most satellites

http://noaasis.noaa.gov/NOAASIS/ml/avhrr.html

so it should be obvious that all data before 1978 – like those right after 1957 – are from extrapolation of terrestrial stations, and there were only a few. So the analysis with 3 nonzero PCs (plus one additive shift, absolute term in the fit) indicates that there were 4 stations that were used to extract the whole maps, and one can never do better if there were only 4 functions of time available between 1957 and 1978, can he?

Do you agree with what I write or not?

I am saying that what you call “filling zeros” for PCs other than the 3-4 first PCs was necessary before 1978 simply because there were no satellites, and a few stations translates to a few PCs for whatever linear prescription you use for the temperatures in all of Antarctica. Does it make sense?

If you want to use satellites with “all PCs” accuracy, you can only work since 1978.

Re: Luboš Motl (#25),

No and yes. There were more than 4 stations used for extapolation, but all information is contained in three PCs (if you consider anomalies only). To repeat, the “extapolated values” are obtained by y(t)=B(t)*x(t), where y(t) is k-dimensional vector of “extrapolated values” at time instant t, B(t) is a kxl

matrix of rank 3, and x(t) is the l-dimensional vector of terrestial station temperatures available at time t.I reconstructed the original satellite data from the post 1982 half of the satellite PC’s and compared it to the original. After that I regridded the data according to Jeff C’s method and recomputed the trend and antarctic distribution of trend.

http://noconsensus.wordpress.com/2009/02/20/satellite-temperature-trend-regridded/#more-2420

Dear Jean S, I understand the linear algebra but could you please direct me to some links that show which (more than four) stations were available right after 1957? It’s plausible but I would find it really surprising if a small number of stations were truncated into an even smaller number.

Re: Luboš Motl (#28),

Lucas, see eg GISS database:

http://data.giss.nasa.gov/gistemp/station_data/

Click close to South Pole, or the Antarctica Peninsula, and you’ll find the station names at least. I didn’t check how good the individual station records are there.

Re: Luboš Motl (#28),

You should check Steve’s collection of the raw data. Here’s the count of available surface stations as used in Steig et al:

Thanks, Jean, that’s exactly what I wanted, in the best form. Well, that’s really bizarre to truncate to 3/4 PCs in this case.

Lubos and others —

I found Steve’s discussion of Toeplitz matrices last March to be very helpful in understanding PCs, eigenvectors, etc, in a climate context. These began on 3/22/08 with thread #2898, “Toeplitz Matrices and the Stahle Tree Ring Network”, and continued in the following days with threads 2902, 2924, and 2952. The last one has spectacular graphs. (There may be some later ones I’m overlooking).

In that case (Stahle’s Tex-Mex treering network), the PC’s were derived by assuming that correlations between “stations” were driven by a declining function of their pairwise distance (hence, the “Toeplitz” part), and the resulting eigenvectors have spatial interpretations not unlike (bivariate) polynomial curvefitting that would allow unobserved points to be “infilled” by a plausible interpolation. Ending up with 3 eigenvectors would just mean that location does not have much explanatory power beyond common level, plus more or less linear tilt in two orthogonal planes. Approximately quadratic terms creating curvature would require including eigenvectors 4-6.

Somehow, Steig and RegEM do something similar, but don’t take distance into account in computing covariances. Instead, they just use the empirical covariances, no matter what the distances involved are. One would expect that close or otherwise similar (eg coastal) stations would be relatively correlated, so that the resulting eigenvectors would indicate something like general level, and then regional factors like peninsula, other coastal, W. Ant, and E interior. If only 3 or 4 eigenvectors are used, that does not mean that only 3 or 4 stations were used, but just that all 42 or whatever stations were condensed into 3 or 4 summary varaibles that capture the great majority of the variation.

One potential danger I see in using unconstrained pairwise covariances as in Steig and RegEM is that just by chance, unrelated stations may have a positive or negative covariance. A distance-driven covariance matrix gets around this.

We know that Steig’s AWS recon starts with 3 PC’s constructed from the manned station data and evaluated at the AWS locations, and then substitutes actual AWS data in where available. So my guess would be that the first 3 derived PC’s reflect the manned station information, while Steve’s graph above of AWS PC’s 4-6 are picking up various composites of the AWS data, in declining order of importance.

I think, anyway. 😉

One thing I find really strange about Steig’s AWS recon is that in the end he averages it only over the infilled AWS sites in each region, and not over all the stations in each region. The AWS sites should be supplementing and in-filling the manned stations, not replacing them. And even then, there should be some kind of geographic weighting, so that regions like the Peninsula and “Eastern” Ross Ice shelf with high concentrations of both types of stations shouldn’t be stealing the show.

(By “Eastern”, I mean toward the right and therefore the Eastern hemisphere when Greenwich is oriented toward the top of the map. In fact, this is locally westward.)

Re: Hu McCulloch (#33),

I had to think about this for a bit, but I agree. The farther apart two stations are, clearly the lower their cross-correlations should be. The question, however, would be how to scale this. Also, would it make sense to do it proportional to the inverse of the distance squared, as with propagation losses?

Mark

Re: Mark T (#36), Probably better than not having anything, but not without difficulty, either. Two stations that are geographically close may show

lesscoupling than stations that are geographically far..

West Antarctica and the bay area are a good example. A station just to the west of the ridge seems to be more tightly coupled to other West Antarctic stations than it is to one just on the east side. So topology matters.

Re: Ryan O (#37), Yes, but the point would be that you weight the correlation (covariance, sorry, I’m a signal processing guy so I use correlation more often) between the geographically distant lower because they are physically disjoint, even if their general correlation may be high (which could be misleading). You’d end up with a Toeplitz weighting for the covariance matrix (there, I used it), but descending values as you move away from the diagonal.

I don’t disagree.

Mark

Re: Mark T (#39), I understand what you’re saying, but I don’t see how that would get around the issue where there are many cases of stations that are geographically far having more

real, physicalcoupling than stations that are geographically closer. You end up weighting the more physical correlation lower and the less physical correlation higher..

I’d think you might want to do it both ways – with and without distance weighting – and analyze the differences for physical causes.

Re: Hu McCulloch (#33),

My understanding is that RegEM treats all data series the same, so that the 3 PCs are constructed from the AWS data as well as from the manned station data. I know that this conflicts with Steig’s statement in the Methods section of his paper that the [manned] weather stations provide predictor data and that the AWS and satellites provide predictand data, but unless he has modified RegEM in some fundamental way I don’t see how this statement can be accurate.

I also had been thinking it was odd that Steig’s AWS reconstruction did not include reconstructed temperature anomaly records for the manned stations, for which infilled data would have been generated by RegEM. Perhaps it is related to his stance that the manned stations role is to provide predictor data. There is also less infilled reconstruction for the manned stations, which in most cases have much less missing data than the AWS.

Hu, I’ve experimented a little with my Toeplitz scripts on this. Unfortunately, my memory doesn’t seem to be as sharp as it once was and I quickly forget things. Nice to have this online notebook.

The Antarctic is interesting because it’s sort of circular, but not symmetric. The Chladni functions would be somewhere between circular and square. It’s sure not

obviousthat the first two Steig eigenvectors are physical, as opposed to what one would expect from spatially autocorrelated data with usual decorrelation.The wave-3 pattern is interesting mathematically because you don’t see this in the standard Chladni circular patterns, tho I’ve seen a figure with an asymmetric mode. But the Antarctic isn’t a Chladni plate either. Maybe wave-3 can arise as a Chladni pattern by specifying the situation a little differently.

#28 Lubos

In a similar thread at RC, I saw the manned and AWS station data for Steig’s study referenced at the site below. Steve or others here may have knowledge of where the data may reside in a more usable form.

thanks

Edward

http://www.antarctica.ac.uk/met/READER/data.html

Btw, just by inspection, PCs 4-6 in this case exhibit an obvious stationarity issue. If the PCs are expected to explain some truly physical reality, these three need some explaining as they will definitely change the results in the latter half of the reconstruction. I would imagine, however, if that the decomposition only went back to 1980, their weighting would be different.

This is partially in reply to Ryan O’s comments and questions in the other thread, btw. The plots just happen to be here so I commented here.

Mark

That might be a way around it, and as I noted, I don’t disagree that topology matters. The problem with distant locations that do have a physical coupling is that said coupling will necessarily spill over into close locations that don’t have a physical coupling. How do you address this? Unfortunately, I mostly deal with situations in which the physical relationships are known to behave certain rules, often based on distance (well, the square of the distance). There are relevant papers in the radar/communication world in which various atmospheric phenomena actually change this, however, such as ducting. Maybe there’s insight from such work?

Mark

Re: Mark T (#42), and Ryan I am not sure I understand where the spatial thing comes in for the current data set and resulting PCs. If, for example, the Antarctica Peninsula had a discrete climate trend pattern then shouldn’t that show up as all or a significant feature of one of the PCs – even if a few of the stations were odd-balls – inspection of a rotated factor would show that this bunch of stations all weighted heavily on one of the factors. Don’t you have to infer the relationship with some other non-measured variable at least initially by inspection. I am probably approaching this too simplistically.

Re: bernie (#45), You’d think so, that is, that there’d be some physical indication in the PCs. I don’t know how to specifically address these issues, btw. I haven’t given them much thought other than the cursory stuff mentioned here.

Mark

Re: bernie (#45), The problem is that as data sets get sparser or shorter (i.e., more missing information), the chances of spurious correlations increase. The idea would be to provide an externally imposed weighting system to discourage spurious correlations between stations that are unlikely to be causally related.

Re: Ryan O (#50), In that case aren’t there rules to determine whether you can actually even use PCA? Normally I would never trust a PCA if the number of cases were not at least twice the number of variables. In this instance if there are 600 monthly records there should be around 1200 real cases — I am not sure of the impact of infilling of values. But again you guys are the pros on this. I am waiting to retire so that I can catch up on all of this again.

The article justifies the truncation into 3 PCs like this:

The monthly anomalies are efficiently characterized by a small number of spatial weighting patterns and corresponding time series (principal components) that describe the varying contribution of each pattern. The results are reproducible using single-season, annual average split-decade length subsets of the data. The first three principal components are statistically separable and can be meaningfully related to important dynamical features of high-latitude Southern Hemisphere atmospheric circulation, as defined independently by extrapolar instrumental data. The first principal component is significantly correlated with the SAM index (the first principal component of sea-level-pressure or 500-hPa geopotential heights for 20u S–90u S), and the second principal component reflects the zonal wave-3 pattern, which contributes to the Antarctic dipole pattern of sea-ice anomalies in the Ross Sea and Weddell Sea sectors4,8. The first two principal components of TIR alone explain [more than] 50% of the monthly and annual temperature variabilities.

…

I am unimpressed by these things. There’ no reason for such truncations and it is actually very likely that the other PCs that were neglected will be relevant at long timescales. Only in the short run, when the climate is in a certain circulation mode etc., the temperatures are linked in the way they suggest with the 3 spatial patterns.

Re: Luboš Motl (#43),

They have allocated 60 stations for the whole Earth, 2 should be enough for Antarctica.

Re: Luboš Motl (#43),

Luboš, the article

saysthese things but provides no evidence supporting any of these claims. The PC3 as others have observed is astonishingly unphysical and totally at odds with the claim in Nature.Mark:

Are you thinking about signal processing for backscatter, for instance?

Re: David Jay (#44), ? Which comment are you replying to? The atmospheric one? If so, not particularly, though that is a relevant atmospheric phenomena, but a bit different than what I was getting at. I was thinking in particular of the ducting phenomena in which signals bend around in the atmosphere (which is different than ionosphere skipping). The inverse R^2 law doesn’t directly apply in such cases, when calculating the path loss from one point to another.

Mark

It probably effects confidence intervals. I’m probably sure this is a sticky wicket with many around here, too, hehe.

Mark

Is it possible to do a sensitivity analysis by artificially changing what you think are the major influences on the PCs, to see if the PCs change in predictable response?

Also, re the use of one site to estimate its effect on another, there are textbooks on methods for handling grades in ore deposits. The enclosure for points to be used in a calculation need not be assumed circular and the weighting need not be assumed inverse square. Assumption can sometimes be improved by derivation. Textural features like the dominant direction of terrain features can have an effect, so a search ellipse can be more appropriate than a circle. That old saw that correlation does not have to be causation enters the arguments.