The Climate Audit system upgrade – help needed

Yesterday while at ICCC, I had a chance to talk with Steve McIntyre at length about the state of the Climate Audit server. As many of you know, Climate Audit failed a couple of weeks ago and was down for almost 3 days due to a hard drive failure. Even with RAID1, it didn’t prevent downtime. Since then I’ve been thinking about making an upgrade, and with Steve’s blessing yesterday, I’m going to go ahead with the project while he’s on travel. Some of the points we discussed are:

  • Climate Audit has become an important repository of data, with links to posts and content made from thousands of websites worldwide, to lose it would be a tragedy.
  • Due to the complex nature of the CA setup, subfolders, BBS, LATEX, etc, we are unable to move to wordpress.com hosting (Where WUWT is at) without some significant work and likely loss of some features.
  • The current CA configuration goes down about every 2-5 days due to Apache failing and needing a restart.
  • CA in it’s current form has a lot of legacy code in place that needs to be updated. Doing so on an operational server is always risky.
  • The current CA server configuration was done under quite a bit of duress and pressure. Readers may recall the downtime in 2007 due to the traffic from the Y2K announcement as well as the flooding in the UK which affected the CoLo there. There’s never really been a chance to fine-tune the setup and to test offline to ensure stability. It has always been a rush job to get it back online.

So with those things in mind, Steve and I have decided to upgrade the hardware platform.

I want to establish RAID5 for added security, plus an automated offsite backup, dual processor, and hot swap drives. The current platform has fixed drives (requiring a de-racking to fix drive issues) single core CPU, and RAID1.
Here is the Intel server platform, the SR1530HAHLX (PDF spec sheet) that I want to acquire for the new home of CA. 1U is required due to costs of CoLo doubling if we put in a 2U rack unit. It will have three main drives running RAID5, plus a small 2.5″ fixed drive used for an automated backup. We also want to establish an automated off-site backup.

This platform will tolerate drive failure better and allow for easier servicing one a drive does fail. Plus since I’ll be able to build it in parallel to the existing operational CA server, install latest code, fine tune it, and then move content, finally doing a swap, downtime will be minimal.

I hope to accomplish this before Steve returns from his trip.

Here is where you come in. Steve has authorized me to solicit for donations to make this happen. The server, drives, etc I estimate will take about 1800-$2000 to put together.

About 50 people donating $40 each would be all it would take. Or 20 donating $100 each. Steve and I think this is acheivable.

So if you feel so inclined here is the donate button. I’ll be building the server which is why my donate button is present below instead of Steve’s His being out of the country will complicate matters of funds transfer.
To help build a new Climate Audit

Thank you all for your consideration.

– Anthony Watts

UPDATE: We have reached the GOAL! Thanks to everyone who helped with the many donations! Thanks to everyone’s generosity, I will be immediately able to purchase the Intel Server and begin setting up a parallel system. I spoke with Steve McIntyre today at the end of our joint session at ICCC and he also sends his thanks. – Sincerest thanks, Anthony Watts

PS I will provide updates as they occur.

UPDATE2: We have enough to complete the job with a quality server. It has been pointed out to me that I should leave the donations “open” so that we may acquire a “rainy day fund” for emergencies such as drive failure or costs of software upgrades. Thus if anyone wishes to add to the goal, we certainly won’t turn it away and your donation will ensure continued operation even in times of faults. For example, with extra cash I will be able to buy some hot swap drive spares to have at ready. But please don’t feel obligated in any way. – Anthony

UPDATE3: 3/13 The new server has been ordered, preconfigured with Linux OS and RAID5. ETA is Wednesday 3/18. I have also ordered some spare drives, thanks to everyone’s continued generosity. There will be a fourth internal drive to use as a backup drive, plus we’ll work on getting automated offsite backup as well. So with RAID5, local second drive backup, and offsite backup, the continuance of the CA “lifeforce” has a high probability. – Anthony

Travel Plans

I’m going to be away for most of the next two weeks – I’ll be in New York for a couple of days and then I’m going to Thailand with my wife and daughter to visit one of my sons. I’ll be spotty in internet connection and posting. I’ve asked a couple of regulars to contribute a few threads and to monitor posting while I’m intermittent. Please be especially diligent in not getting involved in food fights or rising to every barb or bait. If a comment breaches blog policies, point it out and ask that an editor snip, but please do not debate it.

I’ll be in New York to present a paper at the Heartland conference. I’d be just as happy to go to a Pew Center conference if I were invited. By presenting a paper at a conference, I do not endorse the views of other contributors nor the sponsor. I will present my own views. And any tailoring of the presentation will be in the direction of challenging the audience rather than trying to reinforce their preconceptions.

I don’t know about this particular conference, but it’s my understanding that the Gavin Schmidts of the world have refused to attend such venues in the past. I don’t understand the purpose of such refusals. I don’t understand what harm could possibly be done by preaching to the heathen. Maybe some of them would be convinced by Gavin.

In part, I’m attending this conference, because, quite frankly, I don’t get many invitations to speak. I’ve only received one invitation to speak to climate scientists at a university (from Judy Curry and Julien Emile-Geay at Georgia Tech) and they were pretty severely criticized for this. (I was invited by an engineering seminar at Ohio State.) Anyway, I’ll guess that henceforth speaking at this conference will feature prominently in all future profiles of me on the internet, but, as I said above, I’d be delighted to speak to the Pew Center or the Sierra Club or the World Wildlife Fund.

After that, I’m going to Thailand for a couple of weeks. The last time I was in Thailand was in 1968, so it’s been a while. On that trip, we went to Cambodia as well – this is while the Viet Nam war was raging next door. I can’t imagine how worried we would be if one of our children was doing something equivalent today. But we were 20, young and foolhardy. We were about the only people at Angkor Wat when we went there. I think that there was one couple from France there as well. The idea that the Killing Fields would take place was, even in retrospect, inconceiveable. Anyway, it should be a lot different now.

I’ll try to a write a brief comment from New York. I’ll be in touch a bit from Bangkok, but will probably be tired and not too interested in writing.

A Peek behind the Curtain

On Feb 26, Garth Paltridge, Albert Arking and Michael Pook’s report on a re-examination of NCEP reanalysis data on upper tropospheric humidity was published online by Theoretical and Applied Climatology. Upper tropospheric humidity is a critical topic in assessing the strength of water vapor feedbacks – knowledge that is essential to understand just how much temperature increase can be expected from doubled CO2. Paltridge and Arking are both senior climate scientists with lengthy and distinguished publication records. They reported:

The National Centers for Environmental Prediction (NCEP) reanalysis data on tropospheric humidity are examined for the period 1973 to 2007. It is accepted that radiosonde-derived humidity data must be treated with great caution, particularly at altitudes above the 500 hPa pressure level. With that caveat, the face-value 35-year trend in zonal-average annual-average specific humidity q is significantly negative at all altitudes above 850 hPa (roughly the top of the convective boundary layer) in the tropics and southern midlatitudes and at altitudes above 600 hPa in the northern midlatitudes. It is significantly positive below 850 hPa in all three zones, as might be expected in a mixed layer with rising temperatures over a moist surface. The results are qualitatively consistent with trends in NCEP atmospheric temperatures (which must also be treated with great caution) that show an increase in the stability of the convective boundary layer as the global temperature has risen over the period. The upper-level negative trends in q are inconsistent with climate-model calculations and are largely (but not completely) inconsistent with satellite data. Water vapor feedback in climate models is positive mainly because of their roughly constant relative humidity (i.e., increasing q) in the mid-to-upper troposphere as the planet warms. Negative trends in q as found in the NCEP data would imply that long-term water vapor feedback is negative—that it would reduce rather than amplify the response of the climate system to external forcing such as that from increasing atmospheric CO2. In this context, it is important to establish what (if any) aspects of the observed trends survive detailed examination of the impact of past changes of radiosonde instrumentation and protocol within the various international networks.

A few days earlier on Feb 20, Dessler and Sherwood published a review article in Science on upper tropospheric humidity. This was accompanied by a podcast and a blog article at Grist here . They reported:

Interestingly, it seems that just about everybody now agrees water vapor provides a robustly strong and positive feedback

They made no mention of the pending Paltridge et al results.

OK, climate scientists disagree. What else is new. However, today you get a little peek behind the curtains, courtesy of Garth Paltridge who sends in the following account of the handling (and rejection) of their article at Journal of Climate. Continue reading

Gavin and the PC Stories

How many principal components to retain? Recent readers of Climate Audit may not realize that this was an absolute battleground issue of MBH and Wahl and Ammann. In one sense, it was never resolved with MBH back in 2003-2005, but that was before the existence of blogs made it possible to focus attention on problems. So I’m quite fascinated to see this play out under slightly different circumstances.

In the Steig et al case, as shown in the figure below, if one PC is used, there is a negative trend and as more PCs are added, the trend appears to converge to zero. (I haven’t explored nuances of this calculation and merely present this graphic. Retaining 3 PCs, by a fortunate coincidence, happens to maximize the trend. [Note: These were done using a technique similar to but not identical to RegEM TTLS – a sort of Reg Truncated SVD, which converges a lot faster. Jeff C has now run RegEM with higher regpar tho not as high as shown here – and has got (these are 10 times my graphic due to decades rather than years) 1 -0.07; 2 +0.15; 3 +0.15; 4 +0.16; 5 +0.12; 6 +0.10; 7 +0.11; 8 +0.11; 9 -0.08; 10 +0.05; 11 -0.02; 12 -0.05) so the max is not precisely at regpar=3 – so there’s no need to raise eyebrows at 3 rather than 4. However, the trend does not “converge” to 0.15, but goes negative at higher regpars.)


Fig 1. Trends by retained PCs. (Using truncated SVD to approximate RegEM – this will be discussed some time.)

As reported elsewhere, the eigenvectors corresponding to these PCs match very closely to what one expect from spatially autocorrelated series on an Antarctic-shaped disk – a phenomenon that was known in the literature a generation ago. The “physical” explanations provided by Steig et al appear to be flights of fancy – “castles in the clouds” was a phrase used by Buell in the 1970s to describe attempts at that time to attribute meaning to eigenvector patterns generated merely by the geometry. But that’s a different issue than PC retention.

Now let’s turning back the clock a little. As many readers know, some time ago, coauthor Bradley credited Mann with “originating new mathematical approaches that were crucial to identifying strong trends”. One of the “new mathematical approaches” was Mannian principal components (though it wasn’t really “principal components”). It had the unique ability of extracting hockey stick shaped series even from random data. It was that effective at identifying hockey sticks. Mannian principal components were criticized by Wegman and even by the NAS panel. However, in what I suppose is a show of solidarity against such intermeddling, third party paleoclimate use of Mann’s PC1 has actually increased since these reports (Hegerl, Juckes, Osborn and Briffa); Mann’s PC1 even occurs in one of the IPCC AR4 spaghetti graphs.

In the case of the critical North American tree ring network, Mann’s powerful data mining method worked differently than with random red noise. It found Graybill bristlecone data and formed the PC1 from this data. In the red noise case, the method aligned and inverted series to get to a HS-shaped PC1; in the NOAMER tree ring case, all it had to do was line up the Graybill bristlecones.

In MBH98, Mann retained 2 PCs for the AD1400 North American network in dispute. In 2003, no one even knew how many PCs Mann retained for the various network/step combinations and Mann refused to say. Trying to guess was complicated by untrue information (e.g. Mann’s “159 series”). In our 2003 Materials Complaint, we asked for a listing of retained PCs and this was provided in the July 2004 Corrigendum Supplementary Information. The only published principle for retaining tree ring PCs was this:

Certain densely sampled regional dendroclimatic data sets have been represented in the network by a smaller number of leading principal components (typically 3–11 depending on the spatial extent and size of the data set). This form of representation ensures a reasonably homogeneous spatial sampling in the multiproxy network (112 indicators back to 1820).

This makes no mention of Preisendorfer’s Rule N – a rule mentioned in connection with temperature principal components. Code archived for the House Committee in 2005 evidences use of Preisendorfer’s Rule N in connection with temperature PCs – but no code evidencing its use in connection with tree ring PCs was provided then. Nor, given the observed retentions (to be discussed below) does it seem possible that this rule was actually used.

In the absence of any reported principle for deciding how many tree ring PCs to retain, for our emulation of MBH98, prior to the Corrigendum SI, we guessed as best we could, using strands of information from here and there, and after July 2004, used the retentions in the Corrigendum SI. For the AD1400 NOAMER network, we had used 2 PCs (the number used in MBH98 for this network/step combination) right from the outset. However, if you used the default settings of a standard principal components algorithm (covariance matrix), the bristlecones got demoted to the PC4. Instead of contributing 38% of the variance, they yielded less than 8% of the variance. (Using a correlation matrix, they only got demoted to the PC2 – something that we reported in passing in MM2005 EE, but which others paid a lot of attention to later in the piece.)

Using the retention schedule of the Corrigendum SI (2 PCs), this meant that two NOAMER ovariance PCs were retained – neither of which was imprinted by the bristlecone. So in the subsequent regression phase, there wasn’t a HS to grab and results were very different than MBH. Mann quickly noticed that the bristlecones were in the PC4 and mentioned this in his Nature reply and in his December 2004 realclimate post trying to preempt our still unpublished 2005 articles (where we specifically report this phenomenon). We cited a realclimate post in MM2005 -EE by the way.

In the regression phase as carried out in MBH98, it didn’t matter whether the bristlecones got in through the PC4 or the PC1, as long as they got in. In his 2nd Nature reply, Mann argued that application of Preisendorfer’s Rule N to the NOAMER AD1400 network entitled him to revise the number of retained PCs – the “right” number of PCs to retain was now said to be 5. The argument originally presented in his 2nd Nature reply became a realclimate post in late 2004.

In a couple of the earliest CA posts,Was Preisendorfer’s Rule N Used in MBH98 Tree Ring Networks? (see also here), I replicated Mann’s calculation for the North American AD1400 network and then tested other network/calculation step combinations to see if the observed PC retention counts could be generated by Rule N applied to that network. It was impossible to replicate observed counts using this alleged rule. Some differences were extreme – it was impossible to see how Rule N could result in 9 retained PCs for the AD1750 Stahle/SWM network and 1 retained PC for the AD1450 Vaganov network.

Not that the “community” had the faintest interest in whether any descriptions of MBH methodology were true or not. However my guess is that some present readers who are scratching their heads at Gavin’s “explanations” of retained PCs in Steig et al will be able to appreciate the absurdity of Mann’s claims to have made a retention schedule using Rule N. I have no idea how it was actually made – but however it was done, it wasn’t done using the procedure that supposedly rationalized going down to the PC4 in the NOAMER network.

Some of Gavin’s new comments seem to be only loosely connected with the actual record. He said:

Schneider et al (2004) looked much more closely at how many eigenmodes can be usefully extracted from the data and how much of the variance they explain. Their answer was 3 or possibly 4. That’s just how it works out.

Does Gavin think that nobody’s going to check? Schneider et al 2004 is online here. Schneider et al reports on a PC analysis of the T_IR data, stating:

Applying PCA to the covariance matrix of monthly TIR anomalies covering the Antarctic continent results in two modes with distinct eigenvalues that meet the separation criteria of North et al. (1982). The leading mode explains 52% of the variance in TIR, while the second mode accounts for 9% of the variance. The first EOF, shown in Fig. 1a as a regression of TIR anomaly data onto the first normalized principal component (TIR -PC1, Fig. 1b) is associated most strongly with the high plateau of East Antarctica. Locally, high correlations in East Antarctica indicate that up to 80% of the variance in TIR can be explained by this first mode, as determined by r2 values. More moderate correlation of the same sign occurs over West Antarctica.

The second EOF (Fig. 1c) is centered on the Ross Ice Shelf and on the Marie Byrd Land region of the continent, where 40-60% of the TIR variance is explained. Most of West Antarctica is of the same sign, but the pattern changes sign over the Ronne-Filchner ice shelf (at 60°W) and most of East Antarctica. Some coastal areas near 120°E have the same sign as West Antarctica. Only a small fraction of the variance in East Antarctic temperatures can be explained by mode 2.

The two EOFs are illustrated in Schneider et al Figure 1 and look virtually identical to the eigenvectors that I plotted (from the PC analysis of the AVHRR data) as shown below (you need to mentally change blue to red for the PC1 – the sign doesn’t “matter”)


From Schneider et al 2004 Figure 1.

“Two modes with distinct eigenvalues that meet the separation criteria of North et al. (1982)” doesn’t mean the same thing to me as

3 or possibly 4. That’s just how it works out

I guess you have to be a to understand this equivalence. Gavin also says in reply to Ryan O:

Since we are interested in the robust features of the spatial correlation, you don’t want to include too many PCs or eigenmodes (each with ever more localised structures) since you will be including features that are very dependent on individual (and possibly suspect) records

Unless, of course, they are bristlecones.

Lest we forget, Wahl and Ammann had their own rationalization for the bristlecones. They argued that if you keep adding PCs until you include the bristlecones, the results “converge” – an interesting argument to keep in mind, given the apparent “convergence” of Steig results to no trend as more PCs are added. Wahl and Ammann:

When two or three PCs are used, the resulting reconstructions (represented by scenario 5d, the pink (1400–1449) and green (1450–1499) curve in Figure 3) are highly similar (supplemental information). As reported below, these reconstructions are functionally equivalent to reconstructions in which the bristlecone/foxtail pine records are directly excluded (cf. pink/blue curve for scenarios 6a/b in Figure 4). When four or five PCs are used, the resulting reconstructions (represented by scenario 5c, within the thick blue range in Figure 3) are virtually indistinguishable (supplemental information) and are very similar to scenario 5b. The convergence of results obtained using four or five PCs, coupled with the closeness of 5c to 5b, indicates that information relevant to the global eigenvector patterns being reconstructed is no longer added by higher-order PCs beyond the level necessary to capture the temporal information structure of the data (four PCs using unstandardized data, or two PCs using standardized data).

The Wahl and Ammann strategy was condemned by Wegman as “having no statistical integrity”. Wegman:

Wahl and Ammann [argue] that if one adds enough principal components back into the proxy, one obtains the hockey stick shape again. This is precisely the point of contention…

A cardinal rule of statistical inference is that the method of analysis must be decided before looking at the data. The rules and strategy of analysis cannot be changed in order to obtain the desired result. Such a strategy carries no statistical integrity and cannot be used as a basis for drawing sound inferential conclusions.

Again this proved not to be an incidental issue. Wahl and Ammann’s summary of the procedure – the one said by Wegman to have “no statistical integrity” siad:

“when the full information in the proxy data is represented by the PC series [i.e. enough to get the bristlecones in], the impact of PC calculation methods on climate reconstruction in the MBH method is extremely small… a slight modification to the original Mann et al. reconstruction is justifiable for the first half of the 15th century (∼+0.05–0.10◦), which leaves entirely unaltered the primary conclusion of Mann et al.”…

It was this conclusion that was adopted by the IPCC as the last word on the entire episode:

The McIntyre and McKitrick 2005a,b criticism [relating to the extraction of the dominant modes of variability present in a network of western North American tree ring chrono-logies, using Principal Components Analysis] may have some theoretical foundation, but Wahl and Amman (2006) also show that the impact on the amplitude of the final reconstruction is very small (~0.05°C).

So however trivial these matters may seem, there’s lots of fairly intricate debate in the background. I, for one, welcome the entry of another data set.

The real answer is, of course, that the you don’t decide whether or not to use bristlecones by Preisendorfer’s Rule N. (See http://www.climateaudit.org/?p=296 or http://www.climateaudit.org/?p=2844 .) But it’s sort of fun seeing if they can keep their stories straight.

When, after the agreeable fatigues of solicitation, Mrs Millamant …

While I’m often described as a “statistician”, as that’s a word that people understand (or think that they understand), I think of what I do more as “data analysis”. Academic statisticians are interested in different sorts of things than interest me. I have some styles, habits and practices for approaching new data sets, but they probably derive more from osmosis from geologists looking for anomalies on geophysical maps than anything that you’d learn in a statistics course (Hans Erren understands this exactly).

I ran across two articles by Jan de Leeuw (1998, 1994), a prominent applied statistician in the social sciences, towards the end of his career, philosophizing about what applied statisticians actually did or (could do) and was struck by the aptness of many of his observations for present-day climate science.

de Leeuw has a nice turn of phrase, so the articles are fairly lively. The abstract to de Leeuw 1994 read

When, after the agreeable fatigues of solicitation, Mrs Millamant set out a long bill of conditions subject to which she might by degrees dwindle into a wife, Mirabell offered in return the condition that he might not thereby be beyond measure enlarged into a husband. With age and experience in research come the twin dangers of dwindling into a philosopher of science while being enlarged into a dotard.

As someone who’s past his best-before date, it was impossible for me to resist reading the article. Here’s another zinging epithet from de Leeuw:

Science is presumably cumulative. This means that we all stand, to use Newton’s beautiful phrase, “on the shoulders of giants”. It also means that we… stand on top of a lot of miscellaneous stuff put together by thousands of midgets.

Another nice epithet:

It is a truism that statistics cannot establish causality of relationship. It is quite incredible, by the way, that most people who quote this result are engaged on the very same page in trying to accomplish what they have just declared to be impossible.

de Leeuw’s reflections tie statistics to “data analysis” and de Leeuw 1988 ends:

Statistics is data analysis. This does not mean that we want to replace the academic discipline “statistics” by the academic discipline “data analysis”, it merely means that statistics has always been data analysis.

Substantively, de Leeuw takes special note of the following form of data analysis:

We usually do not want a small and uninteresting perturbation of our data to have a large effect on the results of our technique.

and

Classical statistics has always studied stability by using standard errors or confidence intervals. Gifi thinks this is much too narrow and other forms of stability are important as well.

This latter point is highly on point for the sort of analysis that we’ve explored here. This (of course) is a staple CA methodology, though, of course, I look for such perturbations quite differently than the Team. My typical sort of observation is that a typical Team reconstruction is not “robust” to bristlecones, Yamal vs the Polar Urals Update, etc. These issues seem so humdrum that it’s hard to imagine that intelligent people cannot instantly grasp the point. However, in reply, we see longwinded expositions of that their results are “robust” to something that’s always somewhat different than the issue in question – that they are “robust” to whether MBH proxies are weighted uniformly or with MBH98 weights (who cares?). Or that if you have both bristlecones and Gaspe, they are “robust” to take one away sensitivity. Or that they are robust to Tiljander (if bristlecones are in) or robust to dendro (if upside-down Tiljander are in.)

I’ve paid negligible attention here to the calculation of “standard errors or confidence intervals” other than to deconstruct and demystify wildly over-confident Team claims. Some CA readers (though not me) routinely demand error bars for calculations where no one really knows how to calculate them. If you don’t know how to calculate error bars, what’s the point? (The flip side is – if you only know how to calculate “minimum” error bar (and the actual error bar is much greater, should you be graphically presenting this “minimum error bar” given the risks that readers may interpret this representation as being a usual error estimate .)

I won’t bother summarizing the articles because they are easy reads (See links in refs below), but here are a few quotes from de Leeuw that caught my eye.

It is not true that people first formulate a model, then collect data, and then perform statistics… The model gets adapted in the process, various modifications are tried and rejected, new parameters are introduced, and so on… the decisions made by the scientist cannot be formalized before the data are collected…

as we have seen many times in the non-scientific world, rules and laws that continue to exist although nobody obeys them and takes them seriously merely lead to hypocrisy…

Another problem, which is related to the first one, is that the models typically proposed by statistics are not very realistic. Especially in multivariate situations, and especially in the social and behavioral sciences, the assumptions typically made in the standard statistical models do not make much sense. Data are usually not even approximately normally distributed, replications are often not independent, regressions are not linear, items are not Rasch, and so on…

the prescriptions of classical statistics easily leads to hypocrisy. The confidence intervals and tests of hypotheses of statistics are valid only if the model is true. Because we know that the model is never true, not even for an idealized population, it is not clear what we must do with this statistical information. This does not mean, by the way, that the models and corresponding techniques are useless. On the contrary, most of the established statistical techniques are also very useful data analysis techniques. Otherwise they would not have survived. We merely must interpret our use of them in a different way than we are used to…

Statistical statements are not about the data that we have observed, but they are about a hypothetical series of replications under exactly identical conditions. It seems to me that such statements are not interesting for many social and behavioral science situations, because the idea of independent replications is irrelevant. Different individuals or societies or historical periods are not replications from some sampling universe, they are essentially unique. There is no need to generalize to a hypothetical population. All we can require in situations like these is an appropriate description or summarization of the data which illustrates the points the scientist wants to make and which documents the choices that have been made…

There used to be a time when statisticians and their cronies, the methodologists, always complained that they were consulted too late. Scientists only arrived at their offices after the data had been collected, i.e. after the damage was done. The implication was that a much better study would have resulted if the statistician had been consulted earlier…

The statistical priesthood on the other hand has two counter arguments. The first one is that you must assume something, otherwise you can do nothing. The appropriate answer here is that this is nonsense. I can compute a mean value and I can draw a straight line through a cloud of points without assuming anything…

Originally, of course, statistics was descriptive… the emphasis on inference and cookbook forms of instant rationalism becomes prominent with the Neyman-Pearson, decision theoretical, and Bayesian schools…

In order to prevent possible misunderstandings, we emphasize that the information that z = 1.96 is a useful descriptive statement, often more useful than the statement that the difference in means is 4.89…

In fact it seems to be the case that the social sciences clearly illustrate that there is nothing inherently cumulative and self-correcting about the development of any one particular science…

Our conclusions so far, on the basis of the above, can be summarized quite briefly. The task of statistics is to describe the results of empirical investigations and experiments in such a way that the investigator can more easily make his predictions and generalizations. Thus it is not the task of statistics to make generalizations. Statistical inference, whatever it is, is not useful for empirical science. Many statistical procedures are very useful, many statistical measures of fit provide convenient scales of comparison, and many statistical models provide interesting theoretical illustrations and gauges with which we can compare our actual data. But generalization, prediction, and control are outside of statistics, and inside the various sciences. Statistics has given us many useful tools and scales, but it has not given us a methodology to make the appropriate inductions…

Suppose we compare correlation matrices in terms of some numerical criterion, which can be the determinant, the eigenvalues, multiple or canonical correlations, or whatever…

Basing techniques on statistical models is an extremely useful heuristic device. Many other useful heuristic devices exist, for example those based on graphs and pictures. The statistical methodology “behind the techniques” that is usually taught to harmless and unsuspecting scientists is a confusing and quite nonsensical collection of rituals. Many of the techniques work, quite beautifully, but this is despite of and certainly independent of this peculiar philosophy of statistics…

This is supposedly “sticking out one’s neck”, which is presumably the macho Popper thing to do. There are various things problematic with the prescription. They are by now tedious to repeat, but here we go anyway. In the first place, if you follow the prescription and your data are any good, your head gets chopped off. In the second place, because people know their head will get chopped off, nobody follows the prescription. They collect data, modify their model, look again, stick their neck out a tiny bit, modify their model again and finally walk around with a proud look on their face and a non-rejected model in their hands, pretending to have followed the Popperian prescription.

PS: I confess that I “adapted” the first epithet a little. de Leeuw’s own phrase had, perhaps, a different nuance:

Science is presumably cumulative. This means that we all stand, to use Newton’s beautiful phrase, “on the shoulders of giants”. It also means, fortunately, that we stand on top of a lot of miscellaneous stuff put together by thousands of midgets.

References:
De Leeuw, J. 1994. Statistics and the sciences. UCLA Statistics Preprints: 1–20. http://repositories.cdlib.org/cgi/viewcontent.cgi?article=1058&context=uclastat
de Leeuw, J. 1988. Models and techniques. Statistica Neerlandica 42: 91-98. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.3338

Steig 2009’s Non-Correction for Serial Correlation

https://i0.wp.com/faculty.washington.edu/steig/nature09data/cover_nature.jpg

In a story featured on the cover of Nature, Eric J. Steig, David P. Schneider, Scott D. Rutherford, Michael E. Mann, Josefino C. Comiso and Drew T. Shindell report to have found “significant warming” that “extends well beyond the Antarctic Peninsula to cover most of West Antarctica, an area of warming much larger than previously reported.” (“Warming of the Antarctic ice-sheet surface since the 1957 International Geophysical Year”, Nature, Jan 22, 2009).

Specifically, they state that

We find that West Antarctica warmed between 1957 and 2006 at a rate of 0.17 ± 0.06°C per decade (95% confidence interval). Thus, the area of warming is much larger than the region of the Antarctic Peninsula. The peninsula warming averages 0.11 ± 0.04°C per decade. We also find significant warming in East Antarctica at 0.10 ± 0.07°C (1957-2006). The continent-wide trend is 0.12 ± 0.07°C per decade. (p. 460)

However, in another recent paper, Santer et al. (2008) point out that “In the case of most atmospheric temperature series, the regression residuals … are not statistically independent…. This persistence reduces the number of statistically independent time samples.” Such a reduction of the effective sample size can cause Ordinary Least Squares (OLS) standard errors and confidence intervals to be too small, and the significance of coefficients to be overstated.

Steig commendably provides a detailed table of the paper’s Thermal Infrared (TIR)-based temperature reconstruction on his University of Washington webpage, that allows these trends to be re-estimated and checked for serial correlation. In fact, there is substantial AR(1) serial correlation, but the authors have made no correction for it, despite their claim to the contrary in their online Supplementary Information (SI).

When the standard errors reported by Steig et al. are corrected for serial correlation, using either the standard method or a simplified method used by Santer et al., the reported trends remain statistically significant, though not at as high a level as reported in the paper.

Continue reading

Buell: ‘Castles in the Clouds’

CA reader hfl, who cited Buell’s documentation of the dependence of principal component patterns on shapes, has sent me a scanned pdf version now available here. It concludes by observing that analyses that fail to consider this phenomenon (and there is ample evidence that Steig et al falls into this category) “may well be scientific level with the observations of children who see castles in the clouds”. [Update: Here is a rendering by a CA reader (now using Lucy Skywalker hockeystick version)]

Here is hfl’s summary:

Forgive me if this has been discussed in past threads on PCA (although I couldn’t find it in a quick search of the site), but it’s worth noting that the issue of principal component pattern dependence on domain shape is (or was) well known within the atmospheric sciences community. It was first documented by C. Eugene Buell in a 1975 paper published in the Proceedings of the Fourth Conference on Probability and Statistics in Atmospheric Sciences. Buell’s work focused on square/rectangular domains, particularly because the latter approximated the shape of the conterminous U.S. This was followed by a second paper at the Sixth P&S Conference in 1979 which states: “When a region with well defined boundaries is concerned, the EOF’s computed over this region are expected to be very strongly influenced by the geometrical shape of the region and to a large extent independent of where the region is located. As a consequence, the interpretation of the topography of the EOF’s in terms of geographical area and associated meteorological phenomena should be looked on with suspicion unless the influence of the effect of the shape of the region has been completely accounted for. Otherwise, such interpretations may well be on a scientific level with the observations of children who see castles in the clouds.”

Buell’s work generated considerable discussion within the atmospheric sciences literature because PCA (or, as they referred to it, EOF [empirical orthogonal function]) analysis was in widespread use. Mike Richman, in a paper published in the Journal of Climatology (1986) entitled “Rotation of Principal Components” made the case that component rotation appears to eliminate the problem of domain shape dependence. Richman’s paper is worth reading in that it reviews and considers nearly all of the important work that had been done using PCA on meteorological and climatological data up to that time, including work by Gerry North and others familiar to CA readers. So these problems have been well documented and understood for a long time. Like so many other elements of meteorology and climatology, though, modern climate science appears to have forgotten some important statistical insights produced by it’s own practitioners . . . indeed, some of the practitioners themselves appear to have forgotten what they wrote.

Nierenberg re Schmidt re McKitrick and Michaels

Nicolas Nierenberg has taken a look here at Gavin Schmidt auditing of McKitrick and Michaels. He previously reported here on the analysis:

Anyway I have written an analysis of spatial autocorrelation as it relates to S09 and MM07. My conclusion is that the primary result in MM07 was not affected by spatial autocorrelation, which is in line with Dr. McKitrick’s follow up paper on the subject. In addition I am able to explain the spurious results found in S09 using Model E data by showing that it is caused by spatial autocorrelation. This was Dr. Schmidt’s theory in that paper. Through this process I show that the results of S09 while interesting don’t contradict the findings of MM07.

I have not personally examined any of the papers in question (McK-Mic or Schmidt 2009) and have not parsed Nicolas’ analysis either. At this point I’m providing a pointer to Nicolas’ article and suggest that interested readers comment at Nicolas’ blog rather than here.

Upside Down Tiljander in Japan

Some Japanese articles have been in the news recently. CA readers will be interested in the fact that CA was cited (thanks to a CA reader for the heads up). Here’s a graphic from their SI showing differences between Gaspé versions. As CA readers know, similar discrepancies occur for bristlecones between Ababneh and Graybill or between the Polar Urals updata and Yamal – but you can predict the version used in Team reconstructions with almost total accuracy through a very algorithm. 🙂 Anyway,- I sort of like the look of the citation in Japanese and thought I’d share it with you.

They also circle the uptick in the upside-down Tiljander series, which we discussed here, and again it looks kinda cool in Japanese.

In this context, I thought that I’d briefly review the PNAS exchange on this topic. I reported that the Mann 2008 graphic was upside down from the orientation in the original study. So that the HS goes down in the 20th century. The original authors (Tiljander et al) discounted the 20th century portion as compromised by agriculture, ditches and bridges and so the increased varve thicknesses were not considered to be evidence of global cooling.

Upside down proxies are obviously a bad thing in CPS reconstructions (one of two legs in Mann 2008); and non-climatic contamination is a bad thing for correlation based reconstructions.

We referred to this in our PNAS comment as follows:

Their non-dendro network uses some data with the axes upside down, e.g. Korttajarvi sediments, which are also compromised by agricultural impact (Tiljander, pers. comm.)

To which Mann replied:

The claim that ‘‘upside down’ data were used is bizarre. Multivariate regression methods are insensitive to the sign of predictors. Screening, when used, employed one-sided tests only when a definite sign could be a priori reasoned on physical grounds. Potential nonclimatic influences on the Tiljander and other proxies were discussed in the SI, which showed that none of our central conclusions relied on their use.

I think that even Mann sympathizers should not accept this response. The claim that “upside down” data was used may be “bizarre”, but it’s also true. You can see that the data was used upside down by comparing Mann’s own graph with the orientation of the original article, as we did last year. In the case of the Tiljander proxies, Tiljander asserted that “a definite sign could be a priori reasoned on physical grounds” – the only problem is that their sign was opposite to the one used by Mann.

Mann says that multivariate regression methods don’t care about the orientation of the proxy. But that doesn’t solve the problem for Mann as big problems remain. There are two methods – CPS and EIV. CPS methods directly care about the orientation and the upside down data are directly used in the CPS recons. In the regression methods, the data is also used upside down. The meatgrinder picks up a spurious correlation between agricultural ditches and the proxy and assigns the wrong orientation to the series in the EIV reconstruction as well. All one needs to do is follow the series through.

Mann says “potential nonclimatic influences on the Tiljander and other proxies were discussed in the SI, which showed that none of our central conclusions relied on their use”. These are not “potential” influences; they are clearly identified as actual influences by Tiljander. The SI alludes to problems, but falls well short of providing anything like a rational explanation of why this data was used given the problems. The SI also failed to disclose that the proxies were used upside down.

At this point, there are also issues of whether the SI actually shows that “none of their central conclusions” relied on their use. One of their central conclusions was that they could “get” a stick without dendro proxies – but their non-dendro recon used upside-down Tiljander. Their SI showed that they could “get” a stick without Tiljander but, as far as I can tell, the non-Tiljander comparandum used dendro series and, in particular, relied heavily on a Graybill bristlecone. It’s a large job analyzing the impact of this sort of thing. At the time, I didn’t have a working version of Mannian EIV; one of the reasons for working through Steig RegEM in such detail was to get a handle on Mannian RegEM and I may well re-visit this matter in the near future.

Their non-dendro network uses some data with the axes upside down, e.g. Korttajarvi
sediments, which are also compromised by agricultural impact (Tiljander, pers. comm.)

Steig Eigenvectors and Chladni Patterns #2

Yesterday, I showed an interesting comparison between the 3 Steig eigenvectors and “Chladni patterns” generated by performing principal components on a grid of spatially autocorrelated sites on a disk. Today I’ll show a similar analysis, but this time using a random sample of points from actual Antarctica. The results are pretty interesting, to say the least.

Key points for the disk included:
1) the first disk eigenvector and the first Steig eigenvector weight interior points more heavily than points around the circumference, but the first Steig eigenvector is displaced somewhat to the east. I speculated as follows:

Antarctica is by no means perfectly circular and the “center of gravity” is displaced to the east as well. My guess is that the same sort of graphic done on the actual Antarctic shape will displace to the east as well. I’ll check that some time.

2) the 2nd and 3rd Steig eigenvectors and disk eigenvectors were both “two lobed”. In the disk, any axis orientation is as likely as any other, while the axis of the Steig eigenvectors could perhaps be construed as being related to the peninsula and the Transantarcti Mts.

The Steig AVHRR grid information contains 5509 gridcells with lat-longs. I took a random sample of 300 cells (by taking a random sample from 1:5509 and taking the corresponding gridcells). I calculated the distances between gridcells (a network with 90000 from-to pairs) and the correlations assuming an exponential decorrelation of exp(-distance/1200) – this is sort of consistent with what we see, but I’m mainly just experimenting right now. I converted this into a 300×300 correlation matrix and took principal components. I then used the akima program to make this into a contour map (changing everything into x-y coordinates, using Roman’s pretty extraction of the Antarctic contour from mapproj to overlay the continent onto the contour map.) I need to white out some of the ocean areas, but that’s a little fiddly and not germane to the plots shown below.

On the left as before are the Steig eigenvectors; on the right are eigenvectors from the above procedure (with the order of the 2 and 3 eigenvectors reversed for a reason that will be obvious). Using preferred Team terminology, I submit that the patterns are “remarkably similar”.

   
   
   

As before, let’s return to Steig’s assertions about these three eigenvectors (for which no evidence was provided):

The first three principal components are statistically separable and can be meaningfully related to important dynamical features of high-latitude Southern Hemisphere atmospheric circulation, as defined independently by extrapolar instrumental data. The first principal component is significantly correlated with the SAM index (the first principal component of sea-level-pressure or 500-hPa geopotential heights for 20S–90S), and the second principal component reflects the zonal wave-3 pattern, which contributes to the Antarctic dipole pattern of sea-ice anomalies in the Ross Sea and Weddell Sea sectors 4,8

Now consider some of the following as possible “confirmation” that the form of these eigenvectors result from nothing more than principal components on spatially autocorrelated series on a figure with an Antarctic shape.

In addition to the high interior weighting of the 1st eigenvector, it is displaced towards the east as I predicted.

The orientations of the 2nd and 3rd eigenvectors now match the 3rd and 2nd spatially autocorrelated eigenvectors. So the axis orientations seem to be derived merely from the shape of the continent. There is a little extra oomph in the eigenvector with a NW-SE axis relative to the eigenvector with a perpendicular axis.

I’m not saying that this model explains everything in the Steig eigenvectors, but it sure accounts for most of the major features.

Note: we still haven’t seen any actual AVHRR data, only the rank 3 AVHRR version. As Jean S observed, it appears increasingly likely that the rank 3 data was what Steig, Mann et al used in their RegEM process. Keep an eye on this story.