Ryan’s Tiles

Ryan O has produced a very interesting series of Antarctic tiles by calculating Steigian trends under various settings of retained AVHRR principal components and retained Truncated Total Least Squares eigenvectors (Schneider’s “regpar”). The figure below re-arranges various trend tiles provided by Ryan in a previous comment, arranging them more or less in increasing retained AVHRR PCs from top to bottom and increasing number of retained TTLS eigenvectors left to right. Obviously, in terms of any putative regional reconstruction, the results are totally unstable to what de Leeuw would describe as “uninteresting” variations of regpar and retained PCs.

I want to review two things in today’s note. First, the instability reminds me a lot of a diagram in Bürger and Cubasch GRL 2005, which built on our prior results. There’s something remarkable about the Bürger and Cubasch 2005 presentation that we’ve not discussed before. Second, I thought that it would be worthwhile to review what Steig actually said about fixing on PC=3 and regpar=3, in light of this diagram. We’ve touched on this before, but only in the context of varying regpar and not on the joint variation of retained PCs and regpar.


Figure 1. Collation of Ryan O Trend Tiles. I caution readers that I haven’t verified these results. However, Ryan has built on my porting to R of the Schneider RegEM-TTLS algorithm and has placed relevant code online, in keeping with the open source analysis that we’ve all been conducting during this project.

Bürger and Cubasch 2005
Buried in the Supplementary Information to Bürger and Cubasch 2005 is the following graphic which shows high early 15th century values under some “flavors” of MBH98 parameters – “flavors” corresponding to parameter variations with reduced bristlecone weights, which correspond to similar diagrams in MM05b (EE) especially.


Bürger and Cubasch SI Figure 2. SI readme says: This directory contains 3 additional Figures showing …, (2) the analysis for the MBH98-step for AD 1600 [ a different step than the AD1400 step discussed in MM05a,b] …[Figure] 1600.eps [shows] the 32 variants from combining criteria 1-5 (grey, with CNT=0), distinguished by worse (light grey) or better (dark grey) performance than the MBH98-analogue MBH (10011, black). Note the remarkable spread in the early 16th and late 19th century. [my bold].

This figure is not only not presented in the article itself; it is not even referred to in the running text, which refers to the Supplementary Information only as follows:

Figure 1 shows the 64 variants of reconstructed millennial NHT as simulated by the regression flavors. Their spread about MBH is immense, especially around the years 1450, 1650, and 1850. No a priori, purely theoretical argument allows us to select one out of the 64 as being the ‘‘true’ reconstruction. One would therefore check the calibration performance, e.g. in terms of the reduction of error (RE) statistic. But even when confined to variants better than MBH a remarkable spread remains; the best variant, with an RE of 79% (101001; see supplementary material1), is, strangely, the variant that most strongly deviates from MBH.

Bürger and Cubasch Figure 1 is shown below. While it is somewhat alarming for anyone seeking “robustness” in the MBH quagmire, they refrained from including or even referencing the diagram that would be perceived as giving fairly direct support of our work. I don’t blame Gerd Bürger for this at all; he cited our articles and has always discussed them fairly. In 2005, the mood was such that Zorita and von Storch felt that their ability to get their 2005 Science reply to Wahl and Ammann through reviewers would be compromised if they cited us in connection with bristlecones and MBH and discussed the issue without citing us (Zorita apologizing afterwards) even though we were obviously associated with the issue and they were well aware of this. In the Bürger and Cubasch case, the diagram was buried in the SI. (We have obviously been aware of this diagram and have used it from time to time, including our NAS presentation.)

Bürger and Cubasch 2005 Figure 1.

I apologize for the digression, but I think that there are some useful parallels between the non-robustness observed in Bürger and Cubasch 2005 and in Ryan’s tiles. The reason for such instability in the MBH network was the inconsistency between proxies – an issue that we referred to recently in our PNAS Comment on Mann et al 2008, where we cited Brown and Sundberg’s calibration approach to inconsistency – something that I’ll return to in connection with Steig.

Regpar and PC=k in Steig et al 2009
On earlier occasions, the two Jeffs, Ryan and I have all observed on the instability of trends to regpar choices, noting that the maximum for the overall trend occurred at or close to regpar=3. It was hard to avoid the impression that the choice of regpar=3 was, at best, opportunistic. Let’s review exactly how Steig et al described their selection of regpar=3 and to their selection of PC=3.

In the online version of their article (though not all versions), they say (links added by me):

We use the RegEM algorithm [11- T. Schneider 2001], developed for sparse data infilling, to combine the occupied weather station data with the T_IR and AWS data in separate reconstructions of the Antarctic temperature field. RegEM uses an iterative calculation that converges on reconstructed fields that are most consistent with the covariance information present both in the predictor data (in this case the weather stations) and the predictand data (the satellite observations or AWS data). We use an adaptation of RegEM in which only a small number, k, of significant eigenvectors are used [10 – Mann et al, JGR 2007]. Additionally, we use a truncated total-least squares (TTLS) calculation [30 – Fierro et al 1997] that minimizes both the vector b and the matrix A in the linear regression model Ax=b. (In this case A is the space-time data matrix, b is the principal component time series to be reconstructed and x represents the statistical weights.) Using RegEM with TTLS provides more robust results for climate field reconstruction than the ridge-regression method originally suggested in ref. 11 for data infilling problems, when there are large differences in data availability between the calibration and reconstruction intervals [10 – Mann et al, JGR 2007]. For completeness, we compare results from RegEM with those from conventional principal-component analysis (Supplementary Information).

The monthly anomalies are efficiently characterized by a small number of spatial weighting patterns and corresponding time series (principal components) that describe the varying contribution of each pattern… The first three principal components are statistically separable and can be meaningfully related to important dynamical features of high-latitude Southern Hemisphere atmospheric circulation, as defined independently by extrapolar instrumental data. The first principal component is significantly correlated with the SAM index (the first principal component of sea-level-pressure or 500-hPa geopotential heights for 20S–90S), and the second principal component reflects the zonal wave-3 pattern, which contributes to the Antarctic dipole pattern of sea-ice anomalies in the Ross Sea and Weddell Sea sectors [4 – Schneider et al J Clim 2004; 8 – Comiso, J Clim 2000]. The first two principal components of TIR alone explain >50% of the monthly and annual temperature variabilities [4 – Schneider et al J Clim 2004.] Monthly anomalies from microwave data (not affected by clouds) yield virtually identical results [4 – Schneider et al J Clim 2004.]

Principal component analysis of the weather station data produces results similar to those of the satellite data analysis, yielding three separable principal components. We therefore used the RegEM algorithm with a cut-off parameter k=3. A disadvantage of excluding higher-order terms (k > 3) is that this fails to fully capture the variance in the Antarctic Peninsula region. We accept this tradeoff because the Peninsula is already the best-observed region of the Antarctic.

Virtually all of the above is total garbage. We’ve seen in earlier posts that the first three eigenvector patterns can be explained convincingly as Chladni patterns. This sort of problem is long known in climate literature dating back at least to Buell in the 1970s – see posts on Castles in the Clouds. “Statistical separability” in this context can be demonstrated (through a reference in Schneider et al 2004 (by two coauthors) to be the separability of eigenvalues discussed in North et al (1982). Chladni patterns frequently occur in pairs and may well be hard to separate – however, that doesn’t mean that the pair can be ignored. The more salient question is whether Mannian principal component methods are a useful statistical method if the target field is spatially autocorrelated – an interesting and obvious question that clearly is not the horizon of Nature reviewers.

Obviously the above few sentences fall well short of being any sort of adequate argument supporting the use of 3 PCs. In fairness, the use of 3 PCs seems to have been developed in predecessor literature, especially Schneider et al JGR 2004, which I’ll try to review some time.

However, the regpar=3 decision does not arise in the earlier Steig Schneider literature and is entirely related to the use of Mannian methods in Steig et al 2009. The only justification is the one provided in the sentence cited above:

Principal component analysis of the weather station data produces results similar to those of the satellite data analysis, yielding three separable principal components. We therefore used the RegEM algorithm with a cut-off parameter k=3.

This argument barely even rises to arm-waving. I don’t know of any reason why the value of one parameter should be the same as the other parameter. It’s hard to avoid the suspicion that they considered other parameter combinations and did not consider combinations that yielded lower trends.

Ryan O: More Mannian Algorithm Problems

Ryan O has observed a remarkable property of the Mannian algorithm used in Steig et al’s Antarctic temperature reconstruction described in a lengthy post at Jeff Id’s here and cross-posted at Anthony’s. Source code here (the source code style BTW evidencing engineering tidiness from which we should all take heed). I’m reporting here on one aspect of the post; readers are urged to consult either of the original postings.

As I understand his exposition, he took a hypothetical Antarctic temperature history (his “model_frame”) in which the overall Antarctic trend was 0.060 deg C/decade, with cooling on the Ross and Weddel ice shelves and near the South Pole and with a maximum trend in the Peninsula, as illustrated below (showing 1957-2002 trends):


Figure 1. Stipulated Temperature History excerpted from Ryan O original.

Ryan extracted from this:
1) the post-1982 gridcells in lieu of AVHRR data;
2) for stations, the same pattern of missing data as in the actual Steig reconstruction;

He then did a Mannian reconstruction in which he:
1) used 3 PCs for the “AVHRR” data;
2) set the PTTLS regpar parameter equal to 3.

In this case, we know the “correct” answer (0.06 deg C per decade.) Instead of getting the correct answer of 0.060 deg/C , the Mannian algorithm yielded a trend that was 70% higher (0.102 deg C/decade).

In climate science terminology, this trend is “remarkably similar” to the trend reported in the original article. Ryan dryly observed:

If “robust” means the same answer pops out of a fancy computer algorithm regardless of what the input data is, then I guess Antarctic warming is, indeed, “robust”.

Ryan’s example is pretty convincing evidence that this particular Mannian algorithm is biased towards yielding higher trends. However, I think that it’s important to understand the mechanism of the bias a little more clearly.

For example, in our analysis of Mannian PCA, it was important that we were able to demonstrate the mechanism of the bias – short-segment centering led to the algorithm in effect “mining” for HS-shaped series, which, in that case, were the bristlecones. In the present case, our collective understanding of the problem is still a little empirical – though obviously even that seems to be a considerable advance, to say the least, on the level of analysis achieved by the Nature referees. IMO it should be possible to provide a more analytic explanation. With this end in mind, some time in the next few days, I’ll post some notes that will attempt to connect RegEM with known statistical methods i.e. methods used by someone other than a Team member.

FOI and the "Unprecedented " Resignation of British Speaker

Readers have sometimes proposed that I try to enlist the support of a British MP for efforts to get information from the various stonewalling UK climate institutions, such as Fortress CRU. In fact, it seems that British MPs have had their own personal reasons for not supporting FOI. For the past 5 years, they have stonewalled FOI requests by journalist Heather Brooke for details of their expenses (2006 comment here). Speaker Michael Martin led the stonewalling campaign. Brooke challenged the MPs in court and, in May 2008, won a notable success. But a year later, even with a court victory, she was still no further ahead.

In the last couple of weeks, the ground suddenly shifted. Using a tried-and-true method (chequebook journalism), the Daily Telegraph purchased a disk with details of MP expenses. The reasons for the stonewalling became pretty clear. MPs expensed the public for things like repairs to the moat at the family castle of one MP to a subscription to Playboy Channel for the husband of another MP. No detail seemed too large or small not to be charged to the public. A climate scientist would have said that the situation was “worse than we thought”.

The worst of the trough-feeding arose over provisions entitling MPs to purchase and improve 2nd and 3rd homes at public expense, under the guise of being nearer Parliament or nearer their constituency, even if the 2nd home was only a few miles closer to Parliament than the original home. MPs as a class seem to have become small-time real estate speculators with the public underwriting the cost of their speculations, but not the capital gains. The public anger is not just about the chiseling, though the anger about the chiseling is real enough, but about the influence of these sorts of perqs and benefits and MPs as a class.

The exposure of the pigs at the trough has angered the British public and amused the rest of the world (e.g. CBC in Canada here.

One of the first casualties of the affair was Speaker Michael Martin, who had been administered the expense program and who had directed the prolonged litigation against revealing the expenses. Martin became the first speaker since the Little Ice Age to resign – in climate science, this is known as an “unprecedented” resignation.

After spending five years trying unsuccessfully to get the expenses, Heather Brooke was understandably a bit sour at being scooped by the Daily Telegraph merely buying the information, but gamely expressed some vindication at this sorry mess being exposed, observing sensibly:

But I don’t begrudge the paper. It is getting the story out in the most cost-effective way possible. What’s unforgiveable is that the House of Commons repeatedly obstructed legitimate requests and then delayed the expense publication date and that MPs went so far as to try to exempt themselves from their own law. I wonder, too, how much we would have actually seen if we’d waited for the Commons to publish, given that MPs were given a free hand to black out anything that was “personal” or a danger to their “security”. These terms have been so overused by MPs that I’ve no doubt that items such as cleaning the moat would have been removed for “security” reasons, as would the house-flipping scandal, as an invasion of MPs’ privacy…

And now MPs are feeling morose. Tough! They’ve had plenty of opportunities to do the right thing by parliament and by the people. At every juncture they behaved in the worst possible way. They refused legitimate requests, they wasted public money going to the high court, they delayed publication, they tried to exempt themselves from their own law, they succeeded in passing a law to keep secret their addresses from their constituents so as to hide the house flipping scandal …

As CA readers, David Holland, Willis Eschenbach and I have been given a variety of fanciful and untrue excuses by climate scientists stonewalling FOI requests. Within this reverse beauty contest, the excuses of Hadley Center executive John Mitchell for refusing to provide his Review Comments on IPCC AR4 chapter 6 are among the most colorful: first, Mitchell said that he had destroyed all his correspondence with IPCC; then he said that they were his personal property. David Holland then submitted FOI requests for Mitchell’s expenses for trips to IPCC destinations and information on whether he had done so on vacation time, while also confronting Hadley Center with their representations to the public on how Hadley Center scientists were doing the British public proud through their participation as Hadley Center employees in IPCC. So Hadley Center foraged around for a new excuse – this time arguing that releasing Mitchell’s review comments would compromise British relations with an international organization (IPCC), IPCC in the meantime having informed Hadley Center that it did not consent to the review comments being made public – ignoring provisions in the IPCC by-laws that require them to make such comments public. In administrative law terms, there is unfortunately no recourse against IPCC – an interesting legal question that we’ve pondered from time to time (also see Global Administrative Law blog here.)

We’ve also tried unsuccessfully to obtain Caspar Ammann’s secret review comments on chapter 6, which IPCC failed to include in their compilation of Review Comments and which Ammann and Fortress CRU have refused to make public.

It’s hard to picture exactly what’s in the Mitchell correspondence (or Ammann correspondence for that matter) that’s caused the parties to be so adamant about not disclosing comments that are properly part of the public record. In Mitchell’s case, I suspect that the reluctance arises not so much from the fact that anything particularly bad was said, but merely that the record would be embarrassingly empty – thereby showing (what I believe to be) an almost complete casualness to discharging any obligations as a Review Editor other than swanning off to IPCC destinations.

As long as we don’t know, it will of course be a mystery. A couple of weeks ago, MP expenses were a mystery as well.

Bob Tisdale on SST

A shout out for Bob Tisdale’s blog here. Bob cross-posts at Anthony’s from time to time. At his own blog, he’s done a number of excellent analyses of SST data sets.

On many occasions, I’ve observed that critical analysis of the temperature record has spent a disproportionate amount of attention on land data sets relative to SST. While the issues there are not necessarily resolved, they are well-known. Over the past few years, many readers, including myself, have built up some familiarity with the alphabet soup of land data. Many of us, like connoisseurs of fine wine, can distinguish between GHCN, CRU, NOAA, GISS and the various vintages of USHCN.

However, speaking for myself, I’m a bit at sea, so to speak, in my connoisseurship of SST data. The differences between SST data sets was recently discussed at Anthony’s and, curiously enough, it also plays a role in Santer et al 2008, which made a curious switch in SST versions from prior articles – a switch that has thus far gone unnoticed and which affects results. I’ll return to this on another occasion.

After Anthony raised a question of a recent divergence between GISS and NOAA/NCDC temperatures, Bob T, pitched in with a careful explanation of differences between the SST data sets used in the different major global indices. Following are quotes from Bob T. In following the nomenclature for data sets, keep in mind that NOAA originates a few different SST versions, with GISS using one version and NOAA (for its own temperature index) using another.

GISS

GISS has used the NCDC OI.v2 SST anomaly data since December 1981, and before that they had used the Hadley Centre’s HADSST data. GISS then splices the two datasets together….

NOAA describes the Optimum Interpolation (OI.v2) SST anomaly data (used by GISS) as, “The optimum interpolation (OI) sea surface temperature (SST) analysis is produced weekly on a one-degree grid. The analysis uses in situ and satellite SST’s plus SST’s simulated by sea-ice cover.” The in situ data is from buoy and ship measurements. The full description of the OI.v2 data is here: http://www.cdc.noaa.gov/data/gridded/data.noaa.oisst.v2.html

The OI.vs SST anomaly is attributed by Bob T to Smith and Reynolds 2002.

NOAA/NCDC

NCDC has their own SST anomaly dataset for their global surface temperature product, and they calculate anomalies against the base years of 1901 to 2000.

The NCDC identifies the “Global Ocean Temperature” dataset as SR05 in its Global Surface Temperature Anomalies webpage:
http://www.ncdc.noaa.gov/oa/climate/research/anomalies/index.php#sr05

Linked to the webpage is a paper by Smith et al (2005) “New surface temperature analyses for climate monitoring” GEOPHYSICAL RESEARCH LETTERS, VOL. 32, L14712, doi:10.1029/2005GL023402, 2005.

Click to access Smith-comparison.pdf

On page 2, Smith et al describe the SR05 data as, “The SR05 SST is based on the International Comprehensive Ocean Atmosphere Data Set (ICOADS [Woodruff et al., 1998]). It uses different, though similar, historical bias adjustments to account for the change from bucket measurements to engine intake SSTs [Smith and Reynolds, 2002]. In addition, SR05 is based on in situ data.”

It appears, from that quote and the rest of the paper, the SR05 SST dataset does NOT use satellite data. This is consistent with NCDC’s other long-term SST datasets. They also abstain from satellite data.
….
I have found no source of SR05 SST anomaly data, other than the Global, Northern Hemisphere, and Southern Hemisphere “Ocean Temperature” datasets linked to the Global Surface Temperature webpage.

ERSST v2 and ERSST v3
Bob T observes that NOAA/NCDC also supports ERSSTv2 (now discontinued) and ERSST v3 SST data sets and in the linked posts, compares these data sets to the other ones. Bob also mentions two vontages of ERSST v3, one of which seems to have disappeared (though Bob mentions that he had saved some of the disappeared data.)

In addition to the SR05 SST data, the NCDC also has two other long-term SST datasets called Extended Reconstructed SST (ERSST) data. The ERSST.v2 (Version 2) data was introduced in 2004 with the Smith and Reynolds (2004) paper Improved Extended Reconstruction of SST (1854-1997), Journal of Climate, 17, 2466-2477. Many of my early Smith and Reynolds SST Posts used ERSST.v2 data through the NOAA NOMADS system. Unfortunately, ERSST.v2 data is no longer available through that NOAA system, so the latest ERSST.v2 global SST anomaly data from NOMADS I have on file runs through October 2008.

The ERSST.v2 data was updated with ERSST.v3 data. In my opinion, it provides the most detailed analysis of high latitude SST in the Southern Hemisphere (the Southern Ocean). The ERSST.v3 data was introduced last year with the Smith et al (2008) paper: Improvements to NOAA’s Historical Merged Land-Ocean Surface Temperature Analysis (1880-2006), Journal of Climate,21, 2283-2296. The NCDC updated it with their ERSST.v3b version later in 2008, but more on that later. A limited number of datasets (based on latitude) for the ERSST.v3b data are available from NCDC (though it is available on a user-selected coordinate basis through the KNMI Climate Explorer website, as is ERSST.v2 data).

From Bob’s comments, I get the impression that ERSST v3 is not used in the three major GLB temperature reports (CRU, GISS, NOAA) and will seek confirmation of this.

HadCRU
I haven’t checked Bob’s posts for HadCRU yet, but I presume that they use HadSST – a newish HadISST version is now in circulation, which I glanced at recently but haven’t studied.

Santer
I was going to do this in another post, but I’ll mention the point here to perhaps motivate readers to ponder the difference between these data sets.

In the CCSP report that prompted Douglass et al 2007 (and thus Santer et al 2008), the surface observations used as comparanda were CRU, NOAA and GISS. These data sets were also used in Douglass et al.

However, in an hitherto unnoticed swap, Santer et al 2008 used ERSST v2, then hot off the press ERSST v3 and HadISSTv2 as surface comparanda. Santer purported to justify the substitution as follows:

The three SST datasets are more appropriate to analyse in order to determine whether observed lower tropospheric temperature changes follow a moist adiabatic lapse rate (Wentz and Schabel, 2000).

However, the ERSST versions also have a noticeably lower trend than GISS or NOAA (the two “hottest” series). The lower trends in the surface data reduced the mismatch between surface and observations, contributing to Santer’s claim that :

There is no longer a serious and fundamental discrepancy between modelled and observed trends in tropical lapse rates, despite DCPS07’s incorrect claim to the contrary. Progress has been achieved by the development of new T_SST , T_L+O, and T2LT datasets, …

Here the “new T_SST” data set appears to be the ERSST data – which seems to have had a somewhat complicated recent history. It looks like earlier ERSST versions used satellite data in their construction, while the most recent version has got rid of the “satellite bias” by discontinuing use of the satellite data.

Whatever the merits of the ERSST data, it’s hard to see that the comparison of satellite data to ERSST, interesting as it may be, is particularly relevant to the issue of whether there is a statistically significant difference between satellites and the “big” indices (GISS, CRU, NOAA) if it isn’t actually used in the “big” indices.

Again, I provide a caveat that my own personal handling of these SST data sets is very limited and I thus have limited connoiseurship of them at this time.

Baby whirls: improved detection of marginal tropical storms

With the North Atlantic hurricane season officially starting in a couple weeks (June 1), but possibly getting a head start with a developing low-pressure system in the Bahamas, considerable attention will be paid by the media to each and every storm that gets a name. In the North Atlantic, a name is granted to a tropical or subtropical storm with sustained winds of greater than 34 knots AND when the National Hurricane Center declares it is so. It is these storm counts that are used for a variety of purposes including insurance rates and climate change research.

Back in 2007, David Smith described short-lived, generally weak or marginal tropical storms as Tiny Tims. A couple of posts were dedicated to various aspects of the North Atlantic Tiny Tim climatology including here and here.

One is a modern period (the last twenty years, 1988-2007). This is a period of good (and ever-improving) detection tools, like advanced satellites, improved recon devices, denser buoy networks and so forth.

The modern period also matches the 1988-2007 list of Tiny Tim storms. Tiny Tims are storms so weak, small, remote and/or short-lived that there’s no record of ships or land experiencing storm-force winds, yet they were classified as tropical storms. By historical standards these modern Tiny Tims would have been regarded as depressions or disturbed weather, not tropical storms.

In a local Florida newspaper, Chris Landsea describes a new paper he has (re-)submitted to the Journal of Climate along with three other prominent tropical cyclone and/or climate researchers: Gabe Vecchi (NOAA/GFDL), Lennart Bengtsson (Reading, UK), and Thomas Knutson (NOAA/GFDL).

From Kate Spinner’s article online:

Landsea scrutinized the hurricane center’s storm data and corrected for technological advances in hurricane detection and tracking. He concluded that hurricane seasons of the past rivaled today’s activity, suggesting the influence of a periodic climate cycle in the Atlantic, not global warming, is behind the current spike in storms…

Landsea’s new study, currently under review by other scientists, stemmed from his objection to studies in 2006 and 2007 linking the increased number of recorded hurricanes with a rise in global temperatures.

“I did not agree with the studies because I thought their assumption that all the storms were in the database was faulty,” Landsea said.

However, perhaps the most illuminating part of the article includes two quotations from Landsea’s “critics”, including Michael Mann and Kerry Emanuel, both professors who have published various papers on aspects of the Atlantic hurricane climatology. Many of their papers received a high-level of scrutiny here at Climate Audit during the past 4 years.

Mann disputed Landsea’s research, saying that his technology argument ignores the chance that a single storm could have been counted twice before satellite records could show the exact track. He expressed doubt that the study would pass muster to be published.

Kerry Emanuel, a leading hurricane researcher and professor of atmospheric science at Massachusetts Institute of Technology, said Landsea’s work is scientifically robust, but not as important as looking at whether warming causes hurricanes to gain strength.

“I don’t think the number of storms is a terribly interesting thing,” Emanuel said, emphasizing Atlantic storms now rarely exceed Category 2 strength, but that the majority of damage-inflicting storms are Category 3 or higher. “We’re pretty confident that intensity increases with global temperature. There are arguments about the amount.”

Mann helpfully provides an editorial comment on the likelihood of publication, apparently low in his estimation. However, Emanuel (who has collaborated with Mann on several hurricane related papers) finds that the work is scientifically robust but not important to the issue of global warming. While the frequency argument has largely died away with regards to relationship with SST with a few exceptions including Holland and Webster (2007), and remarkably a paper by Mann and Emanuel which does what Emanuel describes as not “terribly interesting”: correlates historical North Atlantic tropical storm frequency with SST warming and tests the hypothesis of multi-decadal oscillation impact on storm activity (AMO).

Here is the abstract of Landsea et al. (submitted)

Records of Atlantic basin tropical cyclones (TCs) since the late-19th Century
indicate a very large upward trend in storm frequency. This increase in documented TCs
has been previously interpreted as resulting from anthropogenic climate change. However,
improvements in observing and recording practices provide an alternative interpretation for
these changes: recent studies suggest that the number of potentially missed TCs is
sufficient to explain a large part of the recorded increase in TC counts. This study explores
the influence of another factor–TC duration–on observed changes in TC frequency, using
a widely-used Atlantic TC database: HURDAT. We find that the occurrence of short-lived
storms (duration two days or less) in the database has increased dramatically, from less
than one per year in the late-19th/early-20th Century to about five per year since about 2000,
while moderate to long-lived storms have increased little, if at all. Thus, the previously
documented increase in total TC frequency since the late 19th Century in the database is
primarily due to an increase in very short-lived TCs.

We also undertake a sampling study based upon the distribution of ship
observations, which provides quantitative estimates of the frequency of “missed” TCs,
focusing just on the moderate- to long-lived systems with durations exceeding two days.
Both in the raw HURDAT database, and upon adding the estimated numbers of missed
TCs, the time series of moderate to long-lived Atlantic TCs show substantial multi-decadal
variability, but neither time series exhibits a significant trend since the late-19th Century,
with a nominal decrease in the adjusted time series.
Thus, to understand the source of the century-scale increase in Atlantic TC counts
in HURDAT, one must explain the relatively monotonic increase in very short duration
storms since the late-19th Century.
While it is possible that the recorded increase in short
duration TCs represents a real climate signal, we consider it is more plausible that the
increase arises primarily from improvements in the quantity and quality of observations,
along with enhanced interpretation techniques, which have allowed National Hurricane
Center forecasters to better monitor and detect initial TC formation, and thus incorporate
increasing numbers of very short-lived systems into the TC database.

The first figure from Landsea’s paper shows the unadjusted frequency of tropical storms (and subtropical) from 1878-2008 which demonstrates the significant upward trend. The second figure shows the frequency of storms which last longer than 2-days, which no longer has a significant trend. Maybe it is possible the Tiny Tims were lost? We’ll keep our eyes out for more of these “Baby Whirls” and at the same time see if Landsea’s paper can “pass muster”.

References:

Holland G. J., and P. J. Webster. 2007: Heightened tropical cyclone activity in the North
Atlantic: natural variability or climate trend? Philos. Transact. R. Soc. A. Math. Phys.
Eng. Sci.. 365, 2695-2716.

Landsea, C. W., G. A. Vecchi, L. Bengtsson, and T. R. Knutson: Impact of duration thresholds on Atlantic tropical cyclone counts. Submitted J. Climate, May 7, 2009.

Mann, M., and K. Emanuel, 2006: Atlantic hurricane trends linked to climate change. Eos,
Trans. Amer. Geophys. Union, 87, 233-241.

Santer and the "Power of Poop"

Rather than spending time archiving information from his various publications, Santer has placed his scientific priorities on introducing a remarkable cartoon (Youtube here), which ends (see 7 minutes on) with a ditty urging its audience to “do something about the power of poop”. The video ends with a close-up of a large odiferous dropping, with the narrator singing in one of the most annoying falsettos that you will ever hear:

we must do something about the power of poop,
the power of poop,
the power of poop,
the power of poop, poop, poop…

Who would have guessed that Santer’s interests were quite so scatological? And who says that the age of lyric poetry is over?

The closing remarks also provide a seamless segue to anticipated reviewer comments.

We learn at the opening of the cartoon that:

The snows of Kilimanjaro are over 10,000 years old.

Lonnie Thompson makes this claim in Thompson et al 2000, but the actual evidence is very slight and it’s not hard to contemplate circumstances in which the Kilimanjaro glacier is less than 10,000 years old, perhaps much less. We’ve discussed this point on a number of occasions in previous CA posts.

I’ve also observed that IPCC made this claim in the First Draft of AR4, but withdrew the claim in later drafts. I;ve discussed this before, but it’s useful to review in the light of Santer’s venture in amateur cartoons.

The IPCC AR4 First Draft (ch 6) stated:

There is only scarce information on the African glacier history, but ice cores retrieved from the Kilimanjaro ice cap reveal that the current retreat is unprecedented in the Holocene (Thompson et al., 2002)

.

One (and only one) IPCC reviewer questioned this claim in comment 6-1076:

Thompson’s dating of Kilimanjaro is very precarious. The assumed accumulation is implausibly low – it’s only 50 m thick (as compared to 160 m at Quelccaya), but is dated to 11700 BP versus start of AD440 at Quelccaya.

The IPCC Author Response apparently conceded the point and the claim was not made in the Second Draft and Final Report. The Author Response:

Noted, I know this point concerning the dating of Kili – we have to decide together shall we keep this reference or not – we cannot discuss the dating problem within the Holocene glacier box.

I guess that the IPCC reviewers did not fully anticipate the “power of poop”.

Re-Visiting CCSP 1.1 on Lapse Rate Trends

As noted in an earlier post, I’ve now managed to synchronize 48 of 49 Santer tropo series with KNMI surface temperature series and have looked at versions of some key figures in CCSP 1-1 and previously inaccessible figures in Santer.

First here is an important figure from CCP 1-1 showing a histogram of relative trends (surface minus T2LT) for models, together with observations (RSS and UAH T2LT versus CRU and NOAA/GISS). CCSP stated in the caption that “each histogram is based on results from 49 individual realizations of the 20CEN experiment, performed with 19 different models (Table
5.1)”. These are the same numbers as in Santer et al 2005 and Santer et al 2008. The “Convening Lead Author” of this CCSP section, to no surprise, turns out to be Santer himself. So I think that we can prima facie assume that Santer did the same amount of “independent” due diligence on Santer et al 2005 as Mann, in his capacity as IPCC TAR lead author, did on MBH98.


Figure 1. CCSP 1.1 Figure 5.3G, showing a histogram of T_surface minus T2LT trends, against corresponding observed trends for RSS and UAH T2LT versus CRU (lowest line) and NOAA (upper line). GISS said by CCSP to be close to NOAA.

Next here is my attempt to replicate this figure from my laborious matching of surface and Santer T2LT information. It’s quite close, but it’s a bit different. I’ve used 48 models as opposed to 49 – I was unable to match one CCSM3.0 model with a KNMI surface series; I truncated the Santer-screwed up version of CNM3.0 before 1965. I got a few more outliers than shown in the CCSP report. I got three runs with trends in lapse rate more negative than -0.1 deg C/decade – these result from highly positive model runs in the singleton Canadian CGCM3.1 run, in the singleton HadGEM1 run and in a GFDL2.1 run. I didn’t get a run above 0.05 and only got three positive runs i.e. runs with low T2LT trends relative to surface: a singleton run from INM CM3, a singleton from MRI hi-res and a MRI med-res run. As bridge players know, singletons are not incidental. No models overlap CRU minus UAH T2LT (or UAH T2) in my calculation – CCSP shows one overlap. My observational trends show differences up to 2009 (while model trends are only to 1999). As Santer says in his SI, it’s reasonable to project the model trends forward. (AIB runs where available may be another alternative.)


Figure 2. Simulation of CCSP Figure 5.3G.

As an exercise, I did a similar plot for the T2 lapse rate, which proved to yield less favorable results for the CCSP consistency argument, as you can see below. In this case, no models overlapped GISS minus RSS_T2 and only one model (INM CM3.0) overlapped CRU minus RSS_T2.


Figure 3. As CCSP Figure 5.3G, only for Surface minus T2.

It definitely seems odd that they argue so strenuously that there is no “statistical” inconsistency between models and observations.

Tomorrow I’ll continue the parsing of these results for individuals models, re-visiting Santer et al 2008.

Deciding Which Runs to Archive

Have any of you seen any articles discussing which model runs are archived? It doesn’t appear to me that all model runs are archived. So what criteria are used to decide which model runs are archived by the modelers at PCMDI? (This is a different question than IPCC selections from the PCMDI population.) We’re all familiar with cherrypicking bias in Team multiproxy studies e.g. the addiction to bristlecones and Yamal. It would be nice to think that the PCMDI contributors don’t have a corresponding addiction.

Figure 1 below shows the number of 20CEN runs in the Santer collection of 49 20CEN runs. A few models have 5 runs (GISS EH, GISS ER, NCAR CCSM, Japan MRI), but many models only have one run.


Figure 1. Number of Runs (49) by Model for Santer 20CEN Population

PCMDI now has 81 20CEN runs (KNMI – 78), but the distribution has become even more unbalanced with much of the increase coming from further additions to already well-represented models e.g. NCAR CCSM.

Figure 2. KNMI 20CEN Runs (78) by Model

It’s hard to envisage circumstances under which a modeling agency would only have 1 or 2 runs in their portfolio. Modeling agencies with only one 20CEN run include: BCCR BCM2.0, Canadian CGCM 3.1(T63), CNRM CM3, ECHAM4, INM CM3.0, IPSL CM4 and MIROC 3.2 (hi-res). Modeling agencies with only two archived 20CEN runs include the influential HadCM3 and HadGEM1 models. Surely there are other runs lying around? Why are some archived and not others?

The non-archiving impacts things like Santer et al 2008. One of the forms of supposed uncertainty used by Santer to argue against a statistically significant difference between models and observations is autocorrelation uncertainty in the models. While we are limited on the observation side by the fact that we’ve only got one earth to study, a few more available runs of each model would do wonders in reducing the supposed uncertainty in model trends. Santer should probably have 1) thrown out any models for which only one run was archived; 2) written to the modeling agencies asking for more runs; 3) include a critical note against non-archiving agencies in his paper (though I’m led to believe by reviewers that such criticisms would be “unscientific”.)

Here’s another interesting scatter plot illustrating an odd relationship between trend magnitude (for each model) and trend standard deviation (for each model). This is done only for multi-run models – as standard deviation is obviously not defined for singletons.


Figure 3. Santer Population : Trend standard deviation (by model) versus mean trend (by model).

The above relationship is “significant” in statistical terms. But why should there be a relationship between the mean stratified by model and the standard deviation stratified by model. I’ve had to scratch my head a little to even think up how this might happen. I think that such a relationship could be established by a bias in favor of inclusion of (shall we say) DD trends relative to their less endowed cousins.

Or perhaps there’s some mundane reason that would trouble no one. Unfortunately, IPCC doesn;t seem to have established objective criteria requiring modeling agencies to archive all their results and so, for now, I’m left a bit puzzled. For the record, I’m not alleging such a bias on the present record. But equally it is entirely legitimate to ask what the selection criteria are. Not that I expect to get an answer.

A1B and 20CEN Models

Lucia did a recent post on the construction of IPCC Figure 9.5, which I’d also been looking at in light of the Santer model information but I had different issues in mind. IPCC Figure 9.5 says that they extended selected 20th century runs (the “20CEN” models) with A1B models in order to produce the graph shown below up to 2005. The splice is intriguing on a number of counts – not least of which is the first question: how’d they do it?

Here is the original version of IPCC AR4 Figure 9.5

Original Caption: Figure 9.5a. Comparison between global mean surface temperature anomalies (°C) from observations (black) and AOGCM simulations forced with (a) both anthropogenic and natural forcings …. All data are shown as global mean temperature anomalies relative to the period 1901 to 1950, as observed (black, Hadley Centre/Climatic Research Unit gridded surface temperature data set (HadCRUT3); Brohan et al., 2006) and, in (a) as obtained from 58 simulations produced by 14 models with both anthropogenic and natural forcings. The multimodel ensemble mean is shown as a thick red curve and individual simulations are shown as thin yellow curves. Vertical grey lines indicate the timing of major volcanic events. Those simulations that ended before 2005 were extended to 2005 by using the first few years of the IPCC Special Report on Emission Scenarios (SRES) A1B scenario simulations that continued from the respective 20th-century simulations, where available. … The multi-model ensemble mean is shown as a thick blue curve and individual simulations are shown as thin blue curves. Simulations are selected that do not exhibit excessive drift in their control simulations (no more than 0.2°C per century). Each simulation was sampled so that coverage corresponds to that of the observations. Further details of the models included and the methodology for producing this figure are given in the Supplementary Material, Appendix 9.C. After Stott et al. (2006b).

I hadn’t really thought about it before, but, if I’d been asked, I would have assumed that the A1B simulations were done separately from the 20CEN simulations. If so, it’s not obvious how you’d go about splicing A1B and 20CEN simulations. For example, in many cases, there are multiple realizations of each model – how would you go about linking individual A1B runs to individual 20CEN runs?

There are other interesting aspects of this figure – including the selection of 20CEN runs: not all runs are used. AR4 Chapter 8 SI provides information on which runs were selected. I’ll return to this issue on another occasion. Today I want to walk through the splicing.

Over the past few days, I’ve scraped tropical (20S-20N) averages for all 78 20CEN (25 models) and all 57 A1B runs (24 models) from KNMI (KNMI has some excellent tools, but they are still pretty labor intensive. I’ve done a pretty little scraping program that eliminates 99% of the cut-and-paste drudgery).

Interestingly, virtually all of the A1B runs start in the late 19th century – and have identical start dates as the 20CEN runs. The two exceptions were GISS AOM and FGOALS – for these two models, A1B starts the year after 20CEN ends. This strongly suggested the possibility that individual A1B runs were associated with individual 20CEN runs and that a lexicon linking runs could be constructed. A hint exists in AR4 chapter SI page 9-7, where 28 20CEN runs are shown as being extended with A1B runs.

It appears that this is the case and that accordingly, there is a “natural” extension of the 20CEN runs with A1B runs.

However, there is not a one-to-one map between 20CEN and A1B models. Overall there are 25 20CEN models and 24 A1B models – the one missing A1B model is BCC CM1, which therefore cannot be extended. I wonder whether the absence of an A1B run for BCC CM1 might be a clerical miss – earlier this week, I notified KNMI that two PCM 20CEN models at PCMDI were not on their system. They promptly responded that the models were there, but the linking webpage hadn’t been updated (they promptly fixed things.) Maybe the A1B run for BCC CM1 is around.

For 14 of the remaining 24 models, there are the same number of 20CEN and A1B models and for all 14 (with the numbers ranging from 1 to 5). For each model, I did cross-correlations for the overlap period and in every case, there was one and only one A1B-20CEN map that had a correlation of 0.99 or so. In every case, the “natural” order was preserved in the map. While the values of the cross-correlation “diagonal” were around 0.99 or higher, the “off-diagonal” were significantly lower, with the characteristics varying a lot from model to model. For example, CGCM3.1 had cross-correlations between “different” runs of around 0.9, while they were around 0.6 for CCSM3.0 and a very low 0.05 (with a couple negative) or so for ECHAM5.

Only one model (CCSM3.0) had more A1B (7) than 20CEN runs (6). This meant that only one out of 57 A1B models was left without a “natural” 20CEN link. Again, I wonder whether there might be another CSM3.0 20CEN run somewhere.

Given the existence of this one-to-one map, if the correlation is 0.995 or even 0.9995, it seemed odd that the correlation wasn’t 0.999999 or 1.000000.

This had an interesting explanation, which in turn confirmed the identity of the runs. My comparisons were done using “anomalies”, one of the KNMI options – and the reference period for the 20CEN and A1B datasets is different. As a result the reference means used to create the anomaly differ between the 20CEN and A1B versions. KNMI also permits the retrieval of non-anomaly versions expressed in deg C. I spot checked the CCCma series and these values were identical between versions to all decimal places, confirming that, in this case at least, the 20CEN and A1B runs were identical in the overlap period. The lack of perfect correlation resulted from the fact that the pattern of monthly normals was slightly different between the 20CEN version and A1B version, resulting a slight decorrelation.

The existence of this connection between 20CEN and A1B runs makes one scratch one’s head a little in trying to understand exactly what the IPCC authors meant by saying that the A1B runs were an “extension” of the corresponding 20CEN runs, if, as appears to be the case, they are actually alter egos of the same run. (One odd exception to the IPCC “extensions”: PCM A1B runs are available at KNMI but were not used to “extend” the corresponding 20CEN runs.

There’s an interesting connection to Santer in this, which I’ll visit on another occasion.

Pielke Sr on the "New" USHCN

The new USHCN was scheduled to come out a couple of years ago. A paper describing it has finally appeared, discussed by Pielke Sr here. I haven’t reviewed the new paper – something that I’ll be looking for is whether they rely on “homemade” changepoint methods to supposedly achieve homogeneity – “homemade” in the sense that the changepoint methods were developed within USHCN and are not algorithms that are described in Draper and Smith or similar statistical text or described in statistical literature off the Island.

If so, intuitively, I’m suspicious of the idea that software by itself is capable of fixing “bad” data. For me, one of the main lessons of the Hansen Y2K episode was that it refuted the claim that Hansen’s wonder adjustments were capable of locating and adjusting for bad data – simply because the GISS quality control mechanisms were incapable of locating substantial Y2K jumps throughout the USHCN network. The argument with Mann’s bristlecones is similar – Mann’s “fancy” software was incapable of fixing bad data – in that case, the opposite was the case: it magnified bad data.

These are the sorts of things that one has to watch out for when a “fancy” method without a lengthy statistical pedigree is introduced to resolve a contentious applied problem.