Emanuel 2008:Global warming and hurricanes

A new peer-reviewed paper has been published in an American Meteorology Society journal that raises many more questions on the linkages between hurricane activity and global warming. Eric Berger at the Houston Chronicle (SciGuy) did the leg-work and is the first (and only) mainstream media outlet to report the findings of Emanuel et al. (2008) and get reaction from other scientists in the climate/hurricanes community. Andrew Revkin at the Old Gray Lady also has blogged reaction … NY Times DotEarth

Important update: 04/17 MIT press release New MIT study validates hurricane prediction Provides confirmation that climate change intensifies storms

Apparently the above blogs misconstrued, misinterpreted, or just plain flunked this lesson, because the above Press Release has some very different conclusions.

While the earlier study was based entirely on historical records of past hurricanes, showing nearly a doubling in the intensity of Atlantic storms over the last 30 years, the new work is purely theoretical.

“It strongly confirms, independently, the results in the Nature [2005] paper,” Emanuel said. “This is a completely independent analysis and comes up with very consistent results.”

Emanuel does discuss some of the uncertainties, which seem rather important (I think this summarizes about 100% of the contradictory views to North Atlantic + AGW causation):

There are several possibilities, Emanuel says. “The last 25 years’ increase may have little to do with global warming, or the models may have missed something about how nature responds to the increase in carbon dioxide.”

Or:

Another possibility is that the recent hurricane increase is related to the fast pace of increase in temperature. The computer models in this study, he explains, show what happens after the atmosphere has stabilized at new, much higher CO2 concentrations. “That’s very different from the process now, when it’s rapidly changing,” he says.

And the final conclusion:

In the many different computer runs with different models and different conditions, “the fact is, the results are all over the place,” Emanuel says. But that doesn’t mean that one can’t learn from them. And there is one conclusion that’s clearly not consistent with these results, he said: “The idea that there is no connection between hurricanes and global warming, that’s not supported,” he says.

…Emanuel employs a downscaling approach to the IPCC model scenarios using synthetic tropical cyclone seedlings to judge the impact of a warming world on future storm activity metrics (frequency, power dissipation). His method’s ability to use historical reanalysis products to fairly accurately reproduce past activity lends credibility to Emanuel’s technique, especially for future activity. Of course there are many caveats concerning model downscaling efforts using IPCC scenarios which have been discussed extensively at CA.

From SciGuy’s blog:

“The results surprised me,” Emanuel said of his work, adding that global warming may still play a role in raising the intensity of hurricanes but what that role is remains far from certain.”

In the new paper, Emanuel and his co-authors project activity nearly two centuries hence, finding an overall drop in the number of hurricanes around the world, while the intensity of storms in some regions does rise. For example, with Atlantic hurricanes, two of the seven model simulations Emanuel ran suggested that the overall intensity of storms would decline. Five models suggested a modest increase.

Dr. Curry from Georgia Tech University also is quoted,

“The issue probably will not be resolved until better computer models are developed…By publishing his new paper, and by the virtue of his high profile, Emanuel could be a catalyst for further agreement in the field of hurricanes and global warming …

Kerry Emanuel has provided a link on his homepage for the BAMS 2008 Article and while a little technical, the paper is a good primer on the current state of the “debate” and presents an even-handed examination of his findings. I encourage all to read it and post their own reviews for consumption by the gallery. Historically, Emanuel’s change in thinking represents one aspect of the evolution of climate science.

***Back in 1987, Time Mag reported upon one of the first articles in Nature to sound the alarm (Emanuel 1987).

a warmer climate could result in hurricanes packing up to 50% more destructive power. This could happen, he suggests, within 40 to 80 years, when some scientists think CO2 levels will have doubled and ocean temperatures will have increased by 2 degrees C to 3 degrees C.

Emanuel (2005) in Nature largely confirmed this hypothesis, almost 20 years later with the so-called climate shift of 1995 (Goldenberg et al. 2001 Science). Back in 1987, the computer technology, climate model development, and physical understanding probably was not there to adequately test the hurricane-warming hypothesis. In 2027, it will be interesting to see what “more work” has been done on the problem…

Flashback to July 31, 2005: Press Release MIT’s presser on Emanuel (2005)

Supplementary Information and Flaccid Peer Reviewing

Based on my limited experience, it seems to me that journal peer reviewing faces an interesting challenge with the increased use of Supplementary Information (and I absolutely endorse detailed SI and obviously encourage even more detailed SI). In a very non-random of articles that I know inside-out (Team journal publications), my conclusion is that, in these cases, the journal reviewers didn’t even look at the SI or even verify that the SI actually exists. Whether or not this sort of flaccid reviewing extends to other journals or even to other climate science articles not involving the Team, I can’t say. But I think that the observations are well supported in respect to flaccid reviewing of Team articles.

It’s one thing if the SI is merely data. But combined with the use of SI to provide data, in some cases, authors use SI to derive results that are applied in the main article. In such cases, the reviewers should surely be obliged to examine the SI as an integral part of the article.

I’ll discuss two examples – the MBH Corrigendum and the recent Wahl and Ammann 2007/Ammann and Wahl 2007.

We got some insight into Nature’s practices with SI because we had a submission to Nature under review concurrent with the MBH Corrigendum. One of the reviewers of our submission said in respect of a Mann reply point that this methodological point was not in the original article or original SI and that it should have been included in the Corrigendum SI. Both myself and Marcel Crok pressed Nature and eventually found out that not only was the Corrigendum SI not peer reviewed, but the Corrigendum itself was not peer reviewed (it was handled by editors); the Corrigendum SI was not even examined by the editors. Examination of SI did not, in general, appear to be included in the peer review process. I must say that it seemed odd to me that Nature did not peer review the Corrigendum. I would have thought that, if they felt that a Corrigendum was warranted, then there was at least as great an obligation to peer review the Corrigendum as the original article.

In addition to providing data, the Corrigendum SI also included methodological information, which, unfortunately, remained very unsatisfactory. Given the problems with MBH replication that had already then been demonstrated, it’s hard not to think that some sort of peer review wouldn’t have improved the MBH Corrigendum SI, whose methodological descriptions remained evasive and unreplicable.

A second example arises with Ammann and Wahl/Wahl and Ammann. Both articles make extensive references to Supplementary Information. Wahl and Ammann 2007 makes 8 references to its Supplementary Information, while Ammann and Wahl 2007 make 6 references. The SI references are not limited to the provision of supporting data. In some cases, they refer to figures supposedly illustrating results in the text; in other cases, they refer to statistics and tables in the SI.

Key discussion in Ammann and Wahl 2007 is exported to the SI. For example, they state:

Only the MBH 1600-network is significant at a slightly lower level (89%), and the much discussed 1400- and 1450-networks are significant at the 99% and 96% levels, respectively. (See electronic supplement for further discussion and details, including code and tables with established thresholds for a variety of calibration/verification RE ratios and for the other WA scenarios examined.)

or

The effect of using “princomp” without specifying that the calculation be performed on the correlation matrix (an alternate argument of “princomp”) forces the routine to extract eigenvectors and PCs on the variance-covariance matrix of the unstandardized proxy data, which by its nature will capture information in the first one or two eigenvectors/PCs that is primarily related to the absolute magnitude of the numerically largest-scaled variables in the data matrix (Ammann and Wahl, 2007, supplemental information).

The latter statement is a foolish assertion about tree ring networks that are already standardized to a mean of 1 (under standard tree ring chronology procedures) and which this have the same absolute magnitude. A pseudo-citation for this claim makes it seem more impressive but the pseudo-citation is not to something that itself has been peer reviewed, but to a non-peer reviewed SI.

But where is the SI? No URL is given in either publication, other than the following:

Additional information and illustrations beyond the present text are provided in an electronic supplement and on our WEB site (http://www.cgd.ucar.edu/ccr/ammann/millennium/MBH_reevaluation.html).

The website http://www.cgd.ucar.edu/ccr/ammann/millennium/MBH_reevaluation.html has not been updated in nearly 2 years and contains no mention of Ammann and Wahl 2007 and does currently not contain any of the promised information. This sentence specifically states that there is an “electronic supplement” additional to the UCAR website. The editor and then publisher of Climatic Change were approached about the location of the SI and neither of them knew anything about it. They said to contact the authors.

All of this strongly suggests to me that the peer reviewers did not have access to the SI – otherwise wouldn’t the editor and/or publisher have had access to it in order to supply it to the reviewers?

This says something about the quality of peer review for Team articles – none of the reviewers even bothered to ask for the SI that was cited in the articles or even bothered to determine whether the SI was in existence. I have no doubt that Ammann did up some sort of SI and that the SI will ultimately see the light of day – but shouldn’t the peer reviewers have considered the SI when they were reviewing the article?

BTW I contacted Ammann about this and another question . The other question was:

In Ammann and Wahl 2007, you state without a reference:

Standard practice in climatology uses the red-noise persistence of the target series (here hemispheric temperature) in the calibration period to establish a null-model threshold for reconstruction skill in the independent verification period, which is the methodology used by MBH in a Monte Carlo framework to establish a verification RE threshold of zero at the >99% significance level.

Can you please provide me with supporting references demonstrating the validity of this statement.

Ammann promptly gave a very surly response.

I must say that I find it disturbing that you don’t appear to acknowledge scientific arguments as such, rather you dismiss anything we say and reduce it to below “low level authority”. Under such circumstances, why would I even bother answering your questions, isn’t that just lost time?

I wrote a very measured response urging him to re-consider this refusal, to which Ammann did not initially reply. I then sent a follow-up, copying Nychka, and Ammann said this time that he would answer in a few days but he had other “pressing” business. That was a few days ago, so it will be interesting to see what his response is.

GISS Adjusts the Heartland

One of the surfacestations.org intrepid traveling volunteers, Eric Gamberg, has been traveling through Nebraska as of late, picking up stations as he goes.

He recently visited the USHCN station of record, COOP # 256040, in North Loup, NE, not to be confused with Loup City, which he also visited. Records describe this station as being in a rural area, which is true. As some might say, it is surrounded by a “whole lotta nothing”. See the map. According to the Nebraska Home Town Locator website: “North Loup had a population of 339 with 192 housing units; a land area land area of 0.41 sq.” Seems quite small.

At first glance, there doesn’t appear to be much wrong with this station.


Looking South – Click for a larger image

No obvious heat sources or asphalt/concrete nearby. But let’s take a look at another angle:
Continue reading

Hurricanes 2008

Discussion of 2008 Hurricane seasonal forecasts, old and new research papers, and interpretation of trends over the entire globe.

Ice Ages

Discussion of tectonic changes and climate change over long (million+) year periods. Note recent discussion by Hansen of climate change on 50 million year time frame.

Unthreaded #33

Continued

Svalgaard #5

Continued from here.

The MBH AD1450 Network

Most of my previous discussion of MBH pertained to the AD1400 network. In recent discussion over at Tamino, some of the posters have stated that BCPs only matter for the AD1400 network and that everything is fine for the AD1450 and later networks, relying here on statements in Wahl and Ammann 2007. (I don’t suppose any reviewers at Climatic Change actually checked the Wahl and Ammann calculations. ) Here is a sampling of the sort of statement about 1450 that’s becoming a mantra:

W&A tested for removal of bristlecones from the 1450 network and found that it passed significance testing. BTW, the 1450 network could actually be used back to 1428 if any real climate scientist (or anyone else for that matter) thought it was worth bothering with. …

Saying that “bristlecones improve the data quality of that stage (1400-1499)” is misleading because it omits the important fact that the bristlecones don’t enhance the data quality when PC summaries are used (as in MBH98) over 1450-1499….

I just want everyone to be clear that the bristlecones don’t become essential in MBH9x until before 1450, and if any scientist thought it was worth bothering with recalculation, they could actually be left out back to 1428….

There is a perfectly good hockeystick without bcps after 1450 (probably also after 1428 if anyone could be bothered checking). The only thing the bcps add is a longer shaft….

Chop off those 50 years, and you have a hockey stick from 1450-present that’s robust in the absence of the BCPs.

Now in one sense, it seems to me that salvaging the post-1450 network would be rather hollow. MBH99 splices onto MBH98 and it’s not just the period 1400-1450 that would be discarded, but the period from 1000-1450 or nearly half the results, including the MWP results. So it would be impossible to make any claims one way or the other about the “warmest year in the millennium”, which was where we started.

One preamble point of definition about “BCP removal”. TCO observed in the Tamino discussion that “BCP removal” for the purposes of the understanding that he is seeking is Mann’s “Censored” network plus Gaspe. TCO:

Pedant caveat: when I say bcp removal that is shorthand for “bcp cum Gaspe” removal (the trial data manipulation of Mann’s “Censored” directory.

Note: Whether combination of Gaspe with bcp is reasonable or too much as a robustness test is also a debatable position. I don’t have a firm position on that (although it’s interesting that Mike though to perform it). But my main concern with this clarification is to preempt any Groundhog Day like repetition of the snarky comments that it’s not just bcps.

This is the same definition that I use. MM2005b referred to the Censored directory as studying “a small group of 20 primarily bristlecone pine sites, all but one of which were collected by Donald Graybill and which exhibit an unexplained 20th century growth spurt”. While the Censored directory is “primarily” Graybill bristlecone sites, it also includes strip bark foxtail and limber pine sites. In the analysis below, “No Bristle” means that none of the 20 Censored sites (or cana036 – which is a duplicate use of Gaspé) are are used.

Another of Tamino’s posters purported to explain the supposed difference between the 1450 network and the 1400 network as follows:

The reconstruction for 1450-1980 contained many more proxies of different types and was therefore robust. As were the later ones.

Many commenters are at a disadvantage in this debate, because they haven’t actually gone to the trouble of learning what’s in the various networks. In fact, the AD1450 network only has 3 more proxies entering into the regression calculation than the AD1400 network: Jacoby’s Coppermine series (cana153), the Vaganov PC1 and a tree ring site in Mexico (mexi001) that is similar in character to series from the Stahle SWM network. None of these 3 series have a relevant HS shape (even though the Vaganov PC1 was calculated using Mannian pseudo-PCA.) The AD1450 network does not contain “many more proxies” of “different” types. It contains a very few additional tree ring series. In the NOAMER network, the number of sites increases from 70 to 86, but similarly none of the additional sites have relevant HS-ness.

So why does the AD1450 network supposedly yield “different” results than the AD1400 network. Continue reading

More on Li, Nychka and Ammann

by Hu McCulloch

A recent discussion of the 2007 Tellus paper by Bo Li, Douglas Nychka and Caspar Ammann, “The ‘hockey stick’ and the 1990s: a statistical perspective on reconstructing hemispheric temperatures,” at OSU by Emily Kang and Tao Shi has prompted me to revive the discussion of it with some new observations. A PDF of the paper is online at Li’s NCAR site.

The 8/29/07 CA thread MBH Proxies in Bo Li et al has already discussed certain data issues in depth, but the 11/18/07 general discussion Li et al 2007 never really got off the ground. (See also a few comments by Jean S and others in an unrelated CA thread.)

The lead and corresponding author, Bo Li, is an energetic young statistician (PhD 2006) at NCAR. Presumably her role was to bring some new statistical techniques to bear on the problem, while the two senior authors provided most of the climatology know-how. I’ll start with my positive comments, and then offer some criticisms.

My first upside comment is that the paper recalibrates the 14 MBH99 1000 AD proxies using the full instrumental period of 1850-1980, rather than just 1902-1980 as in MBH. Cross-validation with subperiods is a fine check on the results (see below), but in the end if you really believe the model, you should use all the data you can find to estimate it.

Secondly, I was happy to see that the authors simply calibrated the MBH proxies directly to NH temperature rather than to intermediate temperature PC’s as in MBH. Perhaps someone can correct me if I’m wrong, but I don’t see the point of the intermediate temperature PC’s if all you are ultimately interested in is NH (or global) temperature.

Third, although the cumbersome Monte Carlo simulation at first puzzled me, because confidence limits for the individual temperature reconstructions are calculable in closed form, it eventually became clear that this was done because the ultimate goal of the paper is to investigate the characteristically MBH question of whether current temperatures are higher than <i>any</i> previous temperature over the past 1000 years. A confidence interval for each year (or decade or whatever) does not answer this, because it merely places a probability on whether that particular year’s temperature was higher or lower than any given level. There is no closed form way to test the much more complex hypothesis that current temperatures are higher than <i> every </i/> past temperature. Monte Carlo simulations of past temperatures based on perturbations of the estimated coefficients potentially enable one to place probabilities on such statements. (I would call these “simulations” rather than “ensembles”, but perhaps that’s just a matter of taste.)

And fourth, the authors explicitly check for serial correlation in the residuals of their regression, find that it is present as AR(2), and incorporate it appropriately into the model. MBH seem to have overlooked this potentially important problem entirely.

Unfortunately, I see a number of problems with the paper as well.

My first criticism is that the authors start with an equation (their (1)) that is supposed to relate NH temperature to a linear combination of the MBH proxies, but then provide no indication whatsoever of the significance of the coefficients.

The whole difference between statistics and astrology is supposed to be that statisticians make statements of statistical significance to determine how likely or unlikely it is that an observed outcome could have happened chance, while astrologers are satisfied with merely anecdotal confirmation of their hypotheses. Perhaps climate scientists like MBH can’t be expected to know about statistical significance, but the omission is doubly glaring in the present paper, since Nychka is a statistician as well as Li.

The MBH proxies are perhaps sufficiently multicollinear that none of the individual t-statistics is significant. However, it is elementary to test the joint hypothesis that all the coefficients (apart from the intercept — see below) are zero with an F statistic. This particular test is known, in fact, as the Regression F Statistic. The serial correlation slightly complicates its calculation, but this is elementary once the serial correlation has been estimated. Or, since the whole system is being estimated by ML, an asymptotically valid Likelihood Ratio statistic can be used in its place, with what should be similar results. (My own preference would be to take the estimated AR coefficients as true, and compute the exact GLS F statistic instead.)

Since for all we know all the slope coefficients in (1) are 0, the paper’s reconstruction may just be telling us that given these proxies, a constant at the average of the calibration period (about -.15 dC relative to 1961-90 = 0) is just as good a guess as anything. Indeed, it doesn’t differ much from a flat line at -.15 dC plus noise, so perhaps this is what is going on. This is not to say that there might not be valid temperature proxies out there, but this paper does nothing to establish that the 14 MBH99 proxies are among them. (Nor did MBH, for that matter.)

My second criticism deals with the form of their (1) itself. As UC has already pointed out, this equation inappropriately puts the independent variable (temperature) on the left hand side, and the dependent variables (the tree ring and other proxies) on the right hand side. Perhaps tree ring widths and other proxies respond to temperature, but surely global (or even NH) temperature does not respond to tree ring widths or ice core isotope ratios.

Instead, the proxies should be individually regressed on temperature during the calibration period, and then if (and only if) the coefficients are jointly significantly different from zeros, reconstructed temperatures backed out of the proxy values using these coefficients. This is the “Classical Calibration Estimation” (CCE) discussed by UC in the 11/25/07 thread “UC on CCE”. The method is described in PJ Brown’s paper in the Proceedings of the Royal Statistical Society B;, 1982, (SM – now online here)and is also summarized on UC’s blog. See my post #78 on the “UC on CCE” thread for a proposed alternative to Brown’s method of computing confidence intervals for temperature in the multivariate case.

In this CCE approach, the joint hypothesis that all the slope coefficients are 0 is no longer the standard Regression F statistic. Nevertheless, it can still be tested with an F statistic constructed using the estimated covariance matrix of the residuals of the individual proxy calibration regressions. Brown shows, using a complicated argument involving Wishart distributions, that this test is exactly F in the case where all the variances and covariances are estimated without restriction.

To MBH’s credit, they at least employed a form of the CCE calibration approach, rather than LNA’s direct regression of temperature on proxies. There are, nevertheless, serious problems with their use (or non-use) of the covariance matrix of the residuals. See, eg, “Squared Weights in MBH98”> and “An Example of MBH ‘Robustness'”.>

One pleasant dividend that I would expect from using CCE instead of direct regression (UC’s “ICE” or Inverse Correlation Estimation) is that there is likely to be far less serial correlation in the errors.

My third criticism is that that even though 9 of the 14 proxies are treerings, and even though the authors acknowledge (p. 597) that “the increase of CO2 may accelerate the growth of trees”, they make no attempt to control for atmospheric CO2. Data from Law Dome is readily available from CDIAC from 1010 to 1975, and this can easily be spliced into the annual data from Mauna Loa from 1958 to the present. This could simply be added to the authors’ equation (1) as an additional “explanatory” variable for temperature, or better yet, added to each CCE regression of a treering proxy on temperature. Since, as MBH99 themseleves point out, the fertilization effect is likely to be nonlinear, log(CO2), or even a quadratic in log(CO2) might be appropriate.

Of course, any apparent significance to the treering proxies that is present will most likely immediately disappear once CO2 is included, but in that case so much for treerings as temperature proxies…

A fourth big problem is the data itself. The whole point of this paper is just to try out a new statistical technique on what supposedly is a universally acclaimed data set, so LNA understandably take it as given. Nevertheless, their conclusions are only as good as their data.

The authors do acknowledge that “many specific criticisms on MBH98” have been raised, but they do not provide a single reference to the papers that raised these objections. McIntyre and McKitrick 2003, MM05GRL and MM05EE come to mind. Instead, they cite no less than 6 papers that supposedly have examined these unspecified objections, and found that “only minor corrections were found to be necessary” to address the phantom-like concerns. LNA have no obligation to cite McIntyre and McKitrick, but is rather tacky of them not to if they cite the replies.

Many specific data problems have already been discussed on CA on the thread MBH Proxies in Bo Li et al and elsewhere on CA. See, in particular, the numerous threads on Bristlecone Pines. I might add that it is odd that MBH include no less than 4 proxies (out of 14) from the South American site Quelccaya in what is supposedly a study of NH temperature.

A fifth problem is with the author’s variance “inflation factor”. They commendably investigate k-fold cross-validation of the observations with k = 10 in order to see if the prediction errors for the withheld samples are larger than they should be. This makes a lot more sense to me than the “CE” and “RE” “validation statistics” favored by MBH and others. However, they appear to believe that the prediction errors should be of the same size as the regression errors. They find that they are in fact about 1.30 times as large, and hence inflate the variance of their Monte Carlo simulations by the same factor.

In fact, elementary regression analysis tells us that if an ols regression

y = X \beta + e

has variance \sigma^2 , the prediction errors for an out-of-sample value y* using a row vector of regressors x* should be (1 + x* (X^{T} X)^{-1} x*^{T})\sigma^2 , not sigma^2 itself.

Since the quadratic form in x* is necessarily positive, the factor is necessarily greater than unity, not unity itself. Exactly how much bigger it is depends on the nature and number of the regressors and the sample size. Using 10,000 Monte Carlo replications, with estimation sample size 117 and 14 regressors plus a constant, I find that if the X’s are iid N(0,1), the factor is on average 1.15, substantially greater than unity. If the X’s are AR(1) with coefficient \rho = .8, and x* is the next element of this process, the factor rises to 1.19. With \rho = .9, it is 2.18 1.22 (corr. 11/17/12), and if the X’s are a random walk ( \rho = 1), it becomes 1.28, essentially the 1.30 found by LNA. So in fact their “inflation factor” may simply be what is expected when the model is perfectly valid. (These simulations take only about 10 seconds each on a PC in GAUSS.)

If the variance of the cross-validation prediction errors prediction errors were significantly greater than they should be according to the above formula, that in itself would invalidate the model, instead of calling for an ad hoc inflation of the variance by the discrepancy. The problem could be a data error (eg 275 punched in as 725), non-Gaussian residuals, non-linearity, or a host of other possibilities. In fact, the authors make no attempt to perform such a test. I don’t know whether it has an exact F distribution, but with two statisticians on board it shouldn’t have been hard to figure it out.

Although LNA make no use of the above prediction error variance formula in their reconstruction period, it is in fact implicit in their Monte Carlo simulations, since in each simulation they generate random data using the estimated parameters, and then re-estimate the parameters using the synthetic data and use these perturbed to construct one simulated reconstruction. They therefore did not need to incorporate their inflation factor into the simulations. They should instead have just used it to test the validity of their model.

One last small, but important point, is that their (1) does not include a constant term, but definitely needs one. Even if they de-meaned all their variables before running it so that the estimated constant will be identically 0, this still is a very uncertain “0”, with the same standard error as if the the dependent variable had not been demeaned, and will be an important part of the simulation error.

Anyway, a very interesting and stimulating article. I wish Dr. Li great success in her career!

Rewriting History, Time and Time Again

Update:

As noted in the comments below, GISS updated the GLB.Ts+dSST anomalies which show a large 0.67 degC value for March. This addition of March 2008 temperature data to the record caused a corresponding drop in annual average temperature for the years 1946 and 1903. According to GISS, 1946 is now colder than 1960 and 1972, and 1903 dropped into a tie with 1885, 1910 and 1912.

That’s really neat.

End update.

In February I wrote a post asking How much Estimation is too much Estimation? I pointed out that a large number of station records contained estimates for the annual average. Furthermore, the number of stations used to calculate the annual average had been dropping precipitously for the past 20 years. One was left to wonder just how accurate the reported global average really was and how meaningful rankings of the warmest years had become.

One question that popped into my mind back then was whether or not – with all of the estimation going on – the historical record was static. One could reasonably expect that the record is static. After all, once an estimate for a given year is calculated there is no reason to change it, correct? That would be true if your estimate did not rely on new data added to the record, in particular temperatures collected at a future date. But in the case of GISStemp, this is exactly what is done.

Last September I noted that an estimate of a seasonal or quarterly temperature when one month is missing from the record depends heavily on averages for all three months in that quarter. This can be expressed by the following equation, where {m}_{a}, {m}_{b}, {m}_{c} are the months in the quarter (in no particular order) and one of the three months {m}_{a} is missing:

{T}_{q,n} = \frac{1}{3}{\overline{T}}_{{m}_{a},N} + \frac{1}{2}\left({T}_{{m}_{b},n} + {T}_{{m}_{c},n}\right) - \frac{1}{6}\left({\overline{T}}_{{m}_{b},N} + \overline{T}}_{{m}_{c},N}\right)

In the above, T is temperature, q is the given quarter, n is the given year, and N is all years of the record.

One can readily see that as new temperatures are added to the record, the average monthly temperatures will change. Because those average monthly temperatures change, the estimated quarterly temperatures will change, as will the estimated annual averages.

Interestingly, application of the “bias method” used to combine a station’s scribal records can have a ripple effect all the way back to the beginning of a station’s history. This is because the first annual average in every scribal record is estimated, and the bias method relies on the overlap between all years of record, estimated or not. Recall that annual averages are calculated from December of the prior year through November of the current year. However, all scribal records begin in January (well, I have not found one that does not begin in January), so that first winter average is estimated due to the missing December value. Thus, with the bias method, at least one of the two records contains estimated annual values.

Of course, it is fair to ask whether or not this ultimately has any effect on the global annual averages reported by GISS. One does not have to look very hard to find out that the answer is “yes”. Continue reading