Can anyone on the Team actually hit a target?

A couple of days ago, I reported that Santer’s own method yielded failed t-tests on UAH when data up to 2008 (or even 2007) was used. I also reported that their SI (carried out in 2008) included a sensitivity test on their H1 hypothesis up to 2006, but they neglected to either carry out or report the corresponding test on the H2 hypothesis (the results reported in Table III).

The reason why the H2 test failed using up to date data is not that the trend changed materially from the 1979-99 trend, but that the AR1-style uncertainty decreased a lot. While I had calculated the changes in SE (for the observed trend) to make these calculations, I didn’t discuss these changes, but, for reference, the SE (observed trend) for 1979-99 (Santer Table 1, which I’ve replicated given their methods) was 0.138 deg C/decade. Apples and apples, using information up to 2007 (available to Santer and his Team as of the submission of their article), the SE(observed trend) had decreased to 0.067 deg C/decade.

The “correct” CI, according to the **actual **methods of Santer et al, is to take the Pythagorean sum of the SE (observed trend) (0.138 in their analysis) and the much decried SE of the inter-model mean (0.092/sqrt(19-1) = 0.0217), thus 0.14 deg C/decade.

Santer et al. state that Douglass et al had ignored uncertainty in the observations:

DCPS07 ignore the pronounced influence of interannual variability on the observed trend (see Figure 2A). They make the implicit (and incorrect) assumption that the externally forced component in the observations is perfectly known (i.e. the observed record consists only of φ_o(t), and η_o(t) = 0

Most readers have taken this for granted. But let’s see what Douglass et al themselves say that they did:

Agreement means that an observed value’s

stated uncertaintyoverlaps the 2σSE uncertainty of the models.

In examples in the running text of Douglass et al, one can see applications of this procedure. For example:

Given the

trend uncertainty of 0.04 °C/decadequoted by Free et al. (2005) and the estimate of ±0.07 for HadAT2, the uncertainties do not overlap according to the 2σSE test and are thus in disagreement.

So it’s a bit premature to take at face value Santer’s allegation that Douglass et al “ignored” trend uncertainty in the observations. Douglass et al **say **that they considered trend uncertainty in the observations (I haven’t parsed this in their article yet.) For now, let’s compare the uncertainties that Douglass said that they used, as compared to uncertainties calculated according to the method Santer said should be used. Here’s what Douglass et al said about trend uncertainty for the UAH T2LT that we’ve been looking at:

For T2LT, Christy et al. (2007) give a tropical precision of

±0.07 °C/decade, based on internal data-processing choices and external comparison with six additional datasets, which all agreed with UAH to within ±0.04. Mears and Wentz (2005) estimate the tropical RSS T2LT error range as±0.09.

Do you recognize any numbers here? Look at the trend uncertainty for UAH T2LT up to the end of 2007 – calculated according to Santer’s own methods – **0.067 deg C/decade**. Compare to the Douglass number – **±0.07 °C/decade**.

So if you use up-to-date information in calculating the trend uncertainty according to the Santer method, you get the same value (actually a titch less) than that Douglass et al say that they used (0.07). I suppose that Santer might then try to *argue * the Douglass et al estimate of 0.07 deg C is covering *different* uncertainties than their 0.07 deg C, but what’s the evidence? And since this is a critical point, shouldn’t Santer have proved the point with some sort of exegesis?

[Update - Oct 26] John Christy writes in with the following exegesis of the topic:

In our paper, we addressed “measurement” uncertainty – i.e. how precise the trend really is given the measurement problems with the systems (I introduced the separation of these two types of uncertainty in the IPCC TAR text in the upper air temperature section.) We wanted to know how accurate the observational trend estimates were.

The subtle point here is that “temporal” or “statistical” uncertainty is not needed because of the precondition that the surface trends of the observations and the model simulations be the same. Sure, you can create a surface trend of +0.13 C/decade from scrambling the El Ninos and volcanoes any way you like, but when the surface trend comes out to +0.13 it will be accompanied by (in the models) a specific tropospheric trend – no need for the “temporal” uncertainty. When you scramble the interannual variations and get a trend different from +0.13, then we don’t want to use it in our specific experimental design.

This notion of the precondition seems lost in the discussion, but when understood, the arguments of Santer17 become irrelevant.

On this information, I’m not quite sure what the ±0.07 °C/decade in Douglass et al would represent in Santer terms. At this point, it *appears *that the Santer accusation that Douglass et al had “ignored” this form of uncertainty is incorrect, but the way that they handled it is, as Christy says, sufficiently “subtle” that, for me to express an opinion on whether the Douglass method is adequate in the context of Santer’s data, I’d need to emulate exactly what was done on Santer’s data and see how it works.

At present, I’ve requested the 49 Santer runs as used, but have received no response from Santer other than that he is at a workshop. The Santer coauthor with whom I’ve cordially corresponded did not personally ever receive a copy of the data and indicates that Santer will probably take the position that I be obliged to run the irrelevant gauntlet of trying to reconstruct his dataset from first principles, even to do simple statistical tests – the sort of petty silliness that is incomprehensible to the public. [End update]

PS. There’s one other issue raised in Santer (citing Lanzante 2005). Lanzante 2005 observed that the denominator when combining different uncertainties is a Pythagorean sum (sqrt(ssq)) and that a t-test using this gives different results than a visual comparison of overlap of uncertainties. If one uncertainty is a lot larger than the other, the Pythagorean sum ends up being dominated by the larger uncertainty.

For example, in the Santer Table III example sqrt(.138^2+.021^2)= .1397, only 1% larger than the greater uncertainty by itself. A visual comparison in which the overlap of separate 2-sigma intervals is inspected implies a little more onerous t-test than using t=2 against the Pythagorean sum. Lanzante 2005 criticized these sorts of visual comparisons in IPCC TAR and many climate articles, but the practice continues.

While the point is valid enough against Douglass, it is equally so against dozens of others studies; however, in the matter at hand, the issue does not appear to be material in any event.

## 68 Comments

Sloppy. When you get down to it, the best one word description for most climate “science” studies is sloppy.

I’m no statistician. But doesn’t the series of posts on Santer et al, and those on Mann et al beforehand, point to a considerable problem with the peer review process?

The peer review process is what it is. As due diligence, it’s very cursory; it’s not an “audit”. My main issue is that all too often in climate science, the pea gets moved under the thimble in the sense that cursory journal peer review process is treated as though the equivalent of an audit.

Attempting to do adequate pre-publication due diligence on articles would be impossible under the present system.

That’s why I stress that journals should require authors to archive data and code so that due diligence is possible without having to run the gauntlet of author obstruction and stonewalling. There’s nothing impossible in such a system. It already exists in econometrics.

If the global warming community is going to open up their confidence levels for testing models versus observations, then they must also open up their confidence intervals on the sensitivity of GHGs and CO2 that models produce.

They must change the expected warming impact from doubled GHGs to 0.0C to 6.0C per doubling.

You can’t have it both ways obviously. You can’t say we expect 3.0C +/- 1.0C per doubling but then simultaneously argue that anything within 0.0C to 6.0C in observations is consistent.

And the shorter length of time of the observation dataset (versus the time expected for doubled GHGs to have its full impact) should not be relevant here since the models are projecting 100 years into the future which should, in fact, increased the uncertainty level rather than confine it to a lower +/- 1.0C.

The logic here just seems entirely backwards.

Wow, very nice.

Ah! Another climate science newbie.

When was the last time you heard that a weather or climatic event was

inconsistentwith the models? Me neither.Re: John A (#6),

The widespread 2007 summer floods in England were described as “not linked to climate change“, but as simple “freak weather”, since the models had predicted drier summers. That may have been because the models had earlier been run after a couple of drier summers.

However, the UK Government review Learning lessons from the 2007 floods managed to twist this by saying

and then talk about “climate change” 140 more times in the rest of the report.

#2 Dave, I agree with Steve. When I have been a peer reviewer (in a different field of science), I have nearly always found “show stoppers” that partly or completely negated the main conclusions. But, usually I dug deep — once producing a counter-report in more mathematical detail than the original I was reviewing. Most peer reviewers I have observed directly seem to spend very little time — sometimes just an afternoon — particularly if they are in general agreement with the conclusions. And, to be fair, a couple times I took a “pass” too.

In my post above, I stated:

John Christy wrote me asking me to provide some of the suggested “exegesis”. He writes:

I will add a note to this effect in the thread. However, at this point, I am unable to test this procedure against the Santer ensemble because Santer did not archive his data as used and thus far, I have received no response to my request for the data other than that he was at a workshop.

Just to add to the previous points about peer review, that has been my experience as well, both as a submitter and as a reviewer.

What science ultimately counts on is the “self-correcting” nature of publishing, at least in important areas worth paying attention to. The problem, as has become quite evident, is that in fields with heavy policy implications, the self-correction is often not given time to run its course.

The field of dendroclimatology seems particularly peculiar, since practitioners in the field not only seem to resent close review, put leave it to “amateurs”, “retired mining executives”, and (worse of all) “Canadians” (gasp!) to do the job.

Do journals accept papers that dispute other papers? If so, who cares about per review?

As a layperson attempting to comprehend all the nuanced meanings of the Santer et al. (2008) paper (and as a consequence reviewing what the Douglass et al. (2007) paper was saying), I must admit that I have thoroughly enjoyed the analysis that Steve M with the help of others, and in particularly Lucia, has provided in these threads. It made me go back and carefully reread the Santer paper and SI. What a splendid relief this has been as something to make one think as contrasted with all the banal crap to which I have been exposed with the prattle of politicians in the current US Presidential election cycle and the talking heads’ over-simplified explanations and emotional reactions to the current financial crisis.

Although I suspect that the analyses have been sufficiently comprehensive that it is all there to see in these Santer threads, for the purposes of order, this grumpy old man likes to see it spelled out in one place. To that end I will outline below my layperson’s summary of the analysis to date.

I think we can sum this up by attempting to answer the question: What was Douglass attempting to show as opposed to the Santer approach?

Douglass was looking at the

differences/ratiosbetween the tropical temperatures trends at the surface and in the troposphere for model and observed results. Santer spends most of its paper (see the title for further evidence) discussing why the Douglass approach will not work when comparing the modeled and observed troposphere temperature trends. Santer finally discusses the issues of the surface to troposphere differences which radically changes the results for the UAH to model result comparison for H1. I searched the Santer SI for more discussion on how the difference statistical tests were calculated without success.Douglass points to the use of surface to troposphere ratios and were able to use, as I recall, in their comparison the troposphere trends because fortuitously the surface trends in modeled and observed were essentially the same. Douglass also points to not using modeled results that do not show a reasonably agreed upon surface trend, i.e. zero or negative.

The H2 hypothesis test is the only one relating in Santer to what Douglass did, i.e. using the model mean for comparison with the observed. That test in order to compare with Douglass would have to have been carried out with troposphere to surface trend differences/ratios and I do not recall that test being done in Santer.

Santer, like Douglass, uses the model mean with SEM, but instead of using that alone to compare to an established target observed trend (ratio) with its established or reasonable precision limits minus the model mean, Santer uses the standard deviation of the regressed surface trend. Now this is where the Beaker and Gavin Schmidt concept comes in, but in Santer not using a standard deviation of the model results as Beaker/Schmidt prescribed but using the observed trend standard deviation. I suspect that Beaker and Gavin saw one observed result to compare the several model results with and did not immediately see how one could derive a standard deviation from the observed and thus decided that the standard deviation should come from the model results. The overall effect, as I see, it is practically the same if one assumes the model and observed standard deviations are the same.

In the light of the Beaker concept of the uncertain content of a single realization of the actual earthly climate, the choice of Santer to use the observed trend standard deviation seems like a convenient but not rigorous method of getting an uncertainty into their comparison.

Douglass appears to be making a reasonable attempt at estimating the observed mean that is within the context of a warming tropical surface. On the other hand, as I see the Santer standard deviations used for the troposphere temperature trend one could not conclude that we have any statistically significant surface warming and in fact the trend could be cooling (the conjecture being that the troposphere is warmer than the surface and with the possibility of zero to negative trends in the Santer used observed limits in the troposphere that would indicate the statistically significant possibility of even cooler surface trends). This where I judge Santer, if I read it correctly, is arguing in circles.

If, as Table 1 in Santer shows, the 1 SDs used in their comparison of observed to model trend results are larger than the trends themselves, and in all cases, for the models and observed results, than one should stop right there and say that the first part of our conjecture that with tropical warming the tropical troposphere to surface temperature ratio should be greater than 1. That is to say that without showing the first part (tropical warming) the second part has no meaning.

Since Santer did not do the Douglass test of differences/ratios of the surface to troposphere temperature trends for comparing model mean and observed results, there really is nothing direct to compare in the way of conclusions. Indirectly when Santer uses differences for comparing the observed to individual model results his results essentially agree with Douglass, i.e. UAH shows a significant differences while RSS does not.

That leaves the comparison of the observed results of the radio sonde adjustments for comparison with the modeled results. Douglass included some of the radio sondes in their comparison while Santer does not make any comparisons – that I detected. Santer instead makes most of their claims for the observed and modeled results being “more compatible” based on newer radio sonde adjustments reported in Santer, but not in Douglass. The Santer claim is based on some of the newer sonde adjustments peaking into the higher temperature trend zones in the middle troposphere altitudes. However, when one looks at the overall pattern of these sonde adjustments one sees that they are lower at the lower and upper altitudes and do not fit the pattern of the models or the other radio sonde adjustments over the entire altitude range.

There is a figure in Santer where all the trend data is compared for the two regions of interest in troposphere. If one looks at the lower troposphere region, one finds good agreement between the sonde adjustments with the UAH adjustments for the MSU data and in sharply contrasting disagreement with the RSS adjustment. Of course, the favored data in Santer is rather obviously the RSS adjustment.

As an aside, the Beaker argument about the use of SEM in comparing means that appeared to me to be saying that one should never use that method because of what happens to the test statistic when using infinitely large sample sizes. I believe that argument was countered by Lucia’s contention that the larger sample size affects the statistics numerator and denominator such that they do not go to 0 nor does the ratio.

Finally the cut-off of the time period tested in Santer becomes a concern as Steve M has shown and the SI references to it must be carefully dissected. What they state is: “Even with longer records, however, no more than 23% of the tests performed lead to rejection of the hypothesis H1 at the nominal 5% significance level”. We, of course, must leave the heavy lifting to Steve M to determine what exactly that sensitivity tests really means with regards to hypothesis H2 and, for that matter, what the H1 rejection rate was for UAH.

I forgot to include a couple of thoughts I had about the autocorrelation problem discussed in Santer and by Steve M. and testing using differences between surface and observed temperature trends.

First, if one uses annual data in these comparisons, as I recall, the autocorrelation is reduced dramatically from the case when using monthly data. Obviously the use of annual data will reduce the degrees of freedom, but at the same, will have a significantly reduced adjustment for the degrees of freedom, if any adjustment is required, for autocorrelation. I think it is UC who always recommends finding a way to regress without AR than to correct for it.

Secondly, I was wondering if anyone here has any ideas how Santer et al. did the comparison when using differencing for H1 and how that would be applied to testing the H2 case.

I really should not drink coffee and read these. I almost lost a keyboard when I read:

Being a “co-author” when you’ve

never seen the data? Danger, Will Robinson, danger.I might consider doing that in some fields, but in climate science, this is either a sign of a devoted undying friendship no matter what the dangers (plus a hidden masochistic streak) … or a first sign of early onset Alzheimers.

Danger indeed,

w.

Re: Willis Eschenbach (#18),

Willis E., when a paper like the Santer et al. comes out with claims of refuting a paper challenging a cherished modelling feature, I would think the appearance of 17 authors was more for a “consensus” showing than indicating that the 17 authors had much involvement with putting the paper together. Was not it Gavin Schmidt who told Lucia she have to check back with her on the equation 12?

Re: Kenneth Fritsch (#19),

17 authors is truly farcical and simply illustrates the pitiful state of climate “science”.

Reminds me of Einstein’s comment on the publication of “100 Scientists Against Einstein”: “If I was wrong, one scientist would have been enough.”

I’ll agree with the above descriptions of peer review.

The crucial piece is the

nextstep after publication. When a new paper was in my specific area, we’d at least replicate the math itself. Just from an ‘understand what the competitors are doing’ standpoint this is quite useful. If your competitors are using technique X… Why? Does it make sense? Should we adopt it? Is it required by the mechanistic details of the data gathering?But we treated papers from other labs as competition. It keeps looking like this particular field of research is too wide for the number of participants. Nobody is studying essentially the same thing – everyone is picking different types of trees, or new areas.

Not doing what Abadneh did. Go back to an examined area, and do a more comprehensive, more detailed, more elaborate study

of the same trees.If Santer did not share his data with a lead co-author, then one may infer that he also failed to share his data with his other co-authors. Since his paper was primarily concerned with statistics, then no-one could substantively contribute to his paper without having the original data in hand. Furthermore, this data would need to be in electronic format, unless the primary author intended to turn the co-authors into data entry clerks.

I appreciate my co-authors fact checking my work, not just my grammar.

Hmmm. I expect that a claim of a temperature delta of 8/1000ths of a degree C./decade will forever be proof against any contradiction by mere observation.

On peer review: in my field, ecology, it is not too hard to obtain results that are squishy (that’s in metric units) because the study of ecosystems is difficult and factors are often confounded. That said, I have rarely encountered errors of the sort Steve documents when I review papers (about 20 per year). Instead often results are somewhat ambiguous and the authors admit as much. On the plus side, people are NOT afraid to contest a result they think is wrong, and there are vigorous debates going on at the moment about many topics, as I have mentioned before. Circling the wagons? Not so much.

Question for Steve:

Has your “cordial correspondent” who coauthored Santer et al ever posted at CA?

a) No

b) Yes

c) Not sure (meaning you’re not sure, not the regular who posts as “Not Sure”)

d) I’d rather not answer that question

Steve: d or c.You’d think co-authors Michael Isle and Joe Hatter would be very familiar with with the content of

Urbinto Hatter Isle2008.Frank Wilczek, the Nobel laureate in physics who helped work out some of the theory of the strong force, wrote that as a doctoral student he was intimidated by the articles he read in the physics journals, to the point of having trouble getting started on any problems of his own. He was enlightened by (future co-laureate) David Gross who pointed out that 90+% of what was published in physics was junk–attractively packaged, made to seem important and correct, but junk nonetheless, despite the rigors of peer review. (Sf fans will naturally recall Strurgeon’s Law here.) This revelation enabled him to get rolling on his own research. So the comments above about the limits of peer review seem well-founded.

I was sufficiently curious about what the adjusted trend standard deviation (ATSD) from Santer et al. (2008) would do when calculating the unadjusted trend standard deviation (TSD) used annual in place of the monthly data that was used in Santer et al. I did the comparison for the time periods 1979-1999 as was used in Santer et al. and 1979-2007. For the adjustment, I used the AR1 (lag 1) correlation (ARC) for the residuals as Santer et al. did and adjusted the trend standard deviation by multiplying it by the factor ((1-ARC)/(1+ARC))^1/2 (ARFact) again as was applied in Santer et al.

The results below would indicate (if my calculations are correct) that using annual data in the place of the monthly data that was used in Santer decreases the all important ATSD as does going to the time period 1979-2007 from the 1979-1999 period that was used by Santer (as was previously shown by Steve M). It would appear that the sampling interval and time period selected by Santer were fairly optimum for providing a larger ATSD.

All of the trends calculated in the above exercise were in the range of approximately 0.15 to 0.17 degrees C per decade. In the results below, the TSD and ATSD are also reported as degrees C per decade. The data used for the above calcualtions was from the RSS T2LT troposphere temperature series which was the data used in the Santer et al. example in their paper.

1979-1999 monthly: TSD = 0.031; ARC = 0.884; ARFact = 4.02; ATSD = 0.124

1979-1999 annual: TSD = 0.085; ARC = 0.111; ARFact = 1.12; ATSD = 0.095

1979-2007 monthly: = 0.0173; ARC = 0.874; ARFact = 3.85; ATSD = 0.067

1979-2007 annual: TSD = 0.047; ARC = 0.018; ARFact = 1.02; ATSD = 0.047

Next I want to look at the ATSDs that are calculated using the differences between the RSS T2LT series and a surface temperature series – and when using both monthly and annual data.

Would it be useful now to turn these threads into a published paper? Not as a critique of Santer et al, primarily. First and foremost address the question: how well do the models match measured trends, using the latest data? Santer et al seem to have left this avenue open by ignoring data after 1999.

Re: braddles (#31),

This blog represents something better than a publication could ever provide, because it is live and dynamic. As soon as you publish it becomes stale, and it becomes fodder for political interests.

I think that if a “co-author” would answer questions on the topic of receiving data or not, that they might be amenable to a few more. I would simply ask if they cared if they got it, and specifically why not if they didn’t. The answer might be completely forgivable.

As an addendum to my previous post, I have used the exact formula for calculating the ARFact and ATSD. It uses the total number of dof and the adjusted number of dof with 2 subtracted from both and then taking the square root of that ratio. Notice that that operation puts my results for monthly data over the period 1979-1999 with RSS T2LT temperature series in complete agreement with the example given in Santer et al.

The results of the exact calculation below show very minor differences with the approximate one. My observation that Santer et al. did well in selecting the data that made the all important trend standard deviation large stands. That large standard deviation makes showing a significant difference between the model and observed results more difficult while at the same time putting a larger amount of uncertainty whether there has been any warming in the tropics at all.

Besides doing these same calculations using the differences between troposphere and surface temperatures, I want to use the Santer approach in looking at the trend standard deviation for the global surface temperature trend over the past 29 years.

1979-1999 monthly: TSD = 0.031; ARC = 0.884; ARFact = 4.29; ATSD = 0.132

1979-1999 annual: TSD = 0.085; ARC = 0.111; ARFact = 1.13; ATSD = 0.096

1979-2007 monthly: = 0.0173; ARC = 0.874; ARFact = 4.01; ATSD = 0.069

1979-2007 annual: TSD = 0.047; ARC = 0.018; ARFact = 1.02; ATSD = 0.047

Re: Kenneth Fritsch (#32),

I have now determined the trends for 1979-2007 using monthly data for various time series and regions of the globe. The results are reported below with the Region/Series, Trend, Trend Standard Deviation (TSD) and Adjusted Trend Standard Deviation (ATSD) all given in degrees C per decade. I used the Santer et al. adjustment for the residual versus lag1 autocorrelation. All time series used in the calculations below included land and ocean.

The tropics adjustments for the temperature trends (not shown here) that result from the large AR1 autocorrelations of the lag1 residuals are evidently unique to that global region of the globe.

I also did these calculations for the 1979-1999 time period (not shown) that was used in Santer et al. The ATSDs are larger compared to trends than was the case for the 1979-2007 time period, but nowhere near the very large ATSDs for the 1979-1999 period that Santer et al. found for the tropics. As a result it appears that when using monthly data, as was employed in Santer et al., that the temperature trends outside the tropics can be deemed statistically different than zero (using the ATSDs) while those in tropics cannot. And if one cannot say there is a significant warming in the tropics (using monthly data) I wonder how one can say anything that would have to be stated as: given AGW in the tropics, the troposphere should warm faster than the surface when the monthly data cannot show the given part.

I plan to look next at troposphere to surface temperature trend differences for T2LT and T2 using annual data for the time period 1979-2007. Santer et al. concentrates on the T2LT troposphere and says little about the T2 troposphere comparisons.

Global/RSS T2LT: Trend = 0.182; TSD = 0.011; ATSD = 0.030

Global/UAH T2LT: Trend = 0.142; TSD = 0.011; ATSD = 0.032

Global/GISS: Trend = 0.171; TSD = 0.009; ATSD = 0.018

Global/HadCru: Trend = 0.169; TSD = 0.008; ATSD = 0.020

SH/ RSS T2LT: Trend = 0.112; TSD = 0.011; ATSD = 0.024

SH/UAH T2LT: Trend = 0.074; TSD = 0.012; ATSD = 0.027

SHXTrop RSS T2LT: Trend = 0.069; TSD = 0.011; ATSD = 0.017

Kenneth, that’s interesting for the annual results. I just noticed the following:

Their SENS3 case ends in

2006and results become more adverse for them with 2007 in. I guess that not reporting 2007 was simply a clerical oversight.Just to throw some more spanners in the works I found this

Mears and Wentz TMT TTS and TLS

I’m not sure how this relates to the references in the Santer or Douglass paper but its interesting that they account for autocorrelation in this data set. So these trends shouldn’t have their sd inflated. Is this the same for previous data sets?

Another thing that struck me was the ENSO structure seems to be independent of altitude for the troposphere. Not sure how this matches to the SST but it may imply that the a change in linear trend with altitude may be resonably deconvoluted from any weather signal (or climate ‘noise’)

I’m trying to follow all this stuff, and indeed find it most fascinating and instructive, but I wonder whether the heavy concentration on comparing two data sets, each described by a standard simple linear regression is really a sound approach.

I have just downloaded http://vortex.nsstc.uah.edu/data/msu/t2lt/tltglhmam_5.2 which I presume is closely related to the data under discussion. If I am mistaken in this belief please put me straight and point me in the right direction, ignoring the remainder of this contribution.

Looking at these data (for NH, SH and Global) it seems to me that fitting a standard linear model over the period from Dec 1978 to the latest available date may be less than wise. If this is the case, detailed discussion on the relative niceties of techniques for adjusting standard errors of the parameter estimates by manipulating their degrees of freedom in various ways may be a bit premature.

My strong impression is that the data above fall fairly naturally into two or perhaps three regimes. Taking the NH data, the first quite stable regime (no discernible steady change) lasts from 1978 to about 1987, when an abrupt change to a higher value associated with a less stable situation takes place, but again with no substantial trend. This regime ends abruptly in 1997 with a change to yet higher values (especially noticeable for 1998 of course) but with no hint of a further increase up to (virtually) the present. There might even have been a slight downward trend, rather dependent on the data for the last year or so.

Thus my worry about using a simple linear model on which to lavish the industry and skills of contributors to this blog. I would have thought that a dummy variable model which uses index variables to denote the regimes I’ve described above could well be more fruitful.

Of course, these remarks are contingent on me having examined an appropriate data set!

The SH data may be best described by a two regime model with the change point at about 1998. Naturally, the Global data fall between its two constituents.

Hope you find this sensible, and even interesting!

Robin

Re: Robinedwards (#35),

Robin I tend to agree with your “seeing” regime changes in the trends and particularly so when you look at zonal temperature data, but from my perspective that is as a layperson, not a statistician, viewing the trend data. I have presented some analyses here on the subject previously and even used breakpoint statistical methods that I found in the literature.

I think the critical part of regime changes is to show independent evidence for the occurences of the breakpoints and its therein that lies the problem.

Re: Robinedwards (#35),

Of course

This issue is completely and convincingly dealt with in D.Koutsoyannis paper “Nonstationarity versus scaling in hydrology” (Journal of Hydrology 234 (2006) .

I have not the link handy but it is easy to find (visit DK thread here f.ex) .

In short DK shows that the nonstationarity hypothesis that postulates that the data can be interpreted as a sum of a linear trend and some kind of “noise”) is misleading , selfcontradictory end generally wrong in presence of scaling phenomena to which most of the climatic phenomena belong .

Re: Robinedwards (#35),

I agree with your observation regarding the “step function” nature of the record. It truly seems that a linear “fit” does not capture the “essence” of the data — and that intrigues me. Thing is, before I would try a different fit and try to “make something of it”, I would like to understand a physical basis (rather than merely fitting with something that “looks right”).

Re: Robinedwards (#35), There is reason to believe you might be right, especially with regards to the Pacific Decadal Oscillation regime’s influence. See Roy Spencer theads here. However, one reason to look at linear trends is 1) because CO2 is rising more or less continuously and 2) because this is sort of the simplest trend analysis one can do. When you start getting more complex, with breakpoints etc. not all data sets show the same behavior (e.g., Hadley vs MSU) and people can take issue with your choice of model.

Re Tom Vonk (35). Many thanks for that reference to D Koutsoyiannis’ paper. I’ve had a good look at it, and find it really very impressive indeed. I wonder whether his ideas should be much more widely applied in climate science. They might help avoid too many false trails. Glad to see that some of you don’t immediately dismiss the notion of a step function. It would indeed be really great to be able to point towards independent evidence for such a model. All I can add here is that very many other very clear steps exist in the climate data archives, none of them having been pointed out by the owners of the data to my knowledge.

A case in point is the wealth of data now available for the North West Atlantic region (Greenland coasts, mainly). Here, as I might well have written before, there is very compelling evidence for a major shift (upward) in 1922, or to be more precise Sept 1922 to perhaps Jan 1923. In some cases my impression is of a step taking just one month to make itself felt (though not to the residents of the region, who would not have spotted the change because of the standard and accepted seasonal (month to month) variation). /All/ the sites from this area that I have looked at show this step. Its size? Very close to 2 deg C! I’d have thought that the experts would have spotted this already, but all I can find is talk about a substantial warming in Greenland in the 1920s. The appropriate model can hardly be a standard linear one, however much one might like that to be so. Interestingly, on either side of the break the climates were remarkably stable – virtually no trend, until around the 1960s, but things get a bit too complex to describe without graphs. Don’t know how to post them here :-((

I accept Craig’s comment about one reason for using a straight line model – the CO2 data, which lie closely on such a model apart from their large seasonal variation, (which seems to be well understood). Effectively, Time and CO2 are the same variable as far as climate data analysis is concerned, especially over time spans of a few years (say 20). Perhaps you saw my comment on your data, Craig. It would really be great to have available the data in their totally unsmoothed form. Does it exist? The assembly of data (20 columns?) would of course if used together generate typical “average smoothing”, so individual site’s data could be much more informative. Yes, linear models are the simplest to interpret, but we seem to be having some trouble with the present one. The big question is “For how long is it reasonable to suppose (guess) that a linear model really does underlie all the numbers?”

Kenneth Fritsch’s table of analytical results is great! Just what I need to check that I’m using the right data set. I’ll have a go with linear models for the whole period later on this evening!

Robin

I have finished my calculations using 1979-2007 annual data to determine the difference trends for the following data series: GISS – UAH T2LT; GISS – RSS T2LT; GISS – UAH T2 and GISS – RSS T2. I used the Santer et al. adjustment for AR1 correlation of the regression residuals. It was small in all cases.

Below I report the difference trend, the adjusted trend standard deviation (ATSD) and the p value for the regression trend (p). The GISS series was for the surface temperature anomaly for land and ocean from 28S to 28N. The RSS series was from 20S to 20N and the UAH series was labeled tropical zone which I believe is from 30S to 30N. The trend and ATSD results are given for degrees C per decade.

I normalized the annual temperature anomalies so that all series were based on the time period 1979-2007. I took the differences in anomalies and regressed them for the years

1979-2007.

What I find is a positive trend that is statistically significantly different than zero for the UAH T2LT and T2 differences with GISS. For RSS differences with GISS, I found no significant trend either negative of positive.

I did not have model results data used in Santer et al. (2008) for differencing (surface minus T2LT and T2) with annual data from 1979-2007. However, a quick inspection of the average means and their standard deviations for monthly data for 1979-1999, as reported in Santer et al., would appear to readily extrapolate to annual data from 1979-2007 showing a significantly and negative difference for model surface – model T2LT and T2 (why else would we be discussing this issue). I, therefore, judge that the UAH and RSS results of differences with the GISS surface temperature are different than the model results and both in the same directions.

GISS minus UAH T2LT: Trend = 0.084; ATSD = 0.019; p = 0.00

GISS minus RSS T2LT: Trend = -0.012; ATSD = NA; p = 0.49

GISS minus UAH T2: Trend = 0.115; ATSD = 0.021; p = 0.00

GISS minus RSS T2: Trend = 0.025; ATSD = NA; p = 0.22

Re: Kenneth Fritsch (#43),

I think the direction of this thread has gotten away from the discussion that I thought fitted my analyses presented here, but be that as it may here is some more that I should have added to my previous post.

In that post, I noted the temperature trend differences for the model surface minus T2LT and surface minus T2 as listed in Santer et al. (2008) with monthly data and for the time period 1979-1999 could probably be extrapolated for comparison with those I did with GISS, UAH and RSS annual data for the time period 1979-2007. At that I time I did not post those differences. The model average trend differences are:

Surface minus T2LT = -0.069 degrees C per decade

Surface minus T2 = -0.053 degrees C per decade

Now, if one assumes that the standard deviations for the model differences would be nearly the same as those for the GISS to UAH and RSS differences (a safe assumption in my view of looking at many other of these distributions) we would have something around 0.020 degrees C per decade and could state that the model results have difference trends that are statistically significant and negative for both surface to T2TL and surface to T2. Comparing that to the results in my previous post gives GISS to UAH T2LT and T2 differences that statistically significant and positive, while the GISS to RSS T2LT and T2 differences are flat but not statistically negative as the model differences are.

I would like to have a copy of the model data that Santer et al uses in their paper (and as I recall Steve M has requested) so that I could make my analysis without any assumptions. Without it, I think I can rather safely state that model difference results are significantly different than those for the UAH and RSS data series when the calculations use the longer time period from 1979-2007 and annual data (with its much smaller AR1 adjustment required). This conclusion would be in agreement with that of Douglass et al.(2007).

I understand the use of a linear trend in comparison to a linear CO2 change but it has bothered me about this paper ever since Steve first started posting on it. We certainly know temp better than a linear model would suggest. Assigning a linear trend rather than an even slightly shaped trend creates a maximally wide variance to correlate with. It is similar to Tamino’s weather variance post where he assumes all temp variation outside of linear is noise, he fits an arma model and then shows that recent temps lie within the 95% CI.

I did a post which shows how changing the assumption using 21 yr filtered gauss data in place of a linear trend has a big impact on the overall variance in the noise level.

http://noconsensus.wordpress.com/2008/10/30/dont-get-fooled-againagain/

It would be easier to accept the conclusions if every little point didn’t coincidentally fall in the favor of the ‘team’

I think this is on-topic as long as we’re discussing step changes and whether they are real. I have a question from a “layman-who-just-saw-something-cool-on-the-Science-Channel.” There was a program on the history of fractal mathematics and how fractal patterns reflect much of nature, from mountain range ridges to cloud formations to a beating heart. In the case of the beating heart, it was postulated that a normal heart displays a fractal pattern when viewed over various time frames. The theory is that finding heart patterns that are not fractal could be a means of detecting heart problems.

I wonder if anyone has looked at the temperature data from this fractal perspective? Surely someone has done something like this already. Might be interesting for tree rings, too…

Re: Dan White (#45),

Benoît Mandelbrot, the “father of fractal geometry” worked with hydrological series and possibly temperature series, You might try googling around him.

Re: Dan White (#45),

Dan , fractal geometry has been part of the chaos theory for decades .

.

Chaotic attractors with non integer dimension (called “strange attractors”) are an example of fractal N dimensional objects .

Now before beginning to dabble in chaos theory , it is necessary to distinguish temporal chaos which happens in the phase space (so

NOTin the ordinary space) and the spatio temporal chaos that has both a temporal component and an ordinary space component .The famous Lorenz system or heart oscillations are examples of the former and turbulent convection , clouds are examples of the latter .

Those are 2 very different categories that one shouldn’t mix .

.

So there are of course hundreds of people during decades that have looked at climate with a chaos theoretical perspective .

However this issue is only indirectly on topic of this thread .

To get back on topic, this paper point out that when you are trying too hard to refute someone rather than just study a phenomenon, it is easy to be biased and, let us say, not so careful. It also points out that 17 coauthors don’t mean that anyone actually double-checked anything. ie more does not equal “better” when it comes to authors.

Craig–

It is an unfortunate fact that when there are more than 3 co-authors for a journal article, most of the co-authors have little involvement in the actual writing. The majority have their names on the paper because they provided a specific analysis used in the paper, but they may not have been involved in the entire argument presented by the first author. This is certainly the case when there is one co-author per page.

For example– one person might have run the monte-carlo synthetic AR(1) process to create figure 5. Another might have downloaded the model data to create the Figure 3. This sort of collaboration can work well, but it’s also a method in which implicit assumptions are made and never thought about. It’s not implausible some authors will not have thought deeply about every assumption made– certainly many will overlook

implicitassumptions!Re: lucia (#49),

I had a professor collague who was surprised to find his name on a paper in a language that he didn’t speak. He had helped some friends at another university on a topic. They decided that he deserved to be a co-author on a paper but forgot to tell him. He couldn’t read it so did not know initially what it was about.

Re: Stan Palmer (#50),

I received several co-authorships as acknowledgment of contributions to the subject science and to the particular study reported. However, I rarely saw them until they were published. I had no issue with it as it helped my reputation and publication count.

Ironically, though, my key contribution in one field was to overturn a key assumption that had been taken as decided for nearly 50 years by all the top people in the field (including me, initially). I was taken to the mat over that — everyone including my managers one and two levels up thought I had lost it and were on the verge of replacing me. At the 11th hour, I was vindicated and “rewarded” (unlike what seems to happen in Climate Science). The vindication happened because one of the field’s “giants” (who originally was one of my worst critics) was guardedly open minded, ultimately seriously considered my position. agreed, and thereafter included me on many of his reports that used my “new” assumption — were it not for his open mindedness I would have been “toast” (as seems to happen in Climate Science).

Having been through that, the “appeal to authority”, “consensus”, and “established fact” nature of some pro-AGW arguments hold no water with me. Rather, show me the data — I’ll make up my own mind (pro or con), thank you.

Re: Allen63 (#52), One caveat: no one on the Team ever gets “toasted.”

Re: PhilH (#60),

But does anyone ever get buttered?

Re: Sam Urbinto (#62), Oh yes.

Re: Stan Palmer (#50),

Yes. Sometimes people’s names appear for reasons you stated. However, in this case, I suspect all co-authors at least knew their names were attached. Re: Kenneth Fritsch (#55), It would be evem more interesting were that not so!

Re: Mike B (#51), Recent statements?

It was interesting when beaker insisted that equation 12 must contain a typo as the 1/n in the equation would clearly contradict what Gavin appeared to communicate in a relatively recent article at Real Climate. Turns out there is no typo. :)

Re: Kenneth Fritsch (#55),

I think many people would like a copy of that model data! :)

Re: Stan Palmer (#50),

I was once co-author on a conference paper, but could not even attend the session because my security-clearance level was not high enough.

Re: lucia (#49),

Lucia – I completely agree with everything you and Craig have written here about co-authorship. However I think we have something a little bit different than the typical too-many-cooks situation that you describe. Here we have

leadco-authors makingpublicstatements that may well misrepresentfundamentalassumptions in the paper.That is, how shall I say it, unprrrreeeccccented.

Re: lucia (#49),

Which, of course, when applied, as it certainly seems appropriate, to the AGW consensus generally puts it in a light not often admitted – or at least openly.

Re: lucia (#49),

Re: UC (#58),

UC wins the thread!

Don’t scientists realise that if they are cited as “co-author” on a paper, that implies that they specifically agree with and endorse the paper? It implies that they have read the paper, checked the calculations, and specifically consented to their name appearing on the paper as a co-author.

If they are cited as co-author, but in fact haven’t actually given their consent, then to protect themselves, they must send a letter to the authors and to the journal publishing the paper disassociating themselves with the paper. I think that failure to do so would be taken, in law, as implying consent, using the legal concept of ‘estoppal’.

Re: Steve M.’s difficulty (frequently) in getting data and code from authors, as here.

From Karl Popper: If an oracle has a vision and announces that “the structure of DNA is a double helix”, that is not a SCIENTIFIC truth, for science is a method, a method the oracle did not employ. Further–and more to the point here–, if Robinson Crusoe has on his island a laboratory with every conceivable scientific instrument, etc., does research and comes up with the finding that “the structure of DNA is a double helix”, writes this down and sends it off in a bottle, this too is not science. This is so, Popper argues, because in order to be science any finding has to undergo a thorough analysis/vetting by the rest of the “scientific community”– a necessary element of the process known as “science”. Therefore, in order for some finding to enter into being science the researcher has to make available to others everything he used to arrive at his conclusion, such as data and code. If Santer, or Mann, or anyone else does not archive all of this at the time of publication, then they are, in essence, using the note-in-a-bottle approach, period, and that is not science. Now this seems like a no-brainer to me, so show me where/if I am wrong: Shouldn’t all SCIENTIFIC journals require that all papers submitted for publication be accompanied by completely archived data and code before publication will even be considered?

Hehe, I almost had to clean off my monitor after seeing that graphic. Very good UC, very good. View at your own peril.

Mark

Ken Fritsch’s numbers in #30 and 32 above reveal an interesting and important property of trend-coefficient standard errors: Abstracting from serial correlation, the standard error of an estimate of a mean or the central intercept of a trend line will decrease roughly as 1/n^(1/2) as the sample size n increases, since every observation is equally informative. However, the standard error of the estimate of the slope will decrease much faster as the observation period is increased, since the end points of a time trend are much more informative about its slope than are the more interior points. Thus, adding points at either end increases the precision of the slope at a much faster rate than just adding new observations along the original line portion. In fact, the standard deviation of the slope will decreae roughly as 1/n^(3/2).

This must be the primary reason Ken is getting much smaller standard errors using the full sample (to 2007 or so) than when he stops, as Santer, Nychka et al (2008) did, in 1999. The autoregression adjustment is important, but since the autoregressive coefficient isn’t much different for the two samples, it is not causing much change in the se.

As I noted in comment 64 of the Oct 22 thread “Replicating Santer Tables 1 and 3″, the excellent Nychka, Santer, et al (2000) unpublished working paper on serial correlation found by Jean S demonstrates that the adjustment Santer, Nychka et al use in 2008 is in fact inadequate, so that the true standard errors are somewhat larger than are obtained using the 2008 adjustment. It is unfortunate that Santer, Nychka et al did not heed Nychka, Santer et al!

As Jeff Id points out above, it should be borne in mind that linear trends are highly suspect as literal models of climatic data. It would be sufficient for the purposes of global warming advocates to make a case that temperature has merely drifted up to a level that is significantly higher than it used to be, without requiring it to be a linear trend. Nevertheless, for the data at hand, the significance or insignificance of the “trend” (and differences between measures of the trend) is probably a descriptively adequate way of characterizing such a warming.

As for multiple co-authorship, an unfortunate incentive for co-authors to pile on is that universities (and research centers, I assume) often count raw citations as a measure of performance for salary purposes etc. If they counted co-author-adjusted citations instead, eg giving each of the 17 authors of the 2008 paper 1/17 citation credit apiece for having co-authored it, gratuitious piling on would quickly come to an end!

Re: Hu McCulloch (#64), In support of the authorship comment, a colleague was told by dept head that a book he wrote (alone) counted as 1 publication.

Re: Hu McCulloch (#64), As I understand it the assumption is that the forcing of the atmosphere can be approximated as a linear trend over time. This of course assumes ENSO/higher order oscillations to be noise and where the signal + noise equation for both papers comes in. Interestingly the post I made earlier for MSU RSS data Re: MC (#34), appears to show that there is a general troposphere ENSO signature independent of height with an additional linear trend

The equation could then be Y0 = ENSO function + linear trend + noise. The errors in the linear trend would be less as the autocorrelation effects would be much less provided the ENSO function was modelled. But then again that seems to be somewhat way off in the future by the look of things

Re: UC (#58), Is that the new Eminem album cover?

Re: Hu McCulloch (#64),

Hu, I appreciate your professional thoughts on these analyses. I do not have a comprehensive background on the theoretical underpinnings of some of these statistical issues and thus your explanations help this (very)layperson considerably.

In my view, I think that using the annual data to avoid the serial correlation that results from using the monthly data should reduce the uncertainty in the value of the adjusted trend standard deviation used in Santer et al. I use an example below, that might be all wet from a theoretical standpoint (and I hope someone here evaluates that aspect and comments), but it makes my point.

Using the RSS T2LT time series from Santer et al. (2008) for monthly data from 1979-1999, one obtains an unadjusted trend standard deviation of 0.0307 degrees C per decade. With a residual lag 1 versus residual correlation of r = 0.886, an adjusted trend standard deviation of 0.132 is obtained.

Now, I go to the potentially illegitimate part. If the lag1 residual versus residual regression is carried out, the correlation of r = 0.887 is obtained and with a range covering the 5-95% CI of 0.827 to 0.944. Using those limits one can calculate the 5-95% limits on the adjusted trend standard deviation and obtain a range from 0.104 to 0.212 degrees C per decade.

If annual data is used, these uncertainties, if legitimate, are reduced to very small and unimportant values due to the sharply reduced serial correlations. This approach goes along with UC’s reminders that I have read at CA that in effect states that avoidance of autocorrelation is preferable to using corrections for it.

Re: Allen63 (#7)

People outside science have a hopelessly exaggerated idea of the quality of peer review. I am a regular reviewer for physics journals, and I probably spend about three or four hours reviewing a typical manuscript. I check that it is comprehensible, that the authors haven’t made any really glaring errors, and that they give enough references to place the work in its proper context. If I have time, and the paper is very close to my own field, I check a couple of calculations. If the manuscript is for a really top journal I spend a little longer; if it’s from a group I know and trust I’m not so careful. And that’s it. Comparing my reviews with other reviews of the same manuscripts I get the impression that I am at the careful end of reviewing in my field.

I used to reckon as a handy rule of thumb that 10% of published papers in my field were fraudulent, 30% were erroneous, 30% were technically correct but completely irrelevant, and the remaining 30% were worth bothering with.

Re: Jonathan (#11),

We see things the same way then. Your heuristic is interesting, plausible, and relevant to AGW literature (pro & con) I imagine.

Heck, even I have written “a report or two” (euphemistically) that were more to meet my publication count requirement than to provide new knowledge. And, yes, one of my first, simplistic models did get through to publication with a serious mistake that was not noticed by reviewers (over 30 years later, I am still embarrassed to think about it — fortunately, that occurrence “rubbed my nose” in the fact that I could be just as subjective as the next person — consequently, I developed thorough objective ways of double/triple checking my work which have stood me in good stead). I (anonymously) admit to these things for the greater good.

Knowing what I (and you) know from experience, I am thankful for Steve’s very public AGW “auditing” efforts.

Re: Jonathan (#9),

Judging by this, it’s pretty clear to me that even this type of review, which I would hardly call rigorous, isn’t happening with “climate science”. That, or some are purposefully avoiding the hard questions.

Re: Allen63 (#11), and 12

In reviewing a paper for a conference once, I noticed that the central example provided had a serious error in it. The error was instructive since it showed that the technique being described was completely unsound. I noted this in amy review and recommended that the paper be rejected. To my surprise, my two fellow reviewers did not recommend rejection. Their reviews were not enthusiastic but they still recommended that the paper be considered for acceptance.

I pointed out what I thought to be the error to them and asked for their opinion on it. They agreed that the mistake was there and it was serious but still recommended acceptance. I asked them for their opinions on what somebody had to do to earn rejection – “What does somebody have to do to get a paper rejected around here?”

I agree with the comments above about the effectiveness of peer review.

Re: Stan Palmer (#13), Strange as it sounds it is helpful in science to have a paper with bad methods leading to bad results. This can provide an example of what not to do. It can be better for the process of scientific learning that a paper is incorrect because it may shed some light on a current issue. A famous example from astrophysics was a paper about stellar temperature modelling that was great for far away stars but woeful for the Sun. Then a guy called Kurutz came up with a better model that matched stars to our own. However the other paper still gets loads of quotes. There is a limit though.

Re: Jeff Alberts (#22)

It is difficult for me to assess whether climate science reviewing is worse than reviewing in other areas of physics as I’m not closely involved in that field. It is, however, reasonably clear to me that the standard of reviewing is “broadly comparable”, and certainly not much better than in my field, where all peer-review guarantees is that the claims are “probably not complete rubbish”.

In my field that’s probably good enough: nobody is basing multi-billion dollar expenditures on my results. Given the political importance of climate science we should expect better standards. And that, of course, is pretty much the whole point of this blog!

## One Trackback

[...] the paper has its merits, it also has some curious features some of which have been discussed at Climate Audit. I’ve mostly focused examining the conclusions we would draw if the method of Santer17 were [...]