Joelle Gergis, Data Torturer

Cheney-torture-worksIn 2012, the then much ballyhoo-ed Australian temperature reconstruction of Gergis et al 2012 mysteriously disappeared from Journal of Climate after being criticized at Climate Audit. Now, more than four years later, a successor article has finally been published. Gergis says that the only problem with the original article was a “typo” in a single word. Rather than “taking the easy way out” and simply correcting the “typo”, Gergis instead embarked on a program that ultimately involved nine rounds of revision, 21 individual reviews, two editors and took longer than the American involvement in World War II.  However, rather than Gergis et al 2016 being an improvement on or confirmation of Gergis et al 2012, it is one of the most extraordinary examples of data torture (Wagenmakers, 2011, 2012) that any of us will ever witness.

Also see Brandon S’s recent posts here here.

The re-appearance of Gergis’ Journal of Climate article was accompanied by an untrue account at Conversation of the withdrawal/retraction of the 2012 version. Gergis’ fantasies and misrepresentations drew fulsome praise from the academics and other commenters at Conversation. Gergis named me personally as having stated in 2012 that there were “fundamental issues” with the article, claims which she (falsely) said were “incorrect” and supposedly initiated a “concerted smear campaign aimed at discrediting [their] science”.  Their subsequent difficulty in publishing the article, a process that took over four years, seems to me to be as eloquent a confirmation of my original diagnosis as one could expect.

I’ve drafted up lengthy notes on Gergis’ false statements about the incident, in particular, about false claims by Gergis and Karoly that the original authors had independently discovered the original error “two days” before it was diagnosed at Climate Audit. These claims were disproven several years ago by emails provided in response to an FOI request. Gergis characterized the FOI requests as “an attempt to intimidate scientists and derail our efforts to do our job”, but they arose only because of the implausible claims by Gergis and Karoly to priority over Climate Audit.

Although not made clear in Gergis et al 2016 (to say the least), its screened network turns out to be identical to the Australasian reconstructions in PAGES2K (Nature 2013), while the reconstructions are nearly identical. PAGES2K was published in April 2013 and one cannot help but wonder at why it took more than three years and nine rounds of revision to publish something so similar.

In addition, one of the expectations of the PAGES2K program was that it would identify and expand available proxy data covering the past two millennia. In this respect, Gergis and the AUS2K working group failed miserably. The lack of progress from the AUS2K working group is both astonishing and dismal, a failure unreported in Gergis et al 2016 which purported to “evaluate the Aus2k working group’s regional consolidation of Australasian temperature proxies”.

Detrended and Non-detrended Screening

The following discussion of data torture in Gergis et al 2016 draws on my previous and similar criticism of data torture in PAGES2K.

Responding to then recent scandals in social psychology, Wagenmakers (2011 pdf, 2012 pdf) connected the scandals to academics tuning their analysis to obtain a “desired result”, which he classified as a form of “data torture”:

we discuss an uncomfortable fact that threatens the core of psychology’s academic enterprise: almost without exception, psychologists do not commit themselves to a method of data analysis before they see the actual data. It then becomes tempting to fine tune the analysis to the data in order to obtain a desired result—a procedure that invalidates the interpretation of the common statistical tests. The extent of the fine tuning varies widely across experiments and experimenters but is almost impossible for reviewers and readers to gauge…

Some researchers succumb to this temptation more easily than others, and from presented work it is often completely unclear to what degree the data were tortured to obtain the reported confession.

As I’ll show below, it is hard to contemplate a better example of data torture, as described by Wagenmakers, than Gergis et al 2016.

The controversy over Gergis et al, 2012 arose over ex post screening of data, a wildly popular technique among IPCC climate scientists, but one that I’ve strongly criticized over the years.  Jeff Id and Lucia have also written lucidly on the topic (e.g. Lucia here and, in connection with Gergis et al, here).  I had raised the issue  in my first post on Gergis et al on May 31, 2012.  Closely related statistical issues arise in other fields under different terminology e.g. sample selection bias, conditioning on post-treatment variable, endogenous selection bias.  The potential bias of ex post screening seems absurdly trivial if one considers the example of a drug trial, but, for some reason, IPCC climate scientists continue to obtusely deny the bias. (As a caveat, objecting to the statistical bias of ex post screening does not entail that opposite results are themselves proven. I am making the narrow statistical point that biased methods should not be used.)

Despite the public obtuseness of climate scientists about the practice, shortly after my original criticism of Gergis et al 2012, Karoly privately recognized the bias associated with ex post screening as follows in an email to  Neukom (June 7, 2012; FOI K,58):

If the selection is done on the proxies without detrending ie the full proxy records over the 20th century, then records with strong trends will be selected and that will effectively force a hockey stick result. Then Stephen Mcintyre criticism is valid. I think that it is really important to use detrended proxy data for the selection, and then choose proxies that exceed a threshold for correlations over the calibration period for either interannual variability or decadal variability for detrended data…The
criticism that the selection process forces a hockey stick result will be valid if the trend is not excluded in the proxy selection step.

Gergis et al 2012 had purported to avoid this bias by screening on detrended data, even advertising this technique as a method of “avoid[ing] inflating the correlation coefficient”:

For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921-1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record. Only records that were significantly (p<0.05) correlated with the detrended instrumental target over the 1921-1990 period were selected for analysis. This process identified 27 temperature-sensitive predictors for the SONDJF warm season.

As is now well known, they didn’t actually perform the claimed calculation. Instead, they calculated correlation coefficients on undetrended data.  This error was first reported by CA commenter Jean S on June 5, 2012 (here).  Two hours later (nearly 2 a.m. Swiss time), Gergis coauthor Raphi Neukom notified Gergis and Karoly of the error (FOI 2G, page 77). Although Karoly later (falsely) claimed that his coauthors were unaware of the Climate Audit thread, emails obtained through FOI show that Gergis had sent an email to her coauthors (FOI 2G, page 17) drawing attention to the CA thread, that Karoly himself had written to Myles Allen (FOI 2K, page 11)about comments attributed to him on the thread (linking to the thread) and that Climate Audit and/or myself are mentioned in multiple other contemporary emails (FOI 2G).

When correlation coefficients were re-calculated according to the stated method, only a handful actually passed screening, a point reported at Climate Audit by Jean S on June 5 and written up by me as a head post on June 6. According to my calculations, only six of the 27 proxies in the G12 network passed detrended screening.  On June 8 (FOI 2G, page 112), Neukom reported to Karoly and Gergis that eight proxies passed detrended screening (with the difference between his results and mine perhaps due to drawing from the prescreened network or to difference in algorithm) and sent them a figure (not presently available) comparing the reported reconstruction with the reconstruction using the stated method:

Dashed reconstruction below is using only the 8 proxies that pass detrended screening. solid is our original one.

This figure was unfortunately not included in the FOI response. It would be extremely interesting to see.

As more people online began to be aware of the error, senior author Karoly decided that they needed to notify Journal of Climate. Gergis notified the journal of a “data processing error” on June 8 and their editor, John Chiang, immediately rescinded acceptance of the paper the following day as follows, stating his understanding that they would redo the analysis to conform with their described methodology:

After consulting with the Chief Editor regarding your situation, my decision is to rescind the acceptance of your manuscript for publication. My understanding is that you will be redoing your analysis to conform to your original description of the predictor selection, in which case you may arrive at a different conclusion from your original manuscript. Given this, I request that you withdraw the manuscript from consideration.

Contrary to her recent story at Conversation, Gergis tried to avoid redoing the analysis, instead she tried to persuade the editor that the error was purely semantic (“error in words”), rather than a programming error, invoking support for undetrended screening from Michael Mann, who was egging Gergis on behind the scenes:

Just to clarify, there was an error in the words describing the proxy selection method and not flaws in the entire analysis as suggested by amateur climate skeptic bloggers…People have argued that detrending proxy records when reconstructing temperature is in fact undesirable (see two papers attached provided courtesy of Professor Michael Mann) .

The Journal of Climate editors were unpersuaded and pointedly asked Gergis to explain the difference between the first email in which the error was described as a programming error and the second email describing the error as semantic:

Your latest email to John characterizes the error in your manuscript as one of wording. But this differs from the characterization you made in the email you sent reporting the error. In that email (dated June 7) you described it as “an unfortunate data processing error,” suggesting that you had intended to detrend the data. That would mean that the issue was not with the wording but rather with the execution of the intended methodology. would you please explain why your two emails give different impressions of the nature of the error?

Gergis tried to deflect the question. She continued to try to persuade the Journal of Climate to acquiesce in her changing the description of the methodology, as opposed to redoing the analysis with the described methodology, offering only to describe the differences in a short note in the Supplementary Information:

The message sent on 8 June was a quick response when we realised there was an inconsistency between the proxy selection method described in the paper and actually used. The email was sent in haste as we wanted to alert you to the issue immediately given the paper was being prepared for typesetting. Now that we have had more time to extensively liaise with colleagues and review the existing research literature on the topic , there are reasons why detrending prior to proxy selection may not be appropriate. The differences between the two methods will be described in the supplementary material, as outlined in my email dated 14 June. As such, the changes in the manuscript are likely to be small, with details of the alternative proxy selection method outlined in the supplementary material .

The Journal of Climate editor resisted, but reluctantly gave Gergis a short window of time (to end July 2012) to revise the article, but required that she directly address the sensitivity of the reconstruction to proxy selection method and “demonstrate the robustness” of her conclusions:

In the revision, I strongly recommend that the issue regarding the sensitivity of the climate reconstruction to the choice of proxy selection method (detrend or no detrend) be addressed. My understanding that this is what you plan to do, and this is a good opportunity to demonstrate the robustness of your conclusions.

Chiang’s offer was very generous under the circumstances. Gergis grasped at this opportunity and promised to revert by July 27 with a revised article showing the influence of this decision on resultant reconstructions:

Our team would be very pleased to submit a revised manuscript on or before the 27 July 2012 for reconsideration by the reviewers . As you have recommended below, we will extensively address proxy selection based on detrended and non detrended data and the influence on the resultant reconstructions.

Remarkably, this topic is touched on only in passing in Gergis et al 2016 and the only relevant diagram conceals, rather than addresses, its effect.

Gergis’ Trick to Hide the Discrepancy

We know that Neukom had sent Gergis a comparison of the “original” reconstruction to the reconstruction using the stated method as early as June 8, 2012.  It would have been relatively easy to add such a figure to Gergis et al 2012 and include a discussion, if the comparison “demonstrate[d] the robustness of your conclusions”.   This obviously didn’t happen and one has to ask why not.

Nor is the issue prominent in Gergis et al 2016.  The only relevant figure is Figure S1.3 in the Supplementary Information.  Gergis et al asserted that this figure suggested that “decadal-scale temperature variability is not highly sensitive to the predictor screening methods”. (In the following text, “option #4”, with “nine predictors”, is a variation of the network calculated using the stated G12 methodology.)

Figures S1.3-S1.5 compare the R28 reconstruction for just the PCR method presented in the main text, with the results based on the range of alternative proxy screening methods. They show that the variations reconstructed for each option are similar. The results always lie within the 2 SE uncertainty range of our final reconstruction (option #1), except for a few years for option #4 (Figure S1.3), which only uses nine predictors. This suggests that decadal-scale temperature variability is not highly sensitive to the predictor screening methods.

If, as Gergis et al say here, their results were “not highly sensitive” to predictor screening method – even the difference between detrended and non-detrended screening, then Gergis’ failure to comply with editor Chiang’s offer by July 31, 2012 is all the more surprising.

However,  there’s a trick in Gergis’ Figure S1.3.  On the left is Gergis’ original Figure S1.3. It gives a strong rhetorical impression of coherence between the four illustrated reconstructions. (“AR1 detrending fieldmean” corresponds to the reconstruction using the stated method of Gergis et al 2012).  On the right is a blowup showing that one of the four reconstructions (“AR1 detrending fieldmean”) has been truncated prior to AD1600 when it is well outside the supposed confidence interval.



CA readers are familiar with this sort of truncation in connection with the trick to “hide the decline” in the IPCC AR4 chapter edited by Mann.  One can only presume that earlier values were also outside the confidence interval on the high side and that Gergis truncated the series at AD1600 in order to “hide” the discrepancy.

Although I haven’t seen the the “dashed” reconstruction in Neukom’s email of June 8, I can only assume that it also diverged upward before AD1600 and that Gergis et al had been unable to resolve within editor Chiang’s deadline of July 2012.

Torturing and Waterboarding the Data

In the second half of 2012, Gergis and coauthors embarked on a remarkable program of data torture in order to salvage a network of approximately 27 proxies, while still supposedly using “detrended” screening.  Their eventual technique for ex post screening bore no resemblance to the simplistic screening of (say) Mann and Jones, 2003.

One of their key data torture techniques was to compare proxy data correlations not simply to temperatures in the same year, but to temperatures of the preceding year and following year.

To account for proxies with seasonal definitions other than the target SONDJF season (e. g., calendar year averages), the comparisons were performed using lags of -1, 0, and +1 years for each proxy (Appendix A).

This mainly impacted tree ring proxies. In their practice, a lag of -1 year meant that a tree ring series is assigned one year earlier than the chronology (+1 is assigned one year later.)  For a series with a lag of -1 year (e.g. Celery Top East), ring width in the summer of (say) 1989-90 is said to correlate with summer temperatures of the previous year. There is precedent for correlation to previous year temperatures in specialist studies. For example, Brookhouse et al (2008) (abstract here) says that the Baw Baw tree ring data (a Gergis proxy), correlates positively with spring temperatures from the preceding year. In this case, however, Gergis assigned zero lag to this series, as well as a negative orientation.

The lag of +1 years assigned to 5 sites is very hard to interpret in physical terms.  Such a lag requires that (for example) Mangawhera ring widths assigned to the summer of 1989-1990 correlate to temperatures of the following summer (1990-1991) – ring widths in effect acting as a predictor of next year’s temperature.  Gergis’ supposed justification in the text was nothing more than armwaving, but the referees do not seem to have cared.

Of the 19 tree ring series in the 51-series G16 network, an (unphysical) +1 lag was assigned to five series, a  -1 lag to two series and a 0 lag to seven series, with five series being screened out.  Of the seven series with 0 lag, two had inverse orientation in the PAGES2K. In detail, there is little consistency for trees and sites of the same species. For example, New Zealand LIBI composite-1 had a +1 lag, while New Zealand LIBI composite-2 had 0 lag.  Another LIBI series (Urewara) is assigned an inverse orientation in the (identical) PAGES2K and thus presumably in the CPS version of G16.  Two LIBI series (Takapari and Flanagan’s Hut) are screened out in G16, though Takapari was included in G12.  Because the assignment of lags is nothing more than an ad hoc after-the-fact attempt to rescue the network, it is impossible to assign meaning to the results.


In addition, Gergis also borrowed from and expanded a data torture technique pioneered in Mann et al 2008.  Mann et al 2008 had been dissatisfied with the number of proxies passing a screening test based on correlation to local gridcell, a commonly used criterion (e.g. Mann and Jones 2003). So Mann instead compared results to the two “nearest” gridcells, picking the highest of the two correlations but without modifying the significance test to reflect the “pick two” procedure. (See here for a contemporary discussion.)  Instead of comparing only to the two nearest gridcells, Gergis expanded the comparison to all gridcells “within 500 km of the proxy’s location”, a technique which permitted comparisons to 2-6 gridcells depending both on the latitude and the closeness of the proxy to the edge of its gridcell:

As detailed in appendix A, only records that were significantly (p < 0.05) correlated with temperature variations in at least one grid cell within 500 km of the proxy’s location over the 1931-90 period were selected for further analysis.

As described in the article, both factors were crossed in the G16 comparisons. Multiplying three lags by 2-6 gridcells, Gergis appears to have made 6-18 detrended comparisons, retaining those proxies for which there was a “statistically significant” correlation.  It doesn’t appear that any allowance was made in the benchmark for the multiplicity of tests. In any event, using this “detrended” comparison, they managed to arrive at a network of 28 proxies, one more than the network of Gergis et al 2012. Most of the longer proxies are the same in both networks, with a shuffling of about seven shorter proxies.  No ice core data is included in the revised network and only one short speleothem. It consists almost entirely of tree ring and coral data.

Obviously, Gergis et al’s original data analysis plan did not include a baroque screening procedure.  It is evident that they concocted this bizarre screening procedure in order to populate the screened population with a similar number of proxies to Gergis et al 2012 (28 versus 27) and to obtain a reconstruction that looked like the original reconstruction, rather than the divergent version that they did not report.  Who knows how many permutations and combinations and iterations were tested, before eventually settling on the final screening technique.

It is impossible to contemplate a clearer example of “data torture” (even Mann et al 2008).

Nor does this fully exhaust the elements of data torture in the study, as torture techniques previously in Gergis et al 2012 were carried forward to Gergis et al 2016. Using original and (still) mostly unarchived measurement data, Gergis et al 2012 had re-calculated all tree ring chronologies, except two, using an opaque method developed by the University of East Anglia. The two exceptions were the two long tree ring chronologies reaching back to the medieval period:

All tree ring chronologies were developed based on raw measurements using the signal-free detrending method (Melvin et al., 2007; Melvin and Briffa, 2008) …The only exceptions to this signal-free tree ring detrending method was the New Zealand Silver Pine tree ring composite (Oroko Swamp and Ahaura), which contains logging disturbance after 1957 (D’Arrigo et al., 1998; Cook et al., 2002a; Cook et al., 2006) and the Mount Read Huon Pine chronology from Tasmania which is a complex assemblage of material derived from living trees and sub-fossil material. For consistency with published results, we use the final temperature reconstructions provided by the original authors that includes disturbance-corrected data for the Silver Pine record and Regional Curve Standardisation for the complex age structure of the wood used to develop the Mount Read temperature reconstruction (E. Cook, personal communication, Cook et al., 2006).

This raises the obvious question why “consistency with published results” is an overriding concern for Mt Read and Oroko, but not for the other series, which also have published results.  For example, Allen et al (2001), the reference for Celery Top East, shows the chronology at left for Blue Tier, while Gergis et al 2016 used the chronology at right for a combination of Blue Tier and a nearby site.  Using East Anglia techniques, the chronology showed a sharp increase in the 20th century and “consistency” with the results shown in Allen et al (2001) was not a concern of the authors. One presumes that Gergis et al had done similar calculations for Mount Read and Oroko, but had decided not to use them.  One can hardly avoid wondering whether the discarded calculations didn’t emphasize the desired story.


Nor is this the only ad hoc selection involving these two important proxies.  Gergis et al said that their proxy inventory was a 62-series subset taken from the inventory of Neukom and Gergis, 2011. (I have been unable to exactly reconcile this number and no list of 62 series is given in Gergis et al 2016.)  They then excluded records that “were still in development at the time of the analysis” (though elsewhere they say that the dataset was frozen as of July 2011 due to the “complexity of the extensive multivariate analysis”) or “with an issue identified in the literature or through personal communication”:

Of the resulting 62 records we also exclude records that were still in development at the time of the analysis .. and records with an issue identified in the literature or through personal communication

However, this criterion was applied inconsistently.  Gergis et al acknowledge that the Oroko site was impacted by “logging disturbance after 1957” – a clear example of an “issue identified in the literature” but used the data nonetheless.  In some popular Oroko versions (see CA discussion here), proxy data after 1957 was even replaced by instrumental data. Gergis et al 2016 added a discussion of this problem, arm-waving that the splicing of instrumental data into the proxy record didn’t matter:

Note that the instrumental data used to replace the disturbance-affected period from 1957 in the silver pine [Oroko] tree-ring record may have influenced proxy screening and calibration procedures for this record. However, given that our reconstructions show skill in the early verification interval, which is outside the disturbed period, and our uncertainty estimates include proxy resampling (detailed below), we argue that this irregularity in the silver pine record does not bias our conclusions.

There’s a sort of blind man’s buff in Gergis’ analysis here, since it looks to me like G16 may have used an Oroko version which did not splice instrumental data. However, because no measurement data has ever been archived for Oroko and a key version only became available through inclusion in a Climategate email, it’s hard to sort out such details.


The precise timing of Gergis’ data torture can be constrained by the publication of the PAGES2K compilation of regional chronologies used in IPCC AR5.  The IPCC First Order Draft had included a prominent graphic with seven regional reconstructions, one of which was the Australian reconstruction of Gergis et al, 2012 (cited as under review).  The AR5 Second Order Draft, published in July 2012 after the withdrawal of Gergis et al 2012, included a more or less identical reconstruction, this time cited to PAGES2K, under review.

The PAGES2K compilation had been submitted to Science in July 2012, barely meeting the deadline. Remarkably, it was rejected.  Mann, one of the reviewers, argued that it was impossible to review so many novel regional reconstructions and that they should be individually reviewed in specialist journals before attempting a compilation. This left IPCC in rather a jam.  However, Nature stepped in and agreed to publish the rejected article.  Keith Briffa, one of the Nature reviewers, “solved” the problem of trying to review so many novel reconstructions by suggesting that the article be published as a “Progress Article”, a type of article which had negligible peer review requirements.  Everyone readily agreed to this diplomatic solution and thus the sausage was made (also see discussion by Hilary Ostrov here).

The Gergis contribution to PAGES2K screened the AUS2K proxy network down to 28 proxies – exactly the same selection as Gergis et al 2016, published three years later.  The PAGES2K Paico reconstruction is identical to the G16 Paico reconstruction up to a slight rescaling: the correlation between the two versions is exactly 1. Their “main” reconstruction used principal components regression – a technique harking back to Mann et al 1998, which is commonly defended on the grounds that later article use different techniques.  The G16 version is nearly identical to the PAGES2K version, as shown below.


The PAGES2K article was mentioned on a variety of occasions in Gergis et al 2016, but I’m not sure how a reader of G16 could become aware of the identity of the networks and reconstructions.

Given that the PAGES2K network was accepted with no more than cursory peer review, it’s interesting that it took nine rounds of revision for the Journal of Climate to accept Gergis et al 2016 with its identical network and virtually identical reconstruction.

The Dismal Lack of Progress by the AUS2K Working Group

Despite the long-standing desire for more “long” SH proxies, the AUS2K working group provided Gergis with only three records (Law Dome d18O, Mt Read Tasmania tree rings, Oroko NZ tree rings) in the target geographical area which began prior to AD1100, with the Law Dome series being screened out.  None of these are new records.

Closely related versions of all three series were used in Mann and Jones (2003), which also selected series by screening against gridcell temperatures, but with different results. Mann and Jones screened according to “decadal correlations”, resulting in selection of Tasmania (r=0.79) and Law Dome (r=0.76) and exclusion of Oroko (r=-0.25) – a different screening result than Gergis et al.



All three series have been discussed at Climate Audit from time to time over the years: tags tasmania  oroko lawdome.   Two of the three series (Mt Read, Oroko) were illustrated in AR4 (which didn’t show Oroko values after 1957), but AR4 lead authors snickered at my request that they also show Law Dome (see here.)  The authors realized that the Law Dome series had very elevated values in the late first millennium (see figure below from Jones and Mann, 2004) and there was no way that there were going to show a series which “diluted the message”. Compare the two series used in Mann and Jones 2003 in the first figure below with the two series shown in AR4 in the second figure below.

law dome and tasmania

Figure ^. Excerpt from Mann and Jones 2003, showing Law Dome and Mount Read series.


Figure ^. Excerpt from IPCC AR4, showing Oroko and Mount Read series.

Thus, despite any aspirations for AR5, Gergis et al 2016 contained no long series which had not been used in Mann and Jones 2003.

It is also obvious that long results from combining Law Dome and Mt Read will have a considerably different appearance than long results from combining Mt Read and Oroko.  Although Gergis et al claimed that screening had negligible impact on results, Law Dome was excluded from all such studies.

Nor did Gergis et al actually use the Tasmania “Regional Curve Standardisation” series, as claimed.  Cook archived two versions of his Tasmania chronology in 1998, one of which (“original”) was the RCS chronology, while the other (“arfilt”) was a filtered version of the RCS chronology.  Gergis used the “arfilt” rather than “original” version – perhaps inheriting this version from Mann et al 2008, which also used the arfilt version.  Cook’s original article (Cook et al 2000) also contained an interesting figure showing mean ring widths in Tasmania prior to adjustment (for juvenile growth).   This data is plotted below (showing Cook’s figure as an insert). It shows a noticeable increase in 20th century ring widths, which, however, are merely returning to levels achieved earlier in the millennium and surpassed in the first millennium. High late first millennium values are also present in the Law Dome data.


Many of the Gergis series are very short – with coral series nearly all starting in the 18th and even late 19th centuries.  To the extent that the Gergis reconstruction shows a 20th century hockey stick, it’s not because this is a feature that is characteristic of the long data, but through the splicing of short strongly trending coral data with the longer tree ring data.  The visual result will depend on how the coral data is scaled relative to the tree ring data.

While Gergis and coauthors made no useful contribution to understanding past climate change in the Australasian region, in the interest of sounding a more positive note,  large and interesting speleothem datasets have been recently published, though not considered by Gergis et al, including very long d18O series from Borneo and Indonesia (Liang Luar), both located in the extended G16 region.  I find the speleothem data particularly interesting since some series provide data on both a Milankowitch scale and through the 20th century. For example, the Borneo series (developed by Jud Partin, Kim Cobb and associates) has very pronounced Milankowitch variability and comes right to the present.   In ice age extremes, d18O values are less depleted (more depleted in warm periods.)   Modern values do not appear exceptional.  Results at Liang Luar are similar.




Gergis has received much credulous praise from academics at Conversation, but none of them appear to have taken the trouble to actually evaluate the article before praising it.   Rather than the 2016 version being a confirmation of or improvement on the 2012 article, it constitutes as clear an example of data torture as one could ever wish.  We know Gergis’ ex ante data analysis plan, because it was clearly stated in Gergis et al 2012.   Unfortunately, they made a mistake in their computer script and were unable to replicate their results using the screening methodology described in Gergis et al 2012.

In order to get a reasonably populated network and a reconstruction resembling the Gergis et al 2012 reconstruction, Gergis and coauthors concocted a baroque and ad hoc screening system, requiring a complicated and implausible combination of lags and adjacent gridcells.   A more convincing example of “fine tun[ing] the analysis to the data in order to obtain a desired result” (data torture) is impossible to imagine. None of the supposed statistical tests have any significance under the weight of such extreme data torture.

Because IPCC AR5 had used results of Gergis et al 2012 in a prominent diagram that it was committed to using, and continued to use the results even after Journal of Climate rescinded acceptance of Gergis et al 2012 (see here),  Gergis et al had considerable motivation, to say the least, to “obtain” a result that looked as much like Gergis et al 2012 as possible.  The degree to which they subsequently tortured the data is somewhat breathtaking.

One wonders whether the editors and reviewers of Journal of Climate fully understood the extreme data torture that they were asked to approve.  Clearly, there seems to have been some resistance from editors and reviewers – otherwise there would not have been nine rounds of revision and 21 reviews.  Since the various rounds of review left the network unchanged even one iota from the network used in the PAGES2K reconstruction (April 2013), one can only assume that Gergis et al eventually wore out a reluctant Journal of Climate, who, after four years of submission and re-submission, finally acquiesced.

As noted above, Wagenmakers defined data torture as succumbing to the temptation to “fine tune the analysis to the data in order to obtain a desired result” and diagnosed the phenomenon as being particularly likely when the authors had not “commit themselves to a method of data analysis before they see the actual data”.  In this case, Gergis et al had, ironically, committed themselves to a method of data analysis not just privately, but in the text of an accepted article, but they obviously didn’t like the results.

One can understand why Gergis felt relief at finally getting approval for such a tortured manuscript, but, at the same time, the problems were entirely of her own making.  Gergis took particular umbrage at my original claim that there were “fundamental issues” with Gergis et al 2012, a claim that she called “incorrect”. But there is nothing “incorrect” about the actual criticism:

One of the underlying mysteries of Gergis-style analysis is one seemingly equivalent proxies can be “significant” while another isn’t. Unfortunately, these fundamental issues are never addressed in the “peer reviewed literature”.

This comment remains as valid today as it was in 2012.

In her Conversation article, Gergis claimed that her “team” discovered the errors in Gergis et al 2012 independently of and “two days” before the errors were reported at Climate Audit.  These claims are untrue. They did not discover the errors “independently” of Climate Audit or before Climate Audit. I will review their appropriation of credit in a separate post.


  1. Posted Jul 21, 2016 at 6:45 PM | Permalink

    Steve, What a comprehensive rebuttal to recent falsehoods spread by Joelle Gergis at The Conversation, or as its known in rational circles: “The Con”. Having followed this for some time sadly I am not surprised at the level of mis-information,vitriol and poor science being promulgated by activists such as Gergis. Sadly this sort are immune to facts.
    I am still also surprised at the lack of good quality palae-climate data from Australia. You’d think with all the millions being spent on some could be set aside for acquiring new data. There are plenty of suitable sites around but I guess this means getting your hands dirty.

  2. Elbert Marks II
    Posted Jul 21, 2016 at 9:35 PM | Permalink

    One of many questions that arises in my mind is; Who is the sugar daddy that pays for Gergis’s four years of submission and re-submission?

  3. tmitsss
    Posted Jul 21, 2016 at 9:35 PM | Permalink

    God Bless you, Mr.McIntyre

  4. Willis Eschenbach
    Posted Jul 21, 2016 at 10:53 PM | Permalink

    Man, this nonsense has surfaced again? Vampires and the living undead are nowhere near as hard to kill as bogus proxy-based science.

    In any case, thanks so much for squaring the circle and pointing out that the latest incarnation is even worse than the previous version. Your attention to detail is lethal. Anyone funding Gergis or believing in this study after your deconstruction is … well … not paying attention.

    One astounding part was this:

    Multiplying three lags by 2-6 gridcells, Gergis appears to have made 6-18 detrended comparisons, retaining those proxies for which there was a “statistically significant” correlation. It doesn’t appear that any allowance was made in the benchmark for the multiplicity of tests.

    I can’t believe it’s 2016 and they still haven’t heard of Bonferroni. It’s not new information. As you point out, if you are looking for significance at the p=0.05 level and you look in 6 places, you need to find something significant at the 0.05 / 6 ≈ 0.01 level. And if you have 18 places to look, you need to find something significant at the 0.05 / 18 ≈ 0.003 level …

    Profound thanks to you as always for your masterful, detailed, and well-explained analyses,


    • Posted Jul 23, 2016 at 11:57 AM | Permalink

      Presumably the temperature trends in adjacent grid cells are highly correlated. So the advantage of trying correlations across these would be small.

      On the other hand the lags are horrible. Gergis & co should decide which lag they are going to use in advance and not test the others.

      • stevefitzpatrick
        Posted Jul 24, 2016 at 8:21 PM | Permalink

        Gergis et al have discovered trees can fortell the future…. they know what the weather will be like a year in the future. That this passes review shows how silly the entire field is.

      • TimTheToolMan
        Posted Jul 24, 2016 at 10:09 PM | Permalink

        As Steve stated, and to be fair the “future” ability of trees might be explained by the fact that the Southern Hemisphere has its growing season spanning two years. So if the proxies are marked as say 1988 but the temperature data is 1989 because summer is still happening in Feb of that year then that *might* explain the seemingly crystal ball-ish capability of the trees.

        I would personally want to see that fully documented though! Otherwise arbitrary use of +1 lag to fit the story is simply torturing the data in the extreme.

  5. ngard2016
    Posted Jul 22, 2016 at 12:23 AM | Permalink

    The Calvo et al study found that there has been a southern Australian SST cooling for at least 6,500 years. This seems to agree with other Australian studies and Antarctica as well.

  6. Posted Jul 22, 2016 at 3:26 AM | Permalink

    Congratulations on an excellent and important article, Steve, and thanks for carrying out the careful analysis underlying it.

  7. Posted Jul 22, 2016 at 5:19 AM | Permalink

    And Brandon Shollenberger has now shown that they again did not do what they said they did. Instead of using the calibration period 1931 to 1990 as claimed they seem to have used 1921 to 1990, overlapping with the verification period. See links to Izuru in trackbacks below.

  8. Posted Jul 22, 2016 at 6:39 AM | Permalink

    There are some noticeable differences between the Mount Read (Tasmania) curves in the Mann&Jones 2003 figure and the IPCC AR4 figure. In particular, the peaks in the AR4 version (at ~1150, ~1300, ~1500, ~1950-2000) are all approximately equal, whereas in MJ2003 the modern peak is clearly highest, with the ~1500 peak well above the remainder within 1000-1950. MJ2003 also has a broader plateau around 1125-1175, compared to the ~1150 peak in AR4. Is this an artifact of the choice of filtering? (The MJ2003 version seems to have less high-frequency content.) Or is it due to a different Cook’s choice (“original” vs. “arfilt” flavors)?

  9. Jimmy Haigh
    Posted Jul 22, 2016 at 7:33 AM | Permalink

    Reading this reminded me of watching the Borat movie: Too exceuciatingly painfull to stomach in anything but small bites.

  10. bob
    Posted Jul 22, 2016 at 7:34 AM | Permalink

    You wrote:
    “The lag of +1 years assigned to 5 sites is very hard to interpret in physical terms. ”

    It apppears that some Canadians are infected with the virus that causes English Understatement.

    More seriously, recall that thiotimoline is derived from the shrub rosacea karlsbadensis rugo. Thiotimoline compounds provide a physical mechanism for a +1 lag and I believe that r. k. rugo is found throughout australasia.


  11. dearieme
    Posted Jul 22, 2016 at 7:50 AM | Permalink

    “amateur climate skeptic bloggers”: I hooted with laughter.

    Is Micky Mann the Hillary Clinton of Climate Science?

    • Posted Jul 23, 2016 at 6:19 AM | Permalink

      Four cuts with one blade. What was wrong with the single word “others”?

  12. Gary
    Posted Jul 22, 2016 at 8:04 AM | Permalink

    “…Mangawhera ring widths assigned to the summer of 1989-1990 correlate to temperatures of the following summer (1990-1991) – ring widths in effect acting as a predictor of next year’s temperature.”

    I knew the coloration of caterpillars and the length of squirrel tails could predict the severity of an approaching winter, but thanks to Gergis we can now predict next year’s summer weather with this year’s tree rings. The wonders of climate science are amazing.

    • Jeff Norman
      Posted Jul 22, 2016 at 12:38 PM | Permalink

      “The wonders of climate science are amazing… Explain again how sheep’s bladders may be employed to prevent el Ninos.” [/Holy_Grail]

  13. mpainter
    Posted Jul 22, 2016 at 9:29 AM | Permalink

    Steve again documents here the dysfunctional peer review that infects climate science:
    Steve McIntyre:
    “One wonders whether the editors and reviewers of Journal of Climate fully understood the extreme data torture that they were asked to approve. Clearly, there seems to have been some resistance from editors and reviewers – otherwise there would not have been nine rounds of revision and 21 reviews. Since the various rounds of review left the network unchanged even one iota from the network used in the PAGES2K reconstruction (April 2013), one can only assume that Gergis et al eventually wore out a reluctant Journal of Climate, who, after four years of submission and re-submission, finally acquiesced.”


    One’s imagination boggles in the effort to perceive a regular process of peer review in this.

    • Steve McIntyre
      Posted Jul 22, 2016 at 10:19 AM | Permalink

      one can only assume that Gergis et al eventually wore out a reluctant Journal of Climate, who, after four years of submission and re-submission, finally acquiesced.”

      Maybe it would have been mildly more felicitously written if I had said:

      one can only assume that Gergis et al eventually wore out a reluctant Journal of Climate, who, after four years of submission and re-submission, finally submitted (sensu WWE) ”

      • Skiphil
        Posted Jul 22, 2016 at 10:36 AM | Permalink

        4 years!? I wonder if there were any personnel changes at Journal of Climate that helped Gergis et al. to finally break through?

      • Clark
        Posted Jul 25, 2016 at 11:12 AM | Permalink

        This “reviewer fatigue” is a real thing – that is if the editors are determined to allow it. My best example is a paper that underwent 7 rounds of review at Nature. Despite never fixing the major methodological flaw identified in the first review, the paper was accepted over my objections. The other reviewers eventually folded under the weight of pages and pages of rebuttals to wade through.

        Of course a year later another group published a rebuttal paper showing that the entire Nature paper was an artifact of bad methodology.

        Unfortunately, that type of competitive science that acts as a corrective for bad peer review seems woefully lacking in climate science.

  14. Posted Jul 22, 2016 at 9:36 AM | Permalink

    “not commit themselves to a method of data analysis before they see the actual data. It then becomes tempting to fine tune the analysis to the data in order to obtain a desired result—a procedure that invalidates the interpretation of the common statistical tests.”

    This is a big temptation that probably prevails in many investigations.

    I write code to do some simple analysis on climate data sets, motivated by curiosity, which I think has value.

    But after running an initial analysis usually with graphical output to understand a data set,
    inevitable refinements follow. These refinements all involve choices, choices colored by preconception.

    This is beyond the biases that occur even when method is predetermined – experience colors those choices also.

    • Duster
      Posted Jul 29, 2016 at 12:38 PM | Permalink

      Empirical time-series field data like weather observations that someone wants to generalize into “climate data” has a flaw in that you cannot, ever, precisely replicate the collection conditions. Running an EDA (Exploratory Data Analysis) can give you a view of that specific data. It can easily reveal correlations that occur within that specific data set. What it can’t tell you is whether those correlations are evidence of causal linkages (either between variables, or influences on variables by a third, etc.) or purely accidental. Highly unlikely events occur all the time when you start considering things that happen on a planetary scale. That doesn’t make them “significant” or informative. Even re-sampling the data runs into the problem that a high correlation between variables in the original data should reappear in most subsets of the original data to some degree. Even predetermining analytical methods are at the mercy of happenstance.

      Climate and weather are ephemeral, and the historical traces they (climate and weather) leave are subject to the ephemeral conditions under which they were left and the ephemeral and changing conditions that preserved them. Every attempt to explain a given data set ultimately comes down to an argument by analogy. An “all things being equal …” approach that is inherently logicallt wrong. We can confidently say that CO2 and LWIR interact in a specific manner in a laboratory, but, since we cannot squeeze the rest of the planet into the laboratory and apply strict control protocols, things never were “equal.”

  15. Posted Jul 22, 2016 at 10:33 AM | Permalink

    Coughlan, 1979
    Recent Variations in Annual Mean Temperatures Over Australia
    [Fig. 3, pg. 711, shows no warming over Australia from 1911-1975.]

  16. Jeff Norman
    Posted Jul 22, 2016 at 12:33 PM | Permalink

    One wonders if the editorial staff at the Journal of Climate was “de-trended” during the interval.

  17. Posted Jul 22, 2016 at 1:06 PM | Permalink

    I can understand that it must be annoying if someone doesn’t represent a situation as you would have done, especially if their representation presents you in a somewhat unflattering light. I can also understand if you don’t get credit for something you think you did. However, what I don’t understand is what you hope to achieve by delving into this again; well, other than simply another round of Climateball(TM). Is there some realistic outcome that you would regard as both positive and constructive? If so, what would it be.

    Also, part of Joelle Gergis’s article was about her treatment on blogs and how it made it difficult for climate scientists to engage on those forums. You’ve ended up calling her a whining, self-serving, data tutoror who isn’t telling the truth. Doesn’t that somewhat confirm what she was saying about the treatment of climate scientists on blogs?

    • Curious George
      Posted Jul 22, 2016 at 1:27 PM | Permalink

      “What you hope to achieve by delving into this again?” What does Joelle Gergis hope to achieve by delving into this again? Or do you believe that she is not worth a response?

      • Posted Jul 22, 2016 at 1:29 PM | Permalink

        I don’t know what Joelle Gergis hopes to achieve, but presumably the timing was based on her article appearing.

        Or do you believe that she is not worth a response?

        Do you believe this is worth a response?

        • miker613
          Posted Jul 22, 2016 at 2:44 PM | Permalink

          ATTP, you mentioned in your post that “To be clear, I don’t know if what she presents in her article is the whole truth and nothing but the truth…” Isn’t there some difference between _not quite telling the _exact_ truth_, the “whole truth”, and: There pretty much wasn’t one word of truth in anything she said.
          She said she “just found a typo”. It wasn’t, it was a error that undid the result of her paper.
          She said she could have “taken the easy way out and just corrected the single word”. She couldn’t have, but she tried! – the editor withdrew his acceptance, and refused to let her “correct that word”.
          She said that her group discovered the error two days before the bloggers did. They didn’t: the emails here make it clear that the error was brought to her group’s attention by one of them pointing to climateaudit.
          She said that FOI requests were an attempt to “find ammunition” and a technique to “intimidate”. They weren’t: the emails make it clear that McIntyre was trying to get the actual data they used so that he could analyze it, and they didn’t want to provide it.
          She said that there were false claims by McIntyre that there were fundamental errors with the study. They weren’t false claims: her own group’s leader made it clear that making the study detrended was an important attempt to try to get past the “Selection Error” issue that was discrediting the work of Hockey Stick scientists.
          She said that the study has verified that everything is fine and shows “virtually the same conclusions.” It doesn’t; according to the post here, simple detrending would have yielded no more than six to eight proxies and very different results. Instead the study has chosen to allow verification methods that are insanely egregiously wrong, to bring the number of proxies back up to where it was before.

          Gergis is trying to re-litigate a scientific issue where she was badly fisked. Those who take anyone’s word for it if they’re critical of skeptics are her prey. Read the comments at ATTP, or at theconversation; they didn’t study the issue, won’t read what climateaudit has to say, don’t know what really happened in 2012, but they’re all ready to denigrate those awful anti-science skeptics.

          In any case, I’m a little astonished, ATTP, that you question the need for this post. Gergis’s group has re-published; a paper that had been rejected by science as being wrong is being reconsidered to be right. Why isn’t this the right time to slap it down again, if that’s what it deserves? See BBD’s immortal words at ATTP ( “It’s still a hockey stick, despite all the years of contrarian fussing. And that’s what counts in the end.” Does it matter to him if the paper is wrong?

          As I’ve pointed out before, anyone who cares about AGW should be trying as hard as possible to get rid of this kind of really bad climate science. BBD will be happy because he won’t know any better, but those of us who see both sides will have a harder time taking a lot of other good climate science seriously.

        • Posted Jul 22, 2016 at 2:53 PM | Permalink

          I simply can’t see what will be gained by delving into the whole who said what when from 3 years ago. If someone wants to publish another paper that does a better analysis, or shows what Gergis et al. have done wrong and the significance of their error, that would be fantastic – that’s how science progresses and how we improve our understanding of whatever it is we’re studying.

          Also, if a climate scientist writes an article pointing out how the tone of some blogs is such that engaging there is difficult/not worth it and some of the blogs mentioned then respond by calling them a whiny, self-serving, data torturer who isn’t telling the truth, it’s hard to then dismiss that aspect of their article.

        • Steve McIntyre
          Posted Jul 22, 2016 at 3:31 PM | Permalink

          Contrary to your remarks, I am unable to find anything in my original commentary at Climate Audit that Gergis could have legitimately objected to as inappropriate in tone. If you can identify commentary that you think is inappropriate, I’d welcome the feedback. From my perspective, climate scientists often treat technical criticism as personal or even “hate mail”.

          Gergis’ article linked to some offensive comments at a blog that I’d never heard of, then smeared the offence against blogs, such as Climate Audit, which did not deserve the smear.

        • miker613
          Posted Jul 22, 2016 at 3:00 PM | Permalink

          And for those who actually are awful anti-science skeptics, “denigrate” means “put down”. 🙂

        • Sven
          Posted Jul 22, 2016 at 3:08 PM | Permalink

          “I simply can’t see what will be gained by delving into the whole who said what when from 3 years ago.”

          So, what do you think, then why Gergis did it?

        • miker613
          Posted Jul 22, 2016 at 3:15 PM | Permalink

          ATTP, Gergis is the one playing Climateball. Her whole article was an attack on climate skeptics, McIntyre in particular, and their scientific criticism of her work. It was vindictive, boastful, and mendacious. It had no other point, as you can see by the comments there; no one cared about her publication or even really knew what it was. That didn’t bother you, but a powerful response by her intended victims does. The more effective the response, the more you feel that they are overdoing it. Also Climateball.

        • Posted Jul 22, 2016 at 3:40 PM | Permalink


          Contrary to your remarks, I am unable to find anything in my original commentary at Climate Audit that Gergis could have legitimately objected to as inappropriate in tone. If you can identify commentary that you think is inappropriate, I’d welcome the feedback. From my perspective, climate scientists often treat technical criticism as personal or even “hate mail”.

          Gergis’ article linked to some offensive comments at a blog that I’d never heard of, then smeared the offence against blogs, such as Climate Audit, which did not deserve the smear.

          I don’t quite see how this is contrary to my remarks. I was referring to your current post, not your earlier ones. I don’t think Gergis specifically accused your earlier posts as being inappropriate, but that some of what she was subjected to at the time was inappropriate. I was simply suggesting that the tone of your current post is such that she would probably be unwilling to engage BLT.

        • Steve McIntyre
          Posted Jul 22, 2016 at 6:54 PM | Permalink

          You say “I don’t think Gergis specifically accused your earlier posts as being inappropriate”. I’m not sure that I agree.

          I normally avoid editorializing adjectives, in part to avoid giving excuses since people like yourself quickly seize on occasional lapses from this policy. You seem to agree that the original posts did not contain inappropriate language. In that respect, I think that it was very unfair for Gergis to mention me by name in connection with comments from a blog that I’ve never heard of. It’s also disappointing that someone like yourself would not have contradicted her on this issue.

          Having said that, in the interests of improving relations, I will remove the extra adjectives that offended you and change the phrase “a whinging, self-serving and untrue account at Conversation of the withdrawal/retraction of the 2012 version” to an “untrue account at Conversation of the withdrawal/retraction of the 2012 version”. The term “data torture” is a term that is used in statistical commentary – I cited Wagenmakers. It has a technical meaning that precisely fits Gergis et al 2016.

          I’ve noticed that your own blog is not entirely free of aggression, let alone micro-aggression, but I do not require or even request that you live up to the micro-agression-free environment that you insist upon for me.

          I hope that these compromises achieve an environment so free of micro-aggression that even tender flowers, such as perhaps yourself, can bask in its warmth.

        • Posted Jul 23, 2016 at 2:49 AM | Permalink

          I’m not insisting on anything and certainly do not require, or even request, that you change anything. I’ve no idea why you think that was what I was wanting. If anything, I’d request you put it all back to what it was and own what you initially wrote. I was simply interested in what you hoped to achieve. I’m well aware that my blog is not free of aggresssion, or even micro-aggression, and have never claimed that it is. I try to make it a site where one can engage in discussions; I don’t always succeed. This isn’t about my blog.

          I hope that these compromises achieve an environment so free of micro-aggression that even tender flowers, such as perhaps yourself, can bask in its warmth.

          Sure, with responses like this delicate flowers like myself will feel all warm and welcome. Seriously, I don’t give a damn. I fully expect the type of response that you’ve given; thanks for not surprising me.

          I was actually genuinely interested in what you hoped to achieve with your post and you still haven’t really answered the question. One reason I was interested was that I noticed you commenting on Twitter how your requests weren’t taken seriously and how your impact was typically negative. Well, if this post is an illustrating of your style of engagement, I’m confused as to why you’re surprised.

        • Steve McIntyre
          Posted Jul 23, 2016 at 8:47 AM | Permalink

          It would be easier to respond to your opening question, if you would resist the temptation to add snarky editorials when you are ignorant of the history.

          I am polite by instinct and my natural style of correspondence is formal and professional. Nonetheless, I experienced great difficulty in getting data that way and the Climategate dossier showed that Mann, Jones and others reacted very unprofessionally to the requests. I didn’t expect such treatment, especially from a community that was simultaneously expecting a wider society to move quickly on their findings.

          Nor do I begin by presuming the worst of systems. In respect to AR4, I did not feel that I would have any right to complain if I did not participate. I made constructive comments on paleoclimate topics on which I had expert-level knowledge without snark. So your closing jibe is completely unwarranted. (In that respect, my prior closing jibe was also unnecessary, but was overtly satirical.)

          You also have a bad habit of paraphrases that distort the original point. I try to directly quote people that I am criticizing and I urge you to do the same, both out of courtesy and to improve understanding.

          Nor does Twitter permit very nuanced comments, as I’m sure that you are aware. This undoubtedly contributes to Twitter flame wars.

          Your comment about “negative” impact pertained to Richard Betts’ criticism of me not participating as a peer reviewer of AR5. I had participated in good faith as a reviewer of AR4 and made review comments without snark.

          I do not believe that my review comments were handled fairly. One example – and I am reminded of it because of the handling of the Law Dome proxy in Gergis et al – was IPCC’s handling of Law Dome in the AR4 graphic of long SH proxies, an issue that I had raised as a review comment and on which there were Climategate emails – see CA post In my AR4 review comments, I repeatedly urged IPCC authors to properly disclose the inconsistency of proxies and the repeated use of proxies with known properties in multiproxy studies that were not “independent”, despite claims to the contrary. In that respect, I asked them to show the Law Dome d18O series which they had not shown in the draft graphic. The IPCC authors sneered at me, continued to withhold the Law Dome series, but added some legalistic text which they could point to if criticized.

          In the controversy with which I was then associated, I commented on the characterization of the hockey stick controversy in the First and Second Drafts. The characterization of the dispute in the Second Draft was not ideal, but was non-committal. Subsequently, the assessment was secretly re-written by Eugene Wahl (at the invitation of Keith Briffa) to one that was much more adverse to us with us having no chance to object or comment, with Wahl’s comments being withheld from the IPCC record which was supposed to be complete. When some such intervention was later suspected, Jones, Briffa, Mann, Wahl and Ammann entered into an agreement to destroy the evidence of Wahl’s involvement by deleting the emails after they had been requested by David Holland in FOI.

          Based on such experience, I decided not to participate in the AR5 review process. I did comment on AR5 drafts based on copies of draft documents that were provided to me, despite IPCC attempts to prevent such commentary.

        • Posted Jul 23, 2016 at 5:03 PM | Permalink

          This has all drifted rather far away from what I was intending. I’m still no closer to really understanding what you would regard as some kind of realistic, contructive, positive, outcome, but I doubt I’m about to get any closer. You also seem to be seeing snarky editorials and jibes in what I write. I’m certainly not intending to write snarky editorials or jibes, so maybe you should consider that I mean whatever it is that I have written. Anyway, I’m back on holiday tomorrow, so will leave it at that.

        • Steve McIntyre
          Posted Jul 23, 2016 at 6:17 PM | Permalink

          your main question, minus the jibes, is a fair one but not one that can be answered quickly any more.

        • Posted Jul 23, 2016 at 5:39 PM | Permalink

          I continue to be amazed that ATTP can’t acknowledge that there might be important technical issues here that should really be the focus. Perhaps since he has already acknowledged his technical ignorance its understandable that he has produced a huge number of words on a really peripheral issue about motivations that is largely irrelevant to the real issues of scientific replication, the reliability of the literature, and the elements of pseudo-science in paleoclimatology. Steve McIntyre’s main focus is it seems to me on those very important issues. It’s not surprising that paleoclimatologists are not amused, but they will survive I think.

        • Posted Jul 23, 2016 at 8:40 PM | Permalink

          Steve, The problem here with ATTP goes very deep and involves the attitude to science and the science literature, particularly the climate science literature. If you believe as ATTP does, that everything is fine with science, and these things are all resolved in the normal course of the literature, then your whole blog enterprise (and your forays into criticism outside the literature) seems illegitimate. A lot of this I believe is simple denial on ATTP’s part. If most medical results are wrong, then that’s a problem in the medical field. If psychology is corrupted by bias, that’s not a problem in physics. It’s a little naive and shows I think an unfounded and slightly dogmatic reverential attitude to science. Not surprising since ATTP’s job is as a public relations man for an astronomy department. It is also a part of the left wing view that there is science, which is right, and superstition and prejudice opposed to it. Usually, the view is that this prejudice is associated with the political right. It is hard for most to rise above their own interests and those of their associates in the science establishment. That’s why there is an abundance of words that really say very little.

        • Posted Jul 23, 2016 at 8:46 PM | Permalink

          This is a big issue for me because I believe that the literature in my own field, computational fluid dynamics, is strongly affected by bias. It is the norm to show only results that agree with the data. Other results are always hidden and attributed to some superficial issue. Another excuse I’ve seen a lot in climate science too is that the data is wrong when it disagrees with the model. The problem here is that the real experts, those who develop the turbulence models, are far more honest in private. One only wishes they were so frank in public. There is tremendous peer pressure to present models in a positive light and negative results are usually ignored even when they are published.

        • jddohio
          Posted Jul 23, 2016 at 9:02 PM | Permalink

          ATTP “This has all drifted rather far away from what I was intending. I’m still no closer to really understanding what you would regard as some kind of realistic, constructive, positive, outcome, but I doubt I’m about to get any closer.”

          I think it is good that you are trying to have a dialog, but in this case, I believe it is close to impossible. If Steve (and Brandon) are correct, even with 4 years to work on it, Gergis has done a very poor job, which strikes deeply at her professionalism. With both sides so far apart and with Gergis having so much to lose professionally, I don’t see any way she can give credit to Steve’s work in any but the most minor way.

          Also, there is a long history of Hansenite science impugning the motives and competence of skeptics. (A better term in my view is climate realists) Realists are routinely denigrated by disparaging unnecessary name calling by people at the very top of Hansenite science. Additionally, for instance, Mann has repeatedly lied about an important excel file and on one in the Hansenite science community calls him out.

          About the only way I can see that any dialog can arise out of this is for Gergis to respond. Don’t expect it would be friendly dialog, but it would be an exchange of ideas. Of course, if she doesn’t respond and the article is never retracted the issue will never go away.


      • HAS
        Posted Jul 22, 2016 at 4:00 PM | Permalink


        To the extent this process makes it harder for climate scientists to use the desired results to select the method it will be a good thing, don’t you think?

        It is a common problem that many plying their trade in this area don’t understand experimental design and appropriate methods, so having a few high profile cases will focus the minds of those that in it for the science, even if it only encourages those who are in it for the PR.

    • TerryMN
      Posted Jul 22, 2016 at 1:43 PM | Permalink

      I can understand why you’d rather talk about tone and treatment of climate scientist than defend or come up with a physical basis of why some trees can apparently predict temperatures a year out, too.

    • Will J Richardson
      Posted Jul 22, 2016 at 1:47 PM | Permalink


      As I read the Mr. McIntyre’s post, it is not a matter of whether or not “[Gergis] doesn’t represent a situation as [McIntyre] would have done”. Mr. McIntyre’s timeline derived from Gergis and co-authors’ emails demonstrates the mendacity of Gergis’ claim to prior discovery of what Gergis describes as a typographical error. When Gergis’ excuses are contradicted by the facts, Gergis’ statements which are demonstrably contrary to the facts is not a matter of not “reresent[ing] a situation as [McIntyre] would have done”.

      As for Mr. McIntyre “calling [Gergis] a whining, self-serving, data tutoror (sic) who isn’t telling the truth”, with which of those appellations do you disagree, and what are your reasons? Has Mr. McIntyre not demonstrated a case of data torturing? Is Gergis typographical error explanation not self-serving? Were Gergis’ emails to the publisher not a bit whinny?

    • mpainter
      Posted Jul 22, 2016 at 2:11 PM | Permalink

      Ken Rice (ATTP):”Is there some realistic outcome that you would regard as both positive and constructive? If so, what would it be.”

      Realistic outcome: an audit of Gergis, et al 2016.
      Isn’t it nice to have a thorough and complete examination of a scientific study?

      Or do you take the position that such an examination (i.e., audit) should not be done in climate science?

      • Geoff Sherrington
        Posted Jul 24, 2016 at 2:50 AM | Permalink

        As an Australian scientist, can I note that it is so disappointing to see work like this represented internationally as “Australia’s view” or similar.
        I welcome the accuracy and diligence of Steve McIntyre and hope that one outcome of his studies will be a review of the paper by Gergis et al before it is accepted in international forums and publications as “Australia’s view”.
        We Aussies had and have essentially no choice about who represented/s us on IPCC and other international efforts. We see a standard of work that we would rather not but are comparatively helpless to correct it.

    • DaveS
      Posted Jul 22, 2016 at 2:24 PM | Permalink

      If academics wish to be taken seriously, perhaps they should stop producing, or defending, dross.

    • Sven
      Posted Jul 22, 2016 at 2:33 PM | Permalink

      The crap that ATTP is willing to accept when it’s on “his side” is just astonishing. Both on the science (Journal of Climate) and behavior (The Conversation) fronts. Utterly dubious techniques? Knowingly lying? Pas de problème! This is truly bizarre.

    • None
      Posted Jul 22, 2016 at 2:36 PM | Permalink

      “However, what I don’t understand is what you hope to achieve by delving into this again”

      You don’t think presenting the facts in response to more slanderous fabrications from climate scientists is important ?

    • Posted Jul 22, 2016 at 2:50 PM | Permalink

      ATTP, as usual you avoid the main issues. It is blindingly obvious to readers of the original 2012 article that was withdrawn that Gergis and her co-authours intended to detrend the proxies and the instrumental temperatures but, by accident, failed to do so. It is also perfectly clear that if they had done what they intended there would have been no usable reconstruction – and this is itself important information. Rather than simply say so what Gergis has done is to change her own criteria for which proxies to include by devices such as allowing the “best” of any correlation within two regions and three dates. According to standard statistical criteria this multiplication of targets will inflate conventional “nominal”significance levels way above the true significance levels. Then there are tricks like truncating the reconstruction by 600 years without clearly noting it.
      The impression I get is that the conclusion was fixed in advance and a search was made over methods to find those that give the closest match with the conclusions of the withdrawn article. This is not robust science. Then throw in some obviously inaccurate statements; that it was only a one-word typo, that the authors themselves decided to completely redo the paper rather than being required to by the journal, and that Climate Audit only spotted the mistake two days after it was spotted by one of the researchers. It is then quite understandable that our host should be annoyed and want to set the record straight, both about the substantive statistical issues and the simple questions of who said what to whom and when. There is a really serious issue about how to choose proxies for temperature reconstructions which the main names in the field simply refuse to address. Until they do so there is no reason to take their temperature reconstructions seriously.

      • mpainter
        Posted Jul 22, 2016 at 7:24 PM | Permalink

        Phelps: “… that Gergis and her co-authours intended to detrend the proxies and the instrumental temperatures but, by accident, failed to do so.”
        Maybe no accident. Remember the Gergis quote retrieved from the Wayback Machine? Something about doing her part in the “Guerilla War Against Climate Change”. This phrase speaks quite eloquently, imo.

        Steve: this is the sort of extra accusation that people like Gergis use to ignore valid criticisms. There is no point in such speculation, so please avoid unless you can supply evidence. I have zero doubt that the original error was inadvertent.

        • mpainter
          Posted Jul 23, 2016 at 10:11 AM | Permalink

          It is a curiosity that in light of all the FOI emails and other documentation which shows otherwise, Gergis persists in her appropriation of credit for the detection of the omission of detrending. Here we are provided a window on her character. This would suffice in a courtroom as evidence in an evaluation of her honesty, imo.

        • Steve McIntyre
          Posted Jul 23, 2016 at 3:15 PM | Permalink

          one oddity is that she actually ratcheted up their false claims to priority – this time claiming that they had discovered the error “two days” before it was reported at Climate Audit, even though their first document is two hours after the error had been reported at Climate Audit.

          Even before Jean S reported the specific error, I had raised questions about screening when it was nowhere on their radar, so their claim to have “independently” discovered the error has no more truthfulness than Gavin Schmidt’s claim that a mystery man had independently discovered the Harry error after it had been mentioned at Climate Audit.

        • mpainter
          Posted Jul 23, 2016 at 5:20 PM | Permalink

          From David Karoly’s email to you:

          “An issue has been identified in the processing of the data used in the study, which may affect the results. While the paper states that “both proxy climate and instrumental data were linearly detrended over the 1921–1990 period”, we discovered on Tuesday 5 June that the records used in the final analysis were not detrended for proxy selection, making this statement incorrect.”

          Tuesday, June 5 was the day that the error was identified at Climate Audit by Jean S.
          Now Gergis claims that Karoly is wrong, that her “team” detected the error two days previously.
          Karol also identifies it as a data processing error, not a “typo” as claimed by Gergis. Good grief.

    • Posted Jul 22, 2016 at 2:52 PM | Permalink

      Anders is fascinating. He wrote a blog post promoting the piece Joelle Gergis wrote for The Conversation, showing absolutely no interest in whether or not anything she wrote were true. He now asks our host:

      I can understand that it must be annoying if someone doesn’t represent a situation as you would have done, especially if their representation presents you in a somewhat unflattering light. I can also understand if you don’t get credit for something you think you did. However, what I don’t understand is what you hope to achieve by delving into this again; well, other than simply another round of Climateball(TM). Is there some realistic outcome that you would regard as both positive and constructive? If so, what would it be.

      The bias and close-mindedness is obvious as again Anders shows absolutely no interest in what is or is not true. Instead he asks why people would bother to respond to a newly published paper, one he promoted, as though somehow people discussing newly published scientific papers requires an explanation.

      For those who cannot understand the obvious, the “positive and constructive” purpose posts like this can serve is to highlight bad work so people can become aware of its flaws. If the flaws are serious enough, like they were for the earlier version of this paper, that awareness might even lead to corrective actions such as a corrigendum or even withdrawal. As for his claim:

      Also, part of Joelle Gergis’s article was about her treatment on blogs and how it made it difficult for climate scientists to engage on those forums. You’ve ended up calling her a whining, self-serving, data tutoror who isn’t telling the truth. Doesn’t that somewhat confirm what she was saying about the treatment of climate scientists on blogs?

      Joelle Gergis’s narrative of the treatment she’s received is entirely self-serving. Much of what she says is false or at least misleading. The sad reality is any criticism of her and her co-authors’ work will be portrayed as “abuse” by many people, to the extent she can lie and be defended for it because people like Anders think pointing out her lies is somehow inappropriate.

      Put simply, when new work gets published, people may choose to discuss it. We know this as even Anders chose to discuss this work. I cannot explain why he feels it natural for him to promote this work and the narrative surrounding it while refusing to actually examine any of it. What I do know is it is utterly absurd for him to express surprise or exasperation for criticizing work he himself writes blog posts to promote.

      • Posted Jul 22, 2016 at 2:54 PM | Permalink

        snip – ATTP, your engagement on these issues is more than welcome, but please avoid this sort of pointless food fight.

    • Steve McIntyre
      Posted Jul 22, 2016 at 3:16 PM | Permalink

      I’ve published commentary on numerous multiproxy studies over the years and, right now, probably know as much about the topic as anyone in the world. I’d published comments on Gergis et al 2012. I was interested in Gergis et al 2016 as a multiproxy study and many readers here anticipated that I would comment on the new version. If you look at my post, I think that most of it was concerned with the Gergis study and not the backstory. In writing the post, I noted, but didn’t parse some key backstory controversies.

      Data torture is an important issue in statistics (e.g. the Wagenmakers articles) and has been a long-standing issue at Climate Audit in respect to paleoclimate studies. When I examined the evolution from Gergis et al 2012 to Gergis et al 2016, I concluded that “data torture” was an appropriate description. Indeed, because we know the ex ante plan from Gergis et al 2012, it is a relatively unique example of “data torture”. Do you disagree with the appropriateness of the term?

      Her narrative at Conversation was untrue and self-serving, and named me in a derogatory context. As far as I can tell, few commenters either at Conversation or at your blog (other than oneuniverse) challenged her untrue narrative. It seems unfair that you call it “climateball” when I respond, but didn’t hold Gergis accountable when she published her untrue narrative.

      The easy way for Gergis to have avoided dispute over data torture would have been for her not to have done it. The best way for the climate science community to avoid criticism from me on such points is for it to avoid such practices in the first place.

      The easy way for Gergis to avoid disputes over her untrue narrative would have been for her to stick to the facts in the first place. It’s unreasonable for her to expect anything other than an antagonistic reaction when her narrative is untrue and derogatory.

      • Posted Jul 22, 2016 at 3:36 PM | Permalink

        That still doesn’t really answer my question. I wrote a single post about the Gergis article because it seemed topical and interesting from a blog perspective. However, I didn’t engage in the comments on her article, have never had a discussion with Joelle Gergis, and may never have a chance to do so. If I did, I might indeed ask why wrote some of what she did, but that’s somewhat beside the point. I was more interested in what you hope to achieve, rather than a justification for what you’ve chosen to do.

        From a scientific perspective, what’s interesting is what they present in their paper, not really what happened when they first tried to publish it. It’s not clear how delving into the past really helps that.

        Also, if part of Gergis’s article is about how it is difficult for climate scientists to engage on some blogs, it’s hard to see how your post doesn’t at least somewhat confirm this.

        The easy way for Gergis to have avoided dispute over data torture would have been for her not to have done it. The best way for the climate science community to avoid criticism from me on such points is for it to avoid such practices in the first place.

        The easy way for Gergis to avoid disputes over her untrue narrative would have been for her to stick to the facts in the first place. It’s unreasonable for her to expect anything other than an antagonistic reaction when her narrative is untrue and derogatory.

        This seems overly simplistic. Would be wonderful if every single scientist always performed their analysis in a manner that could not be criticised by anyone; if every scientist who ever commented publicly did so in a way that could not be criticised by anyone. However, we live in the real world where not every analsysis is perfect, where even the best analyses can be criticised, where even scientists might present a more positive picture of some issue than is justified; where even scientists who believe they’re being completely honest might still say something that can be disputed.

        If you think the only way to avoid disputes is to never do anything that can be criticised, then it seems like you’re suggesting that you can always find ways to introduce a dispute.

        To be clear, my point was about the manner in which it is done, rather than the act of criticising; I think criticising what people have said publicly is absolutely fine – I do it all the time. Hence, if your goal is to simply find reasons to criticise climate scientists, then you’re probably doing fine. However, if your goal is to actually move to a scenario where we can actualy engage in serious dicussions about this topic, then I think it is counter-productive to call people self-serving, whiny, data-torturers who don’t tell the truth. That’s just my view. YMMV, of course.

        Maybe I’ll repose my question. What realistic outcome would you regard as both positive and constructive?

        • Posted Jul 22, 2016 at 5:01 PM | Permalink

          I’ll respond Ken, because the answer I think is pretty obvious. This is not just a personal dispute. The goal should be to get the paper corrected. There is evidence that with proper screening, the result is a lot different.

          And of course, no-one ever said everything in the literature should be above criticism. It should be free of obvious errors such as data torture and it should be corrected with a corrigendum when errors are pointed out. The problem Steve McIntyre has had in the past is that peer reviewers have been hostile to his work, even when it is correct. That goes of course to the “replication crises” that you seem to wink at and excuse.

          My question to you is what do you hope to accomplish by your blog post? You say you don’t know enough to comment on the technical issues involved.

        • Posted Jul 22, 2016 at 5:08 PM | Permalink

          Amazing, there is virtually no link between the answer you’ve given and the question I asked.

        • Steve McIntyre
          Posted Jul 22, 2016 at 8:39 PM | Permalink

          ATTP, at your blog, you say:

          Fighting about something that is probably irrelevant and happened a few years ago seems entirely pointless.

          In that respect, Mann’s lawsuits against Steyn and Ball are surely prime examples of actions that advance nothing (except Mann’s vanity.) To my recollection, I haven’t seen any criticism or condemnation of these lawsuits. But you criticize me for spending a much smaller amount of time examining defective methodology in Gergis et al.

        • Posted Jul 22, 2016 at 9:01 PM | Permalink

          “What realistic outcome would you regard as both positive and constructive?”

          Quality (as in good work… not poop)

        • Posted Jul 23, 2016 at 3:06 AM | Permalink

          Why does Michael Mann always appear in these discussions?

          I’m not really criticising you. I was asking a question. I was interested in what you were hoping to achieve. My own view is that delving back into the details of some dispute from a few years ago is rather pointless if your goal is to move forward and improve our understanding of whatever it is we’re studying. You may, of course, disagree. That may not be your goal.

        • Steve McIntyre
          Posted Jul 23, 2016 at 9:53 AM | Permalink

          I was asking a question. I was interested in what you were hoping to achieve. My own view is that delving back into the details of some dispute from a few years ago is rather pointless if your goal is to move forward and improve our understanding of whatever it is we’re studying. You may, of course, disagree.

          My primary concern was to review of Gergis et al, 2016, rather than to comment on Gergis’ false narrative of events. From my perspective, I referred to her false narrative mainly in passing and did not attempt a detailed exegesis.

          “Data torture” is a very important consideration in applied statistics e.g. Wagenmakers’ discussion of data torture in connection with social psychology. To avoid data torture, Wagenmakers convincingly advocates that analysts specify their planned analytic procedures in advance and that they document failed tests, as well as “successful” tests, since statistical “significance” depends on the process – a point poorly understood in paleoclimate.

          Gergis et al used a very convoluted screening procedure which, to my knowledge, occurs nowhere else. If Gergis et al 2016 had been presented on its own, I would have expressed doubt that this was their ex ante screening plan and presumed that it emerged from data torture. In this somewhat unusual case, we know the ex ante plan from Gergis et al 2012, and thus there is no mystery about the ex ante plan.

          I do not see how it is possible to review Gergis et al, 2016 without considering the issue of “data torture” (Wagenmakers 2011, 2012). Do you seriously dispute the applicability of the criticism?

        • Posted Jul 23, 2016 at 6:20 AM | Permalink

          ATTP, please return to the charge. At your blog you wrote:”What makes the story more interesting, is the response of various bloggers to the original paper. It appears (which is not surprising) that they made a mountain out of a molehill.” Your own words, not Gergis’. The reality seems to be that the 2012-paper had some heavy scietific defects ( the reason for withdrawing it) and the 2016 paper too, see also .
          Have you read the paper before you wrote the cited sentences? You elevated the whole to the emotion-only-level and now it’s returning because the only possible emotion in face of this data massage can be shame in direction of the involved scientists.

        • Steve McIntyre
          Posted Jul 23, 2016 at 10:05 AM | Permalink

          At your blog you wrote:”What makes the story more interesting, is the response of various bloggers to the original paper. It appears (which is not surprising) that they made a mountain out of a molehill.” Your own words, not Gergis’.

          As you observe, ATTP was quick to disparage the original blog commentary, despite having no knowledge of the facts. This shoot-from-the-hip is unfortunately all too characteristic among alarmists. The original commentary was exactly right and on point. The problem was sufficient that Journal of Climate rescinded acceptance of Gergis et al 2012. They did so because of the error.

          Blogs didn’t rescind acceptance, the journal did. If ATTP believes that the journal should not have rescinded acceptance, then it is the Journal of Climate that ATTP should be criticizing, not the blogs. In my opinion, the journal was entirely correct in rescinding acceptance. They gave Gergis six weeks to revise before rejection, but Gergis failed to comply.

        • davideisenstadt
          Posted Jul 24, 2016 at 12:16 AM | Permalink


          I direct this reply to you and hope that it deals with your comments with all of the respect and decorum your posts justify.

          It would be nice if practitioners in the field adhered to the standards and practices of the community that developed the tools that climate science attempts to use.

          The degree of malfeasance exhibited by Gergis defies belief.

          This isnt a case of requiring researcher to employ methods of analysis that will not be criticized by anyone, it is a question of whether these researchers should be allowed to publish results that employ techniques defended by NO ONE in the relevant community.
          Not one credible statistician would approve of the type of ex ante election the she employed.
          That you continue to defend this practice only underscores the problems inherent in PR flacks for a department of astronomy to opine on these issues, while posing as some type of authority on statistical analysis.

        • Steven Mosher
          Posted Jul 24, 2016 at 1:49 PM | Permalink

          required reading

        • Duster
          Posted Jul 25, 2016 at 4:59 PM | Permalink

          Science that works, often works like a pressure cooker. Sometimes to get a finished product one needs to put on some pressure. Post-normal science runs more to “story telling” than empirically useful work. Gergis seems in much of her remarks to think that she should be treated better merely because she slapped a “science” label on her “work.” Worse, she asserts that she independently “discovered” that there was a critical blunder in her paper, and other than that assertion offers no support for that. That by any other name is simply a justification offered for what would otherwise look like plagiarism when you shave the fuzz off of it. She is grabbing credit where there appears to be none due and a considerable load of fault to be shouldered. So, Steve, by keeping the pressure on Gergis might inspire her to clean up her act a bit. At the least, she will know that merely because something is being published doesn’t make it valid or beyond criticism.

        • Steve McIntyre
          Posted Jul 25, 2016 at 6:47 PM | Permalink

          That by any other name is simply a justification offered for what would otherwise look like plagiarism when you shave the fuzz off of it

          According to Melbourne University policy on academic misconduct, plagiarism is:

          the use of another person’s ideas, work or data without appropriate

          I’ve previously expressed the opinion that Gergis et al used “ideas” and “work” of Climate Audit on screening “without appropriate acknowledgement” and Gergis has given no evidence to change my mind.

    • Posted Jul 22, 2016 at 3:29 PM | Permalink

      Ken Rice, One thing that I would expect to be a final outcome is to finally have a more scientifically defensible paleoclimate reconstruction for the Southern Hemisphere. If McIntyre is right, and he present strong evidence he is, the record needs to be corrected, does it not, because the paper is wrong and perhaps badly wrong? Or is it OK for example to leave wrong papers out there to mislead?

      This kind of thing is exactly what the replication crisis is about. I’ve mentioned this very well documented crisis to you many times, and you do exactly what you did above. You deflect and say it’s not serious, or invoke the Clinton defense, “At this point what difference does it make?”. Surely, we should be trying to raise standards in science, not just continue the long slide to meaninglessness.

    • Posted Jul 22, 2016 at 8:18 PM | Permalink

      ATTP, Ms. Gergis resurrected the issue, not Mr. McIntyre. As she appears to have written poorly, perhaps even in bad faith, about Mr. McIntyre in the interval between her two publications, I struggle to understand why you think it wrong for Mr. McIntyre to show her errors again.

      I note that you wrote about Gergis et al before Mr. McIntyre–why are you resurrecting the past?

      As for what Mr. McIntyre is trying to achieve, I sort of think he’s been clear about that for several years now. He audits papers on climate science that he suspects have been misapplying statistical methods. He shows errors and hopes (usually in vain) that they are corrected.

      How is it possible that you could miss this?

    • grmy
      Posted Jul 22, 2016 at 11:17 PM | Permalink


      You are an amazing piece of work. You write “I can understand that it must be annoying if someone doesn’t represent a situation as you would have done, especially if their representation presents you in a somewhat unflattering light.” Your choice of wording is astonishing. “….doesn’t represent a situation as you would have done…” is quite the euphemism for “flat out lied”.

      This is not a grey issue. The FOI documents are all there. Steve M made it easy for you and provided all the references. Spend 5 minutes, follow the links, and report back. Perhaps then you could salvage a little of your credibility.

      You state that you don’t understand what Steve hopes to achieve by delving into this again. So, I guess you don’t regard establishing facts and correcting misrepresentations as important. Long ago, when I was teaching at the University of Natal (which I understand you attended), I tried to instill a focus on ethics and regard for the truth in my students. Unfortunately it appears you were not one of my students.

      • Layman Lurker
        Posted Jul 23, 2016 at 3:06 PM | Permalink


    • Wayne
      Posted Jul 29, 2016 at 4:37 PM | Permalink

      ATTP: Steve may not be able to put it succinctly at this point, but I’d suggest that a clear, constructive outcome would be for a withdrawl of the 2016 paper, with an acknowledgement of the clear lessons learned that:

      1. P-value hacking and its equivalents are not allowed. Doing obvious multiple comparisons without appropriate compensations to find data that agrees with your point is impermissible.

      2. If a paper’s aim is to break new ground or extend analysis to new datasets and it fails, it should not be allowed to regroup and essentially cover the same ground.

      3. Plagiarism is not allowed, which includes claiming you found problems in your paper that someone else found. If proof emerges that credit has been claimed so that it may be withheld from a critic, the paper should be immediately withdrawn.

      4. If (apparently) physically implausible values are used (for example +1 year offset in a predictor), they must be explained in the main body of the text. If they are not, the paper should be dropped.

      5. Ex-post screening should only be allowed with the most rigorous of defenses, in the main body of the text, which explain in detail how the natural bias that results from such methods has been actively mitigated.

      Principles we can all agree on, and which Steve makes a strong case that the 2016 paper violated.

      It does take some history and explanation for Steve to point out things like how points 2 and 3 was were violated. Which may strike you as petty or over-critical, but not the rest of us.

      You go on at some length seemingly implying that the 2016 paper is pretty much reasonable, and it is Steve who is unhappy because… well, because it’s not the way _he_ would have done it, rather than it’s not the way that _anyone_ should do it.

      As for mentioning Mann, it seems perfectly valid to point out that an obvious defense (“an authority figure in our field has set the precedent that this apparently flawed method is in fact acceptable”) is invalid and to provide context.

      • Posted Jul 30, 2016 at 8:00 AM | Permalink

        The withdrawal of a paper is relatively rare, especially if it is simply due to a disagreement as to the method used. Maybe there are examples of authors withdrawing a paper when someone else highlights a problems, but it is much more common for others to publish a response that the original authors and others can consider. Thats how we learn. So, if what you say is true, then ideally someone should publish a response and it can be considered and people can draw their own conclusions. Restricting this to a discussion on a blog is unlikely to have much impact. That’s not an argument against blog discussions, but I don’t think they, alone, have much impact.

        • Posted Jul 30, 2016 at 9:25 AM | Permalink

          But rarely because of a disagreement about the method used. Typically retractions are because of some kind of actual misconduct; fraud, plagiarism, lack of ethics approval, etc. As I said, there are some examples of papers being withdrawn because of methodological concerns, but it is still rare. Also, if you think this is what should happen, a blog post by itself is unlikely to be sufficient. As I said above, I’d be quite keen to see this taken further as that would make it easier to see the significance of the criticisms.

          One of the problems with your suggestion is that journals simply will not publish a paper that is simply showing that past work is wrong. They want something original.

          I don’t think this is strictly true, as many journals allow for comments to be published. However, if your comment is simply a list of criticism without any indication of the significance of these issues and how they would influence the final result, then I can see why they would be reluctant to publish. We do research in order to understand something. This requires making assumptions and sometimes judgements as to how to undertake the analysis. It’s probably pretty easy to find things to criticise. It’s not quite as easy to illustrate the significance of one’s criticism, but that is a pretty important part of the process.

        • mpainter
          Posted Jul 30, 2016 at 10:09 AM | Permalink

          Well, ATTP, Gergis has been presented some expert criticism on Climate Audit. Steve McIntyre is working on a more comprehensive evaluation, which I feel assured will be accomplished in his usual meticulous and accurate way.

          So give us your opinion, please and thanks, whether you think Joelle Gergis should respond to these criticisms in detail or whether you think that she should _dismiss_them_ as unworthy of a response?

        • Posted Jul 30, 2016 at 10:21 AM | Permalink

          I don’t really think anyone should be dismissed as unworthy of a response. However, that doesn’t mean that everything deserves a response; I think that whether or not someone does respond is entirely up to them. It also depends on the forum. If someone actually tries to publish a response, then it would be silly to not respond. If it remains on blogs, then it’s easy to ignore, and probably will be.

        • mpainter
          Posted Jul 30, 2016 at 10:34 AM | Permalink

          Well see, that’s my whole point. If Joelle Gergis ignores the expert criticism offered here, then what is to be concluded?

          Those who do not hate Climate Audit will draw appropriate conclusions. Even those who despise Climate Audit still understand it’s influence among thoughtful scientists.
          Thanks for responding.

        • Posted Jul 30, 2016 at 10:44 AM | Permalink

          Alternatively, those who choose to criticise could aim to use a forum where not only is there likely to be a response, but where there criticism is also likely to be evaluated by the broader community. The lack of a response doesn’t somehow validate a criticism.

        • mpainter
          Posted Jul 30, 2016 at 11:27 AM | Permalink

          Imo, there is no basis for suggesting that Climate Audit is not eminently suited for examination and discussion of paleoclimate proxy reconstruction. In fact, it is likely the best suited forum for this. Or do you mean the personal aversion some feel toward Climate Audit?

        • Posted Jul 30, 2016 at 11:34 AM | Permalink

          What I’m getting at is that if you want to engage the broader scientific community then you really do need to publish in the scientific literature and present your work at conferences. That way you can both make more of the community aware of your work, but also make it more likely that it will be properly scrutinised by others in the field.

          There’s nothing wrong with blogs and there can be interesting and useful discussions on them, but it tends to be a minority who engage on them and – at the moment at least – is not really seen as somewhere where one would normally engage in serious scientific discussions. Maybe this will change, but I think this is the case at the moment.

        • mpainter
          Posted Jul 30, 2016 at 11:51 AM | Permalink

          ATTP, you say:”That way you can both make more of the community aware of your work, but also make it more likely that it will be properly scrutinised by others in the field.”
          This is disingenuous. Of course Gergis et al know of the criticisms here. They watch it like a hawk. And you can bet the others do as well.

          If you are making the point that Gergis et al will ignore the criticisms here, then I must say that I agree. But if you are arguing that Gergis et al are justified in doing so, then you are on very shakey ground.

        • Posted Jul 30, 2016 at 11:58 AM | Permalink

          I think Gergis et al. will indeed ignore what is presented here. I also think they are perfectly entitled to do so. I don’t think anyone is really obliged to respond to criticism or engage with their critics if they choose not to do so. There may be exceptions to this, but this is – I think – generally valid.

          I think you’re somewhat missing my overall point. People are, of course, free to criticise what others have said/done publicly. However, if you really think your criticism has merit and that it is important that others are made aware of it and, ideally, respond to it, then you need to put the effort into making the criticism in a forum where it is most likely to have the greatest impact. If you put it somewhere where it is most likely to be ignored, then that is your own fault, not the fault of those who chose to ignore it.

          I also think that it’s important that the criticism is also scrutinised. If your criticism is largely ignored, then not only have you had no impact, it’s also more difficult for others to evaluate the merits of your criticism.

        • Patrick M.
          Posted Jul 30, 2016 at 1:40 PM | Permalink

          ATTP typed: “I think that whether or not someone does respond is entirely up to them. It also depends on the forum. If someone actually tries to publish a response, then it would be silly to not respond. If it remains on blogs, then it’s easy to ignore, and probably will be.”

          A well thought out critique should never be “easy to ignore” no matter where it comes from. That’s not Science.

        • Posted Jul 30, 2016 at 3:40 PM | Permalink

          Ken Rice wrote:

          The withdrawal of a paper is relatively rare, especially if it is simply due to a disagreement as to the method used.

          This observation points to a very real problem in my field I think. Papers, particularly by “big names” that fail replication stay around forever to pad the resumes of those involved and lead astray others. Bad methods often don’t die, they just fade away over decades. An example is the continuous adjoint method for design, invented by a big name but well known to be inferior to more traditional numerical optimization methods. There are literal hundreds of papers about the inferior method and never a hint at any of the issues. It’s really pretty shabby.

          Anyone with integrity it seems to me would want the goal for the literature to be balanced, accurate, and fair. That doesn’t always happen, for example with Vioxx where critical fatal side effect data was omitted from the published paper.

          However, Ken, if we are so biased we can’t admit there is a serious problem, addressing that problem becomes impossible. And that’s where the whole advocacy issue becomes very problematic I think.

          The other thing to bear in mind about this paleoclimatology field is that there is a long history of crony review and a lot of evidence of bias. You may be too new to this area to know what this consisted of. Some of the big names in paleoclimate turned to not be very trustworthy or honest.

        • leoh
          Posted Jul 30, 2016 at 9:40 PM | Permalink


          Assuming you are not just concern trolling, you are being remarkably naive in believing that Steve’s observations would get more engagement and constructive discussion from climate scientists if he published them in more traditional ways. Unfortunately, history and the facts belie your assertion.

          You’ve been around a long time, so I assume you remember the sorry story of MBH 2008. Mann, Bradley and Hughes published a proxy based reconstruction in PNAS in 2008. Steve and Ross McKitrick identified that they had made a rather boneheaded error in using a Tiljander sediement series upside down (in the calibration step of the CPS reconstruction). Steve and Ross did everything according to your suggestion – they timely submitted a comment (which I believe was peer reviewed) to PNAS pointing out the error. Did they get your “engagement” and “constructive discussion”? On the contrary, MBH, rather than acknowledge an error, replied to their comment calling it “bizarre”! Mann doubled down in his book refusing to accept the mistake. And not a single mainstream climate scientist called them out. To this day, I don’t believe any of them will accept the error. If you bring it up with warmists, they invariably have an excuse not to address it. These include variations on: they haven’t read the paper and gone through the calculations; it is irrelevant because the reconstruction has been subsequently confirmed by other papers; or (more recently) they are not interested in rehashing old arguments. (As an aside, if the Mann v Steyn case ever gets to discovery, it will be delicious to see how Mann tries to dance his way out of this when under oath). The end result is that a climate paper with an egregious error remains unretracted in a major journal, and climate scientists circled the wagons to protect one of their own, even at the expense of their own integrity.

          I’m curious, which excuse do you use not to acknowledge the MBH 2008 error?

      • bob
        Posted Jul 30, 2016 at 10:58 AM | Permalink

        Re ATTP at 10:44 AM Jul 30

        It seems to me that the observations posted here are often reviewed by the “broader community”. The apparent fact that, on occasion, parts of the broader community find useful information here but do not acknowledge that they did so, tells us something about the “broader community.”


        • Posted Jul 30, 2016 at 11:11 AM | Permalink

          Quite possibly, but my impression is that a vast majority of the broader community largely ignore blogs, so I doubt the impact is all that great. Also, if you really want to ensure that you get due credit, you really should publish in a forum that is slightly more formal than a blog. One obvious problem with blogs is that you can edit posts at any time, so how can you ensure that what is in a blog post now, is what was there originally? You could Archive posts as a way of showing this, but there isn’t – I think – some formal way of locking a post so that it is final, rather than something that can be continually updated.

        • davideisenstadt
          Posted Jul 30, 2016 at 1:47 PM | Permalink

          Willful obtuseness isnt a desirable trait.
          Now, the problem isnt that Gergis, et al ignore blogs..
          Its that they ignore common practices of the statistical community.
          They not only dont utilize best practices, they refuse to utilize minimally acceptable ones.
          We understand that your training hasn’t been centered on statistical analysis, your response to critiques presented here has established, amply, your ignorance regarding this subject.
          But for those trained in the field, the degree of malfeasance employed by these people is astounding.
          From ignoring the last three decades of work on meta-analytical research, and failing to correct for their data mining expeditions, something even social scientists figured out was a necessary act years ago, to post hoc selection of proxy data sets, as well as post hoc selection of individual members of proxy data sets, the field is rife with misapplication of the tools of statistical analysis.
          That you’ve had this explained ad nauseum, and still cling to your misapprehensions is troubling.

        • Hoi Polloi
          Posted Jul 31, 2016 at 3:29 AM | Permalink

          If climate blogs are so unimportant according to blogger Ken, then why was Real Climate created by Mann and (man of mystery) Schmidt? And why all the references to SkS, Tamino etcetera? (Digital) society has come a long way ever since the invention of book printing and smoke signals. At the end it’s all about the truth and good governance.

      • mpainter
        Posted Jul 30, 2016 at 12:24 PM | Permalink

        ATTP, there you go again. There is no forum that has a greater impact nor more legitimacy for the subject than Climate Audit, your pretensions otherwise notwithstanding, and no , I have not missed your point nor your purpose, which is all too transparent.
        Gergis et al will never defend their methodology because it is simply indefensible. You must not imagine that your deflection of the issue justifies their refusal to address the defects in their science.

        • Posted Jul 30, 2016 at 12:28 PM | Permalink


          no , I have not missed your point nor your purpose, which is all too transparent.

          Given this, I fully expect that you have have missed my point and my purpose, and have also illustrated why Climate Audit probably isn’t a site where most climate scientist would be particularly interested in engaging. Mind-probing and absolute certainty are not really conducive to constructive discussions.

          Also, just to be clear, I’m not trying justify their lack of engagement, simply pointing out something that seems self-evident; they probably won’t and there’s not much you can do about that given that it’s entirely within their rights to decide not to do so.

        • mpainter
          Posted Jul 30, 2016 at 12:37 PM | Permalink

          Furthermore, you assume that Gergis et al would agree to participate in a forum that includes qualified statisticians, also assuming such a forum could be devised (by your lights). I consider extremely doubtful they they would. They would get clobbered, and they know this full well.

        • Posted Jul 30, 2016 at 1:04 PM | Permalink

          No, I don’t assume any such thing. In fact, Gergis et al. don’t really have to get involved at all. All that needs to happen is for the critic to convince everyone else. I doubt that will happen with a blog post, though.

        • mpainter
          Posted Jul 30, 2016 at 1:07 PM | Permalink

          While we are on the subject of missing points, here is one that you seem to have missed: in science, it is never an acceptable practice to ignore valid criticisms. Indeed, such attention is often welcome in other sciences. But never in climate science, which seems to operate by a different set of mores. Ignoring criticism is the rule in climate science. But climate science rules are the basis for the coined term Climate Ball. Other science operates on the principles of collegiality.

        • Posted Jul 31, 2016 at 12:50 AM | Permalink

          ATTP, Gergis called and wanted to let you know you’re not helping.

        • kim
          Posted Jul 31, 2016 at 2:43 AM | Permalink

          Ken’s been through the looking glass to call on the Duchess who asked him if he wanted milk or lemon with his t-stat.

  18. Curious George
    Posted Jul 22, 2016 at 1:16 PM | Permalink

    Why screening in the first place? Computers can process lots of data these days. Does screening have any benefits other than excluding inconvenient data?

    • davideisenstadt
      Posted Jul 24, 2016 at 12:17 AM | Permalink


  19. MikeN
    Posted Jul 22, 2016 at 3:05 PM | Permalink

    So they have done hide the decline and hide the pause. Does this count as hide the incline?

  20. Posted Jul 22, 2016 at 3:22 PM | Permalink

    To clarify a matter raised in this post, I don’t think the authors truncated anything at 1600. If you look at the blue line in their figure, you’ll see it extends to a point before 1600. My interpretation of this is that line extends back to ~1577, when the Kauri proxy enters the network. Prior to that point, there is only one proxy in the network (Mount Read), and it is presumably impossible to create a reconstruction via the given method with only one proxy.

    This still shows the importance of screening for the paper as if the authors had used a different approach to it, one which they claim creates only minor differences,* they lose nearly 600 years of their reconstruction. That’s a significant problem. It shows if the authors had used the methodology described in their 2012 paper, they wouldn’t have been able to create useful results. It was only by making post-hoc decisions about how to screen their proxies that they managed to get this paper published.

    *I believe the authors intentionally used a semantic trick to give the impression all screening approaches give equitable results while not technically saying such. Pea, thimble and all that jazz. I doubt anyone will use this technicality as a defense though.

  21. miket
    Posted Jul 22, 2016 at 3:27 PM | Permalink

    Guys, don’t let ATTP wind you up. Save your energy.

    • Sven
      Posted Jul 22, 2016 at 3:50 PM | Permalink

      I agree, this would be a wise thing to do. The man is bizarre

  22. DaveO
    Posted Jul 22, 2016 at 3:35 PM | Permalink

    Gergis et al 2012 was introduced at RealClimate by Eric Steig on May 22 2012. It’s rather “entertaining” to review comments made by Steig and other assorted Team members as the story broke at CA.

  23. Matt Skaggs
    Posted Jul 22, 2016 at 4:25 PM | Permalink

    “There’s a sort of blind man’s buff in Gergis’ analysis here, since it looks to me like G16 may have used an Oroko version which did not splice instrumental data. However, because no measurement data has ever been archived for Oroko and a key version only became available through inclusion in a Climategate email, it’s hard to sort out such details.”

    I’ve tried to follow the pea on Oroko over the years but it is difficult. I was first intrigued because logging supposedly reduced the tree ring width in recent times, not impossible but not the direction normally claimed for clearing of adjacent trees. Mann needed Oroko to claim global coverage, so a version with the instrumental data tacked on was used (IIRC), making it a hockey stick. Now G16, still needing the areal coverage and also additional proxies in general, but hoping to avoid criticism of tacking on instrument data, reverted to the original RCS version?

    Steve: since Cook never archived anything, it’s guesswork. I plotted the Gergis version onto my raster image of the Cook et al 2006 diagrams and it looks to me like they used an unspliced chronology – but didn’t seem to realize it. As I said, blind man’s buff.

  24. Hoi Polloi
    Posted Jul 22, 2016 at 4:29 PM | Permalink

    Does anybody know how much government funds has been allowed into this research project?

  25. kenfritsch
    Posted Jul 22, 2016 at 6:26 PM | Permalink

    There is a lesson to be learned here from the Gergis’ manipulations albeit not the message she is attempting to deliver.

    Proxies can be correlated with temperature on an annual basis even though that response to temperature is often affected by other variables that are realized in the response signal as noise, and unfortunately for those attempting temperature reconstructions, not necessarily on a random basis over time. The result of the other variable effects compounded by the temperature to proxy response not being well known for a given location and time period leads to a response signal that may show wiggles at the correct time the temperature changes but have amplitudes that are not proportional to the temperature change. The result of this response problem is that a temperature to proxy response that has a significantly high frequency correlation will not follow the trend of temperature changes. This problem with temperature reconstructions is why a Mann will advise staying away from the detrended correlation as an after the fact selection criteria (and I need to state here that all post facto selections are improper).

    The other side of this problem is that a proxy signal with long term persistence or high autocorrelation that has no trend but is simply red and white noise can be readily found with a reasonable low frequency correlation to an observed upward trending temperature signal. That is why Gergis was advised to avoid the correlation without detrending. Detrending in these cases will normally give low correlation of the residuals.

    Putting this all together is what I call the dilemma of the post facto selection process which is a wrong-headed approach to begin with. Adding past facto selection criteria is simply a case of making the situation worse and puts the process into the realm of picking stocks after the fact using a complicated selection criteria that will do great on in-sample picking but is doomed for failure when applied to out-of-sample picking – and for very valid statistical reasons.

    The case for a detrended temperature and proxy response having an excellent correlation but poor correlation for the series before detrending can be shown in the following example:
    Take the GHCN temperature series from 1910-2015 and find the residuals (detrending). With the residuals use the R function corgen from the library(ecodist) to generate a correlated series with r=0.40. Now compare the correlation of the original GHCN series (not detrended) with that for generated correlated series. First of all the linear fitted trend of the original GHCN series is 0.73 degrees C per century and with a p.value<2e-16, i.e. an excellent fit to a linear trend for that time period. The linear fitted trend for the generated series is effectively 0. The correlation of these two series detrended as residuals is r=0.42 with a p.value=8.967e-06. The correlation, using cor.test in R, of these two series before detrending (the generated series is a residual that requires no detrending) is r=0.24 with a p.value=0.0126. It can be seen that here we can obtain a significant correlation with and without detrending and where the the paired series trends are very different. If the detrended paired series correlation were somewhat lower the paired series correlation without detrending would not be significant.

    The other side of the selection process using series that are not detrended in the preferred Mann mode, we have the same GHCN series and a series generated by using corgen to generate a series that has a 0.05 correlation with the residuals of the detrended GHCN series. The paired generated series and the GHCN residual series have an r=0 and a p.value=0.59, i.e. no significant correlation. Now if we add the fitted linear trend of the GHCN series to the generated series we have a series when paired with GHCN series before detrending that have nearly the same linear regressed trends but have a residuals that do not have a significant trend. The correlation of the paired series before detrending is r=0.84 and with a p-value < 2.2e-16.

    Steve: agreed

  26. kenfritsch
    Posted Jul 22, 2016 at 6:36 PM | Permalink

    I see that Gergis and her followers want to discuss the personalities of this dust-up without addressing the problems presented by the post facto selection processes that are used for proxies in the temperature reconstruction under discussion here and in most other attempts at reconstruction reported in the literature. That reaction is so advocacy politics oriented and lacking in science and statistics that it should be ignored until such time these people are willing to truly discuss the pertinent issues and the results of their works taken seriously.

    • Phil Howerton
      Posted Jul 23, 2016 at 9:02 AM | Permalink

      Ken: It sounds as if Gergis and her fans are discussing Steve on G16. If that’s the case, do you have a link?

  27. Follow the Money
    Posted Jul 22, 2016 at 9:15 PM | Permalink

    Re: Gergis et al 2016, and its supplemental materials, why do all their reconstructions end around 1980?

    And why, then, do the supp’s RE figures, figs. S1.6, .7, .8, & .10 end in 2000? And plotted like a smudge at the end?

  28. Manniac
    Posted Jul 22, 2016 at 9:51 PM | Permalink

    O Bender, where art thou?

    • Phil Howerton
      Posted Jul 24, 2016 at 12:53 PM | Permalink

      Yes. where in the hell is Bender?

    • David Jay
      Posted Jul 26, 2016 at 12:13 PM | Permalink

      I saw that Bender’s swimming pool cleaner made a brief appearance above…

      • Posted Jul 26, 2016 at 12:19 PM | Permalink

        Memory can play tricks but I thought Bender cleaned pool for Mosh.

        • David Jay
          Posted Jul 28, 2016 at 12:04 PM | Permalink

          Correct Richard, Mosh (Steven Mosher) commented above. I was referring to that comment.

  29. David Brewer
    Posted Jul 22, 2016 at 11:39 PM | Permalink

    This devastating demolition of Gergis’ paper appears almost kind after one reads her “Conversation” article.

    For a start, her defense there contains other scientific howlers not even discussed here, e.g.:

    – The Earth is “a planet that has never been hotter in human history” (her study only covers 1000-2000 AD)
    – “these results are more uncertain as they are based on sparse network of only two records” (she thinks she can assign a confidence interval measured in hundredths of a degree to the temperature of the entire Australasian region 800 years ago based on two proxies??!!)
    – “Importantly, the climate modelling component of our study also shows that only human-caused greenhouse emissions can explain the recent warming recorded in our region” (since any attribution depends on the factors allowed into the model, the model can never be considered to have ruled out another cause)

    The “whinging” that Steve mentions is also of epic proportions. Somebody called her a bimbo in a blog post and she goes on and on as if she is Giordano Bruno being burnt at the stake.

    All in all I think this commentary shows commendable restraint.

    • dfhunter
      Posted Jul 25, 2016 at 4:23 PM | Permalink

      In That “Conversation” article She also references this From the Guardian

      “This is part of a range of tactics used in Australia and overseas in an attempt to intimidate scientists and derail our efforts to do our job.”

      which has the sub heading –

      “Climate Science Legal Defense Fund is forced to defend climate scientists against constant frivolous lawsuits”

      which then states –

      “E&E originally attacked Dr. Michael Mann, whose research shows a dramatic increase in recent temperatures in a graph popularly known as the “hockey stick.”

      Funny how Mann –

      “Steve, Why does Michael Mann always appear in these discussions?”

      always appear in these discussions ATTP ?

      • Steve McIntyre
        Posted Jul 25, 2016 at 4:36 PM | Permalink

        Climate Science Legal Defense Fund is forced to defend climate scientists against constant frivolous lawsuit

        The Fund doesn’t seem to have assisted Tim Ball in his defence against Mann’s lawsuit. I wonder whether Mann is getting legal assistance from it in order to attack Tim Ball.

        • Sven
          Posted Jul 25, 2016 at 4:43 PM | Permalink

          Or Mark Steyn

        • dfhunter
          Posted Jul 25, 2016 at 5:06 PM | Permalink


          Lauren Kurtz is the Executive Director of the Climate Science Legal Defense Fund (CSLDF), a non-profit that defends scientists against legal attack. CSLDF was founded to fund Dr. Mann’s defense, represented Dr. Maibach, and filed amicus briefs in support of the University of Arizona. Help protect the scientific endeavor by donating to CSLDF, where a trustee is currently matching all donations up to $50,000.

        • Sven
          Posted Jul 25, 2016 at 5:18 PM | Permalink

          What legal actions have there been? I remember Mann suing Tim Ball, National Review and Mark Steyn. These are all climate scientists suing critics. What else? How many climate scientists have been sued?

          Steve: Only climate scientist that has been sued, to my knowledge, is Tim Ball. FOI requests re Mann have caused controversy, but not apparently FOI requests re Willie Soon or Roy Spencer. In addition, Climate activists have initiated libel actions (Weaver against Tim Ball and National Post, Mann against Tim Ball and Mark Steyn), press complaints in the UK Press Council( against Booker, Delingpole (unsuccessful) and Durkin (mostly unsuccessful)) and misconduct complaints by Bradley/Mashey against Wegman for Wegman Report and the CSDA article. Might be others.

        • dfhunter
          Posted Jul 25, 2016 at 6:32 PM | Permalink

          wonder who the “trustee” matching all donations up to $50,000 is ?
          must be hoping for not to many donations or has unlimited funds.

          Big Soil ?

        • mpainter
          Posted Jul 25, 2016 at 7:21 PM | Permalink

          I believe that Steyn has countersued Mann.

        • Sven
          Posted Jul 25, 2016 at 8:38 PM | Permalink

          So, this represents “constant frivolous lawsuits” against climate scientists…

        • mpainter
          Posted Jul 26, 2016 at 4:06 AM | Permalink

          Tim Ball should request aid of the Climate Science Legal Defense Fund. This should bring advantage whether the request is granted or refused.

  30. charliexyz
    Posted Jul 23, 2016 at 7:32 AM | Permalink

    Maybe for the next go around, 4 years from now, the journal will ask Steve McIntyre to review the paper.

    Or maybe they are slow learners.

    • mpainter
      Posted Jul 23, 2016 at 9:39 AM | Permalink

      The problem would be resolved easily enough if the Journal of Climate would engage a qualified statistician for reviewing studies such as Gergis et al. One can only wonder at their negligence.

    • Steven Mosher
      Posted Jul 23, 2016 at 9:42 AM | Permalink

      4 years isnt that long

    • Steven Mosher
      Posted Jul 23, 2016 at 9:53 AM | Permalink

      I dunno Charlie nobody is perfect

      Watts and Gergis make an interesting comparison. Errors found in both prior to publication

      With both it was a simple switch to re run the analysis…..What did Gergis find when she re ran the numbers?
      What did Watts find? we wont ever know..

      Finally with data from neither it’s really hard to nail down all the various forms of torture..

      • kenfritsch
        Posted Jul 23, 2016 at 10:28 AM | Permalink

        I would hope that readers at this blog understand that SteveM’s analysis here is pointing to methods used in the Gergis temperature reconstruction that from a statistical standpoint are just plain wrong. It is not a matter of the methods being a little off and thus having small effects on the final reconstructions, but rather a totally wrong approach.

        I sometimes think that even critics of climate science find it difficult to believe that these people doing temperature reconstruction could get it so wrong and thus do not fully understand the import of an analysis like SteveM is doing here. Even if the reader does not fully understand and appreciate the statistics involved here (and that is not unlikely based on the fact that even climate scientists fail that test), the Gergis defense noted above should be a clue to her understanding of the critical statistical issues involved or why else would she avoid discussing those issues. She appears to do as so many climate scientists doing temperature reconstructions do and that is to defer to the fact that those who preceded her work used the same (wrong) approaches. There is no one active in this field or for that matter defenders of the work coming from this field who are willing to look in sufficient depth of these methods and approaches to see the problems. While it is satisfying for me to see analyses like the ones in this post, I doubt very much that those criticism will change any minds of those working in this field.

        I hold out hope for someone or a group to come along that uses temperature proxies that have a good physical, chemical and perhaps biological basis and spells out a criteria for selecting proxies a prior based on established physical principles and then uses all the selected proxy data in doing a temperature reconstruction.

        • stevefitzpatrick
          Posted Jul 24, 2016 at 8:49 PM | Permalink

          “….spells out a criteria for selecting proxies a prior based on established physical principles and then uses all the selected proxy data in doing a temperature reconstruction.”

          My guess is that would make it too difficult to find “statistically significant” proxies. Post-hock selection gives you so many ways to find just what you are looking for.

        • dfhunter
          Posted Jul 25, 2016 at 5:32 PM | Permalink


          have a look at her profile from this –

          part of which states –

          “In 2007 she was one of three national finalists for the 2007 Eureka Prize for Young Leaders in Environmental Issues and Climate Change, and was one of nineteen Wentworth Group of Concerned Scientists’ Science Leaders Scholarship recipients selected nationwide. Professor Tim Flannery, the 2007 Australian of the Year, was one of her mentors during the program aimed at training outstanding young scientists to help bridge the communication gap between science and public policy.”

          so when you ask/advise –

          “There is no one active in this field or for that matter defenders of the work coming from this field who are willing to look in sufficient depth of these methods and approaches to see the problems”

          I think you can begin to see why the problems persist, having said that I do believe they try to be unbiased, but it must be conflicting.

        • Stu
          Posted Jul 29, 2016 at 9:10 AM | Permalink

          Ken says:

          ‘While it is satisfying for me to see analyses like the ones in this post, I doubt very much that those criticism will change any minds of those working in this field’

          I often like to think that the legacy of Climate Audit will be as a goldmine for future historical study, after the current politics has died down and people are no longer looking at this issue in such a binary way. Even if you feel that this kind of technical criticism may not ‘change the minds of those working in the field’, I certainly don’t think this kind of situation can go on forever.

          Look at what Gergis wrote over at ‘the Conversation’. Then look at Steve’s post (and Miker613 helpful summary at on Jul 22, 2016, 2:44 PM). People in the future shouldn’t be confused about any of this.

  31. Phil Howerton
    Posted Jul 23, 2016 at 10:03 AM | Permalink

    Never mind, Ken. I found it on Conversation.

  32. EdeF
    Posted Jul 23, 2016 at 10:47 AM | Permalink

    Can’t believe they are still doing the same things. First, evaluate the proxy ahead of time, then take all of the data for that proxy and do correations. Do not toss anything unless you have very good reason that the proxy was somehow damaged physically (i.e.
    a dam broke and sent a large gusher of water over it, an avalanche scarred it, etc.)
    and include all data after detrending for age effects. Science is really that simple.
    Unfortunately, the results may not match expectations.

    • davideisenstadt
      Posted Jul 23, 2016 at 11:55 PM | Permalink

      theres the rub…you get crappy results doing that, and if you want to prove something, that kind of practice doesn’t help.

  33. Posted Jul 23, 2016 at 4:37 PM | Permalink

    The official Melbourne Uni Press Release of 11 July also misinforms…

    some quotes…

    “The manuscript was originally published online in the Journal of Climate in 2012, but was withdrawn by the authors before full publication after noticing a minor inconsistency between the description of the methods and the data analysis conducted.”

    withdrawn by the authors? minor inconsistency!

    “The study has been externally reviewed by nine independent assessors.”

    Wonder how independent they were?

  34. Willis Eschenbach
    Posted Jul 23, 2016 at 8:46 PM | Permalink

    …and Then There’s Physics said:

    This has all drifted rather far away from what I was intending. I’m still no closer to really understanding what you would regard as some kind of realistic, contructive, positive, outcome, but I doubt I’m about to get any closer.

    Perhaps I can help. Steve has been doing this kind of analysis of shonky climate “science” papers for some years now. And whether you can see it or not, he and others have had a huge influence on the climate discussion.

    For starters, Steve was among the first to start pushing for publication of data as used and code as used for the various scientific papers. As a result of this push, the major scientific journals now have those policies as part of their standard. They don’t always enforce it … but at least it is in the regs.

    Next, it appears that you misunderstand the process of science. Science is not a thing, it is a process. Making new scientific claims is only half of th progress of science. Falsification of incorrect scientific claims is equally and perhaps more important.

    The process of science works like this:

    • Someone makes a new scientific claim, and they make it public along with all of the supporting data, math, logic, code, and whatever they think buttresses their view.

    • Everyone else publicly tries to poke holes in it, to falsify some part of the math, logic, code, or whatever the first person used to buttress their claims.

    • If nobody can poke any holes in it, then it is accepted as provisional scientific “truth” … meaning it’s only true until a future date when and if someone does falsify it.


    But you can’t discredit something like the Gergis claims by handwaving at them and and saying “this is bogus science”. Instead, to falsify scientific claims often requires as much data, math, logic, code and the rest as did the original claim … you know, the exact stuff provided by Steve above.

    And his having done so means that anytime someone starts babbling about Gergis, I can just refer them here and be done with the discussion. It’s no longer a question. The study is discredited.

    So when you ask what “realistic, contructive, positive, outcome” will come of falsifying bogus science … well, I can only conclude that you are unclear about how the scientific process works.

    Finally, another positive outcome from it is that Gergis and her co-authors are shown to be typical of far too many alarmists, willing to say or do most anything to advance their cause … and the more that people notice that fact, the less likely it is that the poor of the planet will be shafted by rising energy prices in a futile fight against fossil fuels. In fighting against the alarmists, demonstrating publicly that they are not only wrong but are lying about being wrong is always valuable.

    You also seem to be seeing snarky editorials and jibes in what I write. I’m certainly not intending to write snarky editorials or jibes, so maybe you should consider that I mean whatever it is that I have written.

    Mmm … well, reading through your screeds and the responses, I can only say that if I were you … I wouldn’t put that question to a vote of the spectators. I know I’d vote “snarky”, but of course, YMMV …

    All the best,


    PS—You objected above to a characterization of Gergis as a “whining, self-serving, data tutoror [sic] who isn’t telling the truth”. Apart from the fact that nobody said that but you, it appears you did not pause to consider that in the US (but curiously not in the UK) truth is an absolute defense against libel …

    • Curious George
      Posted Jul 24, 2016 at 8:53 PM | Permalink

      He lives in a fantasy world, without a cause-and-effect relation. He does not try to be obnoxious; it is simply the way he is. I can’t argue rationally with him.

  35. kenfritsch
    Posted Jul 23, 2016 at 9:08 PM | Permalink

    Why continue a conversation with a poster here who rather obviously does not understand the statistical implications of SteveM’s analysis? I would not be too hard on him since there is whole group of climate scientists doing tmperature reconstructions who also do not.

    • Jeff Alberts
      Posted Jul 30, 2016 at 11:02 PM | Permalink

      I would not be too hard on him since there is whole group of climate scientists doing tmperature reconstructions who also do not.

      I think that’s a poor assumption. It’s clear to me that they do understand, they just don’t care.

  36. davideisenstadt
    Posted Jul 23, 2016 at 11:52 PM | Permalink

    ” (As a caveat, objecting to the statistical bias of ex post screening does not entail that opposite results are themselves proven. I am making the narrow statistical point that biased methods should not be used.)”

    Well put Mr McIntyre.

    But your point really isnt “narrow”
    Whenever on violates one of the basic assumptions that undergird the entire enterprise of statistical analysis, one does so at one’s own peril, or in this case all of our peril.
    Exploratory statistical analysis must be employed carefully, lest it morph into a selective mode of sampling, that is, taking the data that one likes.
    A data set comprised of ex-post selected data isnt random in any sense of the word; its a selected sample, and the metric used for the inclusion of data is whether the data tell the story the selector wishes to tell.
    The practice is so fundamentally absurd as to beggar belief..yet it is tolerated.

  37. Geoff Sherrington
    Posted Jul 24, 2016 at 1:47 AM | Permalink

    It should be obvious that one part of the answer to ex-post selected data lies in the proper calculation and use of error boundaries. As a general principle, when numerous ‘selections’ are performed on the data before subjective choice of a favourite ‘selection’, then all of the ‘runs’ should be included in the final calculation of the error.
    Thus, a selection that is rejected because it gave too high or too low a result cannot be forgotten. It should live on through making the final error bounds higher above the line, or lower, as the case may be, but certainly wider.
    In the more extreme cases, one then finds that all of the selections fall within the final 2 sigma or whatever bounds, meaning that one selection can not be put forward as better than another.
    Nature in Her wisdom does not play the game of choice of favourite data selections. That is the Hand of Man at work and it should be shown to be this by error bounds, properly calculated.

    • Pat Frank
      Posted Jul 24, 2016 at 11:45 AM | Permalink

      snip – blog policies discourage editorializing about very general CAGW issues else all threads become the same

  38. Geoff Sherrington
    Posted Jul 24, 2016 at 1:53 AM | Permalink

    It still remains unclear why the Australia mainland is so sparsely represented in this exercise. IIRC, apart from trees at Baw Baw, the rest comes from islands like Tasmania, New Zealand, Pacific coral atolls, some even North of the Equator.
    My preference would be to describe the region as “Australasia less Australia.”
    One might speculate that work has been done in the rest of Australia but that it is not reported by Gergis et al because it does not pass objective and subjective selection to carry the chosen message. If so, that would be deplorable because it is restricting Science rather than advancing Science.

    • HAS
      Posted Jul 24, 2016 at 2:43 AM | Permalink

      Geoff, I understand that you are possibly a resident of the West Island in which case you should understand that New Zealand isn’t an island. The Oroko series comes from the prosaically named South Island that along with a number of others, including the North Island, make up NZ. If we are feeling generous on a good day we sometimes even include the West Island.

      However in more seriousness I haven’t really ever looked at this stuff, but went looking and quickly found Lowery et al 2008 “Speleothem stable isotope records interpreted within a multi-proxy framework and implications for New Zealand palaeoclimate reconstruction” which seems to my untutored eye to suggest more activity in this area than seems to be being reported here.

  39. mpainter
    Posted Jul 24, 2016 at 2:47 AM | Permalink

    In my recollection, Ken Rice (attp) does not comment the issues of science raised at this blog. I doubt that he is capable of that. Instead, he usually raises some non-science issue peripheral to the issues of science addressed here.

    Ken Rice has embarrassed himself in his recent post wherein he gives unstinting approval of the latest Gergis fiasco. His sense of embarrassment has motivated his comments here where, in his usual fashion, he attempts to deflect from the issues of science and his admitted incompetence in these. Thus he harps on Steve Mc’s motivation in posting on Gergis et al. To this end, he mulishly reiterates that his “question” is ignored. Steve McIntyre sees value in such comments for their counterpoint opportunity.

    • Posted Jul 24, 2016 at 12:05 PM | Permalink

      Mosher tried to explain the problem approaching the level of the crowd over there: . The moderator remains silent. Silence gives consent?

      • mpainter
        Posted Jul 24, 2016 at 1:06 PM | Permalink

        Interestingly thread. Rice’s retreat position, which he reiterates, is that the question of Gergis’s truthfulness or lack thereof is a non-issue. That leaves him with only the science, and he has confessed that part is beyond him.

        • Posted Jul 24, 2016 at 1:30 PM | Permalink

          Indeed! All that Jazz started with the inexpressibly article of Joelle Gergis at “Conversation”. What was the reason for it? Disturbed self- perception? It’s very unlikely that she didn’t know about the pitfalls in the paper. Was there some underestimation of the skills of Steve and others? Or the firm conviction that other “brothers in arm” will regulate it? What cheek!

      • kenfritsch
        Posted Jul 26, 2016 at 2:57 PM | Permalink

        I skimmed that blog thread on the Gergis paper and I do not believe the posters on either side of the debate really understand the very basic statistical problem of post fact selection of proxies for temperature reconstructions. It is not whether the data is used trended or detrended or whether the proxy selection used a number of additional selection criteria without accounting for it in their statistical treatment, but about selecting proxies after the fact of knowing how the proxy related to the instrumental record.

        Follow the original discourse where the advantage of using detrended data was espoused as a means of avoiding the selection of series ending trends that are merely part of the red noise in a highly correlated proxy time series and not a trend at all. Given that those scientists doing temperature reconstruction appear not bothered in the least in using post fact selection of proxies nor in taking that process to point of data torture using additional post fact selection criteria if required to obtain the expected result, it would not be too difficult to believe that when the Gergis authors obtained a “good” result with post facto selection with the added benefit of doing it with what they thought was detrended data, they felt the data needed no further torturing to get it published and that they had the detrending bonus. Based on the paper that did finally get published it becomes obvious that trended or detrended data was never the issue but rather what post facto criteria was required to obtain what the authors felt was a “good” result. All these additional selection problems stem from not seeing the problem of using post fact selection criteria for the proxies. Once that barn door has been opened any additional selection criteria can be all owed through that same door.

        Bonferroni correction for using multiple post facto selection criteria is difficult to apply in practice, since it is never known what other criteria was tried or simply considered and glanced at before rejecting.

        • mpainter
          Posted Jul 26, 2016 at 3:34 PM | Permalink

          Ouija Board science, an apt phrase.
          One does not have to be a statistician to see the flaw of post hoc selection. Using the same Ouija Board, one can derive a result that refutes the original conclusion.

          The problem with those who cannot understand this is that they wish not to understand.

        • kenfritsch
          Posted Jul 26, 2016 at 4:26 PM | Permalink

          I should have been more specific that the blog thread to which I refer here is the one from the Ken Rice blog that frankclimate linked about a Steve Mosher post there.

        • Posted Jul 27, 2016 at 1:33 PM | Permalink

          Yes Ken, That post is pretty obtuse even by Rice’s loose standards. No technical content but a lot of Cawley apologizing for bad papers and making them seem OK and just the collateral damage of modern science. No paper is perfect so what’s the problem?

  40. Steve McIntyre
    Posted Jul 24, 2016 at 3:07 PM | Permalink

    I’ve downloaded a version of HAdCRU3v (used in the article) and calculated correlations (and some other regression stats) for all gridcells within 500 km x three lags. Brandon has been working on this as well – check hiizuru. I’ve done calculations including data up to 2000, since instrumental data is sparse early, but not late and many of the proxies go well into the 1990s if not to 2000, so the later data is relevant.

    It gets stranger and stranger.

    In the following, I’m working through Gergis’ protocol. Obviously, I do not endorse ex post screening. Some of Gergis’ data torture doesn’t seem to have made much difference to the number of cells recovered by screening and to have been unnecessary to populate the network.

    In nearly all cases where the highest correlation (t-statistic) is significant and is NOT for the “central” gridcell (the one containing the proxy) i.e. from data dredging, the correlation from the central gridcell (at the same lag) is also significant. I.e. there was no need under their method to trawl through the other gridcells. The only exception is for the Lough Great Barrier Reef series, where there is a stronger “significant” correlation to 17.5S, 142.5E than to 17.5S, 147.5E – a series which has negligible effect on the shape of the reconstruction.

    So the multiple gridcell criterion is a headscratcher.

    Lags are different. Although Gergis’ rationalization for checking lags (“seasonality”) was foolishness, there is a purpose to checking lags that she didn’t mention. The proxies are summer proxies and the six-month season (SONDJF) spans two calendar years. The convention for SH tree ring proxies is that they are supposed to be dated to the opening year, while, by experiment, I’m 99% sure that Gergis’ summer temperature data is assigned to the closing year i.e. 1989-90 summer is dated to 1990. So some vagaries could crop up in assignment of “NH” calendar year to SH proxies that makes lags worth checking.

    In many cases, there is a sharp spike in the t-statistic in one of the lags. In such cases, unless one has clear contrary documentation of the dating in the specialist publication (not usually available), I think that one can justify coordinating the proxy year with the temperature year. In cases where there is no sharp spike i.e. high correlation with two or more years, there’s something wrong with the linearity of the proxy (e.g. very U-shaped.) Gergis didn’t check regression relationships between proxy and temperature though such relationships (correlation) are used for screening. Durbin-Watson relationships in some cases are very poor.

    Nearly all of the coral data has lag zero (exceptions Bunaken, New Caledonia lag -1) indicating that it has been assigned to the closing year as well. (Because coral data is often available monthly, this can be checked.)

    A possible explanation for lag +1 is that the tree ring data from summer 1988-89 is dated to 1988 – per convention, while the temperature data is dated to 1989, but are actually from the same year. Most of the specific tree ring series with +1 lag have rather weak correlations

    The majority of Gergis’ tree ring proxy data synchronizes at lag 0. Perhaps tree ring data is assigned to the closing year by some authors – this ought to be checked. Alternatively, this could be tree ring data from summer 1990-91 (dated to 1990 by convention) responding to temperature from the previous summer 1989-90 (seemingly dated to 1990 by Gergis) – a response pattern that is attested in specialist literature.

    Two series from the same author (Celery Top East, West) have lag -1. This might mean that tree rings dated to 1990 correlated to temperature for the summer of 1988-89. For this to work, it seems that the author would need to have dated the rings to the closing summer and that the tree rings responded with a lag of one year.

    There is one very big red flag that I’ve encountered. I’ll write this up as a separate comment and, after analysis, probably a post.

    • MikeN
      Posted Jul 24, 2016 at 4:54 PM | Permalink

      The multiple gridcells tactic is likely the work of Mann. He submitted evidence to help them avoid redoing the paper. At RealClimate, there was commentary that they could use the methods of Mann 08 to resurrect the paper.

    • Geoff Sherrington
      Posted Jul 25, 2016 at 6:57 AM | Permalink

      It gets more complicated with the summer temperatures. In the Tropics, there are 2 days a year when the Sun is directly overhead. Close to the Tropic the days are close together, but near the Equator they are 6 months apart. Thus the choice of SONDJF months has less purpose close to the Equator.

    • Geoff Sherrington
      Posted Jul 25, 2016 at 7:04 AM | Permalink

      Is there a numerical typo with “17.5S, 142.5E than to 17.5S, 147.5E”?
      The former is in the middle of Cape York, not near the reef.

      Steve: Gridcell locations are conventionally in the middle of 5 deg grids. The GBR proxy was shown at 18S, 147E in the Gergis info.

      • Posted Jul 25, 2016 at 9:59 AM | Permalink

        Figure S1 of Gergis et al. (2012) shows a negative correlation between the regional temperature (field mean) and that of the mainly off-shore gridcell centered at (17.5S, 147.5E). However, it has a positive correlation to the mainly land gridcell at (17.5S, 142.5E).

  41. Steve McIntyre
    Posted Jul 24, 2016 at 3:36 PM | Permalink

    CA readers are aware of my longstanding interest in the Law Dome d18O series as a SH proxy and the determination of IPCC authors not to show it. An earlier version, with about 4-year resolution, had noticeably high values in the late first millennium and early second millennium. AR4 authors refused to show it, despite the scarcity of SH proxies.

    G16 has a version of Law Dome that Gergis (and Tas van Ommen) refused to provide me in 2012. The correlation of Law Dome d18O to summer temperature in the local gridcell over 1931-2000 is a very high 0.504. Data is not available before World War II, but there are 47 years of data, enough to yield a very high t-statistic of 4.407. The Durbin-Watson statistic is good (much better than many tree ring series). The high correlations refute IPCC AR4 reasons for not showing this series.

    Gergis also excluded this series despite its very convincing relationship to local temperature. To be included in her network, she required 50 values in 1921-1990. Law Dome had 47 over 1931-2000. Here’s what the series looks like.


    Because the Gergis reconstruction only has two series in its early portion, inclusion of Law Dome would make a difference. Also, all the series available in 1000 are available for several centuries earlier, so there is no objective reason not to show the reconstruction to an earlier date.

    • Steve McIntyre
      Posted Jul 24, 2016 at 5:28 PM | Permalink

      Gergis’ amateur statistics is well evidenced by her use of correlations rather than t-statistics to measure regressions. A t-statistic incorporates the number of observations.

      Of course, if Gergis had used a t-statistic, she wouldn’t have been able to screen out the Law Dome series, since it has a better t-statistic than nearly all of the data.

      The more that I look at this, the more likely it seems to me that Gergis concocted an ex post criterion for excluding Law Dome.

      Update: she did calculate a t-statistic but ignored it!!!

      • TimTheToolMan
        Posted Jul 24, 2016 at 10:18 PM | Permalink

        Agreed. I rather suspect the more standard “30 years” would have been enough if it was a proxy that supported her findings!

      • Michael Jankowski
        Posted Jul 26, 2016 at 8:38 PM | Permalink

        New post coming soon to delve into the t-stat issue?

      • Jimmy Haigh
        Posted Jul 27, 2016 at 7:47 AM | Permalink

        You couldn’t make it up…

      • kim
        Posted Jul 27, 2016 at 1:43 PM | Permalink

        Ignored? Not likely; it goaded her.

      • scf
        Posted Jul 30, 2016 at 9:25 AM | Permalink

        A good analogy of this behaviour is cooking. A good chef will mix and match, trying this and that, until the end result is just the kind of taste the chef was looking for.
        If a t-statistic causes the stew to go sour, then change the recipe.

    • mpainter
      Posted Jul 25, 2016 at 9:25 AM | Permalink

      Van Ommen, in a 2011 interview with Science Poles, mentioned two other “well-resolved” ice core studies that he had conducted. These were: Mill Island, 100°E and Porpoise Bay, at 130°E, both on the Antarctic coast at approximately the same latitude as Law Dome (65 S, 112 E). Assuming that d18O analyses were performed on these, it would be interesting to compare these two with Law Dome. I can’t say whether these presumed analyses were published. The information should be available through the Australian Antarctic Survey.
      If d18O analysis were performed, then these are two more ice core temperature proxies ignored by Gergis et al.

      Steve: We’ve been waiting nearly 30 years for proper publication of Law Dome d18O data, most of which remains unpublished. So it may take a while. This has been a longstanding irritation and issue here. See for prior posts

      • mpainter
        Posted Jul 25, 2016 at 11:05 AM | Permalink

        Worth noting that Gergis, Karoly, and van Ommen have been co-authors. I would venture that van Ommen has made the d18O data on the three sites available to his colleagues.

  42. Steve McIntyre
    Posted Jul 24, 2016 at 5:59 PM | Permalink

    BTW Gergis’ claim that her team independently discovered the error prior to the CA post reminds me of Gavin Schmidt’s claim (see that “BAS [British Antarctic Survey] were notified by people Sunday night who independently found the Gill/Harry mismatch” that I had drawn attention to at Climate Audit.

    • Michael Jankowski
      Posted Jul 24, 2016 at 7:56 PM | Permalink

      Gergis just applied a lag of -1 days to her statements and +1 days to yours, hence the final result is that she found it two days ahead of you.

    • Gerald Machnee
      Posted Jul 26, 2016 at 10:16 PM | Permalink

      Yes, it is hard to believe that they missed the SuperBowl to find the error.

  43. thingodonta
    Posted Jul 25, 2016 at 11:26 PM | Permalink

    If you expand lags and grid cells long and large enough, you can come up with just about anything. There has to be some kind of stop to this. In mineral resource statistics there are rules for such things, why not within climate publications?

    It seems science now works like the Red Queen, one has to keep running faster and faster just to keep up with new and innovative ways to produce pre-determined outcomes. Better checking by journal editors would go some way to address this.

  44. Posted Jul 26, 2016 at 12:05 AM | Permalink

    “There is precedent for correlation to previous year temperatures in specialist studies. For example, Brookhouse et al (2008) (abstract here) says that the Baw Baw tree ring data (a Gergis proxy), correlates positively with spring temperatures from the preceding year. In this case, however, Gergis assigned zero lag to this series, as well as a negative orientation.”

    The Brookhouse(2008) abstract states that “ring width correlates significantly with net radiation, precipitation and mean minimum and maximum air temperature during the preceding winter and spring of the growing season.” As I read it, this is the *current* year, not the preceding one. E.g. temps for JJASON 2015 affect growth during the SOND 2015/JF 2016 period. While there might be uncertainty for some proxies as to which calendar-year-assignment convention is used (SOND calendar year vs. that of JF), this proxy at least should be clear. And the temperature index has no such ambiguity. For this proxy at least, one can ex ante fix the correlation direction and lag.

  45. Posted Jul 27, 2016 at 6:02 AM | Permalink

    “The lag of +1 years assigned to 5 sites is very hard to interpret in physical terms. Such a lag requires that (for example) Mangawhera ring widths assigned to the summer of 1989-1990 correlate to temperatures of the following summer (1990-1991) – ring widths in effect acting as a predictor of next year’s temperature. Gergis’ supposed justification in the text was nothing more than armwaving, but the referees do not seem to have cared.”

    Haha! Usually you hide this sort of poor statistical inference in some sort of arbitrary average or moving average. Even then it is very dubious. But a lag that is pre-constructed in order to make the data fit? I’ve never heard of anything like it. But let me give the anti-BS warriors in Climateland some succour from a similar enterprise run in a different profession.

    In the 1970s and 1980s the economics profession got themselves busy trying to correlate money supply growth with inflation. They too tried to datamine a relationship based on a time lag. The a priori time lag was assumed, for (sort of) logical reasons, to be two years. When the international data emerged the economists were faced with a very different picture. (The following is from evidence presented to the UK Treasury by Nicholas Kaldor in 1983):

    “Of the thirty observations recorded in Table VII… in nineteen cases the closest ‘fit’ was obtained when no time-lag was assumed… a one year lag gave the ‘best fit’ in four cases and a two year lag in seven cases… There is certainly nothing in these figures that would justify the far-reaching and confident assertions of Treasury Ministers about the existence of a significant time-lag, which, as was repeatedly asserted, is based on ’empirical evidence’ or ‘practical experience’.”

    The good news is that monetarism is today completely dead. The bad news is that it had to fade out of fashion and was not discredited by the evidence presented. I’m sure that the evidence certainly helped to push it out of fashion. But ultimately the fashion had to die before the theory was discredited. Today most economists have forgotten these debates and continue to make similarly dubious claims about new faddish theories.

    • Michael Jankowski
      Posted Jul 27, 2016 at 7:35 PM | Permalink

      Well so we just need to find the trees that predict future temps greater than one year out, and we can get rid of all of the model “projections.” We can have both treenometers and treemodelers.

      • MikeN
        Posted Jul 29, 2016 at 6:25 PM | Permalink

        Are the treerings autocorrelated? If so, then they can predict temperatures a year in advance, and can predict treerings a year in advance, so the one year plus treering predictions can be used to predict year+2 temperatures and so on.

  46. Posted Jul 27, 2016 at 6:05 AM | Permalink

    Oh, and a quote that you all might enjoy from John von Neumann.

  47. Gonzo
    Posted Jul 27, 2016 at 5:53 PM | Permalink

    Pinch OT but I wondered over to the (not) Conversation to see for myself and boy have they “tortured” that conversation. Talk about moderation on steroids. You can’t make heads or tails with all the comments deleted. Cheers and thanks for all your do Steve. You’re able to communicate so well to laymen like myself. I’m sure it’s a chore.

  48. AntonyIndia
    Posted Jul 28, 2016 at 3:45 AM | Permalink

    I am wondering if the new article by Lijing Cheng, Kevin E. Trenberth, Matthew D.Palmer, Jiang Zhu, and John P. Abraham called “Observed and simulated full-depth ocean heat-content changes for 1970–2005” is another case of data torture. Argo buoys are only operational since 2000 and till 2000 m depth only. The ship measurements before were pretty sparse and shaky. Splicing the two looks tricky.

    Steve: not everything is “data torture”

    • John Bills
      Posted Jul 28, 2016 at 2:31 PM | Permalink

      More like making up data.

  49. kenfritsch
    Posted Jul 28, 2016 at 1:18 PM | Permalink

    I have been analyzing the 27 Gergis proxies on an individual basis. I am separating the trends in these series using Singular Spectrum Analysis (SSA) from R [library(Rssa) and using the functions ssa and reconstruct] and finding in my analysis of the first five proxies that the series are for the most part noise and in this regard very unlike observed temperature series for the instrumental period. In these 5 analysis I have used the first component (group 1) as a trend even though that first component tends to show more interconnection with other components than is seen with observed temperature series. The analysis involves some heuristics in determining the trend and I think I might be generation in assigning a trend. Using a trend does allow for the determination of confidence intervals (CIs) for the series. For CIs and mean temperatures (0.50 level) an ARMA model is used to determine the red and white noise in the residuals and followed by a 1000 time simulation using the ARMA model and the original SSA derived trend. The trend and the 95% CIs are presented in graphs with the proxy name as a heading.

    I would like to show those results here incrementally as I obtain the results from this rather time consuming analysis. If anyone here has any suggestions for doing this analysis or has objections to doing it I would like to hear from you.

    Steve: if you’ve got a lot of plots in similar format, why not make a pdf of them and link. The “proxies” look more like noise to me, as well.

    • kenfritsch
      Posted Jul 28, 2016 at 4:33 PM | Permalink

      SteveM, I think I will either do a dropbox link to an Excel or Word file with all 27 proxies shown. Word might be better for putting comments under the graphs.

      I get rather obsessed with the idea that not seeing the underlying proxy data analyzed for these reconstructions fails to point to the weaknesses in the reconstruction conclusions even when the selections are wrong-headed in being made post facto. A publishing author and editor can probably use the excuse of the space required to show details of the individual proxies but I think the whole point is then lost.

  50. Posted Jul 28, 2016 at 2:16 PM | Permalink

    If you have 100 trees and 10 of them are highly correlated to global average temperatures, the 90 that are not correlated are telling you that the other 10 are an accident. They are matching simply by chance.

    Take any arbitrary group of 10 trees in place of the 10 you select by “calibration”. You will find that they predict past temperatures statistically equal to the 10 that you calibrated.

    This statistical problem is well known in other disciplines outside of climate science. If temperature is what you are trying to study, then you cannot use temperature as a filter for your data. The moment in which you do, you immediately invalidate further statistical analysis.

    • Posted Jul 28, 2016 at 5:59 PM | Permalink

      ferdberple: “They are matching simply by chance.”

      There is a logical way in which I think you can use temperature as a filter; pass the following tests:
      1) Do your calibration in both a declining temperature site and an ascending one for each species.
      2) Make certain that the correlating sample of the proxy outperforms randomness in both cases.

      • davideisenstadt
        Posted Jul 31, 2016 at 2:23 PM | Permalink

        one problem with your suggested course of action is that proxy sites often dont show any relation to climate before one discards the inconvenient trees that dont correlate to temperature, whether negatively or positively.
        And even if one finds one site that appears to correlate with temperature, if that site is one culled from a prospective field of ten sites, then one has mined for the correlation.
        The only way to do this properly is to establish a rationale for discarding individual members of the particular proxy data set before analysis, and then report one’s results.
        If one is going to perform a metastudy of a group of proxies, then one should decide before analysis just which proxies one wishes to use, and then, the rationale for using or discarding a particular proxy should not be correlation with temperature during the calibration period.
        If one is going to pick cherries from a pile of proxies, then at least on should correct for this by using the technique suggested by bonferroni….and this isnt really a solution to the problem, only a band aid.

        • Posted Jul 31, 2016 at 3:47 PM | Permalink

          I agree with your comment. I was simply saying that one could in fact use for a proxy filter the variable of study. In fact, I can’t think of another test for validation of the proxy’s accuracy and precision. My point was that one needs to take extreme precaution against cherry picking (testing against randomness) and also confounding influence. For example, if the infamous Strip Bark Bristle-cone Pine proxy had been tested in a local environment known to have a descending modern trend then the CO2 fertilization influence could have been discovered before improperly using it.

        • davideisenstadt
          Posted Jul 31, 2016 at 4:50 PM | Permalink

          ” if the infamous Strip Bark Bristle-cone Pine proxy had been tested in a local environment known to have a descending modern trend then the CO2 fertilization influence could have been discovered before improperly using it.”
          I had not considered that Ron.

    • Curious George
      Posted Jul 28, 2016 at 6:04 PM | Permalink

      That’s where screening comes handy. You simply screen out anything that contradicts your scientifically predetermined conclusion.

  51. Stu
    Posted Jul 29, 2016 at 10:45 AM | Permalink

    ATTP writes- 13th July 2016 (on his own blog):

    “There do indeed seem to be some who would like to be taken seriously, but then find excuses why they don’t actually publish anything. They complain bitterly if they feel that they’ve been maligned in some way, but then find reasons why their “attacks” on others are justified. It’s all just a game; I think it might be called ClimateballTM”

    ATTP writes- 22nd July, 2016 (here):

    “I don’t think Gergis specifically accused your earlier posts as being inappropriate, but that some of what she was subjected to at the time was inappropriate. ”

    Obviously, the first quote from ATTP’s is obviously meant as a portrait/description of Steve Mc himself, and not random other people on the internet. In the second quote he admits that Gergis could hardly accuse Steve’s communications with her before July 2016 as being at all inappropriate.

    ATTP accuses Steve of playing a game and wonders aloud several times what he hopes to accomplish by ‘bringing up the past’.

    ATTP is playing Climateball all by himself.

  52. venus
    Posted Jul 30, 2016 at 5:58 AM | Permalink

    This is reminiscent (I dont know hwere to put the i’s n’s m’s in that word..) of something in a sw project
    I was involved in
    where the author of some module said the machine crashed because just one bit (his module was many megabytes)
    He didnt fix the bit either it had all to be rewritten

    I also wonder (not really) if a Bayesian interpretation of this can be given

  53. Posted Jul 31, 2016 at 3:28 AM | Permalink

    Asking ATTP to leave out snarky editorials and jibes is asking him to be silent.

  54. miker613
    Posted Jul 31, 2016 at 9:15 AM | Permalink

    ATTP says that Gergis et al are entitled to ignore these issues until an actual journal article refutation is published. Could be, but the question remains for the rest of us: In the six months to a year that is needed to publish, what should the presumption of science be? Given that everyone involved in the issue has seen this article – I take that as a given and would find it hard to trust anyone who claims otherwise – the ball is now in Gergis et al’s court. Their paper should currently be presumed to be wrong until they respond. Anyone quoting them should be pointed to this article with a note: That paper has been tentatively refuted.

    • mpainter
      Posted Jul 31, 2016 at 9:27 AM | Permalink

      Exactly. Gergis et al need to respond or the assumption will be made that they have no answer. I don’t think ATTP realizes that. He seems to imagine that they can somehow dodge the issue.

    • Posted Jul 31, 2016 at 1:00 PM | Permalink

      miker, What ATTP is expressing is a pre-replication crisis attitude. In his field, which is policy irrelevant, that attitude may be more defensible. In fields that society relies on such as medicine, that attitude is just inexcusable and a very lazy response. We should demand more and expect it.

  55. Jeff Alberts
    Posted Aug 2, 2016 at 9:11 PM | Permalink

    Looks like everyone has let ATTP derail this post, with a bunch of “did not, did too” tit for tat.

  56. Posted Aug 3, 2016 at 9:49 AM | Permalink

    Reblogged this on I Didn't Ask To Be a Blog.

  57. kenfritsch
    Posted Aug 4, 2016 at 6:24 PM | Permalink

    While I have often times made my judgment clear on the post fact selections of proxies for temperature reconstructions and given reasons why it is incorrect from a statistical point of view, I think that perhaps when the proxy data are presented in complied or spaghetti graph form we lose sight of how weakly even post fact selected proxies are when those doing reconstructions attempt to use the proxies as noisy thermometers. That is why I always start my analysis of temperature reconstructions looking at the individual proxy series.

    I have decomposed and reconstructed 27 Gergis proxies into the components of trend, periodic and red and white noise using Singular Spectrum Analysis (SSA) with L=15 years. Before applying the SSA, the 27 proxies were centered and standardized (subtracted means and divided by standard deviations. The anomaly period was 1888-1990. The separation into components requires using some heuristic tools that come with the package in the R library{Rssa}. There are some subjective calls that are made with these tools that can be made less subjective by comparing results using simulated series.

    In these proxies there was no clear cut evidence for periodic components. The trend components in these series had some connection with the noise components in most cases (not surprisingly more than is seen with observed SSA derived trend) and in a number of cases more the one group had to be combined to reconstruct the trend. The critical heuristic in deciding on the groups that contain the trend and period components is the W-correlation matrix. This matrix “is a standard way of checking for weak separability between the elementary components. In particular, the strongly correlated elementary components should be placed into the same group. The function calculates such a matrix either directly from ‘ssa’ object or from the matrix of elementary series.”

    With components that are difficult to separate the best outcome is to find the trend and allow the remaining components to be noise. The trend component groups will always have much less correlation with other groups than the noise and periodic components will show with other groups. If 2 successive groups have a high correlation and little with other groups that pair can be designated as part of a periodic component.

    After extracting the SSA derived trends from the 27 Gergis proxies the residuals were used to fit an AR1 model. That model provided an ar coefficient and a standard deviation of the AR1 model residuals that were in turn used in 1000 simulations to estimate the confidence intervals (CIs) for the SSA derived trends for the 27 proxies. Those trends and CIs are shown in an Excel file in graphic form in the first Dropbox link below. Looking at the series derived trends and the CIs, the proxies can be, as a first step, categorized as to whether the trend and width of the CIs would allow an ending trend to be deemed significantly higher than previous trends. This can be done by looking at the lower CI of the ending trend and comparing it visually to peaks that come earlier in the series. For a number of these proxies it is rather easy to see that the CIs are too wide to make a case for a higher ending trend and that a number of these series contain rather non directional trends that run both up and down over the entire trend series. After applying this visual filter and allowing marginal cases through, we have the remaining proxies:

    3 tree ring proxies of Pink Pine South Island Composite (marginal), Mangowhero and Stewart Island HABI Composite and 3 coral proxies of Palmyra(using only the most recent segment), Maiana and Rarotonga.3R (marginal)

    All 3 of these tree ring proxies have in common an upward series ending trend that would appear to be out of synchronization with the expected trend for AGW in that trends start upward for Pink Pine near 1930, Mangowhero near 1925 and Stewart Island near 1840. These 3 proxies also show a flattening to downward ending trend starting around 1980. Some of the other tree ring proxies show the same upward ending trends but have CI widths such that that period cannot be shown to be significantly warmer than previous periods.

    A majority of the coral proxies show longer upward ending trends like those mentioned above for the tree tings with a number starting again out of synchronization with AGW timing. Of the three mention above Palmyra and Maiana appear in synchronization with AGW while Pink Pine starts upward around 1940.

    It is important to test here the point that post fact selection is going to “look” for an upward ending trends – and in this case many of the proxies do exhibit that trend – and those upward trends, if merely from the selection process, would not be expected to have the same high correlation throughout the series. To that end I have performed cross correlation tests on all of the paired 27 Proxy SSA derived trend series using the common data between pairs for the entire common period, the common period excluding any data after 1949 , the common period for 1898-1949 and the common period for the data after 1949. Those results are shown below in the second Dropbox linked Excel file. I should note here that a coherence analysis may have been informative, but I had problems determining the optimum smoothing function to use in determining significant frequencies and corresponding time periods. At some level of smoothing there existed very little coherence between proxy pairs outside of the post fact selected series ending trends.

    The change in the correlations from using the after 1949 period in comparison with the other periods used above indicates that post fact selection biases the after 1949 period for paired proxy correlations. The absolute mean and standard error for all 27 proxy pairs correlations were:

    After 1949 mean = 0.47, SE = 0.014; Entire period mean = 0.38, SE= 0.014; Before 1950 period mean = 0.27, SE = 0.012 ; 1898-1949 period mean = 0.35, SE = 0.014.

    The number of negative paired correlations, even in the period after 1949, indicates that trend signs may have been flipped in the methodology used in the Gergis temperature reconstruction. The flipping of signs for one period to another makes a measure of the absolute value of the change in correlation an important comparison between periods, and thus I summarize that measure below using the after 1949 period pairs as the reference:

    Entire period mean of absolute value of change = 0.38, SE= 0.017; Before 1950 period mean of absolute value of change = 0.51, SE = 0.019 ; 1898-1949 period mean of absolute value of change = 0.53, SE = 0.023.

    Finally to see the differences in low and high frequency correlations I did cross pair correlations using the standardized series (before extracting the SSA derived trend) for the entire period and for the after 1949 period. The means and SEs for those periods were:

    Entire period mean = 0.19, SE = 0.007; After 1949 period mean = 0.22, SE = 0.008

    I make no claims here for any discovery of new findings in this analysis of temperature proxies or that the analysis is particularly pretty in presentation, but rather I put the analysis forward as some grunt work on the part of temperature reconstructions that is almost always missing and that is part that sheds some light on the weaknesses of the approaches and the data used as proxies for historical thermometers. Those authors publishing reconstructions are more like the adversarial defense lawyer (and unlike the disinterested scientist) giving only one side of the factual case. That makes my case here more in line with the prosecution.

    • davideisenstadt
      Posted Aug 5, 2016 at 5:05 AM | Permalink

      This is my take on your laudable efforts:
      You examined 27 proxies, and 3 (about 10%) show what appears to be a significant correlation to temperature.
      But almost 90% didnt make the cut.
      Why should one expect that the 10% of the data sets that did show some weak correlation to temperature did so because they actually were responding to changes in temperature?
      Its more probable that they merely appear to reflect temperature.
      After all, the other 90% of the proxies, presumably collected and analyzed because the researchers who collected thought they should (reflect variance in temperature), didnt.
      Also, a plausible physical or biological rationale for correlations flipping from positive to negative for different periods should be articulated before they are flipped for the purposes of curve fitting, no?
      What these guys do is data mining, no more no less.

      • kenfritsch
        Posted Aug 5, 2016 at 9:41 AM | Permalink

        davideisenstadt, you have stated conclusions that can be drawn from my analyses.

        A more general point I had hoped to get across is the lack of in-depth analyses of the individual proxies used for temperature reconstructions by those authoring reconstruction papers. The lack of those analyses and the wrongheaded post fact selection of proxies can only be taken together as an unscientific approach of assuming a “correct” answer and working backwards in attempts to show evidence. If looking deeper with more detailed analyses shows the evidence might not support the assumed answer – as should be the practice of the true scientist – apparently these scientists are willing to forgo it in their haste to support what they have already concluded as the truth.

        • pauldennis2014
          Posted Aug 5, 2016 at 10:05 AM | Permalink

          I’m writing as a scientist on the periphery of palaeoclimate studies and am puzzled by the lack of progress made with these multi-proxy, regional analyses. To date every reconstruction I’ve seen is merely ‘exploratory’ in nature involving ad hoc, post hoc data screening for proxies that might, or might not be responding to temperature.

          The approach that these groups doing such analyses should now be taking is to define the physical nature of the response and determine defining characteristics of suitable proxies (e.g. are the temperature sensitive proxies tree line sites, high latitude ice core, temperate speleothems, coral etc?). New samples can then be collected from a range of proxies from suitable sites and a climatic record determined that involves no post hoc data screening or selection. This would either confirm or refute the exploratory studies.

          Until such research is done all present reconstructions should be treated as provisional and merely exploratory in nature.

        • Steve McIntyre
          Posted Aug 5, 2016 at 10:51 AM | Permalink

          Hi, Paul,
          nice to hear from you. I agree 1000% with your statement.

          New samples can then be collected from a range of proxies from suitable sites and a climatic record determined that involves no post hoc data screening or selection. This would either confirm or refute the exploratory studies. Until such research is done all present reconstructions should be treated as provisional and merely exploratory in nature.

          These were absolutely my very first instincts when I encountered this field. If the “proxies” were faithful thermometers, then they should be off the charts in up-to-date data. If bristlecone chronologies were magic thermometers, then show this through up-to-date data. Similarly Greenland isotopes etc.

          As someone who was familiar with (or at least aware of) attempts to correlate stock market indices to supposed “proxies”, I was well aware that one could fit the data in the past, but the models/systems quickly fell apart with fresh data. I cited such articles in our original articles. Financial analysts were also well aware of the practice of substituting new regressors when the old ones failed.

          Unfortunately, rather than taking this approach, the multiproxy authors have doubled down on their recycling of the same old data. Mann et al 2008 purported to comply with NAS 2006 recommendations, but used the bristlecones once again, despite the NAS recommendation that they be avoided. Doddering Gerry North reviewed Mann et al 2008 and appeared to be unaware of the trick. Gergis uses the same long proxies (Mt Read,Oroko) as the IPCC 2007 illustration, with Mt Read being used in Mann et al 1998 and Jones et al 2008.

          99% of the effort in the field seems to be in the application of opaque multivariate methods by academics who may like the outdoors or care about the environment, but who have limited mathematical skill and negligible understanding of the linear algebra underpinning their enterprise.

        • davideisenstadt
          Posted Aug 5, 2016 at 11:47 AM | Permalink


          Absolutely. (and can I add, thanks for the analysis…)
          You have articulated this point far better than I could.
          The obstacle to the type of analyses you suggest is the expected result, which as you show, aint so good.
          IOW you can put mayonnaise all over chicken feces, but you dont get chicken salad.
          You get a plate full of chicken crap and mayonnaise.

          Steve, above, touched on multi proxy studies.
          I believe he has posted on these before, and I recall him providing us with a graphical presentation of many of the most citied proxy time series, along with a “provenance” of the samples from which they are derived.
          To treat these proxies as independent of each other when they share major constituent elements is indefensible.
          In the end the basic assumptions made in order for statistical analysis to be valid, and be of some utility are routinely violated.
          (thats my pet peeve)

        • mpainter
          Posted Aug 5, 2016 at 11:54 AM | Permalink

          Steve: “Unfortunately, rather than taking this approach, the multiproxy authors have doubled down on their recycling of the same old data.”

          It has always been my impression that the datasets that could be teased into showing a warming “signal” were in limited supply, and that the supply is now exhausted.

  58. kenfritsch
    Posted Aug 5, 2016 at 1:10 PM | Permalink

    Paul Dennis, I wholeheartedly agree with your suggested path for putting the search for temperature proxies on a scientific basis that can be supported with proper statistics. I have made this suggestion many times at these blogs and have stated that I cannot take the results of temperature reconstructions seriously until such a course of investigation appears to be in the works.

    There would be a good deal of scientific work required to properly address this issue and this point one would think would not be lost on the potential participants. Perhaps the problem becomes a political one of not considering the questioning of past temperature reconstructions because it would appear to weaken the current consensus on AGW and the unprecedented occurrence of the modern warming period.

    • davideisenstadt
      Posted Aug 5, 2016 at 3:49 PM | Permalink

      The construct that one should define, a priori, criteria for the collection of samples in order to construct a proxy time series, that these criteria should be defined by physical attributes, like altitude, orientation, location, intact status (no strip bark samples) and species of whoever, plant or animal,
      that the physical or biological relationship of the proxy to temperature have some rational basis,
      that it be articulated ,and that the nature of the correlation be posited before the collection of the data,
      while viewed as sensible, natural, and necessary by any with even a scintilla of experience in applied statistical analysis simply accepted in that community.
      IMO they dont do this because:
      1) collecting new data isnt as fun as crapping sound with the same data sets you’ve been mining for years,
      2) its not as lucrative
      3) it may create inconvenient data, like law dome, for example.
      this level degree and pervasiveness of malfeasance simply cant be coincidental.

      • davideisenstadt
        Posted Aug 5, 2016 at 5:27 PM | Permalink


        “… simply IS NOT accepted in that community”

    • pauldennis2014
      Posted Aug 6, 2016 at 3:33 AM | Permalink

      Steve, Ken and David,

      I’m reluctant to take this debate further since it is outwith the topic of this discussion thread. However, I wouldn’t ascribe the lack of progress with multi-proxy palaeoclimate reconstructions to any motive on the part of those practising it.

      Steve says that 99% of the effort seems to go into recycling old proxy archives amongst the different studies with the invention of multivariate statistics. Many of these seem to have ill defined properties. This is surely a consequence of a lack of mathematical and statistical knowledge and skills. At the same time many of these groups also seem to have little, or no background in experiment design, wether field or laboratory based. I’m sure they think they are sincere and think they are practising good science.

      As a field based scientist (geologist) with a strong experimental background (experimental mineralogy and isotope geochemistry) such an approach is anathema to me. I think it is the same for most others with backgrounds in the physical sciences and engineering. The approach described by Ken and David is intuitively obvious to those who have had to design experiments to determine the physical properties and behaviour of systems.

      The fact remains that however you dress it up data mining it is still data mining and exploratory in nature. It tells us little, or nothing about the climate of the past until confirmed by robust scientific studies.

      The climate change industry is vast and populated by people with different skills and backgrounds. Most are not in the natural sciences. e.g. geographers, social scientists, economists, medical practitioners etc. At present it is these groups that are framing the debate. I have made at least four attempts in the past five years, all without success, to secure funding for investigations into the properties of isotope based geothermometers. All external review comments have been strongly positive and supportive, with the exception of one – ‘why do we need another palaeothermometer when we have enough already?’. It leads one to question who is making decisons and on what basis?

      The upshot is that my experience suggests that fundamental studies are increasingly difficult to secure research funding for. The lead time to major advances in palaeoclimate reconstruction studies is too long and leaves the field open to those studies that promise new, independent chronologies with little effort expended in either the field or laboratory. In reality these are re-hashes of existing bowls of spaghetti.

      • Posted Aug 6, 2016 at 12:59 PM | Permalink

        Paul, Thanks for this information. This problem of getting funding for fundamental research is not limited to climate science. It’s true in fluid dynamics as well.

        I would say that part of the responsibility for this state of affairs rests with the scientific community itself. The relentless marketing of science, and computational models, has given a lot of laymen the idea that these problems are solved or “good enough” to do what society needs to do. You can see ATTP advocating this doctrine for example. Then people complain when no one wants to invest money in a “solved problem.”

        In fairness, there is another side of the coin. In fluid dynamics, those who design and calibrate models are pretty honest about the issues and the pitfalls, in some cases devastatingly so. It is really those at the periphery, whose job is to “communicate” or to “market” who, either out of ignorance, or out of self interest, just give a very biased impression to outsiders. Also, those who run models and apply their results also benefit personally from the idea that the models are better than they are. It generates more employment for them. In climate science, most people run the models. 🙂

  59. TW
    Posted Aug 8, 2016 at 3:28 PM | Permalink

    Hi Steve,

    Another great job analyzing another very poorly done paper. I have a specific, narrow suggestion.

    Would it be worth while sending in a letter to the journal, requesting a corrigendum relative to the calculation of statistical significance, applying a Bonferroni correction which drops the calculated statistical significance below the value stated in the published article? It seems that the methods section of this paper clearly states that a larger number of comparisons were made, than the number on which the statistical significance calculation was based. That seems like something that the editor might go along with. After all, the editor was sharp enough to catch the change of reasoning from Gergis about why the paper needed modification in the first place.


  60. kenfritsch
    Posted Aug 11, 2016 at 6:43 PM | Permalink

    I decided to look at the correlation of the Gergis 27 proxy series with the local temperature station series for both before and after detrending. The local station data are sparse and thus I used the 10 nearest stations (10NN) to the proxy site. The station data were extracted from KNMI for the GHCN adjusted mean temperature series. Taking more stations allows the in-filling of missing data over longer periods of time and gives more validity for the proxy response to represent a larger area. I used annual data and not seasonal since it is the annual representation that is important for reconstructions and if the seasonal data does not correspond well with the annual data that should be part of the evaluation up front. That seasonal data might correlate better with the proxy response is a secondary issue that does not add value to the reconstruction. For the correlations I used all the common periods of the proxy and the mean of the station series and in this way obtained a better look at these relationships using more data.

    The trends and residuals (detrended) were determined using Singular Spectrum Analysis on the proxy and 10NN series with L=30 years. The correlations were corrected for red and white noise and reported in the table in the link below. The corrections for noise were determined by finding the best fitting AR1 model for the residuals and using that ar coefficient and the standard deviation of the AR1 model for 1000 simulations. For the series before detrending the correlations were significant with 95% confidence intervals (CIs) for 18 of the 27 Proxy and 10NN series with the correct sign. The Urewera correlation was the significant but with the wrong sign. For the detrended series there were 6 proxy and 10NN residual series with significant correlations with the correct sign and with again Urewera being significant with the wrong sign.

    What I noticed in the SSA decomposition and reconstruction into components was how different the various proxy and 10NN series were for the groups required to reconstruct the trend components and how different those trend series (and raw series) appeared. I have not read the current Gergis paper but I do not think that the sampling error (spatial) in these temperature reconstructions are detailed or even carried out. With the variations in trends over a relatively small region of the globe and the not particularly high correlations among the proxies as is the case for the Gergis reconstruction, I am curious as to what that sample error would be. I intend to use the approach applied by Phil Jones as described in the second link below and apply it to the Gergis reconstruction. If someone reading here has a better suggestion, please let me know.

    Important disclaimer here: This analysis is done merely as a check on the reconstruction approach even with the basic error of post fact selection of proxies and in no manner or form makes any concessions on how flat out wrong post fact selection of proxies is for temperature reconstructions.

    • Posted Aug 12, 2016 at 12:13 AM | Permalink

      kenfritsch –
      As you are correlating to local stations, you may want to note some location errors:
      Rarotonga: 21.5°S 160°W (not 160E);
      Maiana: 1°N 173°E (not 1S); and
      Bunaken: 1°30’N 124°50’E (not 3S 123E).
      [There may be other errors, but these stand out, being in the N or W hemispheres.]

    • davideisenstadt
      Posted Aug 12, 2016 at 7:21 AM | Permalink

      you are certainly doing the lifting there, let alone the heavy lifting…

      An Interesting observation (at least to me):

      An R of 0.53 (Mt Read) looks great, especially when contrasted to an R of 0.09 (!) (Oroko)
      but… an R of 0.53 implies an R square of 0.2809…meaning that this proxy “explains” a little less than 30 effing percent of the variance in temperature.
      And this is a “good proxy”?
      Only in paleoclimatology. it appears.
      As a snarky aside Ken (snark directed at others than you, of course):
      Dont you get tired of trying to make chicken salad?

      • kenfritsch
        Posted Aug 12, 2016 at 9:14 AM | Permalink

        David, please recall that obtaining a higher correlation of a proxy to the observed temperature series without detrending is a relatively easy post fact selection. You have an upward series ending
        trend for most of the observed series that merely needs a proxy with an upward ending trend to obtain a significant correlation. For series with higher serial correlations obtaining a false trend (stochastic) is not a difficult task.

        I should note here that there are 7 detrended proxies, and not 6 as reported above, with significant correlations with the detrended observed 10NN series with the correct sign.

        • davideisenstadt
          Posted Aug 12, 2016 at 9:52 AM | Permalink

          Yes…but for use as a reliable proxy for variance in temperature, I think an R of .6, say, isnt vey good (detrended or not).
          As you so politely note, achieving a R score like that of Mt Read, given that one has thrown out a bunch of proxies that dont generate the proper wiggle, isnt very convincing evidence.
          A significant correlation doesn’t, in and of itself indicate that one has found a reliable robust proxy especially when one screen for the correlation.

          BTW, Youre quite droll, you know?

    • Posted Aug 12, 2016 at 12:11 PM | Permalink

      kenfritsch –
      As you are correlating to local stations, you may want to note some location errors:
      Rarotonga: 21.5°S 160°W (not 160E);
      Maiana: 1°N 173°E (not 1S); and
      Bunaken: 1°30’N 124°50’E (not 3S 123E).
      [There may be other errors, but these stand out, being in the N or W hemispheres.]

      (Prior post with links is held up in moderation. Just chase the references down.)

      • kenfritsch
        Posted Aug 16, 2016 at 12:45 PM | Permalink

        HaroldW, I just saw your post on the incorrect Gergis proxy locations and I thank you for that information. It would not make much difference for my most recent post above, but the Rarotonga proxies and correlations with near neighbor stations in my previous post well could. I’ll go back and see what I get with the corrected locations. Thanks again.

    • kenfritsch
      Posted Aug 16, 2016 at 8:38 PM | Permalink

      Per HaroldW’s observation posted here, I want back to the NOAA paleoclimatology database to recheck all the coordinates of the Gergis 27 proxies and corrected the locations.

      The correlation results changed for Rarotonga, Rarotonga.3R and Bunaken and I have included those changes in the corrected table linked below. The overall numbers of significant correlations between proxies and the 10 station near neighbors for the series and detrended series did not change.

      • Posted Aug 17, 2016 at 7:56 AM | Permalink

        Thanks ken, appreciate the extra effort.
        I recall being surprised that the G12 authors didn’t catch those oversights in their Table 1, as the proxies which are outside of the defined region (0-50S,110-180E) are so prominent in Figure 1. But it was only typo; it would not have made any difference to the analysis, because correlations were only calculated between the proxy records and the regional average, not to any local series.

  61. John A
    Posted Aug 14, 2016 at 7:25 AM | Permalink

    The lag of +1 years assigned to 5 sites is very hard to interpret in physical terms. Such a lag requires that (for example) Mangawhera ring widths assigned to the summer of 1989-1990 correlate to temperatures of the following summer (1990-1991) – ring widths in effect acting as a predictor of next year’s temperature.


    Steve: As I observed in comments, while Gergis’ interpretation of such a lag (“seasonality”) was implausible, there is a possible rationale which needs to be examined. The assignment of a calendar year to a SH summer proxy is conventional. Tree ring measurement data is often assigned to the calendar year opening the austral summer, while the calibration to “annual” summer temperature might be to the Jan-Feb-Mar average of the calendar year closing the austral summer. So the tree ring chronology could be assigned to 1989 while the instrumental data for the same *summer* is assigned to 1990. So there’s a rationale for examining the lag, just not the one that Gergis provided.

    • Posted Aug 14, 2016 at 9:33 AM | Permalink

      Year assignment of proxy data is binary; in your example, SOND89-JF90 could be assigned to 1989 or 1990. But there is no ambiguity in assigning a year to the instrumental data, whichever decision the authors took. Hence even in cases where the proxy dating choice is unknown, there are only two lags which are reasonable to examine, not three.

      Steve: sometimes specialists believe that growth is responding to temperatures of the previous season – for actual reasons, not just data mining. There weren’t that many studies and the specialist authors were involved, so Gergis could have consulted the specialists ex ante, rather than data mining.

      • Posted Aug 15, 2016 at 7:20 AM | Permalink

        Steve, in the original post you mentioned the possibility of a response to prior year’s temperatures, citing Brookhouse et al.(2008). However, I believe this is a misreading. That paper reasoned that lower winter/spring temperatures prolong snow cover, inhibiting growth in the ensuing months. But that’s not a dependence of growth on the prior year’s temperature; using your example, growth during SOND89-JF90 season is correlated to winter/spring (JJA/SON) temperatures of 1989. While the JJA connection extends beyond the temperatures of the warm months (SONDJF) considered by Gergis2016, it doesn’t go as far as the previous year, which seems to be your justification for the third lag.

  62. kenfritsch
    Posted Aug 16, 2016 at 12:38 PM | Permalink

    I have been attempting to estimate the uncertainty of the Singular Spectrum Analysis (SSA) derived trend line for the standardized series of the Gergis 27 proxies due to the trend, sampling and measurement errors. I do not believe I have seen this attempted by any of those authors doing temperature reconstructions. The Gergis 27 proxies were standardized by subtracting the mean for the period common to all the proxies (1888-1990) for the individual proxies and subsequently dividing by the standard deviation.

    The trend error was estimated by determining the best fitting ar1 model for the SSA trend residuals of each series and using the ar1 coefficient and the standard deviation of the modeled residuals to do simulations that resulted in calculating 1000 realizations of composited SSA derived trend line of all 27 proxies. From those realizations the 2.5%, 50.0% and 97.5% probabilities were determined and used to construct a graph with the trend line of the composite and the 95% confidence intervals over the entire reconstruction period from 1000-2001. That graph is shown in the link below.

    The sampling error was determined using an adaption of the approach of P. Jones in the paper: “Estimating Sampling Errors in Large Scale Temperature Averages”

    That approach uses estimations of the average of the paired correlations of stations series(proxy locations and series) and the average standard deviation of the series of all the station series (proxy series) for the area of interest. That information is combined with number of stations (proxy locations) to calculate a standard error for sampling over the entire area. If the area is well covered by stations (proxy locations) the average paired correlation can used in the following equation:

    SE^2=Sibar^2*rbar(1-rbar)/(1+(n-1)*rbar where SE is the standard error for sampling the area, Sibar is the average of the standard deviations of the individual series and rbar is the average of the paired series correlations.

    If the location of the individual series does not cover the area well – as is the case of the Gergis proxies and the Australasia area – the Jones paper gives an alternative method for determining rbar from the equation:

    rbar=(x0/X)*(1-e^(-X/x0)) where x0 is the correlation decay length and X is the farthest distance for possible separation of series locations (the diagonal of a rectangular area). x0 is determined from a plot of the correlations versus the separation distance of proxy pairs and where the trend line reaches a correlation value of 0.368. Unfortunately for the Gergis proxy case there is very little dependency of proxy series paired correlations on distance and the decay length would actually be a negative distance of approximately 1000 km. The graph depicting that relationship is show in the link below.

    That leaves my only alternative for obtaining an estimation of the sampling error for the Gergis 27 proxies in that large bounded Australasia area (0S-50S and 180E-110E) to limiting the number of locations, n, where the stations are located in close proximity to one another or outside the bounded area. The value of n for this purpose is reduced by 3 for the 2 Vostok ice proxies and the Palmyra proxy for this purpose for being outside the boundary as well as by 2 for the 2 Rarotonga and 2 Fiji proxies which are collocated. While it is noted that the n will vary downward from a maximum n where all proxies have common dates with the year of the Gergis 27 composite, the maximum n is reduced from 27 to 22 by this first step. A second step is to assume collocation of the stations that are separated by less than one-half degree in latitude and longitude. That would include the pair Mt Read and Buckley’s Chance and the triple consisting of North Island LIBI Composite 1 and 2, and Takapari. The maximum number for n here goes to 19. The link below gives a table showing the locations of the proxies in the Australasia boundary and that my use of effective values of n is conservative given the clumping together of the proxy locations.

    The proxy composite trend line and the trend plus sampling error 95% CIs are shown in graphical form in the link below. I have not been able to do a proxy measurement error, but I do have some ideas on approaching that problem and when I feel I have a good handle on it I could add that error to sampling and trend error to obtain something perhaps close to a total uncertainty range for the trend lines. Currently I feel that the proxy measurement errors could be quite large with tree ring proxies – if one can consider the differences between cores from the same tree as a measurement error.

    In conclusion, even when the construction of a temperature reconstruction is incorrectly allowed to select proxies post facto, the uncertainties when properly included can obliterate any opportunity to show the modern warming period being revealed in the post facto selected proxy responses as unprecedented.

  63. kenfritsch
    Posted Sep 17, 2016 at 1:11 PM | Permalink

    I finally got the data for the Gergis 2016 paper (but not the paper) and some time to analyze it. I want to state off the top here that using post fact selection as was the case with Gergis 2016 is not a statistically proper approach, but that even with that biasing other (improper) selection methods and presentations of the results can paint a very different picture vis a vis the modern warming period.

    My analysis was based on the 51 proxies that Gergis (G51) used as a pool for post fact selection and on the smoothing that can be used for the Gergis 28 proxies (G28) composite that passed the authors post fact tortured selection process. There were 2 or 3 proxy for which I had to search the locations. I think I have those locations correct but I am not certain.

    In my approach for the G51 I used all the common data points for the years 1880-2001 for the standardized proxy series (centered and scaled) and the HadCRUT4 temperature series extracted from KNMI for the corresponding 5X5 grid where the proxy was located. On these data I did both correlations and trend differences for the proxy versus grid data. The confidence intervals (CIs) were adjusted for ar1 autocorrelation by ARMA modeling the linearly regressed residuals for ar1 and standard deviation of the ARMA residuals and using that model to determine the CIs with 1000 simulations for each proxy and corresponding grid series in the case of correlation and for the proxy minus grid difference series for the trend differences. The correlation results reported below in the table are for detrended series and the trend differences were derived by calculating the linearly regressed trend that resulted from the proxy minus HadCRUT4 grid series.

    I used HadCRUT4 because it is the latest HadCRUT temperature series available. The 5×5 grids were used to provide a reasonable amount of temperature data without missing data intervals from the common points with the proxy data. I used annual data because that is the temperature that should be of interest when looking at historical versus modern era differences. If there is not a very high correlation of seasonal temperatures to annual temperatures for the entire reconstruction period then the reconstruction becomes much less useful. I used a longer time period for lessening the uncertainties of the results. I added a trend difference measurement since while high frequency annual (or seasonal) correlations might indicate something about that correspondence it is really the trend or low frequency correspondence that we are most interested in when comparing modern to historic temperature changes.

    The results in the table show that a statistically significant correlation of detrended proxy to detrended grid data has little to no predictive power to those pairs having the same trends for these same common data points within the bounds of statistical significance. When using longer time periods for comparison of proxy/grid pairs it appears that more of the proxy response to grid temperature relationships are the reverse of what would be expected for that type of proxy. I put an asterisk by those relationships that show a significant correlation but with the wrong sign. From a close observation of the table it can be seen that there are other proxy/grid pair correlations and trends that show the wrong sign and had no statistical relationship. Tree rings, speleotherm, luminescent and ice core accumulations proxies should have a positive response to temperature while coral and ice core d18O proxies should have a negative response. I should also point out here that for the Vostok and Talos corresponding 5X5 grids I used the CW Infilled HadCRUT4 temperature series in order to get a reasonable number of common data points. I would think that if a scientist were doing serious work attempting to find valid proxies that could be used in developing a priori criteria for selecting proxies based on some hard physical basis that a more detailed look at those proxies that pass both the detrended and trend difference tests might be in order. That work though would have to remain as preliminary tests for finding a proper a prior criteria and not the basis for post fact selection.

    My final analysis involved the sensitivity of how the composite series of the G28 post fact selected proxies can appear depending on some simple changes to the smoothing function used. In the 4 graphs presented below I used the smooth spline function from R to smooth the G28 composite using df=7 and spar values of 1.0, 0.75, 0.50 and 0.25. It is rather obvious that the unprecedented view of the modern warming period can be greatly affected by choice of a smoothing parameter. I should note here that the proxy responses to temperature were adjusted given the expected orientation as noted above.

    • kenfritsch
      Posted Sep 18, 2016 at 10:54 AM | Permalink

      Here are the Gergis 2016 tortured post fact selected 28 proxies in a separate table with results from my correlation and trend relationship with the corresponding HadCRUT4 5×5 grid temperatures.

      • kenfritsch
        Posted Sep 22, 2016 at 10:22 AM | Permalink

        Again, I have made an analysis of the Gergis 28 proxies selected through the torturous and incorrect post fact selection process that shows that even with this biased process the results do not lead to the conclusions that the authors of Gergis 2016 offer.

        In the graphs below I show the relationship in scatter plots with a smooth spline best fit of the Gergis 28 proxy distance separation and series correlations and the same relationship for 55 Australasia temperature stations using GHCN adjusted mean temperatures. It is obvious that the relationship that is present in the station data breaks down for the proxies. While the breakdown points to the problem of using the proxy responses in the same sense of temperature responses at the stations, another problem encountered due to this lack of relationship of distance to series correlation is that arising from using that relationship to reduce the sampling error where the spatial coverage is sparse as it is with Gergis 2016.

        The Gergis 2016 temperature reconstruction, like other reconstructions as I recall, does not attempt to deal with the sampling and measurement error. I have made some simplistic estimates of these errors for Gergis 2016 and have found that the resulting confidence intervals for a standardized proxy composite become sufficiently large to make the reconstruction rather meaningless with regards to claims about an unprecedented warming for the modern era. I am wondering if any readers of these posts here can recall published temperature reconstructions that dealt in detail with measurement and sampling errors in constructing confidence intervals for their series. I also recall that using sampling and measurement errors for temperature series from temperature stations and ocean satellite, buoy and ship data is a rather recent development.

  64. AntonyIndia
    Posted Sep 21, 2016 at 10:30 PM | Permalink

    Now even the Guardian allowed an article about “bad science”; probably because it was in the science section, not in the environment department.
    It is based on a study by Paul E. Smaldino, Richard McElreath published in the “Royal Society Open Science”

7 Trackbacks

  1. By Gergis « Climate Audit on Jul 21, 2016 at 9:02 PM

    […] redirect to here […]

  2. […] some background on this issue, there's a good post up at Climate Audit about this paper, and I'm going to try not to rehash the points it covers. […]

  3. […] some background on this issue, there's a good post up at Climate Audit about this paper, and I'm going to try not to rehash the points it covers. […]

  4. […] Steve McIntyre debunks Gergis et al.  [link] […]

  5. […] Steve McIntyre debunks Gergis et al.  [link] […]

  6. […] […]

  7. […] Read the full post here: […]

%d bloggers like this: