David Letterman sometimes has a segment entitled “Stupid Pet Tricks”, which is an apt title for today’s post – more parsing of Mannian CPS, recently discussed here. With helpful contributions from Jean S, UC and Roman M, I can now pretty much replicate Mannian CPS, but only through a variety of devices that fall into the category of stupid pet tricks. I’m not sure which is stupider – the original pet trick or spending the time trying to figure out what’s going on. Probably the latter.
I realize that this isn’t the most interesting material in the world, especially compared to talking about U.S. politics, but this takes a lot of time to do and I find it useful to document these results while they’re fresh.
The first stupid pet trick is Mannian smoothing. With particular thanks here to Roman M, we now have an exact replication of Mann’s smoothing method in R. Mannian smoothing has been described in two different articles in GRL (see references below); the algorithm as used in Mann et al 2008 differs somewhat from either.
I suspect that we are the first people – including the reviewers of Mann’s two GRL articles – who have actually figured out what Mann’s smoothing method actually does. The real issue is whether the world needs Mannian smoothing – there are standard smoothing algorithms developed by statisticians (e.g. lowess).
And why GRL is publishing articles on smoothing methodology? In my opinion, if Mann (or me or anyone else) has something to contribute to smoothing literature, it should be submitted to a statistical journal where it can be properly reviewed. Review by a couple of climate scientists at GRL is worse than pointless as it gives faux authenticity to the method.
The tricky thing about Mannian smoothing is, as noted before, the use of butterworth filters. These may be meritorious in sound engineering where you are, say, trying to recover music which is governed by frequency. But these methods seem less apt when you are dealing with red noisy time series, not least because they have strong mean-reverting properties at their endpoints.
Because Mann doesn’t even bother centering the series, this leads to very odd endpoint behavior, which he tries to patch with endpoint padding of ever increasing length. The most recent aspect of this method winkled out by Roman M. is that Mann’s mean padding is the mean of the entire series, rather than the mean of the last M (say M=10) values, the method used in IPCC padding.
Here’s a plot comparing our emulation of Mannian smoothing for Dongge to a result extracted by UC using Mann’s source code. The validity is further confirmed by the ability to replicate other results, as shown below.
Counting more than once
The next stupid pet trick relates to double-counting and even quadruple-counting of certain proxies, a point already identified (the issue was first noted by Jean S, who observed the identity of a couple of Mannian gridcells.)
If the Mannian lat-longs for a proxy (which may well be in error) are on the border of a cell e.g. exactly 50E as with the (incorrect) Socotra location, Mann places the proxy in both adjacent cells. A proxy located precisely at a 4-corner is allocated to all 4 gridcells.
It took a while to figure out this stupid pet trick, even with the code. This got most gridcells right but there were further traps for the unwary.
The next stupid pet trick wasn’t easy to find. I eventually tracked the difference between the allocations that I was calculating and Mann’s results to gridcells where the latitude was at the north of a gridcell.
UC had recovered a Mann file showing the latitudes for gridcell centers that Mann had calculated – these ran from -88.5 to +86.5 (S to N), rather than the correct -87.5 to 87.5 (S to N). So a proxy located in the top 1 degree of a gridcell got assigned to the wrong gridcell.
This goof can be 100% confirmed in one of the Matlab runs that UC sent to me. Re-doing with (Mannian incorrect) gridcell centers, I was able to match the AD1000 selections exactly.
Smoothing over and over
The next stupid pet trick is that the proxy data are smoothed over and over again in all 19 steps. On each occasion (e.g. AD1000 which we’ve been studying), the proxy data are truncated to AD1000 and then padded at their beginning even if there are actual values.
Mannian smoothing is not especially efficient in the first place, so, aside from using padded values when known values are available, the procedure adds to the overheads. It would be perfectly feasible to smooth the master proxy data set once and select from this.
In any event, for a given step (e.g. 1000 here), Mann truncates the data to start at AD1000 and selects the subset of series with “passing” correlations. The smoothed series are all given short-segment standardization on 1850-1995 – note that this period is a little different than the 1855-1995 used in correlations – and then averaged. The average is then re-scaled to the mean and standard deviation of the instrumental gridcell over 1850-1995.
Here is the comparison of the emulation to the series (that UC extracted from Mann’s Matlab code) for the gridcell containing the Dongge O18 series. A perfect emulation.
You’ll notice that, unlike the Socotra O18 speleothem series, the Dongge O18 speleothem series has been flipped over, a point that I’ll return to in a separate post. (Gavin Schmidt excoriated Loehle for using Mangini’s O18 speleothem series where the O18 series had been flipped over, but, for some reason, failed to criticize Mann for flipping over the Dongge O18 series.)
This example only had one contributing proxy, but the emulation in a gridcell with multiple proxies is also accurate, as shown below for the gridcell containing the 4 Tiljander proxies (which Mann inverted from the usage of the original author, who warned against non-climate anthropogenic effects in the latter part of the series.)
After making these gridded series, Mann then re-grids all series N of 30N into 15×15 gridcells, weighting the two gridcells by their cos latitude. This yields 15 regrid-regridcells with contributions.
I’ve compared all 15 to versions extracted by UC and the emulations are mostly exact, with the “worst” emulation being the one below combining the Tiljander and Tornetrask gridcells, affected only by very slight rounding.
While I’m thinking about it, another stupid pet trick are the huge Matlab files that Mann created in the CPS program. The gridded series for the 9 steps from AD200 to AD1000 were over 250 MB in size. In the AD1000 step, 2567 out of 2592 columns were empty. In addition, all values prior to AD1000 were empty (as used).
Adding to the absurdity, Mann created a huge 3-dimensional matrix of these series. I guess they simply buy more and more powerful work stations to perform stupid pet tricks.
If you simply kept the portion of each portion that’s used, naming each column to keep track of it, and keep the information from each step in a list rather than a 3-D matrix, the same information took about 1MB in R. So the handling of the information became effortless rather than a strain. This absurd data handling was repeated time after time.
I’ll finish off this post through to the NH reconstruction in a little while.
Mann, M. E. 2004. On smoothing potentially non-stationary climate time series. Geophys. Res. Lett 31: 710713. —. 2008. Smoothing of climate time series revisited. Geophys. Res. Lett 35. http://holocene.meteo.psu.edu/shared/articles/MannGRL08.pdf
Soon, W. H. S., D. R. Legates, and S. L. Baliunas. 2004. Estimation and representation of long-term (>>> 40 year) trends of Northern-Hemisphere-gridded surface temperature: A note of caution. Geophys. Res. Lett 31: L03209. http://www.cfa.harvard.edu/~wsoon/myownPapers-d/SLB-GRL04-NHtempTrend.pdf