How'd They Do That?

Take a look at today’s puzzle.

On the left, I’ve plotted the three EIV reconstructions for the NH hemisphere using the infilled CRU series as a “target”. The data is straight from Mann’s website. One shows the “full” global network, one shows the “full” NH network (both of which I take to really be series selected as discussed in my Full Network post, but it’s always hard to be sure) and the third one is the “screened” NH network. As I understand it, SH proxies are used in the “GL full” network to teleconnect with NH reconstruction and this is the difference with the NH “full”, but again I’m not 100% sure.

The right panel shows the differences between the series, showing the differences in a cycle.

Take a careful look at both panels because lots of questions jump out at me.

First, all three reconstructions go to 2006 even though there aren’t any proxies that go to 2006. How’d they do that? (Of course, we can’t tell by consulting the EIV code, because, contrary to the representaton by PNAS, working EIV code isn’t provided.) Did they splice the instrumental record? Could be. It’s hard to tell.

Second, all three reconstructions are identical in the calibration period. How’d they do that? Is this from splicing or overfitting?

Third, look at the the year 600 in the right hand panel. The black and red series essentially flip orientation. How’d they do that? It looks like there’s a reconstruction step and the regression sign for a series flips over when the composition of the network changes. Whatever it is, it’s not a good sign,

Fourth, what about the green series in the left hand panel. What’s going on in the early portion? How’d they do that?

Just to show that I’m not making this up. Here’s a graphic from the SI in which I think that I discern some of these patterns:

Stranger and stranger.

Off-center Butterworth Filters

Today I’m doing a first reconnaissance of Butterworth filters, used extensively in Mann et al 2008.

The comments here are notes and thoughts and I hope that some issues can be clarified by readers. As of 12.30 pm today, I think that that the main problem that I’ve had pertains to a difference between R and Matlab implementations, but it illustrates an interesting effect.

We’ve had discussions of smoothing from time to time. Just before the release of Mann et al 2008, UC posted an interesting comment on Mannian smoothing here based on an earlier smoothing paper. Within 24 hours, Mann changed his code for lowpassadaptive.m, noting (but not describing) the change here, “minor errors fixed 9/2/08”, failing to cite or acknowledge Climate Audit as a source, although Penn State policies appear to require students to properly acknowledge sources under similar circumstances. Consult the interesting comments by Roman M, UC and others.

This thread was written just before the release of Mann et al 2008. UC observed shortly after its release that lowpass.min is used everywhere in Mann et al 2008, something that I’m now trying to come to grips with. My take on the earlier thread was that the issues mostly seemed to involve padding of future values and their potential impacts. This inter-relates to today’s topic.

I ran into serious problems trying to replicate Mann’s smoothing operations in R. Here’s the Dongge series where I encountered problems with Butterworth filters applied to uncentered data (though it’s taken a while to diagnose the problem.) What I observed was the huge displacement at the end of the series (which seems to be “zero-seeking” behavior at the end point.)

If you are so inclined, you can download Mann’s version of this series as follows:

url=”http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07/data/proxy/allproxy1209/dongge.ppd”
test=read.table(url,colClasses=”numeric”,skip=3)
dongge=ts(test[,2],start=test[1,1],end=1996);x=dongge
temp=!is.na(x);sum(temp);year=c(time(x))[temp]
x=ts(x[temp],start=min(year[temp])) ; N=length(x); N #2003

The figure below shows the results of simple Gaussian filter and a 10-point Butterworth filter with frequency 0.05, padding according to lowpassmin.m (as shown in the SI) and selection of padding alternatives according to Mann’s procedures. In this example, padding was not an issue, as all the padding alternatives led to negligibly different results. The problem lay entirely with the implementation of the Butterworth filter, perhaps with the filtfilt algorithm. (The results are actually worse than indicated on this graphic as the overshoot on the right goes FAR off the chart.)

Here’s the code for my implementation of the Butterworth filter (using the signal package in r), which has an implementation of filtfilt (which I presume to do the same thing as the Matlab filtfilt, but this may be an issue.)

library(signal)
mannsmooth =dongge
bf=butter(10,.05,”low”) ;bf
M=10
mannextend=array(rep(x,9),dim=c(N,9))
dummy1=cbind(rep(mean(x[1:M]),M),x[(M+1):2],2*x[1]-x[(M+1):2]); dummy1=cbind(dummy1,dummy1,dummy1)
dummy2=array(c(rep(mean(x[(N-M+1):N]), 3*M),rep(x[(N-1):(N-M)],3),rep(2*x[N]-x[(N-1):(N-M)],3)),dim=c(M,9));
mannextend=rbind(dummy1,mannextend,dummy2);dim(mannextend) #2023 9
#extend beginning and end with 3 padding schemes
working=apply(mannextend,2, function(x) filtfilt(bf,x) )
test=ts(working[(M+1):(N+M),],start=tsp(x)[1])
fred=apply(test-x,2,function(x) sum(x^2))
index=min((1:9)[fred==min(fred)]);index
mannsmooth[temp]=working[(M+1):(N+M),index]

[Note -for the subsequent graphic below, I’ve centered before filtering and then added back the mean,]

I’ve corresponded with UC and Jean S about this over the last couple of days. It seems that using Butterworth filters (and filtfilt) in R on off-center series seems to introduce a problem that is unrelated to end-point padding. UC observes:

This might cause the difference with Matlab. filtfilt.m sets the initial state + does minimum roughness, and centering doesn’t matter. If zeros are padded centering probably matters .

I’m pretty sure that this is the difference.

Sorting this out has not been entirely easy. In an earlier post, UC noted that the R-manual says:

“In this version, we zero-pad the end of the signal to give the reverse filter time to ramp up to the level at the end of the signal.”

and in another comment on the prior thread, UC observed that thefiltfilt”> Matlab manual stated:

filtfilt reduces filter startup transients by carefully choosing initial conditions, and by prepending onto the input sequence a short, reflected piece of the input sequence. For best results, make sure the sequence you are filtering has length at least three times the filter order and tapers to zero on both edges.

[2 pm Eastern: Dealing here only with the zero-tapering recommendation, UC specifically reviewed this question and emailed me that his view is that the zero-tapering part of the recommendation doesn’t matter given the Matlab implementation of filtfilt and he doesn’t understand why it’s in the manual. Arggh.]

There are some odd comments about filtfilt in the manuals. The manual for the R-package “signal” says of the implementation of filtfilt:

This corrects for phase distortion introduced by a one-pass filter, though it does square the magnitude response in the process. That’s the theory at least. In practice the phase correction is not perfect, and magnitude response is distorted, particularly in the stop band.

In response to an inquiry to Matlab, Matlab (worryingly) stated to a CA reader:

The filtfilt function in Signal Processing Toolbox does not implement the initial condition proposed in Gustafsson paper. However, in his paper, Gustafsson pointed out that we used to have a bug in the initial condition we implemented in filtfilt. Therefore, we put Gustafsson’s paper as a reference to the function.

Padding was raised as a problem in the previous thread. UC observed to me today that the padding in lowpassmin as used in Mann et al 2008 is 50% less than the padding used in lowpass.m as reported in Mann’s 2008 smoothing paper ! THe padding code for lowpass.m in the reconstruction paper is:

fn=frequency*2;
npad=1/fn;

while the corresponding padding in the smoothing paper lowpass.m is three times larger:

fn=frequency*2;
% pad far enough to remove any influence of boundaries of padded series on the interior
npad=3*round(1/fn);

This would be 30 years of padding if the code in Mann et al (GRL 2008) were used, rather than 10 years of padding for the code in Mann et al (PNAS 2008). One would really hope for a more stable software platform.

There are some disquieting comments about both Butterworth filters and filtfilt in technical literature and manuals, though I’m not sure that these comments are relevant to the present situation. One caveat here says:

This filter is usually used in frequency domain work. It offers great slope and phase match, but adds lots of overshoot/preshoot errors to a transient or single shot event. A Butterworth filter is good for frequency domain analysis, but bad for time domain data acquisition.

Despite all these caveats, by centering the data, the problem seems to disappear as shown below. Here I first centered the series, then butterworthed it (there was negligible difference between 10 and 30 years of padding) then subtracted the mean to regain native centering. Whatever the theoretical concerns over padding and filtfilt, this smooth looks OK for practical purposes.

Does any of this matter? Dunno. I’m not really sure why this smoothing is done before standardization. I guess the moral will be to try the CPS recon without using Mannian smoothing and see if it makes any difference. (Once this bottleneck is removed, we’ll be pretty close to being able to emulate the CPS recon, though the EIV procedure still is quite distant.)

Roman:

Now when filtfilt.m receives this padded result, it adds further padding at each end as well. In the case used by Mann, Butterworth (10, .025), it pads each end with 30 more values which are both reflected and flipped about the ends of the already padded sequence.

The "Full" Network

The “full” network isn’t.

Mann et al 2008 describe a “full” network consisting of 1209 proxies:

We made use of a multiple proxy (‘‘multiproxy’) database consisting of a diverse (1,209) set of annually (1,158) and decadally (51) resolved proxy series … Reconstructions were performed based on both the ‘‘full’ proxy data network and on a ‘‘screened’ network (Table S1)

The locations of the “full” network are illustrated in their Figure 1, captioned “spatial distribution of full (A) and screened (B)” network. Their Figure S7 compares “long-term CPS NH land (a) and EIV NH land plus ocean (full global proxy network) (b) with and without using tree-ring data”. Their Figure S8 compares “long-term CPS NH land (a) and EIV NH land plus ocean (b) reconstructions (full global proxy network)”. Their Figure S11 shows “plots of the NH CPS Land reconstruction (A) and EIV Land plus ocean (full global proxy network)”.

Table S1 shows breakdown counts by dendro and non-dendro, by annual-resolved and total, by NH, SH and Global. The top portion of Table S1 is excerpted here:

The problem is that their actual calculations appear almost certainly not to have used the network with the count statistics shown in Table S1, but a truncated version of that network, with their “full” network using only 663 (560 – NH) of the 1209 sites (1026- NH). Jean S has run the gridproxy.m program using the FULL option and so we have been able to “prove” this for this part of the program (CPS) through an actual run of the program. (I’m not sure that the EIV portion of the program can be run with present code.)

First here is a table showing counts that compare to the top right counts (NH all) for Table S1. The column headed Matlab shows the counts obtained from the gridproxy.m program, while the column headed “Reported” shows the corresponding column from Table S1. I’ll explain the calculation of the other 3 columns below (which I calculated from first principles. For now, you can see that my NH total in the third column matches the counts from gridproxy.m exactly in nearly every case and, in no case, differs by more than 1.)

NH Dendro

NH Other

NH Total

Matlab

Reported

421

139

560

560

1036

297

129

426

427

700

162

124

286

287

408

90

120

210

211

277

62

39

101

102

151

28

34

62

62

102

19

30

49

49

77

11

30

41

41

63

6

24

30

30

46

4

21

25

25

40

4

20

24

24

37

4

20

24

24

34

4

19

23

23

33

4

18

22

22

28

4

18

22

22

28

3

18

21

21

27

2

17

19

19

24

An Unreported Screening Procedure on the Full Network
The difference appears to result from an unreported screening procedure on the “full” network.

Mann’s SI does describe a screening procedure on his “screened” network:

Screening Procedure. To pass screening, a series was required to exhibit a statistically significant (P 0.10) correlation with either one of the two closest instrumental surface temperature grid points over the calibration interval (although see discussion below about the influence of temporal autocorrelation, which reduces the effective criterion to roughly P 0.13 in the case of correlations at the annual time scales)

The implementation of a procedure along these lines can be discerned in gridproxy.m in the case (confusingly) entitled “WHOLE”, but which means screened. At one stage of the program, correlation hurdles are defined for various proxy classes (using the proxy class codes to set the hurdle.) “Annual” and “decadally” resolved proxies have different benchmarks. Documentary and speleothems (5000,6000 series) have somewhat different benchmarks (corals – 7000 check for negative correlation). The relevant lines are:

case ‘WHOLE’
ia=4; %% annually resolved
ib=5; %% decadally resolved
corra=0.106; %% 9000,8000,7500,4000,3000,2000
corra2=-0.106; %% 7000
corra3=0.136; %% 6000, 5000
corrb=0.337; %% same as above +1
corrb2=-0.337;
corrb3=0.425;

Later in the program, they test the observed correlation for each proxy against the relevant benchmark (their code is very inelegant; they speak Matlab with a heavy Fortran accent). For the “screened” network, corra is 0.106.

for i=1:m1-1 % This is for searching annually-resolved proxies
if (z(3,i)==9000 | z(3,i)==8000 | z(3,i)==7500 | z(3,i)==4000 | z(3,i)==3000 | z(3,i)==2000) &…
x(kk,i+1)>-99999 & x(kkk,i+1)>-99999 &…
z(1,i)>=ilon1 & z(1,i)=ilat1 & z(2,i)=corra
n=n+1;

end

For corals and speleothems, they do it a little differently, testing the absolute value of the correlation:

… z(1,i)>=ilon1 & z(1,i)=ilat1 & z(2,i)=corra3

Now watch what happens in the “FULL” case. Here’s how they set their hurdles. Can you see what happens?

case ‘FULL_’
ia=4;
ib=5;
corra=0;
corra2=0;
corra3=0;
corrb=0;
corrb2=0;
corrb3=0;

If a series has a negative correlation to temperature (as about half of the tree ring series do), the test shown above will lead to the exclusion of this series. So this test either intentionally or unintentionally eliminates all the series with negative correlations to gridcell temperature – thus the reduction in NH count from 1036 to 560. Jean S got this count from running gridproxy.m and I got this number by independently comparing reported correlations to a benchmark of 0. So while it’s not impossible that I’ve misinterpreted something here, until we see some alternate explanation, I think that this interpretation should be regarded as valid.

Now this creates problems – both with the statistical significance of what was reported and with the accuracy of the methodological description in the PNAS article, both familiar themes in this corpus.

We’ve discussed on many occasions that you can “get” a HS merely from picking upward-trending series from networks of red noise (David Stockwell had a good note on this phenomenon on his blog a couple of years ago. My first experiments of this type were on the cherry picks in the original Jacoby network.) Since gridcell temperature series from the mid-19th century to late 20th century by and large have an upward trend, this test is equivalent to a simple pick operation. Whatever the merits of effectively screening the data set for upward trending data, this affects statistical benchmarks, as the null is applying the same sort of operation to red noise networks.

Secondly, whatever one may think of the Mannian PC algorithm (and defenders are reduced to a pretty hard core right now), few people defend the non-reporting of their modification of the PC algorithm and their failure to assess the statistical significance and properties of these changes. (Quite aside from whether they can “get” a HS using some other method.) We’re into something pretty similar here. Again there seems to be an unreported screening operation on the “full” network. The value of archiving code is once again demonstrated as the unreported operation can only be discerned through parsing code. Because Mann et al commendably archived code concurrent with publication, it hasn’t taken years to identify the issue. (I note that the precise identification of the problem with Mannian PCs also was identified when a little bit of code was inadvertently left in a directory identified for the public after MM03.) So code really does help pin down problems.

Code and realcode

Mann et al (PNAS 2008) states:

All data used in this study and Matlab source codes for implementing the algorithms described, as well as additional supporting information are available at http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07 .

They did make a much better effort than others in the field, but unfortunately the reality falls short of the PNAS representation. They have already coopered up some defects in the original documentation in response to observations made at Climate Audit e.g. by adding “original” data to the archive in early September (notably failing to acknowledge us, presumably following the example set by industry leader, James Hansen).

One of the sections of code that was added post-publication was the directory http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07/code/codeprepdata/infillproxydata/ , which contained code for the infilling of proxy data. There’s something interesting about the code in this directory.

First take a look at the programs in the directory http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07/code/coderecon/ supposedly containing code to do the EIV reconstruction. This contains 14 programs:

center.m gcvfctn.m gcvridge.m iridge.m mridge.m nancov.m nanmean.m nanstd.m nansum.m newregem.m peigs.m pttls.m regrechigh.m regreclow.m

Now return to the directory http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07/code/codeprepdata/infillproxydata/ which supposedly containing code to do proxy infilling. It contains 24 programs, of which 14 program names are identical to the names in the recon code directory. In a quick spot check, it appears that it is not just the names of the programs that are identical, but the programs themselves are identical, even up to the file names of output.

The recon code directory lacks a readme and lacks any programs comparable to the above 3 programs which actually carry the water in the infillproxydata operations. My partial error here. As observed by a reader below, the recon code directory does contain a readme which states the following:

This directory contains all the codes used for reconstuction. Users need to run regrechigh.m for the reconstruction of high-frequency band, and run regreclow.m for the reconstruction of low-frequency band. The other codes are subroutines that have brief description of the function of the codes.

The infillproxydata code directory includes a Readme, which states that the following 3 programs need to be run in order to execute the operations:

1) trimdata1227nolu.m 2)masterinfillproxy.m 3)backsiginfilled1227nolu.m

Given that the programs regrechigh.m and regreclow.m in the recon code directory are identical to the corresponding programs in the infilledproxydata directory, it seems odd that the same program can be used in both places in an identical way and that covering programs are needed in the one case, but not the other. Thoughts on this welcomed.

Proxy Counts: Today’s Crossword Puzzle
Table S1 contains a table showing the counts of proxies starting before specified date. There are 12 such tables. I’ve excerpted the tables at the bottom of the first page of the SI showing the counts of proxies (GLB), the 3 columns on the left show “annually” resolved proxies, the 3 columns on the right “all” proxies. The classification into “annual” is not provided anywhere and appears to have some traps for the unwary.

I tried to replicate the counts shown in the 3 columns to the left (code shown in a comment below). My counting method replicated the top line counts of the 3 right-hand columns.

The “dendro” count (!032) equals the total of code -9000 and code -7500 proxies. In passing, this means that there is another Mannian trap for the unwary: some dendro series are classified as non-dendro – the ones with code 3000. These are two of Cook’s Tasmanian dendro series, the Jacoby-D’Arrigo Mongolian series and the Taymyr series and are all long series (through the MWP). One feels that this detracts somewhat from whatever point was being made in the “non-dendro” reconstructions. Luterbacher (instrumental in their later portion) are included among the “other” proxies, but are really sui generis.

However, as shown below, I was unable to replicate the counts for any earlier period. I place this before you as today’s crossword puzzle. Given that observed counts exceed Mann counts in most periods, the discrepancy is a real braintwister.

1032

177

1209

926

154

1080

553

145

698

317

139

456

197

58

255

117

52

169

90

47

137

73

45

118

58

37

95

43

34

77

31

31

62

27

30

57

24

29

53

19

28

47

17

25

42

11

25

36

10

24

34

10

24

34

9

23

32

More on the MXD Data Set

On Sep 9, 2008, I sent an FOI request to the University of East Anglia, requesting a copy of the MXD data set as provided to Mann et al. Today (Oct 2, 2008), I was notified that they would provide this data and, sure enough the data is now posted (Oct 2, 2008) at http://www.cru.uea.ac.uk/~timo/datapages/mxdtrw.htm under the heading Rutherford et al 2008.

There are some puzzles.

The website reports the use of 341 sites, in Rutherford et al 2005, while the text of Rutherford et al reports the use of 387 sites. So one or the other is incorrect. An earlier article (Briffa et al, Holocene, 2002a) used 387 sites, listed here. I presume that the website is correct and the article is wrong and that a small corrigendum on this matter should be issued. All 341 sites said to have been used in Rutherford et al 2005 are included in the list of 387. Why were 46 sites removed from the network? Surely some sort of explanation should be provided.

The website (as of Oct 2, 2008) states:

The values after 1960 are a combination of information from high-frequency MXD variations and low-frequency instrumental temperature variations. We recommend, therefore, that the post-1960 values be deleted or ignored in any analysis that might be biased by the inclusion of this observed temperature information, such as the calibration of these data to form a climate reconstruction, or comparision of these data with instrumental climate observations for the purpose of assessing the ability of these data to represent temperature variability.

This is also a very puzzling comment as Rutherford et al 2005 nowhere mentions the blending of instrumental temperature variations back into proxy data after 1960. And if this was done, it is rather troubling. The explanation in Rutherford et al 2005 (Briffa, Osborn being coauthors with the Mann crowd) said:

Because the age-banding method requires large numbers of samples throughout the time period being studied, it has been applied only at a regional scale for the MXD network used here, rather than at the level of the 387 original site chronologies.

OSB therefore worked first with the traditionally standardized data at the individual chronology scale and gridded them to provide values in 115 5° by 5° grid boxes (26 available back to A.D. 1400) in the extratropical NH (Fig. 1b). They then developed temperature reconstructions by the local calibration of the MXD grid-box data against the corresponding instrumental grid-box temperatures.

The “missing” low-frequency temperature variability was then identified as the difference between the 30-yr smoothed regional reconstructions of Briffa et al. (2001) and the corresponding 30-yr smoothed regional averages of the gridded reconstructions. OSB add this missing low-frequency variability to each grid box in a region. After roughly 1960, the trends in the MXD data deviate from those of the collocated instrumental grid-box SAT data for reasons that are not yet understood (Briffa et al. 1998b, 2003; Vaganov et al. 1999). To circumvent this complication, we use only the pre-1960 instrumental record for calibration/cross validation of this dataset in the CFR experiments.

As I read the above, it describes a sort of coercion of the individual gridded series at “low frequency” to the corresponding “low frequency” shape of the regional ABD data (which is available at WDCP by the way). However it’s hard to be sure right now.

Here is a plot of average of the Briffa-Osborn gridded data, with dotted red line showing the part deleted in Mann et al 2008 (where Osborn and Briffa are coauthors).

Note that Briffa and Osborn also archived today various data used in IPCC AR4 graphics – previously unavailable in these versions.

It’s Saturday Night Live

When you’re trading in puts and calls (or derivatives), it’s important to know the sign of the relationship between the value of the derivative and the market. Short positions will go up in value as the market goes down. And, unfortunately, you don’t get to decide afterwards whether you wanted to be short or long. Proxies in climate can, in a sense, either be “short” or “long”, in the sense that the values of some proxies (e.g. coral dO18) are said to go down with higher temperatures, while the values of other proxies (e.g. ice core dO18) are said to go up with higher temperatures.

One feels that it is not asking too much of paleoclimatologists to know the expected sign of a proxy derivative. Traders would like to decide on the sign of a proxy derivative after the fact, by taking a correlation to market performance, but this luxury is denied to them, as it should be denied to climate scientists.

In Mann et al 2008, there is a truly remarkable example of opportunistic after-the-fact sign selection, which, in addition, beautifully illustrates the concept of spurious regression, a concept that seems to baffle signal mining paleoclimatologists. For this example, we turn to the highly HS-shaped Finnish sediment series of Tiljander et al 2003. (Update: We reported this in a PNAS Comment, to which Mann responded. See continued discussion at Upside Down Mann and the Peer Reviewed Literature.”)

Tiljander et al cored varved sediments from Lake Korttajarvi, Finland, going back through most of the Holocene. In Tiljander et al 2003, they distinguished the amount of mineral and organic matter in each varve. The basis for using mineral and organic matter as climate proxies is set out as follows:

The amounts of inorganic and organic matter, form the basis of the climate interpretations. Periods rich in organic matter indicate favourable climate conditions, when less snow accumulates in winter by diminished precipitation and/or increased thawing, causing weaker spring flow and formation of a thin mineral layer. In addition, a long growing season thickens the organic matter. More severe climate conditions occur with higher winter precipitation, a longer cold period and rapid melting at spring, shown as thicker mineral matter within a varve.

The caption to their Figure 5 reports the following link between X-ray density and their climate mechanism:

High X-ray density corresponds to high amount of mineral matter (light grey value tints in X-ray film) and low X-ray density corresponds to dark grey values caused by a higher proportion of organic matter.

Putting the two paragraphs together: warmer climate favors more organic material and thus a low X-ray density. In order to show warm values at the top of a graph, you need to invert the plot (i.e. you have to pay attention to the sign of your climate derivative.)

In the figure below, on the left, I show an excerpt from their Figure 5 which they show vertically (only the X-ray density is shown here – consult the original paper for the other plots.) The left portion of their Figure 5 shows an organic-rich period in the MWP, about which they say:

An organic rich period from AD 980 to 1250 in the Lake Korttajarvi record is chronologically comparable with the well-known ‘Medieval Warm Period’ (e.g. Lamb 1965; Grove & Switsur 1994; Broecker 2001). The sediment structure changes, less mineral material accumulates on the lake bottom than at any other time in the 3000 years sequence analysed and the sediment is quite organic rich (LOI ~20%). Thus, the winter snow cover must have been negligible, if it existed at all, and spring floods must have been of considerably lower magnitude than during the instrumental period (since AD 1881). According to the scenarios presented by Solantie & Drebs (2001), a 2°C increase in winter temperature would decrease the amount of snow in southern Finland significantly. Under such conditions, winter snow accumulation and intense spring floods would be rare events….

The Lake Korttajarvi record also indicates a climatically more severe period in the 17th century. Two periods, AD 1580–1630 and AD 1650–1710, are marked by an increase in both sedimentation (varve thickness) and mineral matter accumulation (relative Xray density). Also, magnetic susceptibility values are high between AD 1650 and 1710, indicating increasing mineral matter input into the lake.

They cite literature, including Hulden 2001, showing mild conditions in Finland in the MWP.

On the right, I’ve plotted the corresponding data so that “warm” grey values are on the top. I’ve also highlighted the (MWP) period identified as having elevated values of organic matter. If you squint, you can satisfy yourself that the left-hand and right-hand panels are showing the same data.

   

Fig. 1. Left from Tiljander et al 2003 Figure 5; right – plot of X-ray density (inverted).

Plotted according to the climatic interpretation offered by Tjilander et al, the modern warm period shows as colder than the Little Ice Age, something which makes no sense if this data is to be used as a climate proxy. Tiljander et al provide a plausible interpretation of the “divergence” of the proxy from its climatic interpretation as a result of agricultural and construction disturbance to sediment patterns, actually tying several especially thick varves to ditch and bridge construction:

This recent increase in thickness is due to the clay-rich varves caused by intensive cultivation in the late 20th century. …

There are two exceptionally thick clay-silt layers caused by man. The thick layer of AD 1930 resulted from peat ditching and forest clearance (information from a local farmer in 1999) and the thick layer of AD 1967 originated due to the rebuilding of the bridge in the vicinity of the lake’s southern corner (information from the Finnish Road Administration).

Now let’s see what Mann et al did with this data. All of the 20th century values of varve thickness, X-ray density etc go up like crazy with the agricultural and construction activities as shown below for 2 of the 4 series (the other two are similar). Instead of using the climatic interpretation of the data described by Tiljander et al, Mann correlates the increases in varve thickness and changes in density and color, originating from local construction and farming, to world climate.

   

Figure 2. Two of 4 versions used in Mann et al 2008

By flipping the data opposite to the interpretation of Tiljander et al, Mann shows the Little Ice Age in Finland as being warmer than the MWP, 100% opposite to the interpretation of the authors and the paleoclimate evidence. The flipping is done because the increase in varve thickness due to construction and agricultural activities is interpreted by Mann et al as a “nonlocal statistical relationship” or “teleconnection” to world climate. Mann:

the EIV approach, which makes use of nonlocal statistical relationships, allowing temperature changes over distant regions to be effectively represented through their covariance with climatic changes recorded by the network

A more convincing example of spurious regression in “peer reviewed” literature will be hard to find. After reading through this, I keep expecting someone to say:

Live from New York, it’s Saturday Night.

H/t to Howard Wiseman:

Reference:
TILJANDER, MIA, M. SAARNISTO, AEK OJALA, and T. SAARINEN. 2003. A 3000-year palaeoenvironmental record from annually laminated sediment of Lake Korttajarvi, central Finland. Boreas 32, no. 4: 566-577.

Mann et al 2008 and GAAP Accounting

Let’s say that you were a partner of the firm North, Hegerl and Cicerone and charged with issuing an opinion on the financial statements of Team Capital Management Inc.(TCM) And let’s say that you were doing so in heady pre-crash days when markets were going up and mark-to-market accounting was something that the companies wanted to do.

The footnotes to the TCM statements said that mark-to-market accounting had not been used for non-arms-length investments in securities issued by Briffa MXD Inc. which had shown a lack of “sensitivity” to rising world stock market prices, while mark-to-market accounting had been used for a high-flying Finnish penny stock (Kortajarvi Lake Gold). And while TCM statements did not mention their holdings in controversial Bristlecone Estates, a project that was much in the news, it turned out that mark-to-market accounting had been used on this speculation as well.

In the MD&A (Management Discussion and Analysis), the company said that they were profitable even without their investments in U.S. mortgages. Elsewhere in the footnotes, they said that TCM was profitable even without mark-to-market accounting in the Finnish penny gold stock. However, the combined effect wasn’t discussed.

Can the partners in North, Hegerl and Cicerone certify that these statements meet GAAP? Of course they can’t.

If you’re using mark-to-market accounting on the penny gold stock that went up, then you have to use mark-to-market accounting on the shares in Briffa MXD Inc that went down. While disclosure is important, you can’t disclose your away around this sort of inconsistent non-GAAP accounting. You can’t use mark-to-market accounting on stocks that went up and not do so for stocks that went down.

In case this parable seems too harsh, here are the exact statements from Mann et al 2008. Obviously the precise issue of “mark-to-market” accounting doesn’t arise: I use this as a parable for inconsistent accounting for data that goes up as opposed to data that goes down,

Accounting for MXD
Mann et al say of their handling of the Briffa MXD series (which “diverge” from rising world markets):

Because of the evidence for loss of temperature sensitivity after ~1960, MXD data were eliminated for the post-1960 interval.

In 1998, Briffa postulated an unknown anthropogenic cause for the post-1960 downturn, but nothing has turned up 10 years later. An open alternative is that there is a non-linear upside-down U-shaped temperature response, an alternative referred to even in IPCC AR4 as follows:

Others, however, argue for a breakdown in the assumed linear tree growth response to continued warming, invoking a possible threshold exceedance beyond which moisture stress now limits further growth (D’Arrigo et al., 2004). If true, this would imply a similar limit on the potential to reconstruct possible warm periods in earlier times at such sites. At this time there is no consensus on these issues (for further references see NRC, 2006)

An opinion that you’d think that the auditing firm of North, Hegerl and Cicerone would be aware of.

In respect to the accounting of Mann et al, they truncated this data merely because of a potential non-climate anthropogenic impact on the data, an impact which is postulated, but not proven.


Accounting for Finnish Speculative Stocks

The Mann et al 2008 footnotes say of the speculative Finnish sediments:

These records include the four Tijander et al. (12) series used (see Fig. S9) for which the original authors note that human effects over the past few centuries unrelated to climate might impact records (the original paper states ‘‘Natural variability in the sediment record was disrupted by increased human impact in the catchment area at A.D. 1720.’ and later, ‘‘In the case of Lake Korttajarvi it is a demanding task to calibrate the physical varve data we have collected against meteorological data, because human impacts have distorted the natural signal to varying extents’)…

We therefore … compaired [sic] the reconstructions both with and without the above seven potentially problematic series, as shown in Fig. S8.

Unlike their handling of Briffa MXD data which was truncated merely because of a potential non-climate anthropogenic impact (without reporting on the impact of this truncation), Mann et al included the Finnish sediment (which went up), purporting to justify this by arguing that they were still profitable without the data. A policy which, regardless of its individual merit or lack of merit, is inconsistent with the handling of the Briffa MXD data which went down.

It’s actually worse here, because the original authors do not merely say that human effects unrelated to climate “might” impact the record. They categorically say that they were so caused. No “might” about it. For example:

This recent increase in thickness is due to the clay-rich varves caused by intensive cultivation in the late 20th century. …

There are two exceptionally thick clay-silt layers caused by man. The thick layer of AD 1930 resulted from peat ditching and forest clearance (information from a local farmer in 1999) and the thick layer of AD 1967 originated due to the rebuilding of the bridge in the vicinity of the lake’s southern corner (information from the Finnish Road Administration).

There is another shoe to drop in the handling of the Finnish sediments, which I will discuss in my next post. It appears to me that Mann et al actually have a short position on the Finnish speculative stocks, which they have inadvertently accounted for as a long position. But more on this in another post.

Accounting for Subprime Mortgages on Bristlecone Estates
A consulting report by North and associates, also sponsored by Cicerone, said that subprime mortgages on Bristlecone Estates, in particular, “strip bark” derivatives, should be “avoided”. However, Mann et al 2008 nowhere mention that strip bark derivatives remain on their books.

GAAP Accounting
It’s one thing to object individually to (1) the truncation of post-1960 Briffa MXD data due to a potential anthropogenic disturbance (which may actually be a non-linear response); (2) the use of the 20th century Graybill bristlecone chronologies despite (a) potential non-climate anthropogenic fertilization or (b) recommendations by the NAS panel not to use strip bark derivatives; (3) the use of 20th century Finnish sediment data despite an established non-climate anthropogenic disturbance.

However, the combination of the three is much worse than any one of these individually. The inconsistency of (1) and (3) is particularly opportunistic.

I make no imputation as to intent on the part of the authors, as intent is irrelevant (and, as is blog policy, I ask readers not to speculate on intent either). All that matters is what is presented. Auditors confronted with opportunistic accounting policies of this type cannot certify that the statements comply with GAAP. If North, Hegerl and Cicerone were chartered accountants and certified that such inconsistent and opportunistic practices complied with GAAP, they would be treated very severely by any supervisor of accounting practices.

The MBH98 Corrigendum

The SI for MBH98 listed 34 tree ring series that were not actually used. This was acknowledged in their 2004 Corrigendum which provided the following implausible excuse:

These series, all of which come from the International Tree Ring Data Bank (ITRDB), met all the tests used for screening of the ITRDB data used in ref. 1 (see ref. 5), except one—namely, that in 1997, either it could not be ascertained by the authors how these series had been standardized by the original contributors, or it was known that the series had been aggressively standardized, removing multidecadal to century-scale fluctuations.

A number of the unused series were generated by Schweingruber. Other Schweingruber series arising from the same collection and publication were used in MBH98, making the application of the Corrigendum excuse to the Schweingruber series highly implausible to say the least. Other unused series were generated by prominent dendrochronologists in published articles, whose methodologies were either published or easily available to MBH. Whatever the actual reason for the omission of the 34 series, the above reason seemed highly implausible.

On a small whim, I cross-checked the Mann et al 2008 against the Corrigendum exclusions.

25 of the 34 series said to be unsuitable in the Corrigendum were used directly in Mann et al 2008; 4 excluded Schweingruber MXD series are used in the MXD grid. The only series not used in Mann et al 2008 are 4 Schweingruber RW series and one oddball (chil016, a Roig series – other Roig series are used.)

For 23 of the 25 series, the versions used in Mann et al 2008 are identical to the ITRDB version, previously said to be unsuitable in the Corrigendum. For the other 2 (arge065, cana106), the version differences were not material.

Wouldn't it be nice…

if Team methodological descriptions were correct?

Mann says:

All ITRDB tree-ring proxy series were required to pass a series of minimum standards to be included in the network: (i) series must cover at least the interval 1750 to 1970, (ii) correlation between individual cores for a given site must be 0.50 for this period, (iii) there must be at least eight samples during the screened period 1800–1960 and for every year used.

I checked two series familiar to CA readers – Gaspe (cana036) and Sheep Mountain (ca534). Aside from any other issues with these series, the early portion of the Gaspe chronology has only a couple of samples as does the post-1983 portion of the Sheep Mountain chronology. I checked these series in both the “original” and “infilled” versions. The “original” version transcribed ITRDB crn series, including portions with fewer than 8 samples. The “infilled” version was identical to the “original” version where the original version had values. Thus, contrary to the statement in the SI, the “infilled” versions used data with only one sample.

Does it “matter”? Well, why say it if it’s untrue?

When Good Proxies Go Bad

Many of the good folks who write the papers and keep the databases seem not to use their naked eyeballs. By that I mean, they seriously think that you can invent some new procedure, and then apply it across the board to transform a group of a thousand datasets without looking at each and every dataset to see what effect it might have. Or alternatively, they simply grab a great whacking heap of proxies and toss them into their latest statistical meat grinder. They take a hard look at what comes out of the grinder … but they seem not to have applied the naked eyeball to the proxies going into the grinder.

So, I thought I might remedy that shortcoming for the proxies used in Mann 2008. As a first cut at understanding the proxies, I use a three-pronged approach — I examine a violinplot, a “q-q” plot, and a conventional plot (amplitude vs. time) of each proxy. To start with, let’s see what some standard distributions look like. The charts below are in horizontal groups of three (violinplot, qq plot, and amp/time) with the name over the middle (qq) chart of each group of three.

A “violinplot” (left of each group of three, yellow) can be thought of as two smoothed histograms back-to-back. In addition, the small boxplot in the center of the violinplot shows interquartile range as a small black rectangle. The median of the dataset is shown by the red dot. Below each violinplot are the best estimates of the AR and MA variables as fit by the R function “arima(x,c(1,0,1))”. Note that in some cases either the MA coefficient, or both the AR and MA coefficients, are not significant and are not shown.

The “QQ Plot” (center of three, black circles) compares the actual distribution with the theoretical distribution. If the distribution is normal, the circles will run from corner to corner of the box. The green “QQLine” is drawn through the first and third quartile points. The red dotted line shows the Normal QQLine, and goes corner to corner. It is drawn underneath the QQLine, which sometimes hides it if the distribution is near-normal.

The standard plot (right of three, blue) shows the actual data versus time. The heteroscedasticity (Breusch-Godfrey test) is shown above the standard plot, with a rejection value of p less than .05.

Below are some possibile distributions that we might find. A number of them look related to “power law” type distributions (exponential, pareto, lognorm, gamma, fdist) although there are subtle differences. These all look like the letter “J”, and seem to differ in the upper tails, the tightness of the bend, and the placement of the QQLine.

The sinusoidal, Sin + Noise, and Uniform distributions all look like the letter “S”. Comparison with the QQLine shows that there is too much data in the tails. On the other hand, the “normal squared” distribution is a signed square of a normal distribution, and has too little data in the tails.

Finally, note that the ARMA distributions are not at first blush visually distinguishable from normal distributions. However, as you can see by the blue plots, they are more lifelike.

With that in hand, let’s take a look at some of the Mann proxies. What I did was simply go through the proxies and look for anomalies, proxies that seemed strange in some way. I didn’t look at numbers, that comes later.

Continue reading