TCO has done a long review of Rybski et al., 2006, Long-term Persistence in Climate and the Detection Problem. GRL 33, L06718, 2006 Link
Rather than burying the topic separately in a post, here is a well-behaved TCO post on the masthead. (I haven’t edited it and don’t vouch for it, other than there’s serious effort in it.) So here’s TCO on Rybski et al:
This article has gotten interest on the site because of two issues: support for AGW and citing MM03 as a “reconstruction”. Big picture: what the authors do is (1) look at the change in temp over last 100 years (instrument) and compare it to the historic changeability of climate (in proxy reconstructions) to see how “unusual” (in the sense of odds) that the recent warming is and (2) look at “long term persistence” (LTP) within the reconstructions. The discussion, a bit unnecessarily, mixes the two issues (perhaps to make the simple point (1) seem more “technical”). Really they are BOTH interesting issues on their own. The problems of LTP just make for a case where temp rises are more likely than if the data were independent (thus setting a higher bar for “remarkability” of the recent temp increase).
The article is a bit more technically written than needed, but with a little effort, one can get most of the physical insights from the text. I think I got about 85%. This discussion will contain criticism, observations and questions from reading the paper. It is organized in the order of reading the paper (by paragraph number). Some points are minor, some stylistic.
While most of my comments are negative, that does not mean I think the work without merit. I just like to note all the assumptions and things that could influence the result or the interpretation of the result.
 (reads like an abstract)
A. 6 reconstructions (Jones98, Mann99, Briffa00, Esper02, “MM03″, and Mo05) were examined.
1. The paper does not explain the selection process: Are these the only 6 reconstructions of the NH temps? Do they use the entire records or chop them to be historic only?
2. Are the reconstructions independent (in method or data?) Could one get a false picture of 6 different samples for this analysis? (think a verbal caveat called for here.)
3. In particular, “MM03″ was not intended as a reconstruction (so that selecting it misrepresents the population of reconstructions…all the rest of the samples were at least trying to solve the question of what happened). In that sense, at least there is some Bayesian benefit to looking at several recons. But not with MM03 (the authors don’t stand behind it as an attempt to measure temps). More troubling, MM03 was run as a variant of MBH98 (to test robustness) so it is very much non-independent of MBH in inputs and method.
4. If variants of MBH are of interest to subject to this analysis, it would be good to look at the Burger and Cubasch full factorial of MBH variants as well.
B. There is an implicit assumption (not stated, granted, but not caveated either) that the proxy recons are historical records equivalent to the instrument period (thus allowing the point (1) examination). If the proxies are contaminated by CO2 or cherry-picked then that affects the results. Same issue with the instrumental record (if it is inaccurate)”¢’¬?in particular, the instrumental record is based on ground stations, not on satellite or balloon measurements. That might be fine. But it should be noted.
C. It’s not clear to me why we compare the instrument temp change to the proxy records. Why not compare recent proxy history (last 100 years) to overall proxy records. This type of analysis would better show the unreliability of the proxies (or their divergence from instrument), would bring in the divergence problem and the lack of recent proxy data.
D. (Observation) The paper looks at delta temperatures from midpoint of one averaged temp period to the midpoint of another averaged period. This is where (L, m) come in. This is made more technical sounding than it needs to be.
E. (Minor) Not clear to me why M=30 (averaging period) and L=100 are “climatologically significant”. Certainly sigma ratio = 2.5 is not “climatologically significant”. I think 100 is really significant since it is closely related to the observed recent warming (especially when you have to average across a period, so you’re really looking at 100+15+15=130 years within the delta T. Would rephrase this to be more precise and thoughtful and less “puffy”.
F. The difference between an onset of detection and a “excessive odds versus historic variability (isubc and isubd) is just a feature of the “smoothing” or averaging inherent in the 30 year number. I’m not clear why there is a 14 year difference versus a 15 year difference.
 This is a good introductory paragraph.
A. While the paper gives a nice hat tip cite to others who have worked on “attribution and detection problem”, none of the later discussion compares the results obtained to those other papers.
B. Is there going to be a “longer paper”? In theory, letters should be followed up by longer papers that go into more detail. Recently, in the physical sciences, people have started blowing this off. I wonder if climatology/GRL is even worse about this.
 This para has a tiny bit more detail on recons selected, but still fails to answer the questions from .
A. Also gives a false impression (to the extent of a bad error or a lie) that MM03 is a “time series supposed to describe the historical development of the variable”. (BTW, that sentence is bloated and poorly worded as well as being inaccurate wrt MM03.)
B. The authors fail to say how they obtained the MM03 data (not on file at WDCP). This is poor form, both for future readers, wanting that info, and because they fail to acknowledge or thank MM for providing the info (even knowing that it would be used in a manner per A above that they disagree with.) To the extent that they make it look like MM are failing to archive adequately (when MM03 is like a Burger and Cubasch run)”¢’¬?that is a really nasty, nasty trick.
C. The second half of the paragraph (about LTP) should be its own paragraph. It’s a different subject.
D. The many cited “real” (I don’t like that term btw) climate records with LTP are never directly compared to the obtained results here. We just get a claim that there is similar behavior. Of course in the longer paper, this will all be better described. àⰃ à ➍
 This para is more intro and is about both the moving averages and the comparisons. (Overall poor construction of the paper in layout of subjects: excess repetition and lacks a “pyramid” organization of content.)
 This para is about Figures 1 (spaghetti graph) and 2a (normal distribution of temperature deviation from the mean).
A. This should be two separate paragraphs.
B. Both figures are very tiny and hard to read.
C. The sphaghetti graph would be better shown in smoothed version.
D. SG has bad background color.
E. 2a is a semilog plot (always look at semilog and log-log suspiciously because a lot of range is compressed) and the normalization by std deviation obscures the difference in range of the curves. In addition, I would like to have a numerical value of some test or percent normality. Not just this visual.
F. Last sentence has a lot of puffery. Just say that the visual confirms normal distribution (or what have you). The Gaussian comment is not needed and the “fully described” is puffery and non-quantitative.
 Is about the assessment of LTP in the reconstructions.
A. I could not follow the math. (Just my fault.) I don’t know what the bra ket notation means or what the triple bar equals sign means.
B. I wonder if this is a “vanilla style” of LTP or of explanation of what LTP means. Why aren’t they talking about “redness” or about ARIMA coefficients? At least with ARMA, I have an intuition what the parameters mean (storage effects and the like).
C. Is DFA analysis conventional? (very recent publications). Could we do similar analysis with something more “standard”?
D. (Figure 2b) not clear to me why none of the lines cross. Luck? Significant? Something about how the analysis was done (for graphical purposes)?
E. Significance of the deviations from the line (on log-log). BTW, lines on log-log are very, very easy to get.
 Is a comparison of Mo recon to an artificial LTP function.
A. Not clear to me why so much text space (and two of the only readable figures) are dedicated to this essentially didactic point. (nature of LTP).
B. If one wants to make this didactic point (nature of LTP), better done as on Steve’s blog by showing some iid, some ARMA of different types, etc.
C. The Bunde citation sentence seems odd and forced. Is there really something so special about this 2005 paper on heart rates that needs to be compared to these climate records? Wouldn’t it be better to cite a classic, older paper (the Nile hydrology)? And it’s such a throwaway remark that it doesn’t seem to mean much, the way it’s said.
D. The function used to generate 3b is not ARMA and seems overly technical (not sure why we need to look at a 1996 reference instead of a textbook).
 This para is about variability in the instrument record.
A. Not clear to me what the author means by use of the word “natural” within the period of instrument time. In the past it seems that he uses this word for “historic times”.
B. I don’t see a graph or a table where the results of this “examination within the instrumental period” is recorded. Confusing para really.
C. Oh wait, I think I get it. They are using Tsubi not for T(instrument), but for T(ith reconstruction)? Confusing. Grr. (If the para is about T(instrument), then figure 2a is missing set of records for the instrument. Actually come to think about it…if we have a hockey stick occurring, how can we possibly have this normal disrtribution at all for any records? Wouldn’t it be skewed normal?
D. Is it a truism that if the overall dataset has normally distributed (from the mean) temperatures than the M, L results will be normally distributed? Not sure I buy that as a truism.
E. “very unreasonable” is puffery. Just say “low”.
 Is about the relationship of the standard deviation to the (m, L) parameters. Basically, as you get a larger L (several values shown) or smaller m (two values shown), you get a larger standard deviation. This makes implicit sense. Smaller m means less “smoothing”. Larger L means end points further apart (less correlated).
A. The stuff about “error bars” of the artificial functions is just another proof that the climate records have autocorrelation or can be compared to classic autocorrelation functions.
B. Why are only 4 records looked at?
C. What is the significance of crossings of the error bars. Is there a better (numeric) way to describe goodness of fit?
D. The comment about significantly greater variability in these records then in uncorrelated ones ought to be proven/quantified. I believe them. I just think if they want to make the point, they should prove it.
E. As expected, the “bumpier recons” have higher SD than the smooth ones.
 Para is about figure 5a and 5 b.
A. 5a just shows a good view of the instrumental record. Para makes the point (which we grasped earlier) that the (m, L) blabla is just a deltaT on a smoothed graph.
B. 5b, jumps two steps down the explication train by both having the delta T and by dividing it by the SD of the reconstruction(s): for L=20, m=5. The different reconstruction SDs are just scalars, so what you have are six versions of the same curve, just shifted different amounts from the axis.
C. During the instrumental time, there are a couple periods where the (1940ish, 1995ish) where the curves veer over significance limits. The text discusses the likelihood of this happening. (pretty unlikely).
D. It’s not clear to me if this likelihood equates to a per period sense. That is if we have something that is 1/44 likelihood, it means it happens once per 44 years (on average)?
E. As might be expected, given the general warming experienced in last 100ish years, there are no crossings of the significance boundaries in the negative direction. A couple come close. I think some comparison of negative and positive excursions should be made (and it helps the warmer case).
 this para is about the same concept but with L=100, m= 30 (incidentally why vary both at same time?) Because the instrumental record is only about 150 years long and we need 130 years of data to start to do this method, the graph doesn’t show much extent of time (1970-1990).
A. The curves start out significant (or near significant) and with time all show some excess of the control limits.
B. The 14 year delta of isubc versus isubd is finally clear here (versus a 15 year delta). The smooths are for 30 record years, thus covering 29 calander years duration. They do 14 ahead and 15 behind. I think it would just be better to do 31 year smooths and keep it symmetrical, but no biggie.
 This para just gives the different years where we cross significance thresholds and when detectable, (because of the smooth). The bumpier curves (Mo and Jones) give a later detection limit (it’s just a function of the larger scalar divisor in the SD).
 Conclusion para (claims that observed warming is inconsistent with natural variation.
A. This is a reasonable, mathematical expression of the proxy records versus instrument. I think it is better to drill things down into math than to just show spaghetti graphs (I couldn’t for the life of me tell from the spaghetti who’s side it supported). This at least moves it to math.
B. There is a gratuitous comment about the “quality controlled” instrumental data. But the earlier article did not establish the quality level of the instrumental data. I’m not a UHI whiner, but this kind of gratuitous comment in a conclusion is uncalled for. No new facts/points in the conclusion. Just a natural denoument.
C. Similarly it seems gratuitous and forced and even a little political to frame this result in terms of supporting the arguments of several other “detection and attribution” writers rather than to just give own conclusion. And some summary of these earlier results/arguments was never given earlier in the paper. It’s another case of bringing in new stuff in the conclusion (and in this case stuff that one has to go to a cited reference to even use…or just be part of the club and be familiar with those papers).
D. I think the prominent (by having it in conclusion) and almost titillating comment about the similarity of results from MM03 and MBH is a bit nasty. There are a lot of criticisms of MBH by MM that are not captured in this particular analysis of the two “records”. One might get entirely the wrong point from this paper. Also (as discussed before) the reconstructions are closely dependant (given that one was a variation for effect of the other). However, it is interesting that this analysis shows that the prominent hump at 1450 of the MM03 variation may not be so significant from a mathematical standard deviation/confidence interval point of view.