Since I showed the effect of smoothing on the relationship of Dunde to temperature, I thought that it would be useful to post up a table showing the Jones et al [1998] proxy correlations to temperature versus my calculations using HadCRU2.
This table shows: col 1- my calculated correlation for 1881-1980 using HadCRU2 (I haven’t tested why they used 1881-1980); col 2- correlation as reported in Jones et al [1998]; col 3- original reported correlation, according to Table 4 of Jones et al [1998]; col 4- t-statistic from obtaining the correlation by a regression model (which matches the usual calculation). I’ve not attempted to deal with spurious t-statistics issues here. In a number of cases, Jones et al [1998] already reported lower correlations than the original publication, e.g. Jacoby treeline; Lenca and Rio Alerce; Galapagos corals. In some cases, my calculations using HadCRU2 are much lower again, with the differences sometimes being substantial. For example, the correlation for the Jacoby NH treeline study is reduced here to 0.07, down from the original 0.72 (Jones – 0.37) . Likewise for the Rio Alerce and Lenca series. A few series show higher correlations e.g. Svalbard melt.
Jones says (p. 461) :
"The most surprising correlation in Table 4 is for ENG. The value is relatively low because the gridbox series (50-55N, 0-5W) incorporates up to 10 station records and some SST data while ENG is based on only 3 inland stations."
It is surprising that the reconstruction of gridcell temperature from Tornetrask tree rings (this includes the "adjustment" discussed before) is supposedly more accurate than the Central England temperature version is to the HadCRU2 gridcell. This seems unlikely and suggests that some portion of this high correlation may be spurious. A t-statistic seems to me to be a more sensible approach, but there are no surprises in the t-statistics here. Relative to the usual 2.96 benchmark, 5 of 7 SH series are insignificant and 3 of 10 NH series.
Calculated | Jones 98 |
Original Report (Jones 98) |
tOLS | |
Fenno | 0.73 | 0.79 | 0.74 | 10.59 |
Urals | 0.61 | 0.83 | 0.82 | 6.01 |
Jasper | 0.56 | 0.48 | 0.42 | 6.58 |
Svalbard | 0.30 | 0.08 | NA | 2.76 |
C England | 0.62 | 0.84 | NA | 7.80 |
C Europe | 0.94 | 0.90 | NA | 27.19 |
Kameda.melt | 0.10 | 0.17 | NA | 0.98 |
Jacoby.treeline | 0.07 | 0.34 | 0.73 | 0.52 |
Briffa.WUSA | 0.38 | 0.60 | NA | 4.12 |
Crete, Greenland.O18 | 0.41 | 0.30 | NA | 4.13 |
Tasmania.92 | 0.35 | 0.42 | 0.57 | 3.64 |
Lenca | 0.14 | 0.36 | 0.61 | 1.31 |
Alerce | 0.05 | 0.35 | 0.61 | 0.43 |
Law.Dome | 0.08 | 0.26 | NA | 0.38 |
Great Barrier Reef (5) | 0.19 | 0.18 | 0.31 | 1.86 |
Galapagos | 0.13 | 0.39 | 0.66 | 1.31 |
New.Caledonia | 0.41 | 0.41 | NA | 4.38 |
Jones et al [1998] Table 4 shows "decadal" correlations which are supposedly measuring "low frequency" effects. I’ve spent quite a bit of time recently pondering statistical issues of handling "low frequency" and they are by no means easy, if you’re trying to do it in an advanced way. In this table, col 1 – my calculation of the correlation: I made decadal averages for both proxies and temperature (not by smoothing); col 2- reported inJones et al [1998]; col 3- OLS t-statistic.
The Law Dome statistic here is meaningless as it based on 3 decades ( i.e.the correlation results from 3 values) which are hardly enough to ground a reportable correlation statistic. There are a couple of significant decreases: the Polar Urals decadal correlation declines from 0.92 to 0.23 (whereas Tornetrask is unchanged). I’m not sure why. It might be due to changes in the HadCRU2 version. The decadal correlation in the Svalbard melt series improves. I’ve noticed major changes in Greenland temperature series in HadCRu editions and maybe htere was a change in Svalbard here. Jones et al [1998] mentions as a caveat that inaccurate temperature series may contribute to low correlations.
The t-statistics here show the impact of the reduced number of values used in the correlation calculations. Despite the seemingly high decadal correlations, only 4 series have significant decadal OLS t-statistics (let alone t-statistics allowing for spurious significance issues). The four are: the Central England series – hardly a "proxy" for temperature; the Central Europe historical series; Svalbard melt % and the Tornetrask reoonstruction. The Svalbard melt series is hugely non-normal. I’ve been meaning to check to see the effect of normalizing this series. The Tornetrask series was "adjusted". This may have an effect. Again, it is disquieting that this reconstruction is "more accurate" than the CEng series.
Calculated | Jones 1998 |
tOLS | |
Fenno | 0.82 | 0.80 | 4.05 |
Urals | 0.23 | 0.92 | 0.63 |
Jasper | 0.30 | 0.45 | 0.89 |
Svalbard | 0.79 | 0.38 | 3.67 |
C England | 0.84 | 0.80 | 4.31 |
C Europe | 0.90 | 0.83 | 5.68 |
Kameda.melt | -0.40 | – 0.28 | – 1.24 |
Jacoby.treeline | 0.76 | 0.87 | 2.36 |
Briffa.WUSA | 0.65 | 0.79 | 2.40 |
Crete.O18 | 0.27 | 0.49 | 0.80 |
Tasmania.92 | 0.65 | 0.58 | 2.40 |
Lenca | 0.31 | 0.55 | 0.93 |
Alerce | 0.12 | 0.16 | 0.34 |
Law.Dome | 0.57 | 0.98 | 0.70 |
GBR.5 | 0.55 | 0.52 | 1.87 |
Galapagos | 0.14 | 0.16 | 0.41 |
New.Caledonia | 0.60 | 0.48 | 2.11 |
36 Comments
1. does smoothing always (usually) help the r factor?
2. When is it appropriate, not appropriate (if I’m looking at it from a management
perspective).
3. I don’t understand the difference between col2 and 3. what is jones 98 and jones
original 98? OK. I think I get it: original study report. But WTF. How come 3 people
get different results so often just from basic crunching??
4. “col 4- t-statistic from obtaining the correlation by a regression model (which matches
the usual calculation). I’ve not attempted to deal with spurious t-statistics issues here.”
a. who’s t-stat (yours or Jones and if Jones, did you check them?)
b. “from obtaining the correlation by a regression model (which matches the usual
calculation)” HUH? What should I take away?
c. “…spurious…” You think the number crunching of the t-stat is wrong? Or that their
is something else more subtle that is wrong? Is the t stat is not the appropriate metric
for evaluation?
5. Are we doing correlation to gridcell or to “climate field”?
6. “Jones says (p. 461) :
“The most surprising correlation in Table 4 is for ENG. The value is relatively low because
the gridbox series (50-55N, 0-5W) incorporates up to 10 station records and some SST data
while ENG is based on only 3 inland stations.”
It is surprising that the reconstruction of gridcell temperature from Tornetrask tree rings
(this includes the “adjustment” discussed before) is supposedly more accurate than the
Central England temperature version is to the HadCRU2 gridcell. This seems unlikely and
suggests that some portion of this high correlation may be spurious. A t-statistic seems to
me to be a more sensible approach, but there are no surprises in the t-statistics here.
Relative to the usual 2.96 benchmark, 5 of 7 SH series are insignificant and 3 of 10 NH
series.”
a. Is ENG= central england?
b. Tornetrask? It’s not even in the list.
c. Did you completely change topics in the middle of a paragraph?
d. I have a hard time following you…
7. YEAH…I don’t know what to think about the “low frequency” stuff. Is it right thing to
do to get measurements of interest? A copout to make trends look better? People acting
“cool” with long words?
8. “col 3- OLS t-statistic”: yours or Jones?
9. “The Law Dome statistic here is meaningless as it based on 3 decades ( i.e.the
correlation results from 3 values) which are hardly enough to ground a reportable
correlation statistic. There are a couple of significant decreases: the Polar Urals decadal
correlation declines from 0.92 to 0.23 (whereas Tornetrask is unchanged). I’m not sure why.
It might be due to changes in the HadCRU2 version. The decadal correlation in the Svalbard
melt series improves. I’ve noticed major changes in Greenland temperature series in HadCRu
editions and maybe htere was a change in Svalbard here. Jones et al [1998] mentions as a
caveat that inaccurate temperature series may contribute to low correlations. ”
a. seems like another paragraph with mixed issues.
b. declines mean from “smoothed Jones” to “your smoothed”. Ok. Checked on that. my first
assumption would be smoothed versus unsmoothed from the text.
c. “changes in HadCRU2 version”: I guess this would explain the differences between all
the versions?
d. “Jones et al [1998] mentions as a caveat that inaccurate temperature series may
contribute to low correlations.” He is hypothesizing fault in CRU?
10. This website (http://www.cru.uea.ac.uk/cru/data/temperature/#faq) talks about CRU
editions of data changing. But I’m surprised that there would be this much
impact…especially since FAQ says changes are mostly in recent years data. Is also a note
about the poor quality of data in 1850s.
11. “let alone t-statistics allowing for spurious significance issues” watchu talkin’ bout
Willis?
12. “the Central England series – hardly a “proxy” for temperature”: what is it? a set of
instrumental records? what is it doing in here?
13. “The Svalbard melt series is hugely non-normal.” Implication? (for civilians?)
14. “The Tornetrask series was “adjusted”. This may have an effect.” (What’s going on?
What’s your concern?)
15. Again, it is disquieting that this reconstruction is “more accurate” than the CEng
series: Which one? Tornetrask? where is that on the chart?
So much for notepad…
Try turning word wrap off next time, I guess.2
My cat typed that last “2”
TCO,
I’ve been concerned with using tree-ring widths as temperature proxies since I first came across the subject a few years ago. The signal-to-noise ratio is so poor that you have to jump through a lot of statistical hoops to get anything out of them. The problem is, how do you know that what you are seeing when you are done is indeed real, what with the data so heavily massaged? My take away from Steve’s work is: doubt. We still don’t know if we have recovered the real signal there, but it won’t done by using private datasets and secret data processing procedures. Before we bet the future on this stuff it has got to be much more solid, reliable, and repeatable.
Oh sure, agreed. I’m just trying to follow the thought thread in the post…
I’ll picku up a few now:
1. the t-statistic is related to the r-statistic but allows for the number of measurements. Smoothing reduces the number of “effective” readings, so the significance of any reading is much reduced. For example. If you have two readings only, you will get 100% correlation half the time, but it doesn’t mean anything.
2. for calculating statistical significance – I’m not sure. I’m trying to understand “low frequency” issues.
3. It’s hard to say why the answers are so different. I often have trouble replicating Hockey Team results as you know. Whether it’s different series versions, different temperature versions, I don’t know. There are some weird differences between temperature editions, which I’ve not explored, but would be fertile ground for someone to do. The Hockey Team does not annotate weird changes.
4. my calculation of the t-statistic. The t-statistic from OLS tables assumes independence, which is not the case in autocorrelated series. So this is a lower limit t-statistic,
5. gridcell. That’s what Jones et al do; it’s different than Mann who correlate to “climate fields”, which are weighted averages of gridcell temperatures.
6. Tornetrask=Fenno(scandia); ENG=C England. It’s probably better as 2 paragraphs.
7. The assumption that you can have a “low frequency” relationship between a proxy and temperature without having a “high frequency” one raises lots of problems about how you’d go about establishing one. I haven’t seen any statistical discussions by the Hockey Team on this, and it seems to me to raise a lot of problems.
8. my calculation. Jones doesn’t use t-statistics. Anyone looking at the t-statistics would know that they are insignificicant but the correlations “look” significant. Usual Hockey Team stuff.
9. He’d say this is for the earlier HadCRU. I’m a little worried about someone working both sides of these statistical relationships – there’s a temptation to tailor results a little, especially when there’s no backup on the HadCRU temperature results.
10. I’ve noticed some very odd changes. For examples, 4 gridcells used in MBH, which supposedly had over 50% available observations in HadCRU v1, had 0 observations in HAdCRU v2. What’s going on?
11. see above.
12. I can see why you want to use long temperature series in reconstructing past temperatures, but don’t call them “proxies” and use performance statistics based on partial use of actual temperature data to sell results for steps with only proxy data. Let’s see the proxy-only step. It’s a Hockey Team thing,
13. At best it’s a nonlinear transformation of temperature and will cause bias when averaged.
14. See my post in April or May on Tornetrask adjstments. Their reconstruction went down in the 20th century; so they tilted it up.
15. Tornetrask=Fenno.
Thanks, man(n). 😉
does Jones compile CRU? Is there an alternate compilation? I guess you could rerun against the alternate. maybe even different versions of the alternate. That would show if large changes are to be expected from revisions done by different compilers.
Sorry, I keep suggesting more experiments for you…
Doesn’t everyone know by now that direct measurement of temperature using thermometers is much less accurate than a few well chosen stands of trees, a laptop, and a PhD in saving the world?
Steve,
Is Jones still keeping his data exempt from scrutiny?
I would have thought that temperature data collected by a taxpayer funded organisation should be available to all.
As an interesting aside I am off bush next week personally doing an electromagnetic survey, single-user, backpack machine. It’s efficiency is temperature dependant and as long as ambient air temp is low, (say up to 30 deg Celsius) instrumental drift is linear.
Drift is of course easily recorded, repeated measurements during the day of the same station in the survey.
If the drift is linear, we can correct the data, and essentially a no-brainer.
However if the drift is not linear, then we reject the data totally and repeat the survey next day, which at the end of the day is again examined for drift linearity.
Electromagnetic surveys of the earth’s surface used by the mining industry are restricted by “atmospheric effects”, (‘spherics”) or electrical noise in the earth’s atmosphere. If the “spherics” are high, no surveying is done. Possible but the data is junk.
This atmospheric noise is essentially electrical disturbances in the earth’s atmosphere, or electric field. This means the existence of electric currents passing through a resistive medium, (the air), and for those of us who understand how light-bulbs work, an indication of an input of energy. Heat is produced.
Temperature is an indication of energy state of a substance.
And I leave it here for contemplation.
Jones gridcell temperature data is still private. Warwick Hughes has tried hard to get it without success. Update: What I was meaning to say (and is clear from prior comments on this site) was that the station data underlying the gridcell data is kept under lock and key; the gridcell results are publicly available.
Is there an alternate?
“Is there an alternate?”
Yes.
Mike,
I have the impression that Steve is spending less time online today than usual, so let me mention that Steve has discussed Durbin-Watson in other threads, for example in http://www.climateaudit.org/?p=317
Speaking of not spending time at one’s blog, what has happened to RealClimate? I hadn’t been there in quite some time and figure there’d be tons of new threads. But there was exactly 1 new thread in over a month. Have the principals lost interest?
Merger talks…
What happened to Mike Hollinshead’s comments? They disappeared from this thread, as well as from at least one other thread.
John A: call your office!
Mike,
FYI, this site uses a software package called “Sparm Karma” to block spam, but sometimes it misinterprets good stuff as spam.
I would expect that Steve, or John A, will reinstate your comments fairly quickly.
Don’t take the (presumably temprorary) deletions personally.
Ah, pink potted meat strikes again!
Why low significance? Consider a theoretical model of trees of a single species uniformly and randomly distributed in an area with a range of altitude. Assume the species has an optimal growth temperature outside of which growth rate declines. Now increase the temperature. The increase in growth of trees in suboptimal conditions will be compensated by the decrease in growth of trees in suboptimal conditions. So, if you randomly sample the temperature history of the area, by coring random trees, the average of any measure of deviation should be zero. Multiply this by multiple species and you have a real world situation. Since there should be no reason, on a theoretical analysis, to expect a temperature signal from averages of tree rings, it follows that the only reason a temperature signal has been detected results from biased sampling. Note this should not necessarily apply to other proxies such as glacier length, or to studies constrained to local areas.
Assumes perfect and instantaneous response to the forcing function. That forests have legs. Not reasonable.
No, lags are irrelevant. No Ents required.
Wrong. try doing a forcing function of high frequency on a system whose resonance is at low frequency.
that sounded pretty mathematical (gotta hold me ground now and hope that I’m not “Sidding”)
OK, making me think. I don’t see how the frequency matters as it is relative and there is no scale. We also assume that the age of the trees is considerably longer than the duration of the forcing, if thats what you mean.
Jerryb,
there is an alternate? If so, where and how?
Steve,
thanks – Warwick is persistent but this rejection of data requests means that Jones is afraid of the data becoming public. No need to write here what I think about Jones’ action on that account.
Just reread your post. I guess in your model world, with your model methods that would be the result. However, there is no reason for a real researcher to be so constrained (e.g. to “random coring” or (implicitly) to not inlcuding locational issues (e.g. elevation) as a variable within a multiple regression). And there is no reason why your trees have to be situated in such a manner or have such a response.
Louis,
First, let me clarify the question: “is there an alternate?”.
When Steve wrote: “Jones gridcell temperature data is still private.”, his wording was open to multiple interpretations. When he added: “Warwick Hughes has tried hard to get it without success.”, he narrowed the possible interpretations to (land based) suface station temperature data. (Other kinds of “Jones gridcell temperature data” are publicly available.)
One collection of land based surface station temperature data that is publicly available is called GHCN (Global Historical Climatology Network). A brief overview of GHCN is available at http://www.ncdc.noaa.gov/oa/climate/research/ghcn/ghcnoverview.html
Caveat: any such collection includes problematic data.
That is my point. The theoretical model predicts that in the real world, extracting a temperature signal from trees in environments of variable optimality REQUIRES selection, or call it ‘cherry picking’. There is no apriori reson to expect dendroclimatology to produce a reliable climate signal from a random sample of trees. This is not the case however with other proxies such as glacier length or isotope ratios where one expects, apriori, the response to be linear with temperature not parabolic as is the case with trees.
It’s the station data that’s not available – the gridcell information is available.
Did you follow what I meant with the multiple regression comment? The implication is that with extra equations, you can solve for extra unknowns. It’s an algebra concept and is reasonable (not cherrypicking). Basically, in “Dave’s model world”, I would just record elevation and note that the areas on the “too hot zone” were getting worse and on the “too cold zone” were getting better during any warming period.
Steve can back me up here. I’m not doing anything snaky here. The algebra issue is that in your model world, we have a quadratic response to temp, so that I need to solve a regression for both t and tsq. HAving the extra elevation variable allows me to do that.
Slight pedant point: if RW, MXD are both collected (and differ in their reaction to temp) that may also allow me to deconvolute your conundrum. But elevation recording and input into the model is the obvious method.
Why is the station data not available (short the names of people). Is it still a privacy concern (if exact locations are given maybe)?
What is the source for the gridcell temps in that other record (the stations)? Who did the construction? Still seems like a worthwhile test to look at how that series tends to move with version and interact with proxies, versus the JOnes CRUs (if you want to snoop for possible finagling by Jones).
#31 TCO, yes you could try that, and attempt to use all trees, but the following problems come to mind. First you have to know the optimal temperature and response curve for each species, and the position of each tree relative to that. Also, temperature is only one factor defining optimal habitat. The more parameters you need in your model, the more potential errors you incorporate, and more data required. Not saying you couldn’t with a great deal of selection or additional information calibrate a more complex model, just that one wouldn’t expect a temperature signal from a simple model using a parabolic proxy.
Agreed.
Steve, or John A,
In case you do not get to review yesterday’s comments, let me mention that some comments by Mike Hollinshead were deleted from this thread yesterday between comments 14 and 18.
It seems that he exceeded Spam Karma’s new visitor first day threshold.
#34 And relevant to todays hot thread on treelines, while averages of tree cores have expectation of zero temperature signal, the methodology is probably not ‘foxable’, a treeline proxy would not suffer from the same cancellation problem, as we would expect treeline response to roughly linear with temperature.