In September 2013 Marcott et al 2013 is still highly praised by Stefan Rahmstorf http://www.realclimate.org/index.php/archives/2013/09/paleoclimate-the-end-of-the-holocene/

and Gavin Schmidt prefers these kind of data to adjust his climate model sensitivities http://www.realclimate.org/index.php/archives/2013/09/on-mismatches-between-models-and-observations/comment-page-1/#comment-408404

even after all the criticism written here.

On another blog, the following studies were offered as verification / proof of Mann’s hockey stick workings.

Huang et al. 2000: https://tinyurl.com/3arux4s

Oerlemans et al. 2005: https://tinyurl.com/a3afj4x

Smith et al. 2006: https://tinyurl.com/jewmm

Wahl and Ammann 2007: https://tinyurl.com/asrvvo8

Kellerhals et al. 2010: https://tinyurl.com/ams6l7t

Ljungqvist 2010: https://tinyurl.com/c96g3ej

Thibodeau et al. 2010: https://tinyurl.com/d73p33p

Kemp et al. 2011: https://tinyurl.com/3o743qu

Marcott et al. 2013: https://tinyurl.com/cu9z9kd

PAGES 2k Consortium 2013: https://tinyurl.com/blblfe2

I thought all of this business was put to bed a long time ago, but I see the warmanistas can’t quite part with their bent sticks.

Without getting too far into the weeds, any suggestions on a response?

]]>I am not saying that they did use grid averages; I am saying that they *should* use grid averages. How can you claim that you have a global reconstruction when you don’t have proxies in a large majority of the gridcells or even attempt to estimate the temperatures in those cells? Surely this is a significant source of error (in terms of reality, not in terms of error used in the paper).

In any case, temperature trends can and do vary considerably by location, making the use of ~50 proxies worldwide as a process variance estimate inappropriate. If you suppose that we had proxies with no error (perfect calibration, no standard error of the calibration, no timing uncertainty) distributed across some gridcells that did vary in actual temperature, each gridcell (supposing a gridcell had multiple proxies) would show 0 in-sample variance. However, Marcott (as you have described it — I have not verified that Marcott actually implemented what you describe) would show some variance due to differences between actual temperatures in the gridcells, since the in-sample variance is calculated without regard to gridcell.

Thus, the in-sample variance erroneously accounts for legitimate geographic differences in actual temperature and attributes it to the standard error of the regression, but also ignores the uncertainty arising from a lack of global coverage. The net wrongness of this could go in either direction and could be of any size as far as I know; I can simply observe that it is wrong to do things this way.

]]>Carrick,

*“jumping the shark”*

You are getting more mystifying. I was responding to your statement:

*“The temperature in 1960 isn’t known exactly, nor will it ever be. The process of measurement is to obtain a number using a reproducible methodology that places an uncertainty bounds on what that number would be. “*

It sure sounds like you’re asking for something like 12.5±0.5. So I responded that no major organisation produces or attempts to produce a figure like 12.5°C for global temperature. And they explain why.

So you say: *“This has nothing to do with anything.”*

But it must have. How can the ±0.5 make sense without the 12.5?

Then you have a whole lot of stuff about differences. But that is exactly what anomalies are for. We don’t know the temperatures for 1960 or 2012. But we know the anomalies, with CIs, and, with common base, we can difference them. That’s how it’s done.

]]>Jason,

I think you overestimate the role of grid-cells. They say:

*“We took the 5° × 5° area-weighted mean of the
73 records”*

They aren’t using grid averages; they are simply calculating an area density and using it to weight the proxy data. And since it’s mostly one per cell, they are equal weight except for cos latitude reduction.

They aren’t losing dof, and aren’t resolving geography.

]]>I wasn’t talking about within gridcell variance, but rather across gridcells. The whole point of using gridcells is that we think that temperatures within a gridcell are roughly the same (even though they aren’t always, let’s proceed on that assumption). I further assume that “global temperature” is equal to an area weighted average of each gridcell’s estimated temperature.

If you have 1000 proxies in each gridcell, known timing, known calibration, and normality of residuals, you can get a reasonable confidence interval for global temperature by using the between-sample variance as the population variance.

Now suppose that you have 1000 observations in one gridcell only. You now need to assume a covariance structure with the other gridcells in order to say anything with any confidence in those cells (and you do need to estimate these and include the error of such estimates, otherwise you do not have a global estimate).

Alternatively, suppose that you have 3 observations in each gridcell. Remember, global temperature is an average of each gridcell’s temperatures. Suppose that gridcells are 0% correlated to each other. Then, having only 3 observations per grid cell, you have an extremely poor estimate of the true temperature in each gridcell, and therefore an extremely poor estimate of global temperature (despite ~1000 proxies being used). If we assume a covariance structure between all gridcells, we can use some of the information from other gridcells in estimating each gridcell, reducing error.

My point was simply that having 50 observations is wholly insufficient, as you are using those degrees of freedom to simultaneously estimate both sampling error AND geography. I’m not sure what’s typically done in multiproxy studies with this issue, but I fail to see how 50 obs with sparse geographic distribution (in terms of number of cells actually containing data or having data nearby) could possibly give a proper estimate of the uncertainty from in-sample variance alone.

]]>Jason,

*“This problem wouldn’t be that bad if N was sufficiently large; you obviously don’t need infinite observations to get close to a population variance estimate. However, when you consider that this is a global 5×5 reconstruction, you would want a sufficient N in *each* of the gridcells.”*

No, there’s no within gridcell analysis. In fact, the gridding has negligible effect. There are 73 proxies and 2592 cells. All that happens is that a few proxies get downweighted because there are 2 in a cell (in fact, it’s the same core). And Arctic cells get downweighted, which I think is a bad idea and didn’t do it.

Globally, you usually have a sample of more than 50. That isn’t bad. But I don’t see how importing anything from Muller can help. You just have a small sample.

]]>“There is of course variation about that value. There would be if the calibration was totally certain”

Right, we are all on the same page now, as this is what everybody has been saying the entire time.

“That variation is captured in the between proxy variation”

I don’t think so, at least not correctly. Presume that there is 0 error in the calibration relationship, but that r^2 is .01 as before. Now, assume that a given time period has 1 observation. The standard error of the mean at that point in time, as you’ve decribed it, is also 0, since there is only one observation. If we had infinite i.i.d. observations at each point, your point would stand; the variance of all f(proxy,t)|t) would be the variance of the estimate. What happens when you don’t have infinite observations at any given point in time? The true uncertainty is not merely the between proxy variation of a small number of observations — it is something larger.

This problem wouldn’t be that bad if N was sufficiently large; you obviously don’t need infinite observations to get close to a population variance estimate. However, when you consider that this is a global 5×5 reconstruction, you would want a sufficient N in *each* of the gridcells. If this isn’t feasible, you need to include some kind of error for the lack of true global coverage. You would also need to include some error for timing uncertanties and sparse sampling (if using interpolations; not sure what Marcott did) that interacted with all of these other errors somehow. The calibration error would be used as well.

I am not sure what Marcott did in terms of testing timing/sampling issues, and know that they tested calibration error, but could not possibly have captured true variance of the global mean using <100 proxies of a given type at any given time.

]]>