**A guest post by Nicholas Lewis**

*Introduction*

A new paper in Nature by Jochem Marotzke and Piers Forster: ‘Forcing, feedback and internal variability in global temperature trends’[i] investigates the causes of the mismatch between climate models that simulate a strong increase in global temperature since 1998 and observations that show little increase, and the influence of various factors on model-simulated warming over longer historical periods. I was slightly taken aback by the paper, as I would have expected either one of the authors or a peer reviewer to have spotted the major flaws in its methodology. I have a high regard for Piers Forster, who is a very honest and open climate scientist, so I am sorry to see him associated with a paper that I think is very poor, even as co-author (a position that perhaps arose through him supplying model forcing data to Marotzke) and therefore not bearing primary responsibility for the paper’s shortcomings.

In putting together this note, I have had the benefit of input from two statistical experts: Professor Gordon Hughes (Edinburgh University) and Professor Roman Mureika (University of New Brunswick, now retired). Both of them regard the statistical methods in Marotzke’s paper as fatally flawed.

The Marotzke and Forster paper analyses trends in simulated global mean surface temperature (GMST) over all 15- and 62-year periods between 1900 and 2012, and relates them to contemporaneous trends in model effective radiative forcing (ERF) and to measures of model feedback strength (alpha) and model ocean heat uptake efficiency (kappa).

The paper is very largely concerned with the behaviour of climate models, specifically atmosphere-ocean general circulation models used in the CMIP5 simulations. In discussing relevance to the actual climate system, it ‘assumes that the simulated multimodel ensemble spread accurately characterizes internal variability’.

The authors’ principal conclusions are:

*The differences between simulated and observed trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings used to drive models over the longer timescale. For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends or, consequently, on the difference between simulations and observations. The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded. *

Marotzke claims to have shown that in model simulations the structural (alpha and kappa) elements – which encapsulate model GMST responses to increases in CO_{2} forcing – contributed nothing even to recently-ending, longer-term GMST trends. It is difficult to see how that can be so if the models work properly. It is certainly possible (in fact likely) that over the period 1900–2012 the combined contribution of alpha and kappa to model GMST trends was largely obscured by countervailing variations in model ERF trends: high sensitivity models tend to have more negative aerosol forcing than lower sensitivity models, enabling both to match 20th century GMST trends. But aerosol levels have changed little over the last 35 years and higher sensitivity models have been warming much faster than observed GMST over that period.

In order to show why the paper’s conclusions are not justified, I need to explain what Marotzke has done.

*What Marotzke did *

Marotzke starts with a ‘physical foundation’ of energy balance: Δ*T* = Δ*F* / (*α* + *κ*), where Δ*F* is the change in ERF; *α* is the climate feedback parameter (the reciprocal of equilibrium/effective climate sensitivity [ECS] normalised by *F*_{2xCO2}, the ERF from a doubling of CO_{2} concentration: *α* = *F*_{2xCO2}/ECS ); *κ* is the ratio of change in the rate of heat uptake by the climate system – or in its counterpart, top-of-atmosphere (TOA) radiative imbalance – to change in GMST, termed ocean heat uptake efficiency; and Δ*T* is the change in GMST.[ii]

Marotzke then adds a random term, *ε*, to represent internal variability in GMST, resulting in the equation

. Δ*T* = Δ*F* / (*α* + *κ*) + *ε *(1)

which is taken to apply to linear trends, rather than changes, in GMST and ERF.

He then takes temperature data and individual ERF time series relating to their historical simulations[iii] from an ensemble of 18 CMIP5 models.[iv] The ERF time series were not included in the model simulation output but had previously been diagnosed (estimated) therefrom by Forster et al, along with values for *α* and *κ*.

Marotzke expresses each quantity in (1) as where the overbar represents the ensemble mean and the prime the across-ensemble variation. By considering a linear expansion of equation (1), using those expressions, he arrives at the approximation

Marotzke states that this equation suggests a regression model

where the value of *j* identifies the particular model run involved. Some models have multiple simulation runs, but each model’s values for Δ*F, α* and *κ *are common to all its runs.

I’m rather dubious about the validity of the approximations used in (2) given that *α* is typically somewhat larger than *κ* and there is nearly a threefold variation in *α* across the models, meaning that many of the *α’* terms are substantial in relation to the model ensemble mean . But I will leave that aside for the present purposes.

Marotzke performs multiple linear regressions according to the statistical model (3) for each start year (1900–1998 for 15 year trends; 1900–1951 for 62 year trends). He then determines the extent to which the across-ensemble variations in Δ*F, α* and *κ* contribute to the ensemble spread of GMST trends. Marotzke’s main factual conclusions follow from these three factors explaining little of the ensemble spread of GMST 15-year trends, with the majority being attributed to internal variability, whilst for 62-year periods starting from the 1920s on variations in Δ*F,* or ERF trends, dominate with variations in model feedback *α* and ocean heat uptake efficiency *κ* having almost no effect.

*Flaws in Marotzke’s methods*

To a physicist, the result that variations in model *α *and *κ* have almost no effect on 62-year trends is so surprising that the immediate response should be: ‘what has Marotzke done wrong?’

Some statistical flaws are self evident. Marotzke’s analysis treats the 75 model runs as being independent, but they are not. Only 18 models are analysed, and only one set of predictor variables is used per model. The difference between temperature simulations from each individual run by a model with multiple runs and the run-ensemble mean for that model is accordingly noise that one could not expect to be explained by the regression. The use of all the individual runs invalidates the simple statistical model used and the error estimates derived from it. Also, moving from equation (1) to (3) above will have made the errors correlated with the predictor variables, biasing the coefficient estimates. Uncertainty in the values of the parameters *α* and *κ* and in the forcing time series is also ignored. As I show later, uncertainty in *κ*, at least, is large. And in equation (1) *α* and *κ* appear only in terms of their sum. Allowing a separate predictor variable for each of them may result in part of the internal variability being misallocated.

However, there is an even more fundamental problem with Marotzke’s methodology: its logic is circular.

The Δ*F* values were taken from Forster et al (2013)[v]. For each model, historical/RCP scenario time series for Δ*F* were diagnosed by Forster et al using an equation of the form:

. Δ*F* = *α* Δ*T *+ Δ*N** *(4)

where Δ*T* and Δ*N* are the model-simulated GMST and TOA radiative imbalance respectively, and *α* is the model feedback parameter, diagnosed in the same paper.

Moreover, *κ* had been diagnosed from the model transient climate response[vi] (TCR) as . Therefore, the denominator in equation (1) is simply , termed *ρ* (rho) in Forster et al (2013). Note that *F*_{2xCO2}, the ERF from a doubling of CO_{2} concentration, does not take a standard value (3.71 Wm^{‑2} per IPCC AR5) but is a diagnosed value that differs significantly between models.

One can therefore restate the ‘physical foundation of energy balance’, with added random term representing internal variability, (equation (1)) as:

. Δ*T* = (*α Δ T+ ΔN *) /

*ρ*+

*ε*(5)

As is now evident, Marotzke’s equation (3) involves regressing Δ*T* on a linear function of itself. This circularity fundamentally invalidates the regression model assumptions. Accordingly, reliance should not be placed on any of the results in the Nature paper. That is particularly the case for the 62-year trend results, where the offending, non-exogenous Δ*F’ *term dominates the ensemble spread of GMST trends for start years from the 1920s on.

Since the Δ*F* predictor variable is a linear function of the response variable Δ*T, *which becomes larger relative to noise as the start year progresses, it is hardly surprising that the across-ensemble variations of Δ*F* are the main contributor to the ensemble spread of GMST 62-year trends starting from the 1920s onwards. As the start date progresses the intermodel variation in 62-year trends in Δ*F* is increasingly determined by intermodel variation in trends in *α* Δ*T*: Δ*N* trends are noisy but intermodel variation in trends in Δ*N* is of lesser relative importance for later start years. However, since Δ*T* is not an exogenous variable, domination in turn of intermodel variation in trends in GMST by variation in trends in Δ*F* tells one nothing reliable about the relative contributions of forcing, feedback and ocean heat uptake efficiency to the intermodel spread in GMST trends.

*Examining the effects of the circularity in Marotzke’s method *

One could, at the expense of changing the error characteristics somewhat, rearrange (5) to eliminate Δ*T* from the RHS and remove the circularity, which (since *κ* = *ρ *– *α*) results in simply[vii]

. Δ*T* =* Δ N * /

*κ*+

*ε*(6)

However, Marotzke does not do so, and in any case this equation only deals with the element of forcing that is associated with ocean etc. heat uptake, not with the (larger) element associated with increasing GMST, and it does not include *α*.

I’ll stick with Marotzke’s approach for the time being but derive a regression equation from (5) via a similar expansion to that employed by him, here keeping the two terms comprised in Δ*F* separate but not splitting the *ρ* term between *α* and *κ.* Linearly expanding (5) yields:

which on keeping only the lowest terms but separating the influence of the and terms leads to a regression equation of this form:

I have carried out a regression analysis based on equation (8) using the same set of models. I used the run-ensemble mean where a model had multiple runs, not all the individual runs. The justification for not using all the separate runs was given earlier. Using run-ensemble means will however result in model internal variability not being fully represented in the regression residuals.

Over early, middle and late 15-year periods within the 1900–2012 analysis period, the intermodel spread in GMST trend is dominated by the term; internal variability (assessed from the variance not explained by the regression fit) is small. Over the earliest and latest 62-year periods (1900–1961 and 1951–2012) the term continues to be dominant but less so, with a greater amount of unexplained variance. The other two terms explain very little of the intermodel spread, save for modest contributions from the *ρ* term in the 62-year trend cases. There is little point in examining more than two or three historical 62-year trend cases, as results from periods with substantial overlap are far from independent.

The results from this rejigged regression show that the apparent internal variability shrinks greatly when different coefficients are permitted for the two terms in the diagnosed forcing. And one can actually get even better fits for all periods by regressing using just and terms.

However, none of the analysis examined so far is valid, because in all of it ΔT appears on both sides of the equation – whether explicitly as in equation (8) or, as in Marotzke’s paper, concealed within Δ*F – *so there is circularity involved either way*.* Naturally, if one separates out, as in equation (8), the predictor variable term in which the response variable appears simply multiplied by a parameter – – from an associated noisy term with little explanatory power – – the regression will explain more of the variability in the response variable. But the fact that the term – the only exogenous part of – has no significant explanatory power suggests that Marotzke’s 62-year period results likely just reflect the decline in the intermodel variation in the noisy term relative to that in the circular term as the period considered ends closer to 2012.

*Another reason why Marotzke’s approach is doomed *

Another major problem with this type of attribution approach, even if the circularity could be removed by somehow diagnosing Δ*F* differently and other statistical problems dealt with, is that the underlying assumption that the previously diagnosed *α* and *κ* values for individual models are realistic enough to use in equation (1), or in its circularity-free reduced *κ-*only version (6), appears to be false.

I have compared *κ* values based on the ratio of Δ*N* and Δ*T *trends over 1951–2012 from the model-simulations with the values used by Marotzke, which as explained were diagnosed in Forster et al 2013 by a quite different method. The Δ*N* / Δ*T *trend-based estimates vary from 0.54 times to 2.48 times those Marotzke uses; for only five models are the two estimates the same within 10%. Estimates of *κ* based on the 2005–2066 period under the RCP8.5 scenario, which provides a strong greenhouse gas forcing ramp with little influence from variations in aerosol forcing, range from 0.46 times to 1.09 times those Marotzke uses, and from 0.18 times to 1.75 times those estimated from changes over 1951–2012. And estimates of *κ* based on changes in the rate of simulated ocean heat uptake during 1961–2005,[viii] rather than simulated TOA radiative imbalance, are substantially different again. It seems doubtful that estimates of *α* values would be robust enough either.

With this degree of apparent variation in *κ* when estimated by different methods and over different periods, one would expect equation (6) to have very little explanatory power (regressing Δ*T* on Δ*N */ *κ*). And that is indeed the case. The intermodel spread in GMST trend is dominated by internal variability over both 15 and 62-year periods, whether towards the start or end of the analysis period. The more valid, circularity-free, version of the surface energy-balance equation is useless for investigating the intermodel spread in GMST trends. The same applies when using a regression equation based on (6) but separating the and κ terms, leading to this form:

*Conclusions *

I have shown that there are no valid grounds for the assertions made in the paper that ‘For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends’ and that ‘The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded’.

Marotzke conclusion that for periods ending in the last few decades the non-noise element of 62-year GMST trends in models is determined just by their ERFs is invalid, since he hasn’t used an exogenous ERF estimate. Indeed, if the models are working properly, their GMST trends must logically also reflect their feedback strengths and their ocean heat uptake efficiencies.

The interesting question is how much the large excess of model ensemble-mean simulated GMST trends relative to observed trends over the satellite era is attributable to respectively: use of excessive forcing increases; inadequate feedback strength (excessive ECS); inadequate ocean heat uptake efficiency; negative internal variability in the real climate system; and other causes. The Marotzke and Forster paper does not bring us any closer to providing an answer to this question. It certainly does *not* show the claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations to be unfounded.

One of Marotzke’s conclusions is, however, quite likely correct despite not being established by his analysis: it seems reasonable that differences between simulated and observed trends may have been dominated – except perhaps recently – by random internal variability over the shorter 15-year timescale.

Gordon Hughes had some pithy comments about the Marotzke and Forster paper:

*The statistical methods used in the paper are so bad as to merit use in a class on how not to do applied statistics. *

*All this paper demonstrates is that climate scientists should take some basic courses in statistics and *Nature* should get some competent referees. *

The paper is methodologically unsound and provides spurious results. No useful, valid inferences can be drawn from it. I believe that the authors should withdraw the paper.

[i] Jochem Marotzke & Piers M. Forster. Forcing, feedback and internal variability in global temperature trends. Nature, 517, 565–570 (2015)

[ii] This so-called kappa model does not respect conservation of energy over long periods, but as Marotzke says it is a reasonable approximation (at least in climate models) over periods of one to several decades.

[iii] Extended from 2005 to 2012 using, it appears, the RCP4.5 scenario runs.

[iv] The NorESM1-M model was incorrectly shown as not having forcing estimates available, but does seem to have been included in the models used.

[v] Forster, P. M. et al. Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models. J. Geophys. Res. 118, 1–12 (2013).

[vi] The rise in GMST over a ~70 year period during which CO2 concentration increases at 1% pa, thereby doubling.

[vii] Although it would arguably be more logical to regard Δ*N* rather than Δ*T* as the response variable in this equation.

[viii] Derived from IPCC AR5 Fig.9.17.b

## 867 Comments

Between equations 7 and 8 some missing words occur:

“separating the influence of the and terms”

and more missing letters or words:

“the intermodel spread in GMST trend is dominated by the term;”

“the term continues to be dominant but less so”

“And one can actually get even better fits for all periods by regressing using just and terms.”

“But the fact that the term – the only exogenous part of – “

One more–there may be others

“The same applies when using a regression equation based on (6) but separating the and κ terms”

Thanks. I think I’ve now fixed all the expressions that WordPress didn’t convert. Sorry about that.

Hi Nic,

There are some character insertions in your text that don’t appear.

Let me see if I have digested your argument. What Marotzke and Forster did was run a regression for every year from 1900 to 1950 in the form:

[1] dT = b0 + b1*dF + b2*alpha + b3*kappa + e

where in each year there are 75 observations, taken from each of the 75 model runs. dT is the warming trend counting forward 62 years from start date 1900, 1901, 1902, etc. b0 is the constant term. dF is the trend in forcing counting forward 62 years from start date 1900, 1901, etc.; alpha is the model’s GHG sensitivity (or some transformation thereof), kappa is the model’s ocean heat uptake efficiency (or some transformation thereof), and e is the residual term. The forcing trend (dF) is meant to be a summation of the net effect of GHG+aerosols+solar+vocanoes+other warming/cooling influences, and it is assumed to be exogenous, i.e. determined by data that are independent of temperature trends.

In their regression results they show that b1 is large and significant (I guess? they don’t actually report the regression results in the paper!) while b2 and b3 are nearly zero. And the residuals are also large. So their conclusion is that variations in forcing (the dF term) and noise (the residuals) account for the spread of model-estimated trends, while variations in sensitivity and ocean heat uptake play no role in accounting for the spread of model-generated temperature trends. Hence, they conclude, it can’t be the case that the ‘pause’ implies models are too sensitive to GHGs since sensitivity (alpha) plays no role in short- or long-term trends.

Before turning to your particular point about circularity, your first observation is, in essence, that they seem to be asserting that the structural elements of their models (alpha and kappa) play no role in the key model behaviour. And this, we are to believe, is the basis for their defence of the validity of models. Alternatively, it is prima facie evidence that they have screwed up somewhere because it is inconceivable that that the main structural elements of the models play no role in the behaviour of the models.

Your diagnosis of where they went wrong, as I understand it, is singularly devastating. The authors took their forcing trend estimates (dF) from an earlier paper that constructed them using the equation

[2] dF = a0*dT + dN

where a0 is a feedback term, dN is a term capturing the Top of Atmosphere radiative imbalance, which in this context is just a source of noise, and dT is… dT! So in [1] they regressed dT on itself + other terms! The Marotzke regression is actually something like

[3] dT = c0 + c1*(a0*dT+ dN) + c2*alpha + c3*kappa + e

and not surprisingly they found alpha and kappa contribute nothing. The only reason the regression model didn’t collapse due to dT being on both sides is that a bit of noise in the form of dN is added to dT on the right hand side.

And when they do the same regression on 15-year trends they find dT and the residuals e again “explain” everything and this time the noise component is even larger.

Now, as the saying goes, just because it was published in Nature doesn’t automatically mean it’s wrong. But I have difficulty seeing how this wreck can be salvaged. Have I correctly summarized your argument? If so, how do the authors defend the claim that dF is exogenous?

As a layman, my observation/impression is that we have had a very consistent overall warming trend since circa 1850 with the natural ocean cycles amplifying this trend (as in the 1920/30’s and 80/90’s) along with dampening the trend which includes the 60/70’s and the current pause. (ie one warming trend with almost the same slope over the entire period interposed with a 60-70 ocean cycles) (fwiw, far too much emphasis is placed on the pause). My beef with the models and the discrepancy is the failure to incorporate the ocean cycles into the models, especially since they were reasonably well known by the mid 1990’s.

I am unable to tell from the critique of the paper whether the ocean cycles are given credit for any of the discrepancy or are treated as having no effect on the discrepancy. Any commentary or enlightenment on the subject would be appreciated

The paper gives no role to ocean cycles as such. Some models do exhibit multidecadal ocean oscillations, but as they are unlikely to be in phase in different models (or with the real climate system) they generally show up as part of random climate noise.

I agree that if one incorporates a 60-70 year ocean cycle (the AMO being the obvious one) then warming over the instrumental period bears a more consistent relationship to extrenal forcing influences.

Hi Ross

Thanks for your very helpful comment. Your summary of my circularity argument is good.

Just to clarify, Marotzke & Forster find that for 15 year periods the regression residuals are dominant for all start years – which they interpret as implying that internal variability (in the real climate system) dominates the difference between model simulations and observations during the hiatus period.

They find that for 62 year periods the regression residuals are dominant for early start years – when the temperature and forcing trends were low – but that forcing dominates thereafter.

For both periods and all start years, the structural sensitivity and ocean heat uptake characteristcs, as represented by alpha and kappa, are found to have a negligible influence on model temperature trends.

Thanks Ross.

much clearer now

Is alpha actually prescribed in a specific model? I thought it was a number that was determined ex post based on the resulting temperature trend (?) Or am I conflating alpha with a more encompassing term that takes into account all feedback (TCR maybe?)

Your understanding is correct. Alpha is usually estimated as minus the slope coefficient in an OLS regression of ΔN on ΔT over the first 150 years of a model simulation that starts with CO2 concentration being abruptly doubled or quadrupled from an previously equilibriated position.

“Now, as the saying goes, just because it was published in Nature doesn’t automatically mean it’s wrong.” Aha!, a treasure of a saying! Kind of reminds me of W. C. Fields: “Anyone who hates children and dogs can’t be all bad.”

@Ross McKittrick,

“…just because it was published in Nature doesn’t automatically mean it’s wrong.”

Heh. +3!

Ross, I hope I am not being dumb here but the starting equation is

ΔT = ΔF / (α + κ)

The units of temperature are degrees and units of force are Watts.

Does that mean that the units of both α + κ must be in W/K, as they are additive?

The units of both are W m^-2 / K: the forcing is expressed per square metre (averaged over the Earth’s surface area).

I was wondering the same thing. Here is the Wiki on climate sensitivity units:

http://en.wikipedia.org/wiki/Climate_sensitivity

A long-time lurker, I must confess that I only occasionally understand understand completely the posts that people like Steve McIntyre and Nic Lewis write. No doubt this says more about my limitations and lack of effort than what those posts effectively tell the intended audience. But I always get it when Dr. McKitrick boils it down for us laymen.

Thank you, Dr. McKitrick, for this and the other oases of clarity that you have provided over the years.

So without dN the regression would pick up c1=ao and zeros elsewhere because $(X^TX)^{-1}X^TX=I$ and residuals would be zero because $(I-X(X^TX)^{-1}X^T)X=0$ ?

\hat{c1}=1/ao

Whatever happened to Peer Review?

Who are the peers of idiots and scoundrels?

Another effort to explain the model/pause discrepancy collapses.

Nature’s reviewers may not have had Nic Lewis and Ross McKitrick’ s statistical chops, but they should have caught the fatal contradiction in a comclusion that the two self proclaimed most important model emergent features, alpha and kappa, do not statistically influence model behavior. That is illogical to the point of the absurd.

As I’ve posted elsewhere, I don’t understand what Marotzke and Forster is trying to prove. Maybe someone can explain it. A cursory glance at model outputs vs. global temperature measurements shows that the models do a reasonable job following temperatures for the past century. Therefore, they were not “running hot” then, correct? On the other hand, they are not doing a good job of following temperatures this century, and seem to be running hot.

Surely the claim that they ought to be rebutting is: The models were overfitted somehow, those with too-high sensitivity were balanced by other factors, and the balancing lasted for the century of training data. Now that we are looking at new data, the balancing isn’t working anymore and their too-high sensitivity is becoming apparent.

How is a study of last century’s data going to answer that issue? All they are showing is that things work for the last century.

“Predicting” history is not that difficult…

The paper is another effort to argue that the now over 18 year pause does not falsify CMIP5. BAMS said in 2009 that 15 years would. Santer’s 2011 paper said 17 years. OOPS! That is why the Max Plank Institute gave it the media spin it did.

Many fun details (and other absurdities as bad as this paper) in essays An Awkward Pause and Unsettling Science in ebook Blowing Smoke. Nic’s evisceration of Marotze would have made a nice additional example to the latter essay.

“♫♪…Those were the days, my friend, we thought they’d never end…♫♪”

Miker,

It think they are trying to do pretty much what Foster & Rahmstorf were trying to do a couple of year back with their silly curve-fit paper: show that the rather glaring discrepancy between modeled and measured warming is NOT due to the models just being too sensitive to forcing. If the true values of transient and equilibrium sensitivity are much lower than the model ensemble (Lewis & Curry, for example), then there is less urgency for costly immediate forced reductions in fossil fuel use…. and IMO, that is why there have been so many recent papers published which offer a host of ‘explanations’ for the model/reality divergence, none of which seriously contemplate the most obvious explanation: the models have too much net positive feedback.

miker613 – neatly put. As I see it, their ε is the difference between model results and reality. Since ε, in their view, over time tends to zero, it doesn’t matter how big ε is now, because over time the models will be correct. Circular logic indeed.

Wow: http://julesandjames.blogspot.com/2015/02/that-marotzkeforster-vs-lewis-thing.html

“My first thought on a superficial glance at the paper was that it wasn’t really that useful an analysis, as we already know that the models provide a decent hindcast of 20th century temps, so it’s hardly surprising that looking at shorter trends will show the models agreeing on average over shorter trends too (since the full time series is merely the sum of shorter pieces). That leaves unasked the important question of how much the models have been tuned to reproduce the 20th century trend, and whether the recent divergence is the early signs of a problem or not. (Note that on the question of tuning, this is not even something that all modellers would have to be aware of, so honestly saying “we didn’t do that” does not answer the question…)”

I have not looked at the paper – Do we know what the peer reviewers said?

No. Peer review comments are not made public.

Normally, comments by peer reviewers are confidential. Only the editors and the authors see them unless the authors choose to share them. But a peer reviewer can choose to make themselves known and share the comments I believe.

And in some cases an editor (possibly with reviewers permission) has shared the comments without disclosing the name of the reviewer to show that there was due diligence.

I have found the peer review process to be very uneven. Many senior people just skim the paper. If the paper is controversial, it is likely to get a more careful review. But in my experience, attempts to replicate the work are rare. I do very few reviews these days because I have higher standards than editors and get tired of seeing inferior or mediocre papers published.

I share your concern. I used to act as a peer reviewer for a couple of high impact factor analytical journals and I stopped for the very same reason. It saddens me to see the decline in Peer Review standards especially as it coincides with the raising of Peer Review on to an unjustifiable pedestal.

Hi David,

If substantive peer review comments (not corrections of typos!) were published along with papers, then I suspect there would be more people willing to do solid reviews, and a lot fewer silly papers like M & F published.

To bad Nic’s analysis didn’t appear before the paper was actually published. An opportunity was missed to have the paper “gergised”.

There was no such opportunity in this case, I think. Nature maintains a tight embargo system and the paper was only published online on 28 January.

Now that it is published, will you be seeking to make a comment in Nature?

Just to be clear Nic, you spotted the problems wrt this paper(flaws in its methodology) and then asked Roman & Gordon to independently review the stats only methodology problems ?

only trying to figure out if a reviewer would ever be expected to go this deep into a paper & the problems you highlight are so glaring the reviewers should blush !!!

ps – in engineering we now have about 6 votes before parts are good for manufacture, any no vote has to be countered or the “reject for changes as per xx comments” button is pushed🙂

Yes, I found the paper’s results about 62-year trends very difficult to believe, and when I read it I spotted the circularity. It was easy for me to do so because I was familiar with the (very well known) Forster et al (2013) paper from which Marotzke and Forster got their model forcings, and I knew how model forcings had been derived there. Then I asked Roman and Gordon to review my arguments and other aspects of the paper from a statistical angle.

I think that reviewers who were expert in this field should really have realised that the results in the paper were extremely surprising and, in view of that, delved deeper than they might normally be expected to do. But reviewing is a unpaid role with no kudos earned, so it is probably unrealistic to expect too much of it. The fact that a paper has been peer reviewed doesn’t count for much IMO. Papers that go against the ruling paradigm tend to get tougher peer review, so the poor ones are more likely to get weeded out.

“The fact that a paper has been peer reviewed doesn’t count for much IMO.”

+1

I didn’t know that Roman was at UNB. I seem to remember taking a stats course from a curly haired blond hippy looking dude back around 84. Maybe it was before his time there?

I arrived there in 1976 so it could very well have been me.

Were you the guy who always sat in the back of the class and never paid attention to what I was saying?

Bingo! How’d you guess? I’ll have to dig out my transcript to see if it was you and what my grade was.

AJ, have you thought about comparing your transcript copies of your grades to your school’s current official grade records to be sure your alma mater hasn’t adjusted your scores either up or down in the years since you graduated?

BB, according to Wikipedia, Canadian universities have experienced grade inflation comparable to those in the U.S. I don’t see any reason why my school wouldn’t be affected by the same influences either. Luckily, GPA’s influence on future career prospects has a fairly short half-life.

“Were you the guy who always sat in the back of the class and never paid attention to what I was saying?”

– I think that was the Marotzke fella.

Marotzke was the dude folding paper rockets, lighting the tails, and flighting them down the banked lecture theater. LOL we had a bloke that did this (back in ’75) and our maths lecturer abandoned the class for 2 weeks.

Roman, the class I think I might have taken from you was “STAT3083 – Prob and Math Stat I” Fall 84. A few things stand out in my memory. When demonstrating the Birthday Problem, there was a match with the first person asked. The match was in the adjoining seat. I also remember some infinity arithmetic that blew my little undergrad mind away. Maybe a demonstration that 0.99999… = 1.0. I also remember the gal that sat next to me. I had a dirty liking for her which probably explains why I actually attended class that semester.

Yes, that would have been my course. When covering combinatorics, I always did the birthday problem in class by having people state consecutively their birth day and month. Others would then respond upon hearing their own date mentioned. It made the students more involved so I could occasionally sneak some math and stat in before they were aware I was doing so.

I remember you being one of the better prof’s I had. You kept the subject matter interesting. Coming from someone with attention “difficulties”, that’s a complement. I got a good mark, so you must have given easy exams🙂

Cheers, AJ

http://berkeleyearth.org/graphics/model-performance-against-berkeley-earth-data-set#gcm-acceleration

another piece

even as co-author (a position that perhaps arose through him supplying model forcing data to Marotzke) and therefore not bearing primary responsibility for the paper’s shortcomings.I strongly disagree with this. All authors are equal, no matter what order they are listed as. They all take credit and blame for the contents of the paper in equal portions.

We all know that in most papers with many authors, one or a few of them are the main drivers, and some of the authors might barely know what the paper is about. But upon publication they officially become equally responsible. There is no hierarchy.

Well, in lots of high status journals like Nature the author contributions are listed and there is a hierarchy. One person may only be on a paper because they provided some technical info….doesn’t mean they should be equally responsible for any flaws.

Or… the contribution of the “principal” author may simply be that they supervise the actual author.

Seems like open access journals (like CPD and other EGU journals) where peer review is open are the way forward and should stop all arguments like this before the paper is published.

This circularity seems like an easy thing for a (somewhat) careless peer reviewer to miss. He just has to not track down how the previous paper Forster 2013 calculated its values for the forcings.

Maybe the reviewer was careless. But it’s not reasonable to expect reviewers to do a full audit of a paper….it takes too long and there are only so many hours in a day. It’s not reasonable IMO for a reviewer to dig out old papers and redo the calculations. All the reviewer has to focus on is whether: the paper is well written; appears sound; is replicable; whether conclusions follow on from the results etc.

That is true. But in this case both the abstract and conclusions contain a logical flaw that should have been a huge red flag, as pointed out upthread. How on earth can a models major emergent structural properties NOT influence its outputs? Circumstantial evidence of editorial bias and pal review.

OK. Point taken. I haven’t read it myself yet!

Agreed, the authors and the reviewers should have woken up that something was wrong as soon as their regression coefficients showed the inputs to the models were not determining the outputs. In that case of course random variability would dominate, because it would mean that the models are basically generating random noise regardless of their inputs. Which may of course be true, but it would be devastating for climate science and it would mean that climate models are no better than dice or a coin toss at predicting climate. Maybe that is why they didn’t catch the error. It came as no surprise that the model’s inputs were not determining the outputs, so they didn’t see the error.

It seems rather difficult to determine whether a result is replicable without running through the calculations.

In any case in the observational sciences replication is often not possible. You can hardly turn down a paper on e. g. Shoemaker-Levy’s collision with Jupiter on the grounds that the observations aren’t replicable.

But it’s not reasonable to expect reviewers to do a full audit of a paper….it takes too long and there are only so many hours in a day. It’s not reasonable IMO for a reviewer to dig out old papers and redo the calculations.Perhaps, but if the paper came to a different conclusion, such as “the models are wrong” I’ll be they would have had lots of time, and plenty of hours in the day, to do a full audit of the paper. I only say that because it has played out so often before.

Any paper that is contrary to a dominant paradigm will get a very close look, while fetid papers that support the paradigm will get their typos fixed during review. It is not just in climate science…. but climate science, like any field with serious real world policy implications, the effect is likely worse.

A superficial looking-over could easily miss the circularity. More than once, after hours of analysis of n equations with n unknowns, I’ve discovered that combining two of the relationships gave:

Z = Z,

a relationship that, while reassuring in a post-Normal way, was of about as much utility as Marotzke & Forster.

Don’t all equations have this problem:

E=mc2

substituting E for mc2 gives:

E=E

In your case, maybe the expressions weren’t fully simplified?

AJ, Nope. Erroneous algebraic substitution. You can add, subtract, divide, multiply… Anything to both sides at the same time.

But you cannot just substitute one side for the other. Operators have to work on both sides of the equation simultanteously. Al Hazan’s logic from long ago. (His name and writings eventually gave the english name to algebra. google)

Follow established mathematical rules, and your post would produce

MC^2= E . Not nearly as revolutionary as this bogus circular paper.

I’ll give you a thumbs up on this Rud. I’ll confess I didn’t give it much thought… thanks

Energy equals mass times the speed of light squared. How can they possibly be the same? There aren’t even very many letters in common and they don’t sound the least bit equal.

=======================

Rud,

“

AJ, Nope. Erroneous algebraic substitution. … But you cannot just substitute one side for the other.”Taking that as a general statement, I don’t agree. Quite the contrary.

Take the simple example of a system of 3 linear equations in 3 unknowns (

x,yandz, say). Forget any of that row-reduced echelon-form juggling and just do it as it comes: take the first equation in whichzappears and jiggle it around to getzon the LHS. Thenthe resulting RHS forsubstitutezwhereverzappears in the 2 other equations. You’re now down to 2 linear equations in 2 unknowns, and on you go.The point here is that you made a

— a wholly legitimate one.substitutionIndeed, anything on one side of an equals sign can

alwaysbesubstitutedfor any occurrence of the other side, wherever it appears — otherwise there’s pretty much no point in the notion of equality and no point in bothering to have such a thing as an equals sign. Of course, whether such asubstitutionis useful is an entirely different matter.Obviously, I had N-1 equations, not N, as I thought.

Jorge, it is well established that: ZZZ = Fail.

Peer review opinions are exactly that: opinions. The ultimate decision lies with the editor, who can choose to publish despite a reviewer’s strong criticisms.

I’m in that situation now. I’m reviewing a paper that I will recommend not be published, but the editor knows I have a jaundiced view concerning this piece of research (cause I warned him ahead of time) and might publish anyway.

As you mention, the runs of each model are not independent of each other. If you use the ensemble mean of runs of each model, you have 18 models (data points) and 4 Beta terms, which seems very iffy to me inference-wise.

Yes. It doesn’t necessarily help much using all the individual runs since the differences between each run and the run-ensemble mean for a model will not carry any extra information about the beta terms. And the regression will be weighted towards the models with multiple runs (some models have 10 runs, some only 1 run, some an in between number).

If I do an experiment on different groups ability to hit a bullseye, but I get repeated trials from only 18 participants, I don’t have 75 data points. There are statistical tests to handle this.

To clarify my point, if you do a regression with 4 parameters and 18 data points (18 ensemble means), the confidence intervals are going to be pretty wide, so hard to “prove” anything.

You can get the paper here.

Nick Stokes, thank you for the link to the full paper.

That deadpan comment from Nick Stokes is possibly the most damning criticism of the M&F paper one could imagine.

“That deadpan comment from Nick Stokes is possibly the most damning criticism of the M&F paper one could imagine.”

Huh? All Nick did was link to a free copy. ???

Um, that’s the point. If there was an argument to be made Nick S would have tried.

Looks like nicky racehorse is pleading nolo contendere on this one.

Despite Nic’s courteous review, this looks a bit like supply/demand to me. I am sorry for the cookie-cutter remark but the technical sophistication of the regression with the surprisingly inexplicable result…

Circular in a spiraling swirling flushing sort of way.

Regression dilution by Charybditic Bay.

============

Nice breakdown on the paper. I’ll have to go through this later when I have more time.

I did short article on the incompetent use of OLS in climatology ( and elsewhere ). Much of the misattribution and spurious “forcings” is due to a basic misunderstanding of how and when to use linear regression.

https://climategrog.wordpress.com/2014/03/08/on-inappropriate-use-of-ols/

Nic touches on some of these issues here but biggest one is probably regression dilution.

When we consider if something is ‘right or wrong ‘ we first need to define what we actually mean by ‘right ‘

In science this should be straight forward has we have ideas such as empirical , peer review etc

However in pratice its not , sometimes because we are dealing with theory’s where there is no clear ‘right’ answer.

In this case we do have an opportunity to have a ‘right answer’ but we failed to achieve it because?

Well because in this case the ‘right answer ‘ that little to do with the facts but much to do with the ‘impact’ this paper had with the AGW community and more importantly the ‘media’ A classic case of science by press release paper was ‘right’ in that for the authors it achieved what they wanted it to do , that its facts where ‘wrong ‘ makes no difference to that . And within climate ‘science’ we seen this time and again and often far fro being a probable for the authors coming up with the ‘right answer ‘ no matter the method has been rewarding .

The massive expansion of climate ‘science’ as a area of study . thanks to a mixture of lots of money and its ‘progressive politics’ means there are a lot of people coming from studies into professions who have be taught how to be ‘right ‘ even when your wrong . So if anything paper such has this will be growing problem.

knr,

In the climate obsessed community the right answer is always, “we are right”.

Facts, data, methodology are good as long as that answer is the conclusion.

Nio,

Nice post. Your analysis looks devastating to the conclusions in the paper. Have you contacted the authors and asked them to comment?

Yes, I liaised with Piers Forster and sent him a draft of my article for his (and Jochem Matotzke’s) comments nearly 24 hours before posting it. I have received no comments.

Check the comment #2 on this one, and the moderator response:

http://www.skepticalscience.com/climate-climate-models-overestimate-warming-unfounded.html

They got themselves some attitude!

Steve: 🙂An SKS reader politely wrote:

SKS Moderator JH responded:

There’s another strange comment by Tom Dayton ( no. 9) that is syrreal considering what he’s responding to. SkS is getting ridiculouser and ridiculouser…🙂

The current theme from the SKS kids is that the models are underestimating the warming ie way too conservative in the modeling estimates.

joe,

And they call skeptics “deniers”. lol.

That thread trails into the plaintive.

==========

The SkS kids are in full meltdown mode … including lighting up even ‘beleivers’ posts with incendiary Mod comments.

A number of the usual suspects popping up with blind defense such as ‘its the extra special folks at the big timey Nature journal – they’re a stupid ‘ol blog’ … yet nary a single comment, let alone rebuttal, of Nic’s diligent and speedy work.

Well done Nic …

SKS kids are at it again. The only thing missing to make it perfect was the word “independent” before review.

Moderator Response:

~~[JH] Your comment appears to be a thinly-disguised attempt to castr a shadow on the information presented in the OP. If so, please cease and desist playing such a game on this website.~~Upon further review, this comment is retracted.

Whether this paper stands or falls is of little consequence to AGW really. What really matters is that Nature and its peer review process let though a fatally flawed paper and that rightly puts the whole question of the sanctity of peer review and editor’s prerogative at a major journal into doubt.

Heads should roll.

Regarding the “thinly-disguised” comment by the SKS moderator; that statement has now been lined out, replaced by this:

“Upon further review, this comment is retracted.”

Maybe the jig is up….

whoops didn’t realize repeat

Nic,

Another excellent catch.

Yet, in a certain sense, the findings of the authors are correct and inevitable i.e. that, based on the use of Forster’s abstracted forcings, the temperature gain in the models is not dependent on feedback and ocean heat uptake, provided that the values are also taken from the same source.

There should be a giant bell ringing here for Forster, quite apart from the glaring problem with this paper. The circularity in this argument does not start with this paper. Since Gregory and Forster 2008, an entire edifice of Escherian stairwells have been built, founded on the same illusions. It starts with the unnecessary and demonstrably inapplicable use of a degenerative ocean model (the “kappa model”) to analyse GCM results. It continues with the demonstrably inapplicable assumption of an invariant feedback in the GCMs, an assumption absolutely rebuffed by the GCM data themselves. It continues with the simultaneous abstraction of Adjusted Forcing (AF) values and feedback values from the inapplicable model, having (only) the properties that (a) in combination they will track the late-time temperature behaviour to the given ECS of the GCM and (b) unknown forcings can be estimated as an approximately scaleable function of temperature. It is readily shown that the estimated feedbacks are unrelated to the “true” feedbacks apparent in the GCMs since the shorter-term feedbacks (upto several decades) are eliminated arithmetically by a mechanical reduction of the actual forcing; and the resulting AF values are then so disconnected from the emulated GCM’s reality that the forcings cannot be related to verification of that model’s RTE against LBL code, nor indeed to any independent estimate of forcing.

To give an example, under this Escherian architecture, Hadgem2-ES ends up with an AF value of 2.9 W/m2 for a doubling of CO2, against an estimated stratospheric adjusted forcing of over 4.0; the derived AF forcing value in 2003 for the historical runs is then 0.8 W/m2 – less than half of the average of the AF values abstracted from the other models, but this is the value required to match the Hadgem2-ES temperature evolution.

This amounts to taking a poorly qualified emulation model, plugging in physically meaningless values, and then scaling the historical forcing values to produce some sort of a match to the GCM results. Marotzke et al’s results should therefore not surprise us, but it was still a very nice catch.

Forster & Gregory 2006 addresses the regression dilution issue ( which leads to an exaggerated climate sensitivity ) in the appendix but avoids mentioning it the body of the paper and the conclusion:

http://www.image.ucar.edu/idag/Papers/Forster_sensitivity.pdf

They explain this as basically not wanting to distract attention from the main point of the paper by rocking the boat too much. It may now be long over due that this boat got rocked.

http://judithcurry.com/2015/02/06/on-determination-of-tropical-feedbacks/

Paul, I recall a very useful exchange with you over at Lucia’s blog a year or two back. I would appreciate someone of your background criticising the above linked article of mine that Judith Curry has just posted.

regards, Greg Goodman.

Paul, Thanks for your insightful comment. I think you are right that the findings in the paper are, at least in large part, inevitable.

I agree that the kappa model is physically unsatisfactory, although it does appear reasonably to represent heat uptake behaviour in many AOGCMs over periods of up to several decades in idealised CO2 forced simulations.

The assumption made in the paper of time-invariant feedback value α is indeed problematical for many, perhaps the majority, of AOGCMs. Moreover, as you say it leads to Adjusted Forcing values for ERF derived from the product of α and ΔT that may be a considerable way away from ERF estimates derived using more direct techniques, which are very probably more realistic. That may well be part of the reason why my equation (6), to which their simple physical model equations reduce, has no explanatory power.

However, the main point I make in my article is that, even if the assumptions embodied in the equations (1) and (4) that the paper relies upon were valid, the results of the analysis carried out in the paper are invalid because of the circularity involved.

Correct me if I am mistaken, but I believe this system can be estimated using Full Information Maximum Likelihood (FIML), 3 Stage Least Squares, etc. These models will account for the endogeneity associated with dependent variables appearing on the right hand side of the regression equation, creating correlation between the errors and dependent variables.

[1] dT = b0 + b1*dF + b2*alpha + b3*kappa + e

[2] dF = a0*dT + dN

In SAS I would use Proc Model (Syslin if the system is linear) with FIML estimation to get efficient parameter estimates and significance levels.

Can anyone with the data try this and report back? It would be interesting to see the true results once the model is correctly specified.

For a system estimation to work you need enough exogenous variables to identify the endogenous ones. In the system as you’ve written it, dF is given by equation [2] by construction, so there is no additional information in the system to identify a0 through estimation. Substituting [2] in for dF in [1] is therefore equivalent to the 2-equation system. To estimate this as a system and identify a0 empirically you would need at least one other variable that explains some of the variation of dF but that is independent of dT.

Tom in Indy:

[1] dT = b0 + b1*dF + b2*alpha + b3*kappa + eI agree on the use of FIML in Proc Model, but there is a difficulty with the IV and DV as written. As Nic Lewis noted, dF is calculated from dTm so the model is circular. As I wrote below, I think that Marotzky et al used a misleading notation, and dT is not actually the change in temperature, but a deviation of the particular slope from the mean of all slopes.

It’s possible that my confusion is different from what I think it is. I am waiting to read what corrections I receive.

Matthew Marler

In the paper’s notation, deltaT is the trend in model GMST over the period concerned, whereas the regression model involves the deltaT_primes, the deviations of the particular slope of each model run from the mean of all slopes. It is the explanation for inter-model variation in slopes that is being sought here.

Nic Lewis, thank you. I have been rereading, and both deltatT and deltaF are linear trends.

Is it not peculiar that they compute the mean trend and then compute each trend deviation from the mean? Most regression packages do that automatically if you specify that you want an intercept in the model.

I suggest that polite Letter to the Editor of Nature summarizing these objections should be sent.

Whether or not it is accepted for publication is not the issue,

rather it should be done as a matter of public record.

Surely the lesson that should be learnt is that any scientist purporting to obtain information from data should have a reasonably competent knowledge of statistics. Those scientists and others attempting to produce models from the data should have a greater understanding of statistics; the more complex the model the greater the knowledge of statistics required.

In my world if a statistician is not included in the list of authors then it is expected that any statistics in the paper have been reviewed by a competent statistician.

Nic Lewis, who is far more than just a competent statistician, shows the way by having sought the advice of two other statistical experts before publishing his note.

It is interesting to contemplate whether this site would even exist were all Climate Scientists to have a better understanding of statistics.

Can I just clarify something. It seems that all the variables, both dependent and (allegedly) independent, come from model runs and there is no direct use of observational data. The dependent variable for any given time period is the linear trend produced by a particular model run over that time period and the independent variables are values “diagnosed” from the relevant model runs – i.e. they are estimates of climate sensitivity forcings etc estimated from model runs. Then the residual term is essentially that part of the model temperature trend that can’t be accounted for by the other RHS variables. Is this correct?

Correct. It is the intermodel differences that are being investigated, with the same regression coefficients being applied to each model.

Someone help me out here.

With no direct use of or comparison with observational data, how can any of this be used as a public defense of model skill? Even if Marotzke & Forster gotten that stats right and shown that something other than climate sensitivity is responsible for the model / observational gap, we still have the gap. At best it is a defense of why the models don’t work, but it does nothing to rehabilitate them.

It’s not that climate modelers do not know statistics. It’s that climate modelers do not know how to carry out a physical error analysis.

Every experimental scientist (and engineer) needs to know how to do that, in order to judge the accuracy of a result.

Climate modelers do not; they invariably equate model ensemble variance with accuracy. A more basic mistake is hard to imagine.

They do not understand propagation of error, and do not understand that conformance with an observable is meaningless if the result is not unique.

With all the model parameter uncertainties, no model expectation value is a unique result. I’ve yet to encounter a climate modeler who understands this basic concept of physical science.

Results such as in the Marotzke paper are physically meaningless.

Reblogged this on I Didn't Ask To Be a Blog.

Has Climate-Science scientific media turned into Climate-Science social media where facts and opinions depend on whom one’s friends are?

I see in this site, for the most part, cold, hard analysis backed up by cold, hard numbers and logic pitted against those whose view of the world is as fanciful, as it is as cheerful, about the inevitability of impending doom brought about by our perceived excesses!

Thank you CA for highlighting, once again, the absurdity of equating populist expectations and the scientific method in a way that suspends any notion of disbelief.

I think you went further than necessary with this neo-scientific ‘study’ , Nic. When I read the press release it sounded to men like a schoolboy advocating that he’d got the correct answer even though his required proof work didn’t support it. It’s like prove ten, and with luck, choosing any of 9 ‘right’ answers doesn’t necessarily reflect the actual, i.e, 1+9, 2+8, 3+7, etc. to 9+1. In this case it’s 9 correct potentials with just two variables- how many variables in a typical climate ‘model’? But their all correct because….climate statistics. C

How much can ΔF (the radiative forcing from doubling CO2) vary from year to year in the real world? Models may disagree about the correct value for ΔF, but the absorption cross-section for CO2 itself doesn’t change. Clouds, water vapor and lapse rate have a small effect on the ΔF one calculates, but do their annual average values change as much as M&F postulate?

One might also ask the same question about the climate feedback parameter (α), which tells us how much outgoing OLR plus reflected SWR increase with surface warming. Planck feedback is a constant. Therefore water vapor, cloud and lapse rate feedbacks may reach equilibrium on a yearly time scale. The average water molecule remains in the atmosphere for only about a week. Temperature anomalies show autocorrelation for months, but not years. .

We also have observational evidence about the annual variation in ocean heat uptake efficiency (κ) from ARGO and the climate feedback parameter from CERES and ERBE. Re-analysis data could be used to determine how ΔF varies with time. So M&F’s

Nic, I think you should send this to Nature as a ‘Communication arising’.

http://www.nature.com/nature/authors/gta/commsarising.html

“Critical comments on recent Nature papers may, after peer review, be published online as

Brief Communications Arising, usually alongside a reply from the criticized Nature authors.”Me too.

Me three. Please.

To qualify as a ‘Brief communication arising’ Nature’s criteria include the folowing: “Manuscripts …. should not exceed 600 words (main text), with an additional 100 words for Methods, if applicable.”

The length of Nic’s post is currently about 3000 words, including about 250 words as footnotes. Might be a challenging précis exercise…perhaps Ross could help.

Re: Coldish (Feb 7 09:00), I’m sure Nic could get his main point across in 600+100 words.

I think so too. Someone needs to.

Have other papers published by Nature been withdrawn? In what other ways has Nature handled the sort of thing we are looking at here?

J ferguson,

It does happen: http://www.iflscience.com/health-and-medicine/controversial-stem-cell-paper-set-be-withdrawn-nature

But I suspect it is quite rare in a ‘high impact’ journal like Nature, and when it happens, usually involves obvious fraud, rather than obvious error. It is one thing for journal editors to have egg on their faces, but quite worse to have to publicly admit that egg exists. I very much doubt 1) that the paper will be withdrawn, 2) that Nature will allow publication of any letter/comment/paper which shows the circularity (and silliness) of the paper’s logic. The only chance I see for withdrawal is if Forster is sufficiently embarrassed by the paper to request that Nature remove his name as an author…. and I don’t see much chance of that happening either. Some people are more easily embarrassed by stupid errors than others. Some just don’t care, because they have ‘bigger fish to fry’.

Nature does retract. See for instance this link:

http://www.nature.com/nature/journal/v505/n7485/full/nature12968.htmlIt was the groundbreaking discovery that stem cells could be induced from somatic cells by a short treatment with lactic acid. It created a complete frenzy and a firestorm in the medical/cell biological world. It appeared all due to a contaminated sample…

In December 2014, Willis posted GMT series generated by 42 CMIP5 models, along with HADCRUT4 series, all obtained from KNMI.

http://wattsupwiththat.com/2014/12/22/cmip5-model-temperature-results-in-excel/

We were able to analyze the temperature estimates of CMIP5 models and compare them with HADCRUT4 (1850 to 2014), as well as UAH (1979 to 2014). The models estimate global mean temperatures (GMT) backwards from 2005 to 1861 and forwards from 2006 to 2101.

Bottom Line:

In the real world, temperatures go up and down. This is also true of HADCRUT4.

In the world of climate models, temperatures only go up. Some variation in rates of warming, but always warming, nonetheless.

The best of the 42 models according to the tests I applied was Series 31. Here it is compared to HADCRUT4, showing decadal rates in degrees C periods defined by generally accepted change points.

Periods HADCRUT4 SERIES 31 31 MINUS HADCRUT4

1850-1878 0.035 0.036 0.001

1878-1915 -0.052 -0.011 0.041

1915-1944 0.143 0.099 -0.044

1944-1976 -0.040 0.056 0.096

1976-1998 0.194 0.098 -0.096

1998-2013 0.053 0.125 0.072

1850-2014 0.049 0.052 0.003

In contrast with Series 31, the other 41 models typically match the historical warming rate of 0.05C by accelerating warming from 1976 onward and projecting it into the future.

Over the entire time series, the average model has a warming trend of 1.26C per century. This compares to UAH global trend of 1.38C, measured by satellites since 1979.

However, the average model over the same period as UAH shows a rate of +2.15C/cent. Moreover, for the 30 years from 2006 to 2035, the warming rate is projected at 2.28C. These estimates are in contrast to the 145 years of history in the models, where the trend shows as 0.41C per century.

Clearly, the CMIP5 models are programmed for the future to warm more than 5 times the rate as the past.

Ron C, thanks for drawing attention to Willis’s post. His spreadsheet that it linked to giving the data didn’t identify which run came from which model. But Willis has very kindly just rechecked for me.

The best series, 31, was the (single) run from the inmcm4 model, as I suspected might be the case. That is the CMIP5 model with the lowest climate sensitivity (ECS), and it has a TCR of 1.3 C, in line with good observational estimates. It comes out top at matching the BEST tempereature record as well – see their website.

I happened to read some comments by the blogger Anders, and they were so funny I had to share them. According to him, this post is wrong. His explanation is… remarkable:

A commenter responded by saying circular arguments are circular even if they happen to be right. Anders responded:

Apparently, Nic Lewis would know using temperature to estimate the effect of forcings then using those estimates to estimate the effects forcings have on temperature is okay if he had talked to climate scientists:

I think Anders has a point. You have to talk to climate scientists. After all, who but climate scientists would accept arguments like these?

Rice will only accpt that he was wrong when Nature will pull the research and even then. I mean this fitted so nice in the argument that the models don’t overestimate, such a waste to throw that away….

I think Anders is perhaps in over his head here. I think he just hasn’t had time to really look at it carefully. I suggested he come here and talk to Nic about it to get it resolved. I suspect that won’t happen.

I do think Nic should publish his critique as a note. That would more likely result in the papers authors either defending their work or retracting it.

I am confused by the notation used by Marotzky et al in equation 4. In the text, they seem to describe using the 15-year trend errors as the dependent variable, but in the equation preceding equation 4 the dependent variable is denoted by “delta T prime sub j”. In Equation 4 the dv is denoted “delta T hat sub (reg, j)”. The text in between says “The complete GMST trend is obtained by adding the ensemble mean trend to the regression for the across-ensemble variations:”

It looks to me like Nic Lewis has written a lot about a poor notation. Being confused about the notation, I suggest this with more than my usual modesty.

However, Marotzky has definitely used the wrong estimation/testing procedures for what are in fact several autocorrelated and possibly cross-correlated time series.

Just a layman, no background in stats, so someone correct me if I’m wrong… Just trying to wrap my head around the basics.

It seems to me that this critique is only relevant to the 2nd & 3rd sections (“Energy balance and multiple regression” & “Deterministic versus quasi-random spread”)… and leaves the first section (“Observed and simulated 15-year trends”) completely intact.

Am I totally off, here? Because it’d mean that the models have, to a layman, been shown to be valid & bias-free; and only the paper’s attempt to explore why they fail on short runs, has been trashed.

If I’m completely misunderstanding this, could somebody please explain it, in layman terms? Thx.

One attempt, anyhow: I think that the paper misses the point, even if its statistics were right. https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-750648

Section 1 (“Observed and simulated 15-year trends”) relies on the assumption “that the simulated multimodel-ensemble spread accurately characterizes internal variability” and goes on to say “We now test the validity of this assumption by identifying deterministic and quasi-random causes of ensemble spread”. It is that testing which my article shows to be fatally flawed. In fact, had their parts 2 and 3 results been correct, they would have shown the CMIP5 models to be unphysical rather than valid. As it is, they prove nothing at all.

Part 1 does not show the CMIP5 models to be bias free. It merely shows that over 1900-2012 (and therefore on average for 15 year sub-periods within 1900-2012), they roughly match the historic record, but with rather greater variability of 15 -year trends. As matching the 1900-2012 record can be achieved by many different combinations of model forcings, model climate sensitivity and model ocean heat uptake efficiency, and the temperature record was very largely known when the model versions were selected, that does not at all prove that the models are bias free.

Nic lewis wrote:

I think that Nic Lewis’ statement above reflects an is an issue that concerned me with the M&F result. If their result was correct then the usefulness of comparing of model results to empirical measurements would be lost. Any result could be justified by an appeal to natural variability. This would be a major setback for research into the potentially critical issue of AGW. Am I correct in this interpretation?

I’m trying to check if I get the basic idea here. I brushed off my rusty R skills and tried a simulation:

//x = 4a + 3b +5f + e, where e is error term

> set.seed(1)

> a b f e x plot(x)

> lmm summary(lmm)

Call:

lm(formula = x ~ a + b + f)

Residuals:

Min 1Q Median 3Q Max

-2.26987 -0.49973 -0.00857 0.50395 2.14144

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.032828 0.053684 -0.612 0.541

a 4.008051 0.011139 359.818 <2e-16 ***

b 3.003538 0.007383 406.831 <2e-16 ***

f 5.003245 0.011183 447.403 <2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7278 on 996 degrees of freedom

Multiple R-squared: 0.9981, Adjusted R-squared: 0.9981

F-statistic: 1.741e+05 on 3 and 996 DF, p-value: lmf summary(lmf)

Call:

lm(formula = f ~ x)

Residuals:

Min 1Q Median 3Q Max

-5.6699 -1.0461 0.0670 0.9834 5.3395

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.727846 0.111691 -6.517 1.14e-10 ***

x 0.081310 0.002956 27.511 < 2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.556 on 998 degrees of freedom

Multiple R-squared: 0.4313, Adjusted R-squared: 0.4307

F-statistic: 756.8 on 1 and 998 DF, p-value: f2 lmm2 summary(lmm2)

Call:

lm(formula = x ~ a + b + f2)

Residuals:

Min 1Q Median 3Q Max

-3.249e-14 -1.780e-15 1.360e-16 1.913e-15 6.075e-14

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.951e+00 2.622e-16 3.414e+16 <2e-16 ***

a 6.899e-16 7.721e-17 8.935e+00 <2e-16 ***

b 1.089e-15 5.322e-17 2.045e+01 <2e-16 ***

f2 1.230e+01 1.448e-16 8.492e+16 <2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.844e-15 on 996 degrees of freedom

Multiple R-squared: 1, Adjusted R-squared: 1

F-statistic: 6.251e+33 on 3 and 996 DF, p-value: < 2.2e-16

// note that f2 has totally swallowed any dependence on a and b

Hmm – a lot of things didn’t show up properly. I hope it’s at all comprehensible.

The main parts that didn’t show up:

a=rnorm(1000,3,2)

b=rnorm(1000,4,3)

f=rnorm(1000,2,2)

e=rnorm(1000,0,.7)

x=4*a+3*b+5*f+e

lmm=lm(x ~ a+b+f)

// now lets try deriving f from x

lmf=lm(f~x)

summary(lmf)

f2=fitted(lmf)

// now let’s try regression again, this time using f2 instead of f

lmm2=lm(x ~ a+b+f2)

summary(lmm2)

There is now apparently a response from M&F at climate lab book. See pingback.

I’ve done a quick read of the post at Climate Lab Book. I don’t get how their article is supposed to rebut Nic’s article. They do not appear to contest Nic’s equation linking F and N – an equation that I did not notice in the original article. Their only defence seems to be that the N series needs to be “corrected” but they do not face up to the statistical consequences of having T series on both sides.

Based on my re-reading of the two articles, Nic’s equation (6) seems to me to be the only logical exit and Nic’s comments on the implications of (6) the only conclusions that have a chance of meaning anything. (But this is based on cursory reading only.)

Steve

The equation linking F and N is from Forster et al (2013) (so it’s not Nic’s equation, and it’s hardly surprising that Jochem and Piers don’t “contest” it!)

In their Climate Lab Book post, Jochem and Piers say:

and

So they are aware that they rely on some assumptions, but have already checked these out in previous work.

This can of course can be tested here in future work – if the CMIP6 models allow F to be obtained more directly, the M&F procedure can be re-done with that.

The halt leading the blind,but it’s the halt that’s blinding.

=============

Richard, all of this is new to me so I’m commenting just in data analysis/statistical terms based on partial understanding. You say:

Maybe that’s the objective, but, viewed in statistical terms, adding a linear function of T to one of the right side components and then regressing T against the right side appears to be precisely the lunacy that Nic described. You just can’t do this. A couple of very competent statisticians have already weighed in on this and, if I’ve understood the setup correctly, Nic and they are right and you and Marotzke-Forster are wrong in terms of meeting the requirements of a regression.

There was nothing in the reply at Lab Book that was responsive to Nic’s criticism. As I read it, they more or less just re-asserted that they were right. But to this third party reader with specialist statistical knowledge, they look completely out of their depth. Exactly the sort of ad hoc and home made statistical analysis undertaken for advocacy that has so marred Team paleoclimate.

A further gloss on this article. Once again, I think that the penultimate section provides the most direct analysis. In equation (6), the circularity is removed and a regression can be done and these (very negative) results are the only ones with any conceivable meaning. Marotzke and Forster (and Betts) made no attempt to address the findings in this paragraph, which are very clear.

“As I read it, they more or less just re-asserted that they were right.”

That’s what I took away from the article. I kept reading for the part that says ‘…this is why our argument is not circular” and couldn’t find any.

Inference-making is a chain of logic. When the mind locks on, it becomes hard to see other perspective. Maybe the authors should try being more explicit, if they think they are correct.

Ron C, thanks for your kindly response. Of course also HadCRUT4 is a model, anyway.. with a fresh initialisation every month in contrast to CMIP5. And of course the mean is a mean of so many models…some with an ECS below mean and some with above mean. I also made a comparison only for 1975…2004 and the models ( Willis sheet…)with a very small difference of trendslopes to HadCRUT4 are very suspiciuos for matching some parameters (only) for this interval…the first candidate is aerosol forcing. It seems to me that this “tuning” makes the big difference during other periodes. As less a model is tuned with aerosols to a good performance during 1975…2004 as better it is in other periodes?

Steve, Thanks for your input, with which I fully agree.

To recap, there are only two relevant CMIP5 model Historical simulation outputs available and used here, T and N. The simple physical model used in the paper’s analysis and to diagnose F in the first place reduce algebraically to ΔT = ΔN / κ which, with an error term added, is my equation (6). Whether or not that equation has any significant explanatory power here (it doesn’t), it does not enable any separate estimate of ΔF to be made that would enable the relationship between ΔT, ΔF and α to be investigated. Jochem Marotzke and Piers Forster do not appear to have realised this when they undertook their analysis, and neither have they addressed the issue now that I have pointed it out.

A minor point, but although Jochem and Piers write of “correcting” ΔN for the increased back radiation (α ΔT), that correction term is larger than the ΔN term for most of the 62-year periods they analyse. It might be better to say that forcing is diagnosed from the increased back radiation resulting from the rise in surface temperature it causes, with a correction for changes in the rate of heat absorption by the not-yet-equilibrated climate system (the counterpart of, and equal to, the change in top-of-atmosphere radiative imbalance ΔN).

Nic,

I have been trying to work out what Profs Marotzke and Forster are trying to say, and am having great difficulty. I think that they are trying to make a non-trivial, but entirely erroneous, point in their response.

According to Profs M&F, you are confusing two different entities which are both called temperature. (You aren’t but I think that this is the basis for their argument rejecting circularity.) The first entity is represented by the actual temperature (anomaly) observed in the given GCM. This temperature anomaly is made up of two components, the first of which, Tf, is the forced response including temperature-dependent feedbacks, and the second of which, Tnv, are surface temperature variations caused by “natural variability” in the GCM. By assumption in Forster’s energy balance model, the restorative flux responds linearly to surface temperature change; the model does not care what caused that temperature change. Hence, the restorative flux is represented as a simple linear function of

bothcomponents of this observed temperature. Hence, in order to estimate the forcing from the net flux time series, it is necessary to adjust the net flux using the total actual temperature change observed in the GCM. So the derived adjusted forcing (AF) value is given by:-F(t) = N(t) + α*T(t)

= N(t) + α * (Tf(t)+Tnv(t))

Where F, N, Tf and Tnv all denote change in values from some initial theoretical steady-state, T(t) = Tf(t) + Tnv(t)) and the time series, N and T, come directly from the GCM results. (None of this will come as any surprise to you, but their response seems to imply that it should.)

Now Profs M&F want to separate out the forced change in temperature plus associated feedbacks, from the “natural variability” change in temperature in the same GCM. The model they use to do this assumes that

ΔTf = ΔF/(α + κ)

I think that they are arguing that the ΔTf is not the same animal as T(t) above, since the natural variation component is now excluded. If we substitute the (exact) expression for the derived ΔF, we obtain:-

ΔTf = [N(t) + α*T(t)]/(α + κ)

Hence, since the temperatures on the LHS and RHS are different animals, they reject your argument of circularity. Bingo.

In reality, the problem has not gone away at all, since the actual regression itself is not against ΔTf, but against a mean shifted T(t). I don’t think that they have thought this through.

On a different point, there is substantial error in the emulation model in terms of its ability to match GCM temperature results, since it relies on (a) the assumption of infinite ocean and constant flux per degree temperature change (b) a linearly changing forcing with time and (c) in this instance a zero intercept in a plot of N vs T, which implies a zero surface layer capacity. None of these assumptions are perfectly met, and the “model error” is especially substantial over the shorter 15 year periods. All of this model error is dumped into the regression error term and ends up being dubbed as natural variation.

On a third point, I do find the entire M&F paper ironically amusing. In summary it seems to be:- The AOGCMs have done a cr*p job of matching variation over 15 year periods, so why should you expect them to get the last 15 years right? It is a pity that their methodology is flawed. If it wasn’t I would love to see it applied to 31 year periods instead of the 62 years they adopted. The latter conveniently eliminates the quasi-60 year oscillations from the picture. I suspect that if the M&F logic were applied to 31 year periods, we would find that the models have also done a cr*p job at matching variation over this period. (smiley)

Paul_K,

Nice summary. I think you are correct about the impact of the 62 year period they use, since 62 years is close to the apparent period of ‘oscillation’ in the instrumental record. As to the motivation behind the paper, I think it is very clear that this but one of many recent papers that offer ‘explanations’ for the post 2000 divergence between modeled and measured response to warming. Some of the ‘explanations’ are plausible, some, like this one, risible. One might suggest that the blizzard of papers along these lines is an effort to… err… paper over the obvious divergence.

Paul_K,

sorry, that should have been “measured and modeled response to forging”.

‘forcing’, not ‘forging’….

Piers Forster comments at Climate Lab Book:

Paul,

Thanks. You may be right as to what M&F are arguing; I’m having some difficulty telling what exactly they mean. In any case, as you say (and I knew), it is the same T that they use in both equations, and the circularity does not go away no matter how much they protest that there is none involved.

I agree that “model error” (of their simple physical model used to emulate the GCMs, particularly as linearised into their regression model) is a major component of the regression residuals here.

But surely if the same variable dT appears on both sides of the expression then any regression analysis is automatically meaningless

In my comment above, I mentioned an analysis of CMIP5 temperature series. In presenting the CMIP5 dataset, Willis raised a question about which of the 42 models could be the best one. I put the issue this way: Does one of the CMIP5 models reproduce the temperature history convincingly enough that its projections should be taken seriously?

To reiterate, the models generate estimates of monthly global mean temperatures in degrees Kelvin backwards to 1861 and forwards to 2101, a period of 240 years. This comprises 145 years of history to 2005, and 95 years of projections from 2006 on-wards.

I identified the models that produced an historical trend nearly 0.5K/century over the 145 year period, and those whose trend from 1861 to 2014 was in the same range. Then I looked to see which of the subset could match the UAH trend 1979 to 2014.

Out of these comparisons I am impressed most by the model producing Series 31, which Willis confirms is output from the inmcm4 model.

It shows warming 0.52K/century from 1861 to 2014, with a plateau from 2006 to 2014, and 0.91K/century from 1979-2014. It projects 1.0K/century from 2006 to 2035 and 1.35K/century from now to 2101.

Note that this model closely matches HADCrut4 over 60 year periods, but shows variances over 30 year periods. That is, shorter periods of warming in HADCrut4 run less warm in the model, and shorter periods of cooling in HADCrut4 run flat or slightly warming in the model. Over 60 years the differences offset.

IMO Piers Forster’s comment at CLB seems to be justifying the circularity by treating the model response to forcing as tautological. Just an extension of the same circular logic.

Ron C, Nic: It’s interesting to look at the model mean of the “Willis sheet” and see the relative failure in relation to HadCRUT4. (Not)Surprisingly the trendfailure of the mean for 1975…2004 is only about 1% (!) and the failure of 1979…2013 (Sat-periode) is 37%, for 1998…2013 it’s 196%. This result could mean: The mean is matched to the periode 1975…2004 ( see Mauritsen http://onlinelibrary.wiley.com/doi/10.1029/2012MS000154/full ) and fails dramaticly during other periodes. If the M/F conlusions are correcht this would mean that during 1975…2004 there was NO internal internal variability in the climate system because the models are nearly 100% on the track. In all ohther intervals we saw a much greater internal variability?? This seems to me not very likely. Just another thought: Over at CL M/F claim that the methode for the splitting of dT from forcings and dT from internal variability is robust and that’s why there is no cirularity. This is the maipoint of the discussion and an essential basic of the paper. Anyway, they didn’t show a comprehensible justification in the paper. It should habe been an essential core and not a matter of dicussion at blogs AFTER the release of the paper at “Nature” with so much PR. So IMO the paper is very, very questionable.

Frank, your analysis is interesting, and seems to support your conclusions (which fit the category “Suspicions Confirmed.”

Please help me understand the mean failure rates. Do these cover all 42 series? Are you comparing slopes? How is mean failure defined and calculated?

Thanks.

Ron C., For the first approach I calculated 30y running trends from 1880 on for both: HadCRUT4 and the model mean ( from CE). The differences of the trendslopes over time is shown here: http://kauls.selfhost.bz:9001/uploads/trenddelta.png with the upper and lower 1 sigma. It’s very strange that the difference is near zero for the trends to 1995…2005 when a noise due to the internal variability is working… Look at the trends 1905-1915, the difference is greater 2*sigma.

Frank, thanks for that. So the %s are extent of variance of the mean from the HADCrut4 slope for each period. This does provide a measure of how the set of models compare to HADCrut4 estimates. I hasten to add that, of course, GMT is a statistical construct, and not a physical reality that can be measured. Thus, HADCrut4 is also an estimate, albeit starting with surface thermometers rather than model parameters.

Of course, the mean of the models includes many individual model variances both + and – which can offset, giving a misleading impression of accuracy. That is a major argument why the ensemble mean is a bad indicator, combining as it does many deviations. One good model is much better than averaging an ensemble with so many deficient models.

Paul_K says;

This seems to be an artificial distinction. How can one kind of T cause a difference in diffusion while another “kind” of T does not?!

The factor α/(α + κ) is spurious.

Temperatures do not wear a little yellow star to show their ethic origins.

I had an interesting colloquy with anders at ATTP… he is apparently incapable of understanding that regressing a variable on itself isn’t particularly enlightening…and is also unwilling to describe his own training and background in statistical analysis.

snip

David,

His background and CV are here: http://www.roe.ac.uk/%7Ewkmr/

He is an astronomer who works on the mechanics of planetary formation from accretion discs. Name: Ken Rice. After his undergrad work he was employed by the South African Environmental agency, and made several trips to Antarctica. He finished his PhD in astronomy 1998 IIRC. Short of getting a list of courses he has taken, there is no way to judge what specific statistical training he may have had, if any. My impression is that he does not understand (or perhaps doesn’t care) that regressing a variable against itself is … ahem…. ‘uninformative’.

But is anders the same person as ATTP?

Bill,

Yes. They are one and the same. Anders is apparently a shortened version of andThenTheresPhysics (aTTP).

David,

Or, maybe, I disagreed with your assertion that Marotzke & Forster are actually regressing a variable on itself. I guess, however, that accusing me of being dishonest and disingenuous is much simpler than considering that possibility.

And if you were annoyed by how I responded to your comments, maybe don’t start with a demand that I answer your question and maybe don’t be quite so condescending yourself. Also, if I’m annoyed with someone, I still try not to go around calling them a liar, but maybe that’s just me.

Steve: I agree that such accusations should be avoided. I’ve removed the language.

snip

I note that you still have not answered the question, now put to you for the fourth time:

“what is your training and background in statistics?”

Steve: this is now a foodfight and this is last bite.

David,

Seriously, you think I’m interested in answering your question? Also, how did I mischaraterise your question?

I’ll explain something to you. You demanded I answer a question on my blog. I don’t need to answer your question on my blog. I don’t even need to answer it here. Of course, Steve could insist that I do, but I still don’t have to. This is not a complicated concept. Additionally, my interest in engaging with someone who has called me a liar is normally limited to one snarky response (this one) and then ignoring because anyone who thought doing otherwise would be constructive is a fool.

Steve: this is now a food fight where you seem merely petulant. My usual practice would be to snip both responses, but I’ve left one extra bite for both of you.david, it seems the lack of a background in statistics is not a hindrance in the practice of climate science. It actually helps to be naive when it is often necessary to make up novel statistical approaches to get the right answer. The Nature reviewers and editors know this. They are smart.

Steve,

Don’t not snip these on my behalf. I have no great interest in these discussions.

Steve: I give a longer leash to critics and let this go on so you could have a last word. But my comments were intended to draw a line.

Steve, thanks.

Since I’m commenting here, I think that if you want to argue that this analysis is circular, you’re essentially suggesting that climate models do not conserve energy. Consider a climate model that is known to have a climate sensitivity of alpha. Consider that it starts in equilibrium and that you apply a change in forcing of dF. If the temperature response is dT, then the TOA flux has to be (by energy conservation)

dN = dF – alpha dT.

Unless these models don’t conserve energy, the above is true.

However, dF is an external forcing and so does not depend on dT by definition. However, you can still rewrite the above as

dF = alpha dT + dN

Since dF does not depend on dT, the quantity alpha dT + dN also does not depend on dT. Any change in dT is compensated for by a corresponding change in dN (i.e., if the surface temperature goes up without a change in dF, then dN goes down, and vice versa).

Therefore if you use the output from climate models (dT, dN, and alpha) to determine the forcing timeseries, dF, it is independent of dT as long as the model conserves energy. Of course, climate models are not perfect and don’t conserve energy exactly, but that doesn’t really change that dF is not explicitly dependent on dT.

Therefore, I would argue that this analysis is not circular. Just because the temperatures are used to determine the external forcings does not mean that the external forcings depend on temperature.

It depends on what your definition of “depends” is….

Let me amend my comment about the definition of “depends” since it appears flippant against the seriousness of ATTP’s point. Whether models conserve energy well or poorly, modeled dT is not entirely independent of modeled dF since you are solving for one value by assuming the others. The results depend upon the initial assumptions in the models, and changing the assumptions changes the model’s (and the simplified equation’s) output. For example, changes in forcing (dF) may not be dependent upon changes in temperature (dT) to the same degree (pun intended) that changes in temperature are dependent on changes in forcing — but they are interconnected. Consider that cloud formation responds to temperature changes and clouds can produce both feedback and forcing.

In any event, much of the debate is actually over climate sensitivity (alpha, in the above) and Nic Lewis’ original posting implicitly challenges the majority’s calculation(s) of sensitivity. It does so by undermining the Morotzke and Forster defense of modeled dT’s divergence from recently observed dT. Morotzke and Forster’s paper suggested, essentially, that the assumptions used to produce model results are sufficiently accurate to reproduce the recent pause in the global temperature trend — after accounting for a few more assumptions about internal variability. Nic Lewis has presented a serious challenge to the statistical methods employed by Morotzke and Forster and most of us are still trying to work our way through the arguments. Intelligent comments, therefore, are greatly appreciated from all sides in the debate.

opluso,

My laptop has died, so am using a tablet and am not that used to this. Excuses out of the way. I think there is some confusion. Forcings are, by definition, external. Things like water vapour, clouds, albedo, are feedbacks. They’re all included in the alpha term in front of dT. Therefore if energy is conserved, the term dN + alpha dT gives the external forcing and is, by definition, independent of dT. The forcing are driving the temperature changes, not the other way around.

There obviously seems to be a difference between how a physicist and a statistician approach statistical analysis. It seems to me that the physicists are to some extent hypothesizing a can-opener.

But watch what happens (as I understand it and I haven’t parsed it) if you start from the data: in this case, what you have are the series N and T. ATTPhysics says: dN depends on dT while dF does not. Well, if dN depends on dT, that’s precisely the sort of thing that you want in a statistical relationship. Rather than being ill-suited to regression, isn’t it ideally suited to regression? ATTP’s comment seems to misunderstand the entire purpose of statistics.

From a physics point of view, you may want to add alpha*T to N get F, but from a data/statistics point of view, the two series: T and N+alpha*T, are going to be related by construction. Even if there is a real relationship somewhere, you won’t be able to disentangle it from the tautological relationship created by construction.

Again, the statement: “dN depends on dT while dF does not” really seems to show how you’ve grabbed the wrong end of the stick so to speak.

Steve,

I’m not actually talking about statistical analysis, though. Let’s do this in two steps. First you have Forster et al. (2013) who use the dT and dN values from the climate models to determine the external forcing. They use that conservation of energy means that dF = dN + alpha dT and that even though there is a dT on the right-hand side, dF does not depend on dT.

Now we have Marotzke & Forster who take the dF time series and use them to determine the forced trend and then add a residual (epsilon) to estimate internal variability. Since dF dies not depend on dT there is no actual circularity.

I’m not quite sure why you think I’ve got the wrong end of the stick. If you want to determine the forced trend, you need to use the external forcings. You can’t do it using dN, for example, because you can’t get the forced trend from dN. You can’t really criticise Marotzke & Forster for not doing what you think they should have done. You can only really criticism them for not doing what they said they’d done, properly. You can’t do their analysis using dN.

oplus,

I should check this, but I think the cloud radiative effect is bcause of anthropogenic aerosols seeding clouds, and so is a forcing. It’s not the same as the cloud feedback response.

Steve: there’s a difference between the ideal concepts and what you can measure. Once you use dT to contruct F, you end up with a tautological property because of the math. Andthentheresmath, so to speak. At the end of the day, linear regression is just some matrix algebra: if you do all the matrix algebra, you should be able to see the tautology that Nic observed. He’s right.

Anders/ATTP/Ken,

But dF was in fact calculated directly from dT in Forster et al (2013), as Nick Stokes and others have pointed out in criticizing the circularity of a post at WUWT by Willis E, based on the self-same temperature-calculated forcing from Forster et al (2013). dF is calculated from dT by Forster et al, and the equations:

dF = alpha dT + dN

combined with:

dT = dF / (α + κ) + ε, or

dF = (dT – ε) * (α + κ)

Makes the circularity of regressing one function of dT against the Forster et al (2013) calculated forcing…. which is really just another function of dT…. explicit. Surely you can see that the temperatures used are the same.

There is nothing that is going to remove the circularity except an independent forcing history for each model which is not calculated from dT.

FWIW, I would normally try to respect someone’s pseudonymity on my blog. However, I guess this is Climateball(TM) and so there aren’t any rules and the only losing move is to not play.

Stevef,

I don’t know what Willis did or why Nick Stokes criticised it. That may or may not be relevant for this discussion. However, the thing that you seem to be ignoring is energy conservation that adds an extra constraint. So, yes, dT is used to determine dF, but so is dN. If climate models conserve energy (as they will to within the accuracy of the method) then the following quantity

dN + alpha dT

is independent of dT and depends only on the change in external forcing dF. As Ed Hawkins, Piers Forster and others are pointing out on Ed Hawkins’ post, by combining dT and dN in this way you can determine dF in a manner that does not make dF depend on dT.

Denote X = dN + alpha dT. The analysis requires X to be dependent of dT. If it isn’t then the regression coefficients are biased and inconsistent. So we are asked to assume that it is, namely deriv(X) / deriv(dT) = alpha = zero. Or, if we treat alpha as a function of dT, then we need dT * deriv(alpha) / deriv(dT) + alpha = 0.

Either expression requires alpha to be independent of dT, so one of the main empirical “results” is an assumption required for the empirical method to work. This is methodologically invalid.

There is a prima facie case that X is likely dependent on dT. It is not sufficient for M&F simply to assert that it ain’t. The empirical issue could be settled using a Hausman endogeneity test, though it would require collecting additional data to serve as valid instruments for dF.

Ross,

Denote X = dN + alpha dT. The analysis requires X to be dependent of dT.Yes, because of energy conservation. If you apply a change of forcing, dF, to a climate model with a climate sensitivity of alpha, then if the temperature response is dT, the TOA imbalance has to satisfy (because of energy conservation),

dN = dF – alpha dT,

where dF, above, is your X. Therefore, if the above is true,

dF = dN + alpha dT,

and since dF does not, by definition, depend on dT, neither does dN + alpha dT.

I suspect that it would be difficult to find a physics textbook that shows that conservation of energy implies anything at all about dN and dT, since these are fitted trends in statistical constructions based on ad hoc averages rather than basic physical variables. alpha isn’t a basic physical variable either.

However, if what you say is true, then a Hausman test should rule out endogeneity bias.

It is absurd to assert “dN + alpha dT” is independent of dT.

The presence of dT in the former expression means otherwise, unless you also assert alpha is 0.

If you do not assert alpha is 0, then it is clear that the former expression changes as dt changes, which by definition of the word “independent” means the two quantities are not independent. The equation means they are not independent, by definition.

It is fascinating to see what people are willing to argue.

Steve, what Willis post was that?

Don,

A post where Willis showed that the model temperature histories were nothing more than the lagged forcing histories from Forster et al (2013). (Model climate sensitivities calculated directly from model results, was the title, I think.) Like M&F, his analysis was circular, because the forcing came from the temperature, though he probably did bot appreciate where the forcing came from.

I see what you mean, SteveF. It didn’t take nicky long to jump on Willis for circularity:

http://wattsupwiththat.com/2013/12/01/mechanical-models/

“In fact, the close association with the “canonical equation” is not surprising. F et al say:

‘The FT06 method makes use of a global linearized energy budget approach where the top of atmosphere (TOA) change in energy imbalance (N) is split between a climate forcing component (F) and a component associated with climate feedbacks that is proportional to globally averaged surface temperature change (ΔT), such that:

N = F – α ΔT (1)

where α is the climate feedback parameter in units of W m-2 K-1 and is the reciprocal of the climate sensitivity parameter.’

IOW, they have used that equation to derive the adjusted forcings. It’s not surprising that if you use the thus calculated AFs to back derive the temperatures, you’ll get a good correspondence.”

The author of that? Nick Stokes. The irony, it is so rich!

In the present context, Willis Eschenbach’s post http://wattsupwiththat.com/2013/12/01/mechanical-models/ is well worth re-reading. I haven’t parsed it, but at a quick read, Willis seems to have tried to go down the same road as Marotzke.

Steve McIntyre,

Nick was perfectly justified in pointing out the circularity of Willis’ calculations. I do wish he would bring his considerable analytical talents to bear on the M&F paper, since, while it is more complicated than what Willis did, it suffers from exactly the same circular reasoning…. and having been published in ‘Nature’, can lead to a lot more confusion and incorrect understanding about climate models than what Willis posted on WUWT. What is good for the Willis ought to be good for the Jochem you know. I figured Nick would find such an obvious and glaring error offensive to his scientific sensibilities, but so far he has been pretty quiet. Count me surprised.

Nicky’s comment on M&E was quoted by Not Sure, above. Oh wait, it’s his comment on Willis’s similar circularity. Of course, nicky could deny it applies to M&E. Will nicky talk. Or will he take the fifth?

SteveF,

“while it is more complicated than what Willis did, it suffers from exactly the same circular reasoning”It is more complicated, and I haven’t had a lot of time for it lately. And if Nic Lewis and Piers Forster are in disagreement, it needs thinking about. It’s not exactly the same.

Willis’s was simple. he just said – look, all models are doing is taking in ΔF and producing a linear ΔT. Silly models. But, as I well know, the models aren’t working on ΔF as input. Forster explicitly back-computes it by pretty much that same formula.

OK Nick, maybe not exactly the same reasoning, but IMO, terribly close to the same reasoning. Forster et al does indeed ‘back-calculate’ the individual model forcings from the change in temperature, so it is difficult to see how those calculated forcings ever be independent of temperature as some are now insisting. I mean, you can just do the agebra and see that delts-T ends up on both sides, and as Nic Lewis points out, the alpha*delta-T term dominates TOA imbalance term.

I do hope you find some time to think about it.

Thank you, Dr. Stokes. We get it.

It only took Dr. Stokes 3 hours after the first comment on the post to make a monkey out of Willis. He has to think about this M&F pretty much the same thing.

Here is another interesting Dr. Stokes comment on that post. He agrees with Nic, who had also spotted Willis’s error:

“Nick Stokes

December 2, 2013 at 7:24 am

Joe Born says: December 2, 2013 at 6:19 am

“Whether the forcings values you use are the models’ actual stimuli or represent the forcings they respectively infer from the stimuli they do use, I find it telling that, after all their machinations, their results differ from respective simple one-pole linear models by much less than they differ from each other.”

I think you’ve missed the point of my earlier comment, and of Nic Lewis. Forster et al took the temperature outputs of the models and calculated adjusted forcings ΔF (they call it F) using the formula

N = ΔF – α ΔT (1)

Here N, the TOA imbalance, has to be small by cons eng, and some models constrain it to be zero.

This post substitutes those ΔF into a regression and finds that, presto

ΔF – λ ΔT=0.

But of course, they have to. It has nothing to do with what the models actually do. It’s just repeating the arithmetic of Forster et al by which ΔF was derived.”

Dr.Stokes is not programmed to critisize alarmists.

SteveF, you lumped me, I think, in with a couple of defenders of the paper being critiqued here, and like lumping the individual climate models, I do not think that is a good idea. (I don’t do emoticons).

I do hope the circular reasoning criticism does not take away from some other aspects of these paper that I think bear further discussion. Putting all the model/model runs in one population to do regression appears an artificial construct for me and one that would not necessarily stand up statistically if the authors had looked at individual model outputs. The kinds of noise and noise levels are different for the individual models. Further conflating stochastic noise with differences in deterministic output of the individual models does not seem correct in the eyes of this layperson. If the authors are using overlapping trends there must be some auto correlation issues that need to be addressed in the analysis.

Kenneth,

No, I referenced the real first name of ATTP/Anders/Ken Rice…. nothing I wrote was directed toward you. Now if you would put together a guest post or two on your work, that could change…😉

To ATTP:

There is more than one point of confusion, at least on my part. You originally stated:

In my longer reply, I was thinking of the fact that the IPCC refers to “cloud radiative forcing” effects (which precede any feedback). Perhaps this effect (which ultimately influences alpha estimates) is officially subsumed solely under the feedback mechanisms inherent in warming-induced changes in the clouds themselves. So if clouds are exclusively and always “feedback” I stand corrected in my chosen example. Otherwise, “clouds” are on both sides of the equation dF = alpha dT + dN.

Although my initial comment about the definition of “depends” was somewhat contingent upon cloud forcing/feedback assumptions I also was uncertain whether you were talking about the “dependent” and “independent” variables in an equation. Typically, one thing has to be the dependent variable you are testing for. You originally stated:

Yet if dF “does not depend” on dT, why is dT in the equation in the first place?

Even if you accept observed measurements for dT and dN, you still have to use a model generated climate sensitivity (alpha) to produce a result for dF — since neither is directly observed. Thus, plugging values into the discussed equation requires a bit of bootstrapping because there are multiple uncertainties hidden in the equation’s symbols. The underlying assumptions seem to generate most of the confusion. In other words, it all depends…

I get the feeling I am trying to converse in Portugues with someone who knowns no Portuguese.

ATTP keeps saying that the forcing is independent of the model temperature. For the true forcing, that is cirrect. But the forcing from Forster et al is NOT the real forcing, it is a value for forcing calculated from the temperature rise in the model, after taking into accout the model diagnosed TOA imbalance. There IS NO explicit data for forcing… it is 100% inferred from the temperature and TOA imbalance. It is not possible to remove the circularity with arguments about energy conservation; it is implicit in the Forster et al calculation. This is a very strange thread.

SteveF

Thanks for articulating that which I was unable to do.

ATTP refers to the actual forcing function; staying that it is independent of T, but thats not what M&F used…they calculated that function using the very variable they then used as a dependent variable.

Its like walking into the argument clinic.

Can’t they just pretend that it’s the real forcing? M&F must be saved, somehow.

Don,

They already are pretending it is the real forcing…. it’s not, it’s calculated from the temperature change.

stevef:

thanks so much…i felt like I was getting gas lit, so to speak.

Stevef,

But the forcing from Forster et al is NOT the real forcing, it is a value for forcing calculated from the temperature rise in the model, after taking into accout the model diagnosed TOA imbalance. There IS NO explicit data for forcing… it is 100% inferred from the temperature and TOA imbalance.Well, yes, but if the models conserve energy, then

dF = dN + alpha dT,

and, because dF is independent of dT, so is dN + alpha dT.

Hence, the point I made above, that the argument being made here is essentially that climate models do not conserve energy. Of course, they don’t conserve them exactly, but they do to within the accuracy of the method.

ATTP,

If you are calculating dF from the equation:

dF = dN + alpha dT

using dN, dT, and alpha from the models, then

how on Earth is the calculated value of dF independent of dT? Forster et al use that equation to calculate dF; there is no possibility the value of dF calculated from dT is independent of dT. Like I said, I feel like we are speaking different languages.

Just so I know my apples and oranges;

dN = dF – alpha dT.

Now dT is in K, dF is in watts.

What are the units of alpha and N?

Doc,

F is expressed in units of Wm-2, as is N. α has units of Wm-2/K, and varies (model to model) over a range of 0.64 to 1.79 Wm-2/K per Table 1 of reference (v) of the original post.

ATTP is not only wrong on the statistics, he’s wrong on the physics as well.

The idea that conservation of energy in a GCM is narrowly closed on GMST and TOA imbalance is quite wrong. There are plenty of energy transfers in GCMs, the obvious ones being the rest of the atmosphere and the ocean, but many more subtle ones as well, and the conservation of energy closes around all of them, not narrowly GMST and TOA radiation. Unfortunately when ATTP is out of his depth, you can pretty much guarantee an appeal to energy conservation will be the argument of last resort. One day, ATTP will perhaps realise that while energy conservation is a key constraint to close the system, it is a weak constraint in terms of the defining the model dynamics. Until then, Zzzz.

Spence,

The amount of energy in a box depends only on the fluxes through the surface, not on the movement of energy inside the box.

ATTP, the box does not just consist of GMST in a GCM. This much should be obvious. The deltaT you refer to is not “the whole box”. In fact it isn’t even the largest part of the box. It’s a tiny bit of the corner of the box. And we’re not directly measuring the fluxes in and out of that corner of the box.

Remember, the values populating deltaT and deltaN here are not from a simple one-box or two-box models, they are GCM outputs. Those values are then used to feed a simplistic model, but the input values are not constrained in the way you think they are.

That is empirically obvious from what Nic Lewis has already pointed out – that variations in T are larger than variations in N. As a result, the difference between the two must be dominated by variations in T, which means in turn the regression is necessarily broken.

Your physics and your statistics are both wrong here. I note at your blog Pekka is politely steering you away from this red herring conservation of energy argument. You would do well to heed his advice.

There obviously seems to be a difference between how a physicist and a statistician approach statistical analysis.There shouldn’t be. If ATTP is actually a physicist, I am embarrassed for physicists as a whole, because his argument is completely wrong. Or, as Pauli would have said, it’s not even wrong.

What ATTP does not seem to understand, and what is absolutely critical to any statistical analysis, is that for statistical analysis the important thing is not whether the

truephysical values are independent or not, but rather how theestimatesfor those true physical values are obtained.In this case, the estimate of the physical value for dF was obtained by using dT. Doesn’t matter if the true values of dT and dF are independent or not.

The very fact that dT was used to estimate dF means that, for statistical analyses, the two are NOT independent.ATTP’s argument shows a shocking lack of statistical understanding. As I said before, I am embarrassed for physicists everywhere. I assure you that most competent experimental physicists would not make such a basic mistake.

BY the way, as a side note, multiple regression and PCA are really quite elementary statistical methods. They were invented when computational power was very limited and they can be quite useful as long as one understands them well.

A valid multiple regression has three main requirements: first, that relationships between variables are linear, second, that the errors on the measurements are Gaussian, and, finally, that the values of different measurements are statistically independent.

As Nic describes the paper, it is a trifecta of bad: relationships have no reason to be linear, the errors are nowhere near Gaussian, and multiple runs from the same models were treated as independent.

It’s going to be a black mark on the resumes of everyone involved.

You don’t need Gaussian errors for ordinary least squares to be the best linear unbiased estimator of the coefficients. And there exists a host of methods for correctly performing non-linear regressions to deal with limited or censored or discrete dependent variables. For some reason, the very extensive development of regression theory and practice by econometricians is not widely appreciated in other disciplines.

You don’t need Gaussian errors for ordinary least squares to be the best linear unbiased estimator of the coefficients.There are two problems here: first, justifying the use of least squares for non-normal errors requires detailed knowledge of the actual error distribution, and second, any uncertainty estimates on the parameters will be useless.

For many (I would say most) error distributions, least squares gives a biased estimate. The estimate is unbiased if and only if the distribution is symmetric about the mean. Proof is left as a (trivial) exercise for the reader.

Parameter uncertainties from least-squares regressions arise as a result of the application of the Maximum Likelihood Ratio theorem to the Gaussian distribution. If the underlying distribution is not Gaussian, then least squares cannot be used to estimate parameter uncertainties.

If you’re going to use multiple regression on non-Gaussian errors, then why not just do Markov-chain Monte Carlo and just get the right answer directly?

I’m just stating the Gauss-Markov theorem, Day 1 in econometrics. Yes, you need the errors to be uncorrelated and homoskedastic with mean zero, but not necessarily Gaussian. OLS would still be BLUE. (And you need the independent variables to be measured correctly and uncorrelated with the error term. And of course the model can’t be functionally misspecified or omit variables.)

Gaussian errors in addition also make OLS maximum likelihood, which is nice, but not necessary to be BLUE. For non-Gaussian errors you have to break it down into the case where you know the error distribution versus where you don’t. For the former, you would use the correct covariance matrix; for the latter, something like bootstrap estimators can be tried. Your general statement is too strong.

Steve Postrel is correct. Assuming that the error terms are Gaussian is one way to conduct OLS regressions, but not necessary. Hermann Bierens and other econometricians developed the asymptotic theory that requires only that the error terms be independent and identically distributed with a finite variance as the sample size goes to infinity.

I’m not sure what you just said but I hope you just proved with statistical certainty that evaluating a circular equation will produce garbage if anything at all.

James :

do you think that those basic assumptions required by bierens are met. Are the error terms of repeated runs of the same model really independent? are the error terms of different models’ runs identically distributed?

do you think the variance is finite as the number of runs goes to infinity?

and the length of those runs goes to infinity?

It just seems like they will…

David Eisenstadt:

Without having worked with this data, I have no idea about the answers to your questions.

I am simply pointing out that Steve is correct. Gaussian errors are not necessary in order to run an OLS regression.

I’ve idled part of my recent life away trying to understand why the simple point that Ross McK makes about the amount of information available isn’t somehow instinctive to many in the community. So help me ATTP (if you are still monitoring this thread).

It seems to me that you make a number of empirical testable assertions in your initial comment. For example you assert climate models conserve energy and do this in a particular way, you assert “dF, .. is independent of dT as long as the model conserves energy” while conceding “climate models are not perfect and don’t conserve energy exactly but that doesn’t really change that dF is not explicitly dependent on dT”.

At that point you then assert “this analysis is not circular. Just because the temperatures are used to determine the external forcings does not mean that the external forcings depend on temperature.”

Now as I said all those assertions are empirical, and as a good empiricist you’ll be keen to test them.

Now here’s the thing. We aren’t dealing with abstract theoretical concepts, we are dealing with specific climate models warts and all. We have some rumpty incomplete data with which to do this. We can take the cheats way and just add into the paper that we assume all the above, but if we did that we wouldn’t have much of a conclusion.

And in fact M&F try and take the high road. They attempt to estimate the various relationships from the data to hand. They admit that their information is incomplete. But they don’t test their assumptions including those required by the tools they use with the data along the way. And by that I mean the real data they have from the models under study.

Help me understand why is it sufficient to simply assert these things as givens, as you have done, when it is obvious they are empirical?

“We aren’t dealing with abstract theoretical concepts…”

It looks like ATTP is looking for help from the realm of abstract theological concepts. He is hoping and praying that M&F can be saved by some miracle.

This post is very timely with regards to my learning and study of the CMIP5 models. What I see most notably on reading the Nic Lewis’criticism of the Marotzke and Forster paper and the paper itself (ignoring the more fundamental errors pointed to by Nic)is that the authors motivation is based on an assumption that the model outputs can be lumped into a single data base for statistical analysis. In my studies I have attempted to find tools to look at differences in model outputs that might well question the validity of this lumping effort or at least warn against the interpretation of the analysis results.

My efforts have been based on first attempting to decompose the temperature series of the CMIP5 models and observed data sets into deterministic, or at least secular trends, cyclical components and red and white noise using Singular Spectral Analysis (SSA). While my analysis to this point using (SSA) has not been rigorous in determining significant difference, these decompositions and subsequent reconstructions visually reveal some very different patterns and residual white and red noise and differences between models and observed temperature series.

For the time being I left the noise study and went on to the study of the individual CMIP model equilibrium climate sensitivity (ECS) and Transient Climate Response (TCR) emergent parameters. What I find interesting is that the attempts to classify the warming pause of the past 15 or so years in terms of model and observed differences have tended to obscure looking further back in time like 40 years where the white and red noise have a lesser effect in finding statistically significant differences. While one can find significant trend observed to model differences over that time period after accounting the auto correlations, there remains the potential of the difficult to measure low frequency cyclical (60 to 70 year)component that could affect the analysis result if not properly accounted for. I was motivated by these difficulties to look at the more deterministic part of the model outputs like ECS. I was further motivated from Nic Lewis posting at these blogs about the estimation of ECS and TCR from observable data and the comparisons with the climate models.

My first surprise from looking at the individual CMIP5 models ECS estimation from the abrupt 4XCO2 experiment was the need for correcting the net TOA radiation and surface temperature with the pre industrial control runs. The control runs, in general, do not appear to be going to an equilibrium even after a 200 year run up. Forster, who Nic has mentioned here, was a coauthor of the paper that used ordinary least square regression on the net TOA radiation and surface temperature to estimate ECS. Based on a later paper coauthored by Andrews using the same regression method those estimated values appear in the AR5 chapter 9. On the suggestion of Carrick, I did both ordinary and total least square regressions and found that the estimated ECS values were, in general, larger by 10% or more using total least square regression. I am currently finishing downloading the CMIP5 model and model run radiation values in order to compare the net TOA radiation to the potential global sea water temperature for individual CMIP models and runs. Ultimately I want to determine whether the TOA is truly made to balance by tuning as noted in publications or if, for at least some models, there remains a residual TOA not accounted for by changes in the ocean heat content as realized in the global sea water temperature change.

Like some others commenting here, I don’t fully grasp the stats that are at the center of the argument. In situations like this I look for what Pekka has to say. He will defend the consensus side when possible, but he is honest and he knows his doo-doo:

Pekka on Lab Book:

“Basically we have first

F = N + α T

Then we do regression

T = a + b F + c α + d κ + e

Using the first in the second and moving one term to the left hand side

(1 – α b) T = a + b N + c α + d κ + e

That seems to lead problems, if the coefficient of T may be close to zero. Thus we should perhaps not trust the results, if the regression tells that b is close to 1/α even in part of the situations.

Basically we have first

F = N + α T

Then we do regression

T = a + b F + c α + d κ + e

Using the first in the second and moving one term to the left hand side

(1 – α b) T = a + b N + c α + d κ + e

That seems to lead problems, if the coefficient of T may be close to zero. Thus we should perhaps not trust the results, if the regression tells that b is close to 1/α even in part of the situations.”

Also, the absence of a racehorse defense from nicky is telling.

Pekka’s latest comment on aTTP seems interesting:

We can see that all regression parameters multiply variables that have significant variability. Therefore the regression is not hampered by the circularity. The situation is not nearly as bad as Nic claims.

One problem remains. The coefficient of temperature may be very small in some cases. Therefore there may be situations, where the results of the regression lead to large uncertainties in the calculation of the temperatures from the results of the regression. It’s possible that this effect affects the spread of predictions from regression seen in the Figure 2b over years 1950-70 (the Figure 2 is shown in aTTP’s post). That’s at least a possible consequence of this issue. (M&F propose other possible reasons for the effect, but only propose).

If my above proposal is correct, it might contribute also to the somewhat less increased variability of the latest predictions and to the variability over most of the full period in the case of the 62 year trends.

Thus I do not think that the whole analysis would be affected strongly as Nic claims, but the circularity might have influence. Certainly it would be nice to know, whether the coefficient of T is small at all, and if it is, how much influence that would have on the results. Checking that would be possible either from full information from the original calculation or from a repetition of that calculation recording the relevant coefficients during the calculation.

> the absence of a racehorse defense from nicky is telling.

An alternative to this innuendo is that Nick may still have problems commenting:

http://moyhu.blogspot.com/2015/01/echo-chamber-at-climate-audit.html

Steve: I have a longstanding record of allowing critics to comment. It is ludicrous to think that I would depart from this longstanding policy in Nick’s case. Nick has posted hundreds of comments here, including a comment on this thread https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-750695. The commenter’s innuendo is entirely justified and your proffered rebuttal isn’t.In addition, I was unaware of Stokes’ whinge as he had never contacted me about an issue. It’s ludicrous for Stokes to complain that he’s being censored. Nick has had some problems with wordpress from time to time. I haven’t tracked his recent complaints and am too busy right now to do so. But to speculate that I’ve suddenly changed a longstanding policy of allowing adverse comments is absurd. If he has posting problems from the filters or otherwise, it’s easy enough to email me and he should have done so before whining. You Climateballers are pieces of work.

I wrote a comment at Nick Stokes’ blog explaining how comments get chewed up at at blogs – the website just disappeared it. I couldn’t be bothered to re-write it so I let it pass.

At a later data, I wrote another comment at Nick’s blog and his blog swallowed it again, repeatedly.

I even made a movie of it: https://twitter.com/shubclimate/status/563820850441252865

Shub,

That is an editing issue with the Google Blogger software, which is outside my control. It seems that if you choose an ID after writing your comment, it clears the comment space. Nothing comes through to me. I do not moderate comments there.

At CA, on the other hand, my comments were making it into the moderation queue, where they stayed (visible to me) for a very long time. Below is one that stayed for a week, and never made it. It is just a simple protest at my comment being said to be “making stuff up” when I had based it entirely on the paper which I quoted.

I haven’t had problems at any other WordPress site for months. In fact, I was in the clear at CA until some time during the Sheep Mountain thread. And my one comment on this thread passed without moderation.

After losing many comments at Nick’s blog, I learned to paste a copy of my comment on the clipboard (or in a word processing program) before trying to post it. When it gets eaten, I just paste in into the comment field a second time and it usually works. There are also issues with trying to type from an iPad or iPhone, where the cursor gets locked up, but these are minor compared to losing whole comments.

SteveF,

“I just paste in into the comment field a second time and it usually works.”Yes. I think it works because by then you have supplied your ID.

Nick, the comment I wrote on your website said exactly this: my comment would remain visible to me with the ‘Your comment is awaiting moderation’ on top but would never clear moderation. Sometimes I would get a grey blank screen after posting a comment saying ‘oops looks like you’re already said that’ but the original comment would not appear.

So, when it happens on your blog, it’s “an editing issue with the Google Blogger software”, but when it happens to you, Nick, at CA, it’s the evil CA and Steve Mc. I think an apology from you, Nick, to Steve is a right way forward.

Shub,

The animation you showed does not show a comment going into moderation. It shows the edit screen clear as soon as you press the ID select button. And the remedy for that is to adopt an ID first. I (and the system) can only deal with comments that are actually submitted.

I have the blog set to no moderation, and there is no moderation queue. Comments go either straight to screen or to spam. Very few now go to spam, and I think none of yours. Some time ago I did set it to moderate comments on posts more than four weeks old (which were mostly spam). But not recently.

Nick, you say “The animation you showed does not show a comment going into moderation”

I said I made a movie to show how ‘…comments get chewed up at at blogs”

You misread.

Not at all. Have you read the Sicre paper?

Testing.

Dear Auditor,

You assert that “The commenter’s innuendo is entirely justified and your proffered rebuttal isn’t.” I disagree. Don Don’s innuendo mismanages the commitments in play:

Nick owes no constant room service on CA, more so considering the reception he constantly gets. This leads to a Procrustean bed: slimed if Nick comments, slimed if he doesn’t. Since you allow yourself not to say what you think from time to time, the

more Omertà“trick” on CA may very well be suboptimal.***

Here are some instances with commitments you have yet to fulfill. You still have not declared having read Sicre at the time of writing. No response to Fabio Gennaretti has been forthcoming [1]. For more than a year now, you failed to respond to Robert Way [2] regarding your use of his private correspondence. In this very thread, you have yet to opine on Nic’s exhortations:

Neither have you endorsed Gordon Hughes’ comment:

May I suggest that it is now time for you to lead by example and,

at the very least, commit to Nic’s and Gordon’s claims? Show us why you are the fiercest player in the history of ClimateBall ™, after all. Unless you prefer more Omertà.If you would be so kind as to resurrect the MMH10 thread [3], that would be nice.

Thank you for the kind words,

W

PS: Some, but not me, might wonder why Nic assumes α is constant. Any auditor to wonder why too?

[0]: scientistscitizens.wordpress.com/2011/07/26/debate-in-the-blogosphere-a-small-case-study/

[1]: climateaudit.org/2014/10/13/millennial-quebec-tree-rings/#comment-740376

[2]: neverendingaudit.tumblr.com/post/110629968739

[3]: moyhu.blogspot.com/2011/06/effect-of-selection-in-wegman-report.html?showComment=1308008366856#c7386160124216218370

So, first there was an insinuation by Nick ant thouself that Steve is censoring Nick. When this did not turn out to be true, then, instead of apology we here that it doesn’t matter, Nick is not nicely treated… This is just stupid, snip

Steve: was in moderation for some word. No need to engage in Willard’s foodfight.

No, went still into moderation, has to be some other word… Agh, doesn’t matter🙂

“PS: Some, but not me” blah

Ach, we have another rabett in our midth…

Steve, one thing I worked out during comments to my recent post at Judith’s, and which probably explains many blogs seeing these “unexplainable” false positives, is that the list of words that moderators provide via the WP admin page are NOT words but strings.

WP will match any occurrence of that STRING of characters, not its presence as a space delimited word.

I sussed this because Agung was triggering moderation holds. It turned out that Judy had added “gun” (for some reason) to her block list, thinking it was a word match.

Similarly, I posted using the word “familiar” which trigger moderation because it contains the string “liar”.

The solution, should it be the case here, is to provide moderations traps as quoted strings including leading and trailing spaces if you wish to trap those words.

eg

” liar ”

” gun ”

” Stokes ” 😉

HTH

Well that comment (which the world will see later) got held for moderation … Bingo.

I’ll bet that has correctly identified the problem here and that Steve is including L_I_A_R in his block list.

The original exercise of Marotzke is essentially to reconcile the outputs of models of a chaotic system to a measured global average and justify that the very significant differences are down to “natural” variability. It is obvious why such an exercise was undertaken ie an attempt to keep the models from being thrown on the scrapheap because they are diverging significantly from observation and that the hypothesis that pedicted dangerous warming is a result of anthropogenic CO2 is becoming less and less credible.

It’s like saying that the predictions don’t fit the measurements because there is internal stuff that we don’t know. Well knock me down with a feather. If we knew the unknown internal stuff then the output of the models would be running cooler ie not as dangerous (or dangerous at all) and it would be unreasonable to demonise anthropogenic CO2.

They apparently did not think things through enough. Note that Marotzke of MPI works for the German center of climate modeling, dependent on ever more funding for same. Such a defence was to be expected. Just notmthis poorly. Or, to paraphrase Napoleon, never interupt an enemy in the process of making a fatal mistake.

So, lets suppose for the sake of arguement they are right and Nic is wrong. (The opposite appears true–this is for the sake or argument only.) Then they showed that ‘internal’ aka ‘natural’ variability has a large role in temperature time series.

Ah, but the about 1975 to about 2000 temp rise attributed by fundamental GCM model design to GHG is then falsified. And so the model parameterizations. And so the model outputs. Either way falsified… See my horns of a dilemma comment below.

Many people have learned that a strong enough positive feedback makes the system unstable and behave very differently from the system without feedback. A moderate positive feedback leaves the system stable, but adds to its variability, while a negative feedback makes it only more stable and less variable than the system without feedback.

This case is analogous to that. The kind of circularity the analysis involves affects the stability properties of the whole analytic process that includes both the earlier Forster et al (2013) analysis and this paper. In that analogy the present circularity seems to have a nature similar to the positive feedback making the analysis less stable. It’s, however, not at all obvious that the method becomes worthless. Actually the results are quite reasonable, in general, proving that nothing very drastic takes place. How much the circularity affects the accuracy of the final results is another question that I’m unable to tell.

Pekka

Thanks for your comment; I agree that your analogy seems apt. If the variability of the ΔN term dominated that of the α ΔT term, then the diagnosed forcing would be nearly exogenous in relation to ΔT. However, as I wrote in my article, that does not appear to be the case for recently ending 62 year trends. The inter-model variance of α ΔT is around three times that of ΔN. That seems to correspond to quite strong positive feedback, in your analogy.

I think that you wrote elsewhere that your suggested regression had similarities to one based on my equation (6)? But isn’t the obvious starting place my simple equation (6), rather than your complex regression system? There are only 18 models, and if several coefficients have to be the estimated from noisy data and for a linearised version of the equation (1) model that is extremely approximate, I am doubtful that one would be able to obtain reliable results. In any case, if a regression based on equation (6) – arguably best performed on a logarithmic rather than linearised version – does not yield significant and reasonably stable results for 62 ear periods ending in recent decades, doesn’t that strongly suggest that the whole simple model edifice on which the paper’s analysis is based isn’t valid? And, as I wrote, regression based on equation (6) generally has very little explanatory power.

You say here that the paper’s results in general are quite reasonable. But you wrote at ATTP: “The models have highly different parameter values for α and κ. They are much closer in their temperature trends. Thus they must have highly different forcing histories. Those highly different forcing histories are used in the comparison presented in the paper.”, which implies that you – like me – believe that the parameter values for α and κ do, between them, have a significant impact on model temperature trends, at least over multidecadal periods ended recently. Am I right? Yet the paper claims that they have no significant impact on 62 year trends ending recently, with forcing variations alone dominating.

Nic,

It’s not important that the variability of the ΔN term dominates that of the α ΔT term, it’s enough that the contribution from ΔT to the right hand side of the regression is significantly less than the left hand side. If that’s not the case we might expect some strange variability in the temperatures.

By reasonable I mean that the time series behave essentially as expected. That’s a different thing than the ultimate results of the analysis, and I consider it more likely that the reason for the surprising final results is somewhere else, perhaps simply in the fact that 62 years is so long and more than half of the full period considered, or perhaps the CMIP5 ensemble is not representative due to the implicit (and in part explicit) selective processes that have contributed.

Marotzke and Forster have also listed several problems that they have recognized. Perhaps the real problems are among those.

My impression based on the output of the analysis is that the circularity has probably not affected the outcome very much. It’s influence should be checked, but that’s my guess.

Pekka, with all due respect, your reply is illogical. Think it through rather than reflexively defending the apparently indefensible. Logic explained in more detail in other comments.

Rud Istvan, with all due respect, you not following the argument doesn’t make it illogical.

Pekka, this is not the forum for a discussion of system control under different feedback regimes, but you might ask Anders to give you an ATL where you can explain what you mean, in terms that are used in classical control theory and the huge body of statistical validation that has been explored in control theory.

I have yet seen any application of control theory to cAGW.

DocMartyn,

Pekka was using control theory to provide an analogy, which is fair enough whether or not classical control theory itself is applicable for analysing climate system behaviour.

I generally find Pekka’s comments sensible and informative, and I hope he will comment more often at CA.

Doc Martyn wrote:

“I have yet [to see] any application of control theory to cAGW.”

One reason for that is climate science hasn’t the slightest clue as to the historical mix of feedback versus system capacitance. If you cannot tease those two apart, the control equation cannot be resolved in the time domain, at least not to the point of elucidating decadal trends. There is still the (relatively unexplored) possibility of deriving useful control information using a control volume or control mass approach over very long time intervals, but that won’t help with decadal variability.

I agree with Pekka, and I think the feedback analogy is appropriate. I see here assertions that you can’t do regressions where there is dependence, but that is not true. You just have to use an appropriate covariance matrix, which will be less well conditioned because of the dependence.

There is a familiar example in finding the trend of a time series with autocorrelation. There is dependence between the terms. That modifies the covariance of the random term, often approximated (with AR(1)) by a Quenouille correction. And it generally means the result is more uncertain than OLS, but the expected value is not much different.

Steve: this post was in moderation for 5 hours. I moderate after the fact. I have learned that Nick Stokes has been whinging at his blog about being supposedly censored – even though he’s posted hundreds of comments here. Stokes has gotten onto some wordpress moderation lists but claims to have solved them all. I don’t know why Stokes’ comment went into moderation. He’s not on any CA blacklist despite whatever claims he may make at his blog.Sometimes the angle you take on things makes my head spin. Are you saying it is “not true” to claim an analysis is invalid due to the failure of key assumptions because it is possible to redo the analysis in a more general way, even though that was not actually done? Or did MF, in your opinion, actually specify an appropriate covariance matrix in this instance to address the linear dependency?

Nick, SteveMc,

FWIW, I have had several comments held up at Climate Audit (including one on this thread) for ~2 to ~5 hours. I figured there must be some key words that trigger transfer to a moderation list rather than immediate posting. I did not ever think there were nefarious motives involved, and I rather suspect that is also the case with Nick Stokes’ above comment which ended up in moderation.

Steve: as you observe, Nick is not the only person that gets a comment tied up from time to time for reasons that seem puzzling. It’s easier for me to deal with such incidents manually than to try to figure out the interaction between various spam filters. Because Nick has gotten on wordpress blacklists unrelated to CA in the past, it is entirely possible that he’s run into problems additional to those experienced by others. And yes, there are a variety of key words that trigger moderation, some of which are related to spam control rather than to good manners. For the most part, his comments seem to come through, so I’m not going to lose any sleep trying to figure things out. Because Nick is in an opposite time zone, such delays are longer. I’m also a little inconsistent in my editing diligence and sometimes don’t do things for a few days.

I see here assertions that you can’t do regressions where there is dependence, but that is not true. You just have to use an appropriate covariance matrix, which will be less well conditioned because of the dependence.Maybe I should have said something more like you can’t do a

meaningfulregression when there is dependence.You can use a covariance matrix when there are correlations between dependent variables (those whose coefficients are being regressed) but to do one when there is a correlation between the independent variable and the dependent variables makes interpretation of the results iffy.

In this case, the dependent variables are not independent because several came from the same models. And one of them is estimated using the independent variable, making those not independent either.

It doesn’t matter whether the impact on the results is large or small — in a competent, rigorous, scholarly paper, these problems would have been identified and attempts made to quantify the resultant uncertainties. Nothing of the sort appears in the Nature article. If an author omits a crucial set of issues in a paper, then the entire thing should be called into question until it can be proven valid.

It seems to me that people are defending this paper because they like the conclusions, not because it represents good science. Think hard about that — is that really how you want climate science to move into the future?

I posted an explanation of where this comes from but because of the forbidden word in the explanation it is, itself, held in moderations and Steve has not cleared it.

The problem is that admins put words for moderation traps into the WP interface but WP regards them as strings and matches any instance where a comment contains one of the forbidden words as a sub-string.

Nick say:

“There is a fami_L_I_A_R example …!”

Geddit?

The solution is to pad the words with leading and trailing spaces and put them in quotes. Most admins don’t realise this and commenters across WP are railing against apparently spurious moderation holds that no one can understand.

Some then get all paranoid and start concluding they are banned.

For example to ban Nick Stokes, Steve would need to enter:

” Nick ”

” Stokes ” and not

Nick

Stokes

since that would end up trapping words like “knickers” and holding the comment for moderation.

Hopefully, once our host reads my comment that got held he will fix this and everyone can feel less paranoid.

Probably my comment is very late and will not be noticed.

I have a blog called Science of Doom – it is hosted by wordpress. I do not moderate, but have a bunch of words and a few rules that call wordpress to send a comment into moderation – waiting for me to release it.

That aspect works pretty well. WordPress never lets a comment through if it contains a word – or violates a rule – that I have given it.

However – and this is a bit of a kicker – some comments get put into moderation until I release them. And I can’t work out why. I review their comment – no keywords, or rules violated.

We could say – plenty of “false positives”.

One particular commenter comes to mind – in a given week he might have 3 out of 5 comments held in moderation. In another week he might have 0 out of 5 or 0 out of 10. I can’t understand the logic and I can’t see the reason.

What I *guess* is that some combination of his IP address/name/words are triggering other rules that wordpress has decided are bad.

For someone without a wordpress hosted site it will be different, but the “behind the scenes magic” even with a client side hosted account are not at all clear.

Sometimes people whose comments end up in moderation get a little testy. Other times people are understanding. It all depends on their day, their week and their demeanor.

Your comment is noticed and appreciated. So too is your blog.

It seems M&F have placed themselves on the horns of a dilemma with their reply.

If their procedure is correct, as they claim, it leads to the conclusion that the two most important emergent structural properties, a and k, do not influence model outputs. Illogical, but if taken at face value then ‘internal’ climate variability caused the pause. But then that ‘internal’ natural variability would have also been present in the hindcast period back to roughly 1975 to which the models were parameterized for best hindcasts per the CMIP5 experimental ‘near term’ protocol. And so the underlying temperature rise in this period to GHG is falsified. So the models run excessively hot.

Or, their procedure is faulty because of circularity, and so just produces an illogical result. Since M&F have not addressed the point (because they cannot since it is true) their explanation fails and the models are now falsified. They are too sensitive– because of the parameterization tuning period contained natural variation. Moreover, since the rise from 1975 to about 2000 is indistinghishable from the rise from about 1920 to 1945 (Lindzens point, noting that even the IPCC does not attribute the earlier rise to GHG), natural variation could well be most of the later rise as well. Still more observational support for the root cause of the model/ temp divergence.

Either way, the model results are unsupportable in the bigger picture.

The authors talk about the emergent parameters not affecting the differences in the models trend outputs much, I think, while still, I would assume, acknowledging that the value of the parameters affects the trends and particularly the longer term ones. To me this is saying that the noise level is the overwhelming factor in determining the trend differences and that is where in individual model ECS and TCR values can be 100%.

Is there or should there be any dependency between the emergent parameter ECS or TCR and the natural variation that I call noise? If not the authors should be decomposing individual model outputs into secular trends, noise and cyclical structure and not doing the group thing and assuming the regression residual from these differences is all noise – or quasi-random variability as the authors reference it. In my mind, a better comparison would be the observed climate versus the individual model output and better still where the individual model has multiple runs and the noise levels can be better estimated (modeled).

See Akasofu 2009 on this point. Summarized in essay Unsettling Science with reference footnotes. There is a dependency through the necessary model parameterization (see essay Models all the way Down) for ‘best fit’ multidecadal hindcasts specified in the ‘experimental design’ published by Taylor, Meehl et. al. in BAMS in 2012.

Should have said ECS and TCR estimated values for individual models can be at ratios of 2:1.

More on this topic:

http://www.reportingclimatescience.com/news-stories/article/blog-row-erupts-over-nature-model-paper.html

Steve: ironically their article doesn’t contain a hyperlink to the CA article. They refer to CA, but link to Nature.Link fixed. Apologies.

L

Now they link to CA but refer to (2015, Nature ).🙂

No it says: “Post by Nic Lewis criticising Marotzke & Forster (2015, Nature) here.”

Marotzke & Forster (2015, Nature)is the citation for the paper…

L

Thank you sir, I stand corrected.

“Andthentheresmath, so to speak”…lol Steve!

Made me laugh too🙂

Nic Lewis,

Seems to me the circularity is complete in the paper. Forster et al (2013) defined delta F as you showed above:

ΔF = α ΔT + ΔN (1)

Where ΔN is the change in the TOA imbalance and α is the inverse of the ECS

M&F start with the basic equation:

ΔT = ΔF / (α + κ) (2)

Where α is the inverse of the ECS, and κ is the “ocean uptake efficiency”, or the ratio of change in TOA imbalance to change in temperature, and then add an error term ε. But κ is related to ΔN and ΔT:

κ = ΔN / ΔT (3)

Substituting (3) into (2) we get:

ΔT = ΔF / (α + ΔN / ΔT) (4)

And rearranging:

α ΔT + ΔN = ΔF (5)

Which is nothing more than the equation (1) used by Forster et al (2013). So the M&F paper uses the SAME equation as Forster et al (2013), slightly rearranged, and with an error term added. I don’t see how using an equation to calculate forcing from change in temperature, and then that same equation to calculate change in temperature from calculated forcing adds much to the world’s knowledge.

stevefitzpatrick,

I think you have overlooked that the estimates they use for κ come from a different set of simulations (see the paragrqph below my equation 4) and are quite different from the values of ΔN / ΔT obtained during the Historical simulations – see 2nd paragraph of my section ‘Another reason why Marotzke’s approach is doomed’.

The circularity appears thus to be somewhat diluted, but at the expense of a key assumtion in their eqn. (1), where they apply the previously diagnosed values for κ to model behaviour in th eHistorical period, being falsified.

This is fascinating. M&F and their defenders basically argue on the logic that their is no big deal because the paper’s statistical analysis is validated by it’s producing the result that was expected (by them). The threat of circular logic was the furthest things from their minds apparently.

For sure astrology had a root in strong foundations of settled assumptions proven by generations of observation.

Science and humanity owe a debt to Nic, Steve, Ross and many others for providing an inspectors general of sorts to tax-payer funded science driving our politics, which in turn drives our press, voter choices and science funding. Is there a valid equation for that?

Nic, Steve, Ross Nature does not need to be call you to be a referee/reviewer. You are changing the way science is done, making history here. Public audits are the wave.

I strongly, STRONGLY recommend the work of Judea Pearl and others who have developed a theory of causal modeling. Pearl’s book begins with the striking observation that physical ideas as expressed by equations contain no causal information. If F=ma, does F cause a or does a cause F or do F and a cause m? The theory particularly clarifies complicated observational relationships versus operational relationships in which an exogenous actor changes a variable. The confusion between an exogenous forcing and the endogenous estimation of the forcing would not have occurred if the entire model were rebuilt with these ideas in mind.

The point you just made should be included in the lead paragraph of Nic Lewis’ submission to Nature:

That might limit the back-and-forth arguments over circularity as a necessary result of the M&F formula’s design.

Sorry, but as there has not been an Unthreaded post recently I will steal a moment to post this offtopic tip. The Australian BoM’s adjustments to land temperature records in ACORN-SAT will be checked by a government-appointed panel of stats experts.

So Climate Audit, meet the climate auditors…

http://www.environment.gov.au/minister/baldwin/2015/mr20150119.html

No findings yet, but one to watch in the coming months.

I see Pekka’s comment this morning on Climate Lab Notebook in M&F posted response where he concludes a lengthy statistical summary demonstrating the problem with:

“Starting values are from a database, formulas are given. Results follow from that. Variable F is not a real forcing, it’s a derived construct (ERF) defined in Forster (2013), motivated by physics, but not an externally given forcing.”

His analysis boils down to this: the use of modeled forcing to evaluate modeled forcing cannot enlighten the real world.

My present conclusion is that the circularity occurs in this analysis in the way that it does not result in any problems. It’s typical that the same effect results in a zero in one direction of the analysis and singularity (infinity) in the inverted direction. In this case only the zero occurs in the relevant calculations and that does not cause any problems.

The zero (or variable sign) may occur in some coefficients of the regression, but the regression is well behaved in all these cases. The model obtained by the regression is also well behaved in all the calculations M&F perform. In the mathematical sense it’s possible to define additional questions that involve explicitly ΔN and that would involve singular behavior, but such cases are not part of the M&F analysis and do not affect that analysis.

More on that at Climate Lab Book and still more at aTTP.

My apologies for misinterpreting your final conclusion. I think you are saying that dT caused by dN (the temp change at surface theoretically attributable to TOA radiative imbalance) is insignificant to their results. But I read M&F in their reply maintain it is absolutely necessary to correct for this dT and absolutely deny any circularity.

Pekka, when we have recurrence relationships, isn’t it the case you have to iterate until you’ve achieved convergence?

It seems like even if the recurrence relationship is stable, the result of a single iteration isn’t likely to be accurate.

Carrick,

The nature of this case is not such that iteration is needed.

The starting point is fixed: the results included in the CMIP5 model archive. All input values of the regression come from that archive either directly or through the earlier analysis of Forster et al (2013), or other earlier analyses that have deduced the values of α and κ, which are used and reported also in Forster (2013).

The issue that Nic observed is that the CMIP5 archive does not contain estimates of model specific forcings, only temperatures and TOA imbalances. The forcings are calculated from these. That’s an one-time final calculation, no later corrections based on temperatures derived from the regression model are needed. If it were necessary to calculate such new corrections, then we would end up in iteration and further problems.

Pekka, to be honest I really can’t tell without getting more immersed in this, whether there are quantities you could update using the new value of T (or F).

There is a similar problem in sensor calibration where you measure the rations of sensitivities of two sensors relative to a third source or microphone, and use that to separately compute the calibrations of the two sensors.

In that case, looks on paper like it’s totally circular, but the trick to straighten the reader out is to subscript quantities so you can track where the quantities are being measured.

Here, for example:

ΔT1 = ΔF / (α + κ) + ε

ΔF = α ΔT2 + ΔN

But are ΔT1 and ΔT2 really supposed to be independent measurements here? This is not obvious to me.

What I wrote above – and what I believe to be the case in full agreement with the response of Marotzke and Forster – means that the calculation is not circular in the serious sense that the result obtained on the left hand side would be used iteratively as input on the right hand side. It’s circular only in the way that ΔT appears on both sides of the regression formula, when it’s written as it written in the paper and when ΔN + αΔT is substituted for ΔF.

The substitution is used (implicitly) in the determination of the regression parameters, but, after the regression parameters have been determined, ΔN is not of further interest in the use of the regression formula, which now tells, how ΔT depends on ΔF, α, and κ in the regression model that tells approximately, how ΔT, ΔF, α, and κ are related in the actual CMIP5 models. Thus the whole regression is just a simple multilinear fit to the model behavior.

This is a totally well behaved way of figuring out something about the model ensemble. (I’m a bit embarrassed that I didn’t see that more rapidly, but I wasn’t alone in not understanding the situation immediately.) The main limitation of the approach may be that it’s a multilinear regression that cannot describe any more complex variation of ΔT in the 3-dimensional space of the other variables, while the formula used to motivate the regression

ΔT = ΔF/(α+κ)

is clearly not linear in the variables and leads to a dependence that cannot be approximated well by the multilinear regression over a wide range of parameter values. The values of the other variables vary, however, over a wide range. The sum in the denominator has the range 1.17-2.81 in the model ensemble, the two parts of it vary even a little more. The overall adjusted forcing from doubling the CO2 concentration varies similarly significantly (2.59-4.31), but the values of ΔF over the 15 year and 62 year periods vary surely more than that, and include also periods of decreasing forcing. Thus the linearization is a crude approximation that may affect the outcome a lot.

Carrick,

Perhaps the comment that I wrote simultaneously with your comment helps you in understanding the case. If not, I try to add more.

Thanks Pekka, that makes sense.

Pekka, Carrick,

One question I have is the sequence of the regression performed. Pekka, when you write your version of the equation removing deltaF (dF) as you have here, it seems that sets up a regression model that simultaneously produces a “best fit” of dT to all the various empirical parameters. However unless I’m mistaken they are using the dF results of a prior fit to the models (Forster 2013) which doesn’t consider all the factors that this paper does. Thus they haven’t performed the multiple regression simultaneously on all parameters as one is really supposed to…

bill_c,

The regression is done in a way that’s totally equivalent to doing it simultaneously. The result is fully well defined for the determination of the regression coefficients that they define. The only problem is that the resulting regression formula may have diverging coefficients when it is solved for the temperature trend, when the free parameters are ΔN, α, and κ, but this formula is not needed in any application included in M&F, and it’s difficult to see where it could be needed.

With this set of variables the regression formula tells without any problems the energy flux contributions based on ΔN, α, and κ, but these contributions balance closely by themselves without the term proportional to ΔT, which has a coefficient near to zero in that case, and is therefore small for all reasonable values of ΔT. That’s enough to justify all the analysis of M&F. It’s never necessary to use that formula to calculate ΔT, and that’s the only step that could be problematic.

The division of ΔT to contributions from the other parameters and a residual is well behaved, when the energy flux parameter is ΔF, not necessarily when it is ΔN, but there are also other reasons to pick ΔF.

Pekka:

Did you not read the paper? The authors state:

The calculation of the predicted temperatures and the residuals from the regression is

the main reason for carrying out the regressionThe possibly “diverging coefficients” is by far not the only problem with the regression when it has been reformulated in the proper manner. The situation is not as simple as it appear to you. I hope to post a comment on CLB (and here) on that later today.

Thanks Pekka.

RomanM,

Yes, that’s the idea of the paper, but that’s done for the regression model that uses the free variables ΔF, α, and κ. That’s perfectly legitimate and supported by physical arguments. That does not lead to any problems in determining the residuals.

I repeat once more also some of the caveats acknowledged also by the authors:

– A linear regression model is not an accurate model, but can describe only some leading features of the models.

– The variable ΔF used effectively in the determination of the regression parameters is not exactly the same as the ΔF that occurs in other connections, it’s only an approximation, but the best approximation they have at their disposal.

The list can be continued. Thus it’s justified to have some doubts about the accuracy of their results. There may very well be also some more fundamental issues, but going through step by step, what they must have done shows that the circularity does not enter in a damaging way. All steps are stable against consequences of that.

Why not use ΔF from the RCP files, which is certainly independent of temperature, rather than ΔF_est = ΔN + α ΔT?

HaroldW,

Values of forcing taken from the RCP files would give rise to a huge divergence between the emulated temperature from Forster’s simple model and the GCM’s actual historical temperature. Some of the calculated AF values from Forster 2013 are less than half those in the RCP files.

HaroldW, you bring an interesting point up here. The abrupt 4XCO2 experiment used for CMIP5 models was a special experiment devised I assume to better capture the equilibrium climate sensitivity and yet not have to run the models for millennial time periods. A lot of the output of the regression on that data depends strongly on the pre industrial control data (piControl) used for each model to adjust the TOA and surface temperature from the 4XCO2 experiment.

However, why would not the authors M&F run their regression/model output on RCP4.5 or other scenarios and determine how well it agrees as kind of an out-of-sample test? Not sure I have completely thought through this but it is a thought.

From what I can gather by reading comments here and at Climate Lab Book, there appears to be an emerging consensus that M&F does incorporate a degree of circularity in its use of deltaF derived from an earlier study. The disagreement now seems to be centering around how significantly this has affected the (surprising) outcome of the paper. Nic Lewis above states that the circularity may be ‘somewhat diluted’ but only at the expense of untenable assumptions in the calculations elsewhere.

Pekka, whilst acknowledging the circularity, believes it may not significantly affect the analysis, though admits that it may make it ‘unstable’. Pekka furthermore suggests that the “surprising final results” may have their origin elsewhere. Pekka also states on CLB that “Variable F is not a real forcing, it’s a derived construct (ERF) defined in Forster (2013), motivated by physics, but not an externally given forcing”.

So, from a purely logical point of view, the paper appears to be seriously flawed by its inclusion of this circularity, whether or not it significantly affects the final results or not. Another problem seems to be the poor choice of ‘independent’ periods, particularly the 62 year ones. For these reasons, it appears to me, the ‘surprising’ conclusions cannot be relied upon from a technical perspective, nor indeed also from a scientifically purist viewpoint.

Good summary. I was struggling to put something similar into words.

This is an interesting read from top to bottom. Thank you all.

Another problem with many of the discussions above is that you cannot invert a regression. That is, if y = a * x + error is an optimally fit OLS regression, then x = y/a + error is NOT optimally fit. This is also true for more complicated regressions (other than OLS). Claiming that the coefficient is a real physical constant does not fix this problem if the constant was estimated with regression.

I have linked to an Excel file at Dropbox that shows the plots I made for a Singular Spectrum Analysis decomposition and reconstruction for some CMIP5 model Historical and Pre-Industrial control runs and observed temperature series. Another worksheet shows the percent variance explained by the principle components used and some ARMA modeling results. The plots show a secular trend (red line), some cyclical components and the residuals (black line).

One can see some large differences in secular trends which can represent the deterministic part of the series. The cyclical and noise component while visually different pattern-wise among the models is at near the same level. I make no great claims for this analysis other than it does show differences model to model and further shows a measure of the deterministic trend and natural variation for the models and observed temperature series. I cannot reconcile these plots with the findings in M&F under discussion here.

https://www.dropbox.com/home?select=SSA_Obs_CMIP5_Models.xlsx#

Are there mistakes in the expansion and regression of Equation (3) in M&F’s paper (that have nothing to do with circularity)?

First, they appear to be using the approximation that 1/(1+x) is approximately equal to 1-x when x is small. In this case, x is equal to (a’+k’)/(a+k) where a is the ensemble-mean climate feedback parameter (a_overbar in M&F), k is the ensemble-mean ocean heat uptake efficiency (k_overbar in M&F), a’ is the “across-ensemble variation” in the climate feedback parameter and k’ is the “across-ensemble variation” in the ocean heat uptake efficiency. The ensemble range for a and k are 0.6–1.8 and 0.45–1.52 W/m2/K. It isn’t obvious to me that x must be small enough for this approximation to be valid.

The authors define a’ and k’ using the phrase “across-ensemble variation”. It isn’t clear to me what this phrase means. During regression, each model presumably has a distinct a’_j and k’_j. Presumably each a’_j must come from subtracting the model climate feedback parameter from the ensemble mean climate feedback parameter. If so, the approximation is incorrect for at least some of the models.

Second, when one transforms the expansion they obtained into the regression equation immediately below, the coefficients beta2 and beta3 are required to be equal. Instead of two independent terms, there should be a single coefficient multiplied by (a’+k’).

Frank,

All the apparent derivation is really only motivation for the rest. The actual analysis starts from the equation that has the betas in it, i.e. the unnumbered equation above equation (4).

Pekka: One can dream up many possible regression models for fitting this data. If the models are purely statistical in nature, one has difficulty deciding which model to use and what type of noise it may contain. (For example, the IPCC has arbitrarily chosen to use linear AR1 models to fit the historical temperature record, and I’m sure you are aware of the controversy that choice has caused.) Our understanding advances much more rapidly when we use “physical models” in place of “statistical model”. However, you must handle the physics equations correctly; not be “motivated” by flawed mathematics.

Furthermore, if your explanation is the correct (and they were aware of these mistakes), the authors have inexcusably deceived the readers of this paper.

You are correct on all counts here.

The approximation is indeed just the first two terms of the series expansion of 1/(1+x). In the paper, the authors state: “This equation holds for each start year separately and suggests the regression model…” which somehow justifies the separation of a and k when, as you noticed that the a and k terms had the same multiplier in the expanded equation. This separation helps to remove the individual effects of a and k from ΔT and they are then surprised that the residuals and the predicted values seem not to depend on the differences in the various models (which are related to the values of a and k.

Roman: Thanks for confirming my work. It always seem more likely that I have made a mistake rather than an error like these two getting all the way into a published paper.

The coefficient is the same in the expansion of the simple formula, but α and κ have different roles in the models. Therefore it’s not known, whether they affect ΔT very similarly or not.

As I already wrote, all the discussion that precedes the first formula with betas as coefficients is only motivation, not derivation.

Pekka:

α and κ may have different roles in the models, but the starting point

physicsequation for analyzing ΔT postulated by the authors is ΔT = ΔF/(α + κ) + ε.In that relationship α and κ impact the the result

onlythrough their sum. A change of an amount δ in α has the same effect on ΔT as a change of an amount δ in κ. So the proper “expanded” equation for this situation is two keep the two variables together as a single variable ρ = α + κ. This is indeed still the case in the unnumbered equation next to Figure 2c in the paper.However, the authors then make an unjustified assumption that the two variables have different effects on ΔT by presenting the “suggested” version of the regression equation actually used. In the case that they are not separable, the introduction on an extra parameter provides room for

wiggle-matchingand possible distortion of the results. Do you not think that it would have been more appropriate to do the initial regression using ρ and then look at whether there is the assumed relationship exists between α or κ individually and the residuals and predicted values of the regression?As it stands, I did not see any formal analysis in the paper that justifies the use of the regression in the form that the authors used.

Roman: Suppose I want to use linear regression to analyze data that arises from a physical situation that produces y = a/(1+x) relationship. I inappropriately use the approximation y = a-ax (ignoring -ax^2 and possibly higher order terms that may be significant). Aren’t I going to end up with residuals that are much bigger than necessary? In M&F, the residuals are interpreted as unforced variability. If you apply ANY physically-inappropriate regression equation to the CMIP5 data, you will artificially inflate the unforced variability present in the CMIP5 output.

M&F have construct a model that converts the histograms in Figure 1 into the histograms in Figure 2. Common sense tells me that they have made a mistakes somewhere in the process.

Furthermore, the regression equation should not have separate terms for α and κ, since they only appear as a sum. If this degree of freedom were removed, I suspect the spread of the histograms in Figure 2 would widen even more.

From M&F response on CLB: “Because radiative forcing over the historical period cannot be directly diagnosed from the model simulations, it had to be reconstructed from the available top-of-atmosphere radiative imbalance in Forster et al. (2013) by applying a correction term that involves the change in surface temperature. This correction removed, rather than introduced, from the top-of-atmosphere imbalance the very contribution that would cause circularity.“

One can read this paragraph many times and still not understand, IMO, that M&F are completely ignoring the accusation of derivation circularity and instead replying with a recitation of Forster’s 2013 methodology for finding F (forcing), which was to derive a conjugate assumed temperature increase for every TOA imbalance whose sum would be assumed to equal the forcing caused by the known increase in GHG. And, thus their coy response is that if one failed to plug in the temperature correction, as Forster aptly did, one would chase their tail theoretically as the forcing was satisfied toward equilibrium, (radiant balance).

Technical talk here about their regressions being statistically troublesome (or behaved) is IMO being blinded from the forest by the trees. The aim of F&M was to validate CMIP5 wholesale to quash mermors of it already failing the old fashioned way. So they selected 36 models (not 114 as reported in Science Daily) out of CMIP5. Perhaps Nic knows if they were the same models unchanged from Forster’s 2013 study. They also ran a subset using 12 of these filling them with AR5 data and running them from 1900 (as if nobody ever thought to do this before.) Then they were relieved to report that the largest divergence for any 15 years was 0.3 K. Yes, they could hit a barn.

Their conclusion is the same as the assumption: that 15 years is completely filled random chaos that covers the forcing signal. One must wait 62 years before forcing can resolve truly into view (with some uncertainty from 5-95%). The unmentioned huge assumption that all on CA are all too familiar with is: there is no centennial variability to worry about (thank you Mann Hockey Stick).

“The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded. “ — Marotzke and Forster

Ross at top wrote: “…dN is a term capturing the Top of Atmosphere radiative imbalance, which in this context is just a source of noise, …”

I believe dN is central to F&M’s work here. It’s likely inflated and taking up the slack to account for lack of temperature rise which should be taking it’s place in the energy balance. But I find the proposition the atmosphere lacks the ability to warm itself to a new equilibrium in one year is troublesome, never mind 15 years. I know what you are thinking; ocean heat banking. But isn’t this accounted for now in the kappa term? So what am I missing?

What will the next paper’s assumptions and conclusions be if the pause continues to 20 years? Is anyone making predictions anymore?

I believe Ross’s “context” for dN is within the circular regression equation where dF is re-written as: a*dt + dN. dN is then simply an added term on the rhs of the regression equation and not a predictor wrt dT. No different then the way the error (noise) term is configured in a simple regression equation.

I don’t see any discussion about the model selection, which being itself here being analyzed, is of course crucial to the science validity. Nic wrote there were 18 models; the paper’s abstract notes listed 36 and 12; Nature’s table shows 35 models; and Science Daily reported 114 models. The Nature link here shows the table of models, 17 with forcing data, each model having (randomly?) 1 to 10 realizations run, for a total of 114.

http://www.nature.com/nature/journal/v517/n7536/fig_tab/nature14117_ST1.html

As we know from M&F now the forcing data was from Forster (2013) the question becomes how familiar was he already with the models and their behavior as he selected them for study and which to repeat and to aggregate subsets of. All these question I submit are fraught with peril.

If I understand Pekka’s feeling now on the circularity its that if the forcings were averaged and made one value before reuse that would dilute the identity problem of F somewhat. Does anyone know if the ensembles were all fed the same F value?

R Graf:

How valid are multiple entries from the same model? Presumably if you imagine the extreme case where all data points are from separate runs of the same model, then all variation in dT is due to the internal variability. Maybe only one run from each model should be included.

Now that M&F have proved the models reliable and Mother Nature not there is time to run all the models in a 100 different ways. Right?

Forster et al 2013 analysed 23 models for which at least some data they needed was available, but for some of these TOA radiation data was not available for the relevant CMIP5 simulations. This data is required to compute N and hence derive F. Also, data for the FGOALS-s2 Historical run was subsequently withdrawn as faulty.

It is unclear to me why the inmcm4 model was excluded from M&F’s study, but otherwise the set of 18 models used looks logical to me. Note that although the NorESM1-M model is not ticked in Extended Data Table 1 of he study, I believe that is probably an error and that it was in fact included in the analysis.

Every observation of real data has some error attached. The errors will have some statistical characteristecs, possible pernicious, possibly benign. If the observations are fed into an equation in order to infer an estimate of another quantity, the equation itself is an observation process for the inferred quantity. For example, if x = a + b, and a and b are to be inferred from an observation of x, then the errors from observing x will be distributed to a and b, but, constrained by the equation, the errors will be negatively correlated. This will be a property of the estimate, regardless of the assumed physical proprties of a, b and x.

No response yet from Marotzke on my request for data as used and details on his methodology,

It’s a really tough decision. It might take him a while.

But what if they find something wrong with it?

(This comment is a re-post of a comment posted at the Climate Lab Book blog)

Pekka has proposed that the regression can be done in a restated form of the original equation. This is incorrect. The problems with the regression model adopted in M and F are due to the endogeneity of the situation and in no way do they depend on (nor does this comment address) the correctness of the specification of the model.

In order to understand the arguments on the effects of circularity on the regression used in M&F, it is necessary to look at the Least Squares methodology in a bit more detail.

The authors start with a mathematically based statistical model:

ΔT = a + b ΔF + c α + d κ + ε

In the model, the variables ΔF, α and κ are assumed to be independent of ε which accounts for the random variation of ΔT in the statistical model. The ε’s are assumed to be independent of each other and to have means equal to 0. In this case, the authors have implicitly assumed that the ε’s are also homoscedastic, i.e. each having the same variance. There is a further very important assumption that the ε’s also be independent from all of the predictors.

In LS, estimates of the coefficients and the variance of the ε’s are obtained by first forming a sum of squares of the residuals:

SSE = ∑ε

^{2}= ∑[ΔT –(a + b ΔF + c α + d κ)]^{2}and then minimizing SSE with respect to the parameters a, b, c and d. It should be noted that the parameter estimates are functions not only of the non-random variables, but of the ε’s as well so they are random variables within this structure. In this case, the minimization procedure is simple to carry out using easily calculated matrix algebra.

Now what happens if ΔF is calculated from a previous relationship with two variables: ΔF = α ΔT’ + ΔN?

We substitute this relationship into the original equation to get:

ΔT = a + b(α ΔT’ + ΔN) + c α + d κ + ε

If ΔT’ is not the same as ΔT, then nothing is changed. The variables on the right hand side are still unrelated to ε and the entire procedure gives identical results to the previous case. However, if ΔT’ and ΔT are identical, the situation becomes radically different.

Now, ΔT has become a predictor of itself and the ε’s are present not only at the end of the regression equation, but also (invisibly) through the ΔT which is also on the right hand side. The predictors have violated a very important assumption that they must be independent of the ε’s. Hence, the usual simple regression procedure fails and all results from it are spurious. Estimates of the parameters, confidence interval and p-values will be biased and therefore neither reliable nor scientifically meaningful. This violation occurs even if one uses ΔF in the regression procedure. Despite the fact that you can’t “see” ΔT in the equation, its effect is still present mathematically because it has been used in the calculation of ΔF.

To produce a solution for this situation, the regression equation can be rewritten as Pekka suggests:

(1 – b α) ΔT = a + b ΔN + c α + d κ + ε

and the sum of squares becomes:

∑ε

^{2}= ∑[(1 – b α) ΔT – (a + b ΔN + c α + d κ)]^{2}Minimizing this with respect to the coefficients in the equation is not as simple as in the above cases, but can be done with a little bit of programming or by using available optimization techniques. But the story does not end here. From the regression, we need to form the decomposition:

ΔT = Predicted(ΔT) + Residuals(ΔT)

In the ordinary regression case, the predicted value is calculated by replacing the corresponding values of the predictor variables into the equation for each model. The residuals are then calculated by simple subtraction or taken directly from the minimizing process for SSE. For the circular case, the entire equation must be divided by (1 – b α). Note that α (and therefore 1 – b α ) are in fact vectors whose elements have different values depending on which climate model the particular observation is from. This has some important consequences, not the least of which is the introduction of bias into the entire estimation process.

First, b (and therefore 1 – b α) is itself a function of the ε’s in the model. The distribution of the ratio of two random quantities is very complicated and can be unstable, particularly if the divisor is close to zero.

A second consequence is that the effective coefficients for the predictor variables will be different for each climate model in the regression. As noted above the divisor is a vector so the actual value will be different for every observation.

Finally, the residuals are no longer the ε’s themselves. Due to the division process, they have become ε/(1 – b α). Their independence has been destroyed due to the common presence of the estimate of b and they are now heteroscedastic with a variability depending on the sign of b as well as the magnitude of α.

The bottom line is that the regression done in the M and F publication is inappropriate and their subsequent results are scientifically unreliable and difficult if at all possible to correct.

RomanM,

In the part of the analysis, where the regression coefficients are determined. ΔT is external data that’s used to determine the regression coefficients. As external data, not affected by the calculation it may occur any number of times in any number of places without causing problems in the calculation, it’s just a fixed set of numerical values as are the values ΔN and model specific parameters α and κ. This is data picked from the database. This data is used to determine the regression coefficients, that’s all, and that’s so simple.

Pekka,

How did they derive the data in the database?

As I understand it the value for F is not external data but the estimate of the value of external data. It is derived from internal values. There is an external but unknown parameter that is being estimated and the estimate is contained within the variable F.

Since it is an estimate derived from a calculation the value contained in variable F is subject to the errors associated with the internal values T and N is thus dependent on them. It is not an independent external value.

I’m confused. The issue is not whether deltaT is exogenous but whether deltaF is really deltaFhat(deltaT), a stochastic estimator correlated with the error term that is pretending to be a deterministic variable uncorrelated with the error term. “That’s so simple.” Apparently neither view of this is simple to those seeing it the other way.

stevepostrel: ” “That’s so simple.” Apparently neither view of this is simple to those seeing it the other way. ”

Apparently Steve Mc’s statement to aTTP “you’ve grabbed the stick by the wrong end” that was snipped by Ed Hawkins at his site as offensive or inappropriate was actually not meant to be offensive but quite appropriate to describe the reason for misunderstanding each other.

Roman, Pekka,

You are speaking two different languages. I suspect that only analysis of synthetic data with known characteristics using the paper’s methods will resolve the issue. My guess is that Roman is correct here, and that the circularity makes the paper worthless. But in any case, it is a question which should be possible to resolve with little doubt.

We have seen this situation several times before (Beanstock’s papers, for example)

Essentially, the physicists say that the statisticians lack physical knowledge and acumen, while the statisticians respond that the physicists abuse known statistical procedures

They do indeed talk past each other. I expect this to never be resolved

As to the debate as to whether ΔF is an external independent input variable, in most cases it would be because is an independently measured or estimated input to get an output of ΔT. Models output ΔT, which was evaluated by M&F. But in Forster (2013) a new machine outputting ΔF was created using the climate model’s parts. This also would be fine unless you are using the same ΔF as feedstock to the same model you got it from. If you dilute the output by averaging ΔF from multiple outputs or use one output on multiple models as input you should expect less and less inherited traits but they are still there.

ΔT = ΔF / (α + κ) + ε Where we now realize ΔF is dependent on ΔT twice, once by the equation and once by internal construct (is my understanding). Does anyone think the equation is an acceptable for analysis as stands? Seeing alpha and kappa in the denominator of this equation gives no information as to whether these feedbacks that only temporarily affect (T) like, TOA imbalance (N) and ocean heat banking (k), each with its own timescale, versus convection efficiency, which is independent of time but dependent on temperature. The equation needs more components be meaningful.

Really interesting discussion – tks.

For me, it’d be even better without crud like personal jabs, blog-vs-blog sniping, discussions of who ATTP’s Clark Kent is, moderation whining, general low-rent rhetoric etc etc unless formulated so as to deliver a high humor/tedium ratio, which seldom occurs.

CA obviously has a better signal/noise ratio than the propaganda sites, Climate Etc and so on, but I think it would be a lot better with a bit more moderation rigor:

– Nothing OT.

– No tedious ideological rhetoric.

– Nothing about individuals’ identities, characters, motivations, politics, dress sense.

– No room for blog-war spill-over from elsewhere.

– Unless funny.

any thoughts on the issue you would like to contribute?

Rigour is more precise than rigor if you want more [unless]funny.

Szilard–

I got myself schooled mostly as a scientist (chemistry) but am totally out of my depth here in nearly all these posts, such that often when I contribute it is with “humor” only. And I tend not to get snipped for it, but it has no real value otherwise, it advances no discussions. So perhaps the “tedious ideological rhetoric” and “blog-war spill-over” is more valuable over-all. Perhaps. Myself, I say, “It’s the internet!”, which roughly translated means, “Here on the internet things are supposed to be free-wheeling and wild.” And on top of it all, I learn a great deal from S. McIntyre in how he “runs” this blog–I learn things about to run my own life (!). For example, he allows food fights up to a point; he’s most generous with critical comments; he doesn’t try to be a “blog tyrant”; he tries to get at the truth and stick with it; he is able to make his points often in colorful ways. I say that all this is good. If you come here a lot I think that you will have a similar evaluation. And also, thanks for your honesty in speaking your mind as you did.

Several people have raised questions about my comments. I don’t answer them separately, but try explain once more, how I see the situation.

CMIP5 database contains data on model runs from many models, and several runs from most of them. The model runs have resulted in a “spaghetti” of temperature histories. As the models and model runs differ in many different ways, it’s difficult to figure out, what the temperature histories tell. Marotzke and Forster present an attempt to extract information from that spaghetti. Based on various arguments they end up in the hypothesis that the variability of the temperature might be related to three other characteristics of the model runs by the formula

ΔT = a + b ΔF + c α + d κ + e, (1)

in the way that the residual e would be mainly internal variability that cannot be explained causally. Δ refers here to change in the variable expressed as linear trend over a period of 15 or 62 years.

None of the variables ΔT, ΔF, α, and κ is directly input to the models, all are determined from the model results. As far as I understand, only ΔT can be found directly from the results of the model runs, all others are determined by analyzing the model outputs in separate studies. The values of F have been obtained by Forster (2013) for every year and every model run using the formula

ΔF = ΔN + α ΔT, (2)

As all variables are based on model results, the meaning of each of them is defined strictly only by the procedure that’s used to determine it’s values from the model output. Thus use of the regression formula involves two assumptions:

1) we understand what the variables mean

2) the formula (1) is a good enough description of the behavior of the actual models.

Marotzke and Forster state the first point at least implicitly, and the second point explicitly. Thus they agree that the assumptions are only assumptions, part of the hypothesis they have made.

After these preliminary considerations we have two main steps in the analysis that they report:

1) Determination of the regression coefficients a, b, c, and d separately for every period considered (98 periods of 15 years, and 51 periods of 62 years).

2) Use of the regression models to draw the graphical presentation of the results and to draw other conclusions.

The first step

does not contain any circularity at all. It’s a set of straightforward calculations to determine the parameters a, b, c, and d. All the numbers picked from the model runs are well defined. The fact that ΔT appears both on the left hand side of (1) and affects the right hand side of (1) through (2) does not change that observation. (The value on the left hand side is not determined by formula (1) and fed to (2), but totally fixed by the CMIP5 data.) All the variable values in (1) are totally fixed by the CMIP5 data.Now we have the regression models

ΔT = a + b ΔF + c α + d κ (3)

for every period with coefficients determined in the first step.

Next we face the question of, what formula (3) really means, and how it can be used. We have a well estimated regression model for variables, whose meaning is not as well understood. α and κ are perhaps not really what their names imply. The dependence of equilibrium temperature on forcing is not necessarily as simple as the defining formula assumes making α perhaps to vary over time. κ may also be variable in the models and depend on the initial state. Similar problems apply to F. Forcing is a consequence of changes in external variables like CO2 concentration, volcanic activity, and aerosols, but the operational model F defined by (2) is not controlled in a well understood way by these external factors.

In spite of all the issues of the above paragraph, we can find out the ranges that the variables have according to their operational definitions in the ensemble of model runs, and we can calculate the residuals, when we apply (3) to each model run to predict ΔT. M&F report the predicted values of ΔT and residuals in their Figures 2 and 3. They report also the contributions of the three variable terms of (3) either in the paper or in the extended data.

So far so good. But what does this mean? Do the results support their conclusion:

Or should we believe the surprising result of the paper that α and κ have little influence on the temperature trends as solid and valid for a non-biased set of models with α and κ that mean, what they are usually defined to mean?

These conclusions are not as obviously correct as the technical correctness of their basic approach. The authors do also tell about caveats that might undermine these conclusions. There’s space for further study. Access to their data would help in some of that further study, but probably it would be better to go to the original source of the data (the CMIP5 database) and use it’s content in some other way.

It’s also possible that the CMIP5 database is too restricted as a source and that further and quite different model runs are needed to learn essentially more even about the present models.

Pekka, non-independent analysis is common in research of all stripes. That the conclusions drawn from such work may be correct, or incorrect is immaterial. That their methods and analysis do not lead to the conclusions is the key. Healthy science is to cut out such wrong methods quickly before others ‘build’ on it and advance more theories, professors commit their students to blind alleys of inquiry and funding is poured into chasing ghosts. Non-independence is sometimes not easily detected and the clues are papered over with egos and reputation. At least that is not the case here.

There’s no reason to worry about spread of some bad methodology in this case.

The only real method used is regression. That remains as good and as prone to misuse as before.

The task is finding a way of telling in a more transparent way what an ensemble of model runs contains about one limited question.

The rest is so case specific that there’s nothing to spread.

I should have been more clear. Wrong methodology should be cut out before conclusions from such methods become accepted as part of the scientific discourse – in this case, namely, that models contain meaningful natural variability, among others. If not there are two possibilities – either this paper becomes the last word on this topic or others build on the assumption without questioning it, and both are bad outcomes. It the methods are not correct, the conclusions are not useful.

Well does it, Pekka? Do you agree that this paper proofs that the climate models do not overestimate, everything is fine and dandy and the hiatus not being in the models seem to be a problem. At least that’s what M&F claim and parrotted in the MSM.

I find it puzzling to note that many here and in other blogs have difficulties to see what the authors want to proof anyway.

Another thing, Nic Lewis who wrote this scathing rebuttal, seem to be more busy with other things (see his Climate Lab comment) than to react to Pekka’s points. I find this rather disappointing.

“I find this rather disappointing”

+1

Pekka

You’re argument seems to completely ignore the existence of variations in ΔT that are not explained by the regression (residual “errors”). These appear on both sides of the regression equation, contrary to the assumptions in ordinary least squares regression. That is what leads Roman M to say:

“Hence, the usual simple regression procedure fails and all results from it are spurious. Estimates of the parameters, confidence interval and p-values will be biased and therefore neither reliable nor scientifically meaningful.”

The fact that the values for ΔT are simply numbers in a database is not relevant.

You also write:

“Access to their data would help in some of that further study, but probably it would be better to go to the original source of the data (the CMIP5 database) and use it’s content in some other way.”

I don’t know if you have ever tried obtaining and processing CMIP5 data, but it is a complex and time consuming business. The file structures, model grids, etc. vary from model to model. There is a great deal of processing needed just to get the raw data into useful form, and in my experience it is not easy to automate the processing. There are also errors in CMIP5 data, and it gets updated from time to time.

There is also quite a lot of post-processing involved. For example, if may be necessary to identify corresponding segments of the preindustrial control runs and to deduct offset and drift ocurring in them from the data being used (here a splice of Historical and RCP4.5 experiment data). M&F don’t mention doing so, but this was done in Forster et al 2013.

Masking for HadCRUT4 observational availability requires further processing and defining of rules as to what counts enough data in each time period. M&F provide little details of their methods, and replication of their work would be difficult if not impossible without provision of a detailed, algorithmic statement of their processing steps (in the form of their computer code or otherwise), along with all non publicly-available data used.

Nic,

I do not ignore anything that is there. I only observe that it has no effect of the kind you seem to think. It’s a red herring that having ΔT in that way on both sides is a problem. It’s not, the claim that it’s a unfounded assertion based on misunderstanding the situation.

That the values of ΔT is not only relevant but essential, because that’s the reason that prevents any problems of that kind from entering the analysis at any point in the determination of the regression formula.

Similar formulas are problems in some other problems, where they are used in a different way. Therefore it took me as well some time to realize that it’s not a problem at all in this case.

Wouldn’t it be relatively simple to set up an experiment with synthetic data to investigate the conjectures that the variations in ΔT either make of do not make a difference. The effect could be quantified by simulation and its significance for the result determined. I’ve followed the discussion, as much as I could, here and at the other site. It just seems to be a trading of assertions. The requirements for OLS are not met on one side. And on the other side a) The requirements for OLS are met or The requirements for OLS are not met but any error is insignificant. An investigation with simulated data could resolve this dispute.

Tom,

What synthetic data?

The procedure of M&F is basically stable and works surely without any problems with any synthetic data (I assume no simple technical errors, but have no reason to think that there are technical errors).

If we introduce a model with some suitably chosen properties and generate from that model both the synthetic data and results that the analysis should reach, we may find a contradiction, but there’s no way of proving that the expectations we have generated is really correct. Thus the contradiction proves nothing. The error can be equally well in our model as in their method.

The only models that can be trusted to produce relevant synthetic data are the models used to generate the original data in CMIP5 data base, because the analysis is supposed to tell about those models. Further model runs by the same models using significantly different forcings might show that the final conclusions of M&F are erroneous – and I would not be surprised if that were the case. That would not mean that they have made technical errors in the analysis, but that would mean that their assumptions are not correct enough or that the CMIP5 runs used are not representative of models running under different conditions.

“It’s a red herring that having ΔT in that way on both sides is a problem.” Hmm. Don’t know much about climate science or statistics, but I do try to be a student of the Scientific Method. Karl Popper, whom I love to channel from time to time, has written that if an oracle in ancient Greece had said, “The structure of DNA is a double helix,” that would NOT be a SCIENTIFIC truth. A truth is only a scientific truth if arrived at by scientific methodology, so the METHOD is extremely important at all times. So here we are, staring at delta T on both sides of the equation–and how can we look at that and say that this method is still scientifically valid? Maybe the RESULT is true, is valid–and it certainly has all those scientific-looking trappings–, but one might say that any such result cannot be SCIENTIFIC truth, more like oracular truth.

I don’t think Pekka grasps the idea that there are supposed to be true parameters a, b, c, etc. inherent in the model and that these are what we care about. Later he says “The goal is to describe the output of the models summarizing certain potential relationships between the variables considered. Thus looking at the output is what must be done – and is done.” This statement seems clearly wrong–the goal of the regression is not to summarize the observable data relationships in an exploratory fashion but rather to infer the hidden true values of a, b, and c in the model.

For that purpose the regression estimators are random variables that either do or do not converge to the true values with more data accumulation (i.e. they either are or are not consistent estimators of the parameters in the sense of plimming to the truth). Roman and Nic have shown that they do not converge. Yes, you get “numbers” from performing the regression steps but those numbers have no necessary relationship to the true values a, b, and c that we care about. The reason for the regression failure is correlation of the RHS variables with the error term, a garden-variety endogeneity problem often encountered in trying to run regressions.

For ordinary mortals working in the biomedical-research field, to get a paper published in high profile journals as Nature and Science is a very challenging enterprise. The vast majority of submitted manuscripts won’t even pass the first selection by the editorial board and are upfront declined. The ones that get through face a very stern review process and more often than not, the reviewers will ask to do a multitude of additional experiments to further strengthen the conclusions. Mind you, we are talking here about laborious experimental work, which, believe it or not, is even more painstaking than reading thermometers, measuring tree-ring widths or copy-paste data into computers. You may understand that I was a bit surprised to read that for climate research it is sufficient to present indicative, exploratory results in a Nature paper. Some animals are indeed more equal than others.

I should perhaps add that what I have written is true for the regression model chosen by the authors. An essential detail in that is the way the residual is introduced. When we keep the basic idea that regression coefficients are determined by minimizing the sum of squares of residuals, an alternative definition is to link the residual directly to the value of ΔT, not as it’s done in the paper. In this case we have using my notation

ΔT – e = a + b(ΔN + α(ΔT – e)) + c α + d κ

and

e = ΔT – (a + b ΔN + c α + d κ) / (1 – b &alpha)

This alternative regression model has for some model runs and some values of b very large coefficients for all the other terms than ΔT. In this alternative regression model the residual is not a linear function of the coefficients and the minimization is therefore more complex. Because the sum of the squares of residuals is minimized, a correct minimization procedure avoids such a situation, i.e. forces the value of b off from the inverse alphas of every model. Even this is probably not very serious, because the coefficients need not change very much to get far enough from the singularities.

The basic observation is that this alternative is not the regression model of M&F.

The conclusion M&F have produced comes from their analysis of complex machines that have to be considered black boxes. The quality of the box’s output is impossible to determine by immediate inspection. Thus M&F are testing the boxes output. But one of their tools they admit part is itself output of the black boxes. Without getting to statistics can you provide why the quality can not be trusted? And, where does the burden of proof rest for quality?

I meant how can the quality be trusted. The AR5 data is not a source of new input. They only new input F was the part derived from the black boxes.

The goal is to describe the output of the models summarizing certain potential relationships between the variables considered. Thus looking at the output is what must be done – and is done. A simple linear model is used as the tool, whose coefficients are determined by regression. Qualitative physical arguments are used to argue that the approach makes sense. The authors list also several caveats.

After getting over the confusion created by the post of Nic, I think that the paper describes well enough, what they have done, and what their analysis has produced. There’s obviously some new information in their results, but personally I’m not convinced that the results allow for strong conclusions. The acknowledged caveats alone allow for doubt, and there may be additional issues that have a major effect. What Nic proposed is not among them in my opinion.

Pekka, You have been a great referee and generous with your time. I know this quesion is a tough one but do you believe the the author response shows they appreciated the statistical orthodoxy issue here and came to your conclusion or was it just a near miss: no harm-no foul?

Pekka,

Perhaps the dodgy statistical modeling used by the authors indeed was innocuous. You said that it took some time of checking before you came to that conclusion. If true it’s not surprising, because inferior statistical methods sometimes do lead to valid conclusions, but that doesn’t justify an endorsement of their usage. The problem here is that other researchers may now use similar methods and cite this paper as support. The next time around the effect may not be innocuous.

R Graf,

I think that they are fully aware that their linear regression model is at best a crude representation of the actual model runs. It’s a scientific paper and it can be assumed that readers of such a paper understand that a linear regression model cannot be accurate and reliable, even if that’s not emphasized in the paper.

The uncertainties in estimating forcing was discussed in the earlier Forster (2013) paper that is used as source for this paper.

There are no objective measures to tell, how much trust should be given for their results. As authors of the study they may choose an optimistic view on that and discuss their results based on that. I’m presently less confident, but that’s just a personal judgment.

The results should perhaps be taken rather as indicative and exploratory than fully quantitative. For this reason the paper does not contain any error estimates (at least I haven’t observed any) for the results, all ranges tell about the spread of different models, not about accuracy of any conclusion.

Pekka,

I think that you have to drive through two STOP signs in order to get to your starting point in this analysis. The first barrier is that the “emulation model error”,( i.e. the ability of the emulation model to predict the forced temperature change) is (a) substantial and (b) has a bias which is a function of the interval length of the period(s) selected for regression (“temporal bias”). This emulation model error does not go away even if the circularity could be intelligently circumvented. And it ends up being interpreted by M&F as natural variation.

The second problem is that M&F set up a logical contradiction by including F, alpha and K as free variables in their regression form. It contravenes the physical model which M&F establish as a basis and justification. This is not the same as the circularity error.

For a specific GCM, the value of alpha is taken from a step-doubling or step-quadrupling of CO2. The forcing for a doubling of CO2 is defined simultaneously such that AF2x = alpha * ECS. The assumption that alpha is invariant with time and temperature immediately forces linearity onto a curve (in the vast majority of models), and results in the GCM information from the first decade or three being discarded. This introduces the first component of temporal bias in the emulation model via the calculation of the forcing term.

For the same GCM, the value of rho (= alpha + k) is taken from a 1% per annum increasing CO2 run. It is the gradient of a plot of Forcing against Temperature. However, the forcing which is used as ordinate on this plot is determined by the value of AF2x as calculated above (simultaneously with alpha). Hence the gradient is always equal to AF2x/TCR.

So we have rho = AF2x/TCR = alpha*ECS/TCR .

The predictive component of the M&F emulation model is a simple linear scaling of forcing:

DelT(predicted forced) = DelF*TCR/(alpha*ECS) + model error (1)

This is derived as a degenerate solution of a 2-body feedback model. First assumption is that the feedback is linear with temperature (which is not valid for the GCMs, hence the enforced linearization when alpha is selected). The second assumption is of an infinite acting ocean, which leads to CdT/dt = F(t) – rho*T. Third assumption is of a constant linearly increasing forcing. The analytic solution for this case asymptotes to a linear relationship between forcing and temperature, with gradient rho. Fourth assumption is that the surface mixed layer heat capacity is negligible (C->0), which leads to (1) and which, by eliminating the early asymptotic behavior introduces a second component of temporal bias in the emulation model error.

Moving on, DelF is taken from the historic run data over the period of interest in the form

DelF = DelNactual – alpha *DelTactual

=DelNactual – alpha*(DelTf + DelTnv) (2)

Where Tf and Tnv represent the partitioning of the observed GCM model temperature change into its forced component and “natural variation in the GCM”.

Hence, substituting (2) into (1), we obtain:-

DelT(predicted forced) = DelNactual*TCR/(alpha*ECS) – DelTf*TCR/ECS – DelTnv*TCR/ECS +model error

Even if you wave away the problem of circularity in this expansion, notice that the parameter, K, does not appear anywhere. Notice also from (1) that if a free regression coefficient is allowed against DelF, then the regression is apparently insensitive to any variation in alpha as a free variable. By breaking out alpha and K as free variables in the regression, therefore, one concludes that the predicted forced temperature response is not sensitive to either alpha or K! Which seems to be what the authors found.

Pekka writes:

I think that they are fully aware that their linear regression model is at best a crude representation of the actual model runs.So why use it?

It’s a scientific paper and it can be assumed that readers of such a paper understand that a linear regression model cannot be accurate and reliable, even if that’s not emphasized in the paper.…

There are no objective measures to tell, how much trust should be given for their results.…

the paper does not contain any error estimates (at least I haven’t observed any) for the results, all ranges tell about the spread of different models, not about accuracy of any conclusion.I’m sorry, but publishing a paper based on an inaccurate and unreliable model with no attempt at uncertainty quantification doesn’t seem all that scientific to me. Typically when I submit a paper, I’ve done months of work to be sure that all possible errors are addressed and quantified. To not do so evinces a complete disregard for the process of science.

The fact that there are people in the climate science community defending this paper says a great deal about the standards of the community. None of it is positive.

Because the CMIP5 archive contains information about the models, that information is difficult to interpret, and nobody has evidently presented any better method to extract information on the question M&F study.

Science is a process that has resulted in more and more understanding that describes better and better the real world.

That’s characteristic of the full scientific process.

Scientists study typically small details at the edge of the knowledge. The issues they study are mostly difficult. When they think that they have made progress, they publish. That brings their results to wider knowledge and allows other scientists to look at them. It’s very common that it turns out that their results are either simply wrong or, more often partly right and partly wrong or misleading.

Science would develop very slowly, if scientists would not publish their findings in spite of their potential errors.

Thus it’s not excluded that the results of M&F are erroneous and misleading. My point in this thread has not been disputing that possibility – I have expressed my own doubts on the reliability of the results pretty directly. What I have emphasized is that they have not made such an obvious and stupid error that the error alone would make the analysis worthless. The circularity found by Nic does not affect the calculations they have made, but the inaccuracy of the relationship that leads to circularity in Nic’s argument does affect the accuracy of the analysis. It leads also to some questions on the interpretation of their results.

As a general rule every scientific paper should be read with a skeptical mind, only multiply confirmed results represent well established scientific knowledge (and even that may turn out to be wrong, although that’s not so common).

In all fields of science new unconfirmed results are publicized in a way that I don’t like, climate science is no exception to that. This paper is an example of that. I do not believe that it’s conclusions are solid enough to justify the way they are presented in some media.

I have even the cynical thought that some of the conclusions of the paper were written as they were just to get the paper published in Nature. To me Nature and Science are not the most reliable sources of scientific information. They tend to accept papers that present strong conclusions even when those conclusions are not fully supported by the actual science reported, the most interesting conclusions may be even purely speculative. More narrowly focused top journals publish better and more accurately reported science.

Pekka Pirilä: “To me Nature and Science are not the most reliable sources of scientific information.”

…or as Ross McKitrick put it, “just because it was published in Nature doesn’t automatically mean it’s wrong.”

Pekka, in your last comment about science you sound more like a politician than a scientist, that’s why you didn’t answer on my direct question whether you agree with the conclusion of the paper : “The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded”.

May be you state somewhere that you don’t agree, but that has been vanished in your “Wortsalat”.

When I think that any simple answer is misleading, I don’t give any.

Pekka’s perspective is very interesting. It doesn’t match my experience very well, but my publications have mostly been in Phys Rev Letters and related journals, not in Nature or Science.

The thought of publishing something with strong conclusions and no detailed, quantitative uncertainty analysis seems alien to me. I’m trying to decide if it reflects a lack of professionalism on the part of the authors or if it is related to something about the field or the content.

Climate science, compared to physics, is extremely data-poor. That is, you (generally) cannot design your data collection in advance and you can’t collect new data; you are stuck with what you have. Thus, every bit of data has to be squeezed to within an inch of its metaphorical life in order to do anything “new.”

Maybe that accounts for several troubling features of climate science: the tendency to treat model outputs as if they were data, the tendency to use questionable statistics methodologies to reach conclusions much stronger than justified, and the tendency to overemphasize “peer reviewed” as a synonym for “correct.”

The current paper under discussion would never be accepted for publication if it were in a field where replication with new data is possible, because it makes no testable predictions.

fizzymagic,

I’m not proposing exactly what you read from my comment.

After a decision has been made on the preferred journal for a planned paper that does very often affect the way the paper is written as the scientists know from own experience or other sources, what kind of papers the journal is most likely to publish. In the field of physics Physical Review may be more neutral in that respect, but the changes of getting the paper accepted in Physical Review Letters are improved by emphasizing some points like topicality even with little real arguments to support those claims.

Nature and Science have a very diverse readership and many scientists have expressed the view that it’s of particular value for them that the paper is likely to be noticed also by the wider readership, That’s a different criterion from what’s most important in a more narrowly focused top journals. Looking at the papers published has, indeed, shown (IMO) that the conclusions contain often rather speculative additions on issues that are highly interesting, if true, than conclusions published in other journals.

I have seen similar observations, presented as criticism of Nature and Science on several occasions. There have been also counterarguments to that, but at least I’m not nearly alone with my views.

Pekka,

“They tend to accept papers that present strong conclusions even when those conclusions are not fully supported by the actual science reported, the most interesting conclusions may be even purely speculative. More narrowly focused top journals publish better and more accurately reported science.”

About this at least we can completely agree; I canceled my subscription to Science 15 years ago when I found the publication had become more about surprising (even shocking!) and ‘glamorous’ results….. and much less about solid science.

This episode reminds me of the Steig et al paper on Antarctic warming that appeared on the cover of Nature. Steig et al claimed little warming over the Antarctic Peninsula, in direct conflict with extensive thermometer data for the Peninsula, but lots of warming elsewhere, including Eastern Antarctica, again in conflict with thermometer data. Nobody at Nature cared, or even seemed to notice, these glaring discrepancies. But many other people did.

In the case of M&F, I think Nature has published a paper with similarly doubtful methodology, but upon which the authors have based similarly strong conclusions: “There is absolutely no reason to doubt the accuracy of CMIP5 warming projections.” (What!? Are they actually serious?) As with Steig et al, I very much doubt M&F’s methods and conclusions will stand up to scrutiny over time.

Pekka,

I think you are correct in your assessment of Nature and Science. I have never been a reviewer for either, but I have for Physical Review and Phys. Rev. Letters. And I have indeed recommended publication of papers that I believed were erroneous because of their topicality and provocative conclusions.

But those papers for which I recommended publication had several things in common: the errors were either a result of bad data or speculation about a new theory that had yet to be tested experimentally. I would never recommend publication of an article that included incorrect data analysis or mathematical errors. I would also never allow publication of an experimental paper that did not properly characterize the experimental errors.

I

didauthor and publish a paper (in Phys. Rev. Lett., as it happens) that contained a result I did not believe was completely correct; however, in that paper, we very carefully explained all the possible errors we had considered and all the corrections we had performed, and we were careful not to over-state the significance of the result. Basically, we hoped that having other eyes looking at the result would help us understand what we had observed.And there is a history of erroneous experimental results from improper statistical analysis being published in those journals, though it is (relatively) quite rare. In all those cases that I can recall, however, subsequent papers explored the problems with the analysis and although the results were never formally retracted, the community recognized the error and the papers stopped being cited in a positive way.

This case feels different to me. Here there are clearly problems with the methodology that reviewers should have caught. The paper seems to have been published not because it reported something unexpected and provocative, but because it reinforced the community’s prior biases. It seems likely that, like Mann’s early climate reconstructions, it will continue to be cited as evidence long after its flaws have been recognized and it has been shown to be incorrect.

fizzymagic (Posted Feb 13, 2015 at 4:56 PM),

Excellent comment.

Pekka

As I commented over in CLB if we strip away the stats and think about the maths, as you are doing, then Foster & Taylor appear to estimate your equation (2) using OLS, so in fact what M&F use is the calculated F which is (N + α ΔT – Residue(t)). The Residue(t) are presumed to be independent with 0 mean but a SD that looks like a reasonably large proportion of F.

This Residue(t) term then passes back into your equation (3) as b*Residue(t). Unfortunately it then seems to me it gets in the way of subsequent analysis, and in particular making any estimation of the internal variability unreliable (without knowledge of Residue(t) there is insufficient information to estimate it).

HAS,

An error (and possibly a significant error) is introduced by the procedure of Forster et al (2013). That affects the results, but that’s not the same as the residue of the regression calculation, and does not feed back to the calculation to cause more serious problems that significant inaccuracies always do.

That error cannot be avoided, when only the presently available data is used. Marotzke and Forster discuss this issue in their response at CLB:

Pekka

I understand the difficulty in estimating F, but that is a limitation of the available data and is acknowledged by M&F as you say.

What we are discussing here is Lewis’ suggestion that there is an avoidable methodological error caused by the two stage process of analysis and the multiple regressions in t to get trends. I am attempting to understand why that may or may not be a problem in simple terms that a mathematician might grasp (aka writing down the formula :))

I’m unclear from your response if you are saying the problem with the two stage analysis doesn’t exist, it’s immaterial or something else.

It can be eliminated by not using F at all, but you then end up unable to estimate all the parameters you need (I think).

HAS,

One way of looking at the issue is to follow the calculational process, and check, whether any step of that has problems related to the appearance of ΔT on both on the left hand side and in the estimate of ΔF. The answer is very clearly that none of the steps needed in Marotzke and Foster is affected by such problems. That’s easy to see, and no-one has presented any proposal of the opposite. None of the arguments of Nic enters in that process. So far the situation is really simple.

It may be more difficult to argue that the resulting well determined regression model is fit for use in the way M&F use it, not because it’s coefficients were badly determined, but because it’s variables, and in particular ΔF are badly defined. The model ΔF is not the same as the forcing that’s used outside of that analysis. It’s certainly strongly related, but there may be significant differences. Here we have the problem that M&F acknowledge in their response.

My view is that it’s a weakly justified hypothesis that the model is fit to the use in the way M&F use it. It’s a reasonable enough hypothesis to make and to try, what the results are, but how much more it is, that’s can be questioned.

It’s an open question to me, whether better analyses that have the same goal that M&F have chosen can be devised without additional model runs using the same CCM models over wider range of forcings and collecting more data from the runs.

Pekka

That’s what I was trying to do, starting with what the F M&F use really is. The first problem I strike is the residue from its estimation in Forster. What happens to that in the subsequent steps of the analysis?

I have written in several comments that:

– the calculations of M&F are stable (robust might be more accurate) and without significant problems from circularity

– their regression model is at best a very crude description of the model behavior

– ΔF that they use is not the real ΔF, but an operationally defined substitute that may deviate significantly from the real one. Probably more for the 15 year trends than the 62 year trends

M&F seem to agree on the issues I mention above.

There’s one further point that may be significant, but I have not discussed extensively in my earlier comments. That’s the final step in calculation of the results shown in their paper.

All the above concerns the derivation of the regression coefficients, but the regression model is used further to tell, how much of the variability between models originates from each characteristic of the model and the model run, and how much is residual ( typically unresolved internal variability). This can be done in two ways. They have chosen one that comes naturally, when &Delta:F was calculated in a separate earlier step.

In their approach the values of previously determined ΔF, α, and κ and the coefficients a, b, c, and d are used to calculate

ΔT = a + b ΔF + c α + d κ

The difference of that from the ΔT extracted from the model run is the residual. Contributions of the three variables are also collected for use in the figures.

The alternative approach doesn’t use the ΔF determined by Forster (2013), but uses ΔN, and calculates the predictor for ΔT from the equation

ΔT = (a + b ΔN + c α + d κ)/(1 – b α)

This alternative approach leads to different results for ΔT. As bα is probably positive in almost all cases, this alternative approach results in larger contributions of α and κ to ΔT, but the overall ΔT may also be very large meaning that the residual is large as well. Finding sometimes large residuals is related to the fact that the regression coefficients were not determined based on this formula. (I have discussed that in an earlier comment.)

Due to the nature of ΔN this second alternative is not physically justified, but in case the two approaches lead to very different final results, we might ask, whether their method is really valid either. Whether the second approach leads to very different results, could be checked from their data. The results of that check might then either strengthen or weaken our trust in their final results. (I repeat: The second alternative is not better, but the difference between the results tells something about the robustness of their method.)

Pekka, there is so much misunderstanding of statistics in this comment that one is tempted to sat that “you aren’t even wrong!”.

Loosely interpreted – if you can do the arithmetic, and it comes out the same every time, then the results are correct. This is nonsense. I explained to you in my earlier, rather lengthy comment that the regression procedure has assumptions which the data must satisfy. If any of those assumptions are violated, the results will be affected in some cases more seriously than others. In some cases (e.g as in time series regression), one must make adaptations to the original procedure to get results which are correct.

Without passing judgement as to whether it is correct let’s start again with the M and F model:

ΔT = a + b ΔF + c α + d κ + ε

ΔT is the temperature variable. a, b, c and d are assumed to be unknown fixed values. ΔF is the unknown forcing and α and κ are unknown parameters. ε is the “random” part of ΔT which accounts for the fact that substituting the same values of ΔF and the two parameters into the predictive equation does not always produce the same value of ΔT. I described the various assumptions made about this model in a previous comment on this thread.

You write:

This is correct. So let’s make the following simple substitution in the equation: ΔT” = a + b ΔF + c α + d κ so that the model is ΔT = ΔT” + ε where ΔT” and ε are respectively how much of the model variation can be explained by the predictors and ε is the residuals, i.e. the

unexplainedportion. Also it should be noted that in the prior calculation of theestimatedforcing, the relationship ΔF = α ΔT + ΔN was used.There is no absolutely no reason why this calculation needs to be made in advance. We can substitute it directly into the regression equation and the least squares calculation will work exactly as before. Thus

ΔT = a + b(α ΔT + ΔN) + c α + d κ + ε

At this point you say that we should carry out the ordinary regression and get estimates of all the parameters and of the residuals (which we will denote in bold):

ΔT =

a+b(α ΔT + ΔN) +cα +dκ +εThe estimate of ΔT” is obtained by replacing

allof the estimated residuals with their zero means which you claim would look like this:ΔT”=a+b(α ΔT + ΔN) +cα +dκ.However did

allof the residuals get replaced and the answer is no:ΔT”=a+b(α [ΔT”+ ε] + ΔN) +cα +dκ.The correct result for the predicted values should look like:

ΔT”=a+b(αΔT”+ ΔN) +cα +dκ.The ordinary regression procedure gives you the wrong predicted values, wrong residual estimates and therefore the wrong estimates for the coefficients.

If you are going to try to correct for this problem, this is certainly NOT the way to proceed. In my earlier comment, I wrote the equations:

(1 – b α) ΔT = a + b ΔN + c α + d κ + ε

and the sum of squares to be minimized becomes:

∑ε2 = ∑[(1 – b α) ΔT – (a + b ΔN + c α + d κ)]

^{2}The solution to this is not “clean”. It is subject to biases due to the later calculations necessary for calculating predicted values and residuals. The sole reason for all of this is that the same ΔT that one is analyzing has been used in the estimation of ΔF.

However, these difficulties do not justify carrying out in incorrect analysis just because it produces “robust results.”

Pekka: M&F seem to agree on the issues I mention above.

And yet: Two million viewer of Science Daily were blasted with an article that begins “Skeptics who still doubt anthropogenic climate change have now been stripped of one of their last-ditch arguments…”

And not without help from the authors…

Sorry, I did not specify that the second quote “Two million viewer of Science Daily were blasted with an article that begins “Skeptics who still doubt anthropogenic climate change have now been stripped of one of their last-ditch arguments…” ” was from R Graf

I am somewhat confused, I am informed that both α and κ have units of W m^-2 / K, so for your equation

ΔT = a + b ΔF + c α + d κ (3)

it follows that

a must have units of K,

b must have units of K/W m^2

c must have units of K^2/W m2

d must have units of K^2/W m2

is that correct?

Doc,

The expressions leave some room for interpretation, but at least your approach seems correct.

At the risk of tilting at a straw man (not have read all earlier comments),this argument (Pirila) seems flawed. Lewis is correct that (1) is a misspecified regression. How far this matters might be argued.

One might test this may taking (1) and replacing dF with dN. Also drop the e – this is just an exercise in (orthogonal) projection without statistics. Doing this gives a new set of regression coefficients. Convert estimated equation

dT = a+bdN+c(alpha)+d(kappa)

into an equation for dF by simply rewriting

this equation in terms of dT and dF using eqn (2). That is, replace the regression coefficients x by

x^=x/(1+b*alpha), x=a,b,c,d,

giving an equation

dT=a^+b^(dF)+c^(alpha)+d^(kappa).

If this eqn approximates what M&F obtain by a direct regression using dF, then no real problem from the circularity in terms of the basic fitted model.

This still leaves the statistics of course, but that is too much to think about just now.

Though okay on the graphs this time, I feel there are too many dF+e=kappa (1,2,whatever) etc

formulas around.

the proverbial cat does not find her young in that.

M&F published a conclusion regarding “all” CMIP5 models yet they were selective in their models to test. One reason given by M&F (2015) was that most of the hard work of deriving F had already been done in Forster (2013). The model list accompanying the 2015 M&F paper in Nature number 35 models, 17 with forcings and runs of all models varying from 1-10 times totaling 113 runs. There were 36 reported models in the paper with 18 forcings and 114 runs so apparently one model with forcings run once is missing. Comparing the two study’s model lists, 19 of the 35 models are the same, 15 of the 17 with forcings are the same, (2 had forcing that were not used) and 4 models in 2015 with forcings are not on the 2013 models list. I think even if an explanation of the selectivity of samples are given it’s messy.

The commonly reported number of CMIP5 models is 112, which is why Science Daily may have misreported that F&M studied all 114. There are 56 coupled pairs of models in CMIP5 of which F&M omitted 21 pairs (including the one missing from their list). There seems to be no pattern as far as variables covered or variable complexity as seen from the table on pg 747 of IPCC AR5.

As I said over at CLB in response to ATTP, it would be very useful if M&F were to release their data and code to Steve in order that he can try to replicate their results. We could at least then see if the method they use is flawed as claimed, or robust. I can understand though that they might be resistant to such a request. Additionally, others could try to replicate the study using independent analyses, which would be a good test, not of M&F’s method, but of their results.

I believe it is fairly straightforward to get CMIP5 model runs using KNMI Climate Explorer – whether or not this would be sufficient to construct those independent studies, I’m not sure.

If Nic does a study which does not match M&F’s conclusion there will be claims of bias. And, unfortunately, our bench of published academics that are looking to contradict a director at the Max Plank is bare. Two million viewer of Science Daily were blasted with an article that begins “Skeptics who still doubt anthropogenic climate change have now been stripped of one of their last-ditch arguments…”

The problems with good science does not always make as interesting a headline as mediocre science can.

Perhaps a simple test model of the model could be constructed with control data to test the methodology in various extremes to see if the results output as the methodology predicts.

I have to admit I learned a lot in this discussion. Before now I did not know you could test the behavior of variables with an equation containing variables derived from those same system you are testing.

Pekka has written in several comments that “the calculations of M&F are stable (robust might be more accurate) and without significant problems from circularity”, and asserts (I think correctly) that Marotzke and Forster agree with him about this. He has also now clarified mathematically what he is arguing.

If I understand correctly, Pekka argues that the fact that ΔF is a linear function of ΔT does not involve a circularity since it is the actual model-simulated ΔT (ΔTs) that is used to calculate ΔF, not the purely forced, free-of-internal-variability-etc-error, version of ΔT (ΔTf), which is what the regression fit represents (if the regression model is appropriate). I will explain why Pekka’s argument does not support the conclusions of Marotzke and Forster in relation to 62 year periods.

Suppose that over the 62 year period involved simulated multidecadal internal variability leads to ΔTs exceeding ΔTf in some models and falling below it in other models, without the simulated value of ΔN (ΔNs) being similarly affected. This seems both plausible and likely; many models exhibit substantial multidecadal internal variability, and show little correlation between multidecadal ΔTs and ΔNs (after detrending).

In this situation, models with ΔTs > ΔTf will generally have a relatively high diagnosed value (ΔFs) for ΔF, since ΔFs = α ΔTs + ΔNs. (Note that although Marotzke and Forster write of α ΔT being a “correction” to ΔN, it is the larger of the two terms in most cases.) As a consequence of such internal variability, intermodel spread in ΔTs will be positively related to that in ΔFs, increasing the proportion of the intermodel spread in ΔTs that is “explained” by the ΔFs, or the “contribution to the regression by the ERF trend”, which Marotzke and Forster state is dominant for start years from the 1920s onward. This effect is what I refer to as circularity; it is not total and I did not claim that it was.

I consider a contribution to intermodel spread in ΔTs that arises purely from the same elements of internal variability appearing on both sides of the regression equation to be an artefact of an unsatisfactory method. Perhaps on reconsideration Pekka may also come to this view.

Whether the circularity element that exists in the regression method used is the largest source of error in this study is uncertain. I identified other potentially serious sources of error involved in it; they may be more important. Paul_K has set out further issues with the study’s methods.

Note that it would be unsurprising if Marotzke and Forster has just found that the ERF trend ΔFs has a considerably larger influence than model feedback strength and model ocean heat uptake efficiency over historical 62-year periods starting from the 1920s on is. Aerosol forcing varies hugely between models (by over 1 W/m2). Up to the turn of the century, 62-year ΔTs trends have a correlation of 0.9 with diagnosed or estimated aerosol forcing levels for the models used by Marotzke and Forster. And over the entire Historical simulation period, 1860-2005, ΔTs trends have as high a correlation with aerosol forcing strength in models as with ΔFs.

However, that intermodel differences in the ERF trend have to date had a considerably larger influence than those in model feedback strength would not justify Marotzke’s claim: “The difference in sensitivity explains nothing really”. And even if variations in model sensitivity explain relatively little of the intermodel spread over the Historical period that would not justify his statement that “The claim that climate models systematically overestimate global warming caused by rising greenhouse gas concentrations is wrong”. It is entirely possible that systematically-excessive model sensitivities have until recently been largely offset by systematically-excessive aerosol forcing and/or obscured by a positive influence of actual multidecadal internal variability on observed GMST.

The determination of the regression parameters is robust, but there are some issues that must be considered, when the model is used, as I discuss in my recent comment.

Pekka,

Despite your reassertion of your claim, I think my comment shows why M&F’s determination of the regression parameters is not in fact robust.

That may be a matter of defining, what determining the regression coefficients means. I define that operationally following the approach of M&F accepting that the ΔF of the formula is

definedasΔF = ΔN + α ΔT

If ΔF is defined in some other way, then the result may be much less reliable.

I shift the discussion of the potential problems to the step of using the model. At that stage it depends on the application, whether some serious problems arise or not.

Another technical assumption that M&F have made is that the regression coefficients are determined by minimizing the sum of the squares of residuals calculated as

e = ΔT – a – b ΔF – c α – d κ

or equivalently with the operational definition of ΔF

e = (1 – bα)ΔT – a – b ΔN – c α – d κ

This is not the only possible choice that can be made, but this is what they chose.

The sum I refer to above is over the models in the ensemble. The calculation is done separately for each period.

When these two choices are made, the calculation is robust. The potential problems are in the interpretation of the resulting formula, and in the choice of the input variables, when the resulting regression formula is used to calculate ΔT.

The next step to ponder is the one I discuss here.

I have still mixed thoughts on, how this affects the results of the M&F analysis.

“It is entirely possible that systematically-excessive model sensitivities have until recently been largely offset by systematically-excessive aerosol forcing and/or obscured by a positive influence of actual multidecadal internal variability on observed GMST.”

I was going to comment on the potential effect of the aerosol factor on the M&F regression comparisons and better that you did. Model sensitivity can be addressed independently and should be.

Having finally read in more detail the Marotzke and Forster paper, I think I could repeat what they did in their regressions of 15 and 62 year trends. I will describe here my version and hope to obtain agreement or disagreement at this thread by others who have studied the paper.

I have the CMIP5 model historical temperature series in Excel and will be presently locating it for my use. I could make it available to others here although it is readily downloadable from KNMI Climate Explorer. The alpha and kappa data used in the M&F paper are in the paper at this link:

http://onlinelibrary.wiley.com/doi/10.1002/jgrd.50174/epdf

Another paper here that I have not had time to digest includes CMIP3 and CMIP5 alpha and kappa values that on first glance do not jibe with the ones used by M&F.

http://onlinelibrary.wiley.com/doi/10.1029/2012GL052952/epdf

I have search high and low for a published series of the individual model forcing (effective radiative forcing) used in M&F and now believe that it was derived using the historical temperature series for each model run and the information for N (TOA energy imbalance) and alpha in the equation provided in the paper linked immediately above.

N=F-alpha*deltaT or F=N+alpha*deltaT

I counted data for 23 models in that link.

For the multiple regression, the 15 or 62 year successive and overlapping year trends from the historical global mean surface temperature and the similarly constructed global forcing change trends were calculated and tabulated. A variation measure of these trends was then calculated across all model/model runs for each trend starting year. I am unsure of what variation that measure was, albeit standard deviation, variance or range. Alpha and kappa are assumed by the authors constant over time and thus the variation across model runs for those variables would be the same for each trend start year. In my view it is those 4 variations that are used in the M&F multiple regression.

Kenneth,

Yes, the model ERF series used in M&F were derived from CMIP5 Historical run T and N series (run-ensemble means for each model), along with the previously diagnosed alpha values. But I believe that drift in the corresonding section of the PI control run was deducted from the T and N timeseries. It is not specified whether monthly or annual means were used; I imagine annual.

I think you will find it requires some work to derive the ERF values they used. If you succeed in doing so, and upload them somewhere, I will check them against the values I am using.

Nic, I have the CMIP5 model piControl run data for N and T to use for correction and used it when attempting to duplicate the ECS values derived in the published regression method. I could duplicate the results after adjustment in a general way but not exactly for all models. The differences could be accounted for by a difference in a constant amount used to adjust N. Looking at the piControl data for N (actually rsdt-(rlut+rsut)) leads me to believe that the adjustment required is not so much drift but rather a more or less constant residual amount by which the model fails to balance the TOA energy.

Great discussion!

+1

Pekka says:

“It’s a scientific paper and it can be assumed that readers of such a paper understand that a linear regression model cannot be accurate and reliable, even if that’s not emphasized in the paper.”

Oh what a pure and ideal world you live in Pekka. I would that it was that way.

Before diving into the scrum here, let me recount the reality of the awareness of the applicability of OLS.

Many years ago I was a young contract programmer in the maths dept. of a major UK university. Every Friday afternoon there was a session where a member of the department would describe their research, highlight any problems they were having and put it out to the assembled mass of mathematics PhDs for comments and solution.

On one occasion a student presented her nearly finished thesis trying to reconcile a model with a large volume of observational data. There was problem in extracting a linear relationship between two experimental variables from the satellite data.

A scatter plot was presented with the OLS regression fit. It was visible obvious that the slope did not match the “cloud” of data points. It’s was obviously underestimating the slope. She was stuck as why this was and put it out for discussion and suggestions. The assembled body of doctors and professors of mathematics spent two hours of intellectual jousting without coming up with a useful explanation.

Not being a member of the academic staff, I kept quiet and listened. After the meeting I approached the student and pointed out that she could not use OLS in a situation with significant uncertainly in the x variable. It would typically under-estimate the slope and this was indeed the problem she was facing.

She was rather taken aback and asked if I was sure.

The next morning I presented here with a page of maths showing the derivation and the point at which it is necessary to apply the condition err(x)<< err(y).

She thanked, but said it was too late to make any substantial change to the paper, then added a paragraph of waffled excuses and the usual "need for further study" and presented her thesis without any correction.

What shocked me most was that none of the assembled body of egg-heads seemed to be aware of the issue.

So Pekka, with the utmost respect, I have to say that your basic assumption that both the readership and presumable the authors can be assumed to know this kind of thing is, sadly, unfounded.

I wrote an short article about this last year:

https://climategrog.wordpress.com/2014/03/08/on-inappropriate-use-of-ols/

Much of it was incorporated into my recent article a Judith's :

http://judithcurry.com/2015/02/06/on-determination-of-tropical-feedbacks/

In fact this lack of understanding of applicability of OLS is at the heart of the exaggerated estimations of climate sensitivity.On a second thought.

Taking into account that the journal, where this was published is Nature, the need for explicit statement of caveats is larger than in a journal, where the most likely reader is from a very close expertise.

To be fair to Piers Forster, Forster & Gregory 2006 did discuss the dilution issue and showed that a better estimation of the rad vs temp regression gave a notably lower climate sensitivity.

Sadly they relegated this to an appendix and avoided any mention of it in the conclusion or abstract of the paper itself.

I cover this in my OLS article.

It does not seem to get any mention in the current article and my gut feeling is this was at Gregory’s insistence that this got into F&G.

So in this case the author was aware but that does not distract from the point that the assumption that either authors or readers are aware of the issues is unfounded.

You seem to accept that this does need to be explicitly stated. Thanks.

Greg,

Your reference to Forster & Gregory 2006 brought up the observation that they write:

This is relevant to the present discussion, because Q of that paper is forcing and Y the feedback parameter (α). This kind of earlier results are surely a factor in Forster’s expectation that their approach works.

I don’t follow your logic here. What is the basis of their expectation?

The earlier paper just derived an estimation for climate sensitivity. There is nothing suggest that this value was correct and thus the method should be re-used with “expectation that their approach works.”

In fact the opposite is true: the earlier paper established that simple regression actually produces exaggerated climate sensitivity. In 2006 F&G did not want “distract from the main point of the paper” by including the lower results in the conclusion and abstract ( where it would get reported ) and so tucked the whole discussion and its results away in an appendix.

Now nearly ten years later he does not even mention the issue at all.

This looks close to scientific misconduct to me. There is a serious issue with simplistic regression being used in a context where it is technically invalid.

At least one of the authors is aware of this because he already published a paper discussing the issue and its impact on climate sensitivity.

Greg,

I refer to the way ΔF is determined from ΔN. If that connection were accurate for every case considered, no circularity at all would be present in the calculation as any point, because we could pick equally well ΔF as a direct model result as ΔN. The error in that relationship occurs in a way leads to some ambiguity in the interpretation of the results. That ambiguity is affected by the “circularity”.

The excerpt from the Forster and Gregory 2006 tells that at least in one case the relationship was found to be close by the standards of typical relationships that they consider.

(It’s not obvious from Nic’s post that not problem whatsoever would come from the “circularity”, if the relationship were accurate. In that case F could be added to CMIP5 data base as an equally accurate value as N, there were no reason to consider N more primary data than F).

Pikka

What do you mean by “accurate” – Foster et al state it has been estimated by regression and the distribution of the errors in F. As I asked above (to which you didn’t really respond) doesn’t that mean that we all know it involves circularity?

What I meant by that sentence is that some unavoidable error terms might be amplified by extra coefficients that result from an equation that is perhaps dependent on something that goes into the residual. I’m not sure, whether even that is the case but there may be reasonable physical frameworks where that would take place.

The error term that I refer to is not the full difference between ΔF and ΔN, but a part of it that results from the approximate nature of the formula used for calculating the difference.

pekka…I taught statistics at the college level for a decade or so but I still I have no idea of what you refer to….

just what do you mean by:

” that some unavoidable error terms might be amplified by extra coefficients that result from an equation that is perhaps dependent on something that goes into the residual”

“some unavoidable error terms”…which ones?

“is perhaps dependent on something that goes into the residual”

perhaps dependent? something?

really?

this is the language of mathematics? of physics?

Can you write that as an equation?

David,

It might have been more appropriate that I had not written that at all, as I really couldn’t say anything clear. I just wanted to add something on a small point that could have also been left out from my earlier comment without affecting it’s real content. Both are related to the fact that there are acknowledged uncertainties that affect the accuracy of the quantitative results. How they do that depends on the inner workings of the climate models of the CMIP5 ensemble.

“On inappropriate use of OLS”

That’s really interesting, Greg. I hope someone who really understands the implications will read and comment. Thanks.

We had this discussion about using OLS and TLS at the Blackboard for the regression used to derive the ECS values that were published in Chapter 9 of the AR5 review for IPCC using temperature and radiation data. Carrick suggested I try TLS. The ECS values where 10 per cent or so higher using TLS. I emailed Tim Andrews, who has coauthored papers with Gregory and Forster on this subject, and he told me that they had not considered TLS. The paper I refer to here is linked below and does not discuss the recommendation you make about doing a reverse regression but I believe in a previous paper on the same subject that was done and used as evidence that OLS was sufficient.

http://onlinelibrary.wiley.com/doi/10.1029/2012GL051607/epdf

Thanks Kenneth. Figure 1 in the Andrews paper underlines Paul_K’s point in his posts at Lucia’s that the model responses are not linear anyway, they are “curvilinear”. So why is a linear model being regressed in the first place?

Most of the models shown there are clearly steeper for larger deviations. In rad vs temp plots that means less sensitive. So the ‘average’ sensitivity, dominated by the bulk of small deviations is being used project future large deviations where it is inappropriate.

Citing this “as evidence that OLS was sufficient” is simple bias confirmation, not science.

Is the bolded part Greg’s? Or a comment by Steve M.?

Sorry, this part: In fact this lack of understanding of applicability of OLS is at the heart of the exaggerated estimations of climate sensitivity.

Nic: In a simple regression analysis, one interprets the residuals as noise in the data or as an inappropriate regression equation. M&F15 regress data that has a chaotic component and interpret ALL of the residuals as unforced variability. Most of these residuals arise because of the limitations of the dT = dF/(a+k) model being applied to the CMIP ensemble. However, papers have shown that kappa decreases with time in TCR simulations. (As the top of the ocean warms, it becomes more stably stratified.) Paul_K has studied how estimates of climate sensitivity vary when they are deduced from different periods of model output using this approach. These parameters are abstracted from long periods, so they are more relevant for 62-year trends than for 15-year trends,

The dilemma – as I see it – is why M&F’s regression can reproduce the ensemble mean as well as it does in Figure 2b. In M&F’s regression equation 4 (your equation 3), alpha and kappa have separate regression coefficients. This additional and inappropriate degree of freedom allows the regression equation to fit the ensemble mean more closely and thereby assign more of the variance to unforced variability. As you have pointed out, the dF terms has already been derived from simulated temperatures. So the dF term is circular AND the remaining two terms have an inappropriate degree of freedom.

If you haven’t already done so, it would be interesting to see what happens if the regression is performed using the sum of alpha plus kappa and a single coefficient and if alpha and kappa are completely omitted.

Frank:

If I understand your statement correctly, there is no dilemma here. The way that the regression is set up using variable deviations from their mean, the following is true for equation 4:

beta-0 is equal to 0. The average across the ensemble for each of the following variables is equal to 0: ΔF’, α’, κ’ and the residuals. Thus, the average of the predicted ΔT over the ensemble must be exactly equal to the ensemble ΔT average for each year.

The same would be true for the reduced case using (α + κ)’ which is the same as α’ + κ’ or if the latter are removed completely.

Roman: I didn’t describe the dilemma correctly. dT = dF/(a+k) an imperfect way to analyze output from climate models. Approximating 1/(1+x) as 1-x introduces additional error. Despite all of these limitations, M&F’s regression describes the multi-model ensemble mean shockingly well. How does one show what factors are responsible for this surprising result: circularity?, additional degree of freedom?, something else?

Frank, it is something else.

Each regression takes place over the a single year of ensemble data. In any regression of the form (with n predictors):

Y = a0 + a1*X1 + a2*X2 + … + an*Xn + e

has predicted values from the least squares solution looking like:

Predicted(Y) = m(Y) + a1’*(X1-m(X1)) + a2’*(X1-m(X2)) + … + an’*(X1-m(Xn))

where the primes on the coefficients denote that they are the estimates and m() is the mean of a given variable.

If you calculate the average of the predicted values you get

m(Predicted(Y)) = m(Y) + a1’*0 + a2’*0 + … + an’*0 = m(Y)

because the sum of the deviations from its mean for any variable is always 0

There is nothing in the M and F data set that causes that. It is true for every linear regression.

Frank,

Full support for RomanM’s explanation is found between Equations (3) and (4) in the paper.

If, on the other hand, you want some general indication of how well AF/rho works as a predictor of GCM temperature change over a long period (when it should perform close to its best), then you might want to look at Figure 9(a) in Forster at al 2013. This gives you some idea of the predicted/actual spread of temperature change over the historical period up to 2003. A between-model bias to underprediction is apparent, most pronounced in higher sensitivity models. The free regression on AF carried out by M&F (across the ensemble for each period) should largely correct for this between-model bias, but then still leaves significant within-model residuals, which represent some combination of model error plus natural variation. All the residuals are deemed to be “natural variation” and the model error is not quantified (and is probably unquantifiable).

The National Academy of Sciences said despite the errors found in Dr. Mann’s paper the conclusions were basically correct. In the present case we have the friendliest view as being that the methods were hazardous, the conclusions unwarranted, but the concept basically correct. The quality of climate science, and perhaps the credibility of western science, rests in these soul’s courage.

That’s one of the better trolling jobs I’ve seen!

I’m actually 100% sincere.

So much the sadder

We can’t say much here as it derails the thread topic, but just look up the relevant Mann issues on climateaudit.

You must’ve gotten Valentine’s Day confused with April Fools Day.

I read RG’s post to say, not that Mann was correct, but that the NSA was wrong to find the errors (which undermine Mann), yet find him “correct”; just as Pekka is acknowledging error while arguing M&F could still be “correct”. tingtg, I think you misread this as RGraf saying Mann was correct.

The whole multivariate regression idea is fundamentally invalid since the temperature is not a direct linear function of the forcing, neither is the diffusion of ocean uptake.

Douglass et al 2006 figure 1 shows the temporal development of forcing , temperature and ocean diffusion:

On the assumption of a linear feedback mechanism, the response to forcing is an exponential convolution of the forcing time series, which introduces change in the profile in relation to the forcing. In crude terms a lag and a change in magnitude. So any direct regression between such quantities will not correctly deduce the assumed linear relationship, even if the regression dilution issue is ignored.

Throwing in extra variables to do a multivariate regression in no way improves the situation, it compounds it.

There seems to be a large body work getting published that thinks they can describe a complex physically interlinked system by almost arbitrarily chosen linear regressions without any consideration of the physical reality of what the quantities are and how their time series are related.

There is an unstated assumption that if they do an invalid regression enough times , with enough variables it will somehow “converge” to the right answer.

The field of “Earth Sciences” seems to be largely devoid of any training in science or statistics yet produces prolific quantities of papers based on essentially home spun methods that have no grounding in existing knowledge nor are tested to prove their validity as novel techniques.

It’s a one pony show based on linear trends.

In essence they are have spent 30 years pushing the idea that the system can be modelled as a linear AGW “trend” + “noise”.

My conclusion on all this is expressed well by RomanM:

BTW “climtegrog” is my WP account, the above posts are mine.

Someone wrote in this thread that physicists and statisticians live in different worlds, or something similar. That should, of course, not be the case, but arguments from both should be combined in a consistent way. In this particular example starting point is a set of physics based models. Physics based models obey at least approximately laws of physics. Thus an analysis of their properties is extremely inefficient if that’s not taken into account. It’s typical that model structure is chosen based on physical arguments, but coefficients of that model are determined by some method of fitting. Physical arguments are used also in the choice of the measure that’s used to decide which are the “best” parameter values.

That’s the basic nature of this analysis as well. The authors have selected the regression model and the measure to be minimized based on physical considerations. The model is crude, and some of their arguments may be criticized, but with these reservations the model makes sense.

One of the choices is that one of the model variables is ΔF in spite of the problem that ΔF is not directly available from the CMIP5 archive, but must be deduced from ΔN using a formula that’s not fully accurate, and that happens to contain ΔT.

The above comment allows for writing the formula in the way that ΔT appears on both sides of the defining equation, but the way that takes place in M&F does not change the way the residual appears in the formula. The residual might be added to the formula that is used to calculate ΔF from ΔN. If that were done circularity would result, or in another way of saying that, the formula that defines the residual would change. The physical argument of the authors is, however, that this is not should be done, but the residual should be calculated as before from their original definition.

Selection between the alternative M&F use and the alternative Nic and RomanM has to be based on physical arguments, more precisely the right choice could be fully confirmed by further calculations by the CMOP5 models. At the present we have some reasonable, but not very strong physics based arguments of the authors detailed in their response at CLB. For the alternative we have no physics based argument to the best of my knowledge.

The detailed argument of RomanM in this thread is based on the alternative choice that lacks support. It does not apply to the model of the paper, and the model of the paper has at least some arguments to support it.

By the above I do not claim that I see the analysis as very strong. There are so many issues that may make it too inaccurate and unreliable. Some of these are discussed by Nic in the post. The circularity is, however, essentially a red herring. It appears only when the problem is analyzed forgetting that physics does play it’s role in the correct way of doing the analysis, and that the authors have legitimate arguments in support of their choice.

The difference of the temporal evolution of the forcing and the relaxation response of the linear model are shown here:

The lagged-correlation plot of post-2000 CERES data from Spencer & Braswell 2011, is shown here: (negative lag: radiation change leads temperature change.)

It is clear that lag of 12mo between the peak forcing and peak in the response will decorrelate the regression and produce an incorrect result. Even if a lagged-regression is performed, the ratio of the forcing and the response is NOT the constant of scaling in ODE that is used to define the relaxation relationship.

There will be a final equilibrium temp change associated with a change in radiative forcing but this is not available from the time series or rad and temp ( even less so from some clumsy proxy value taken from the model ).

It is not sufficient to simply ignore the fact that both the in-phase and the orthogonal signals are present in the data, pretend there is no lag and hope it will all come out in the wash.

What is being done is not physically meaningful. The rest of the argument is barely of even academic interest.

I went into all that in extensive detail over at Judith’s CE:

http://judithcurry.com/2015/02/06/on-determination-of-tropical-feedbacks/

Greg,

I have made no attempts to figure out, how your observations affect 15 year and 62 year trends. Thus I don’t make any claims on that.

Thanks Pekka, I take that to be a very polite way of saying I’m off topic but I don’t agree.

My point is that there is little point in arguing about such trends if they are trends in something that is not physically meaningful resulting from ignoring the linear relaxation upon which the whole thing is based.

The dT=k.dR kind of equation is the equilibrium of the linear relaxation model giving rise to the whole concept of lambda and climate sensitivity.

Trying to ignore the fact that the data being examined is not just the dT=k.dR but also contains dT/dt=k2.dR which is orthogonal to the former, means the whole exercise is just playing with numbers. Just more Shakun Mix climatology.

One can pop up Excel and start fitting “trends” to anything but until you have a credible model and a statically acceptable reason for fitting a linear model it is meaningless.

Regression of linear model may produce a best estimate of the slope ( linear relationship ) under specfic conditions. There seems to be whole body in the field of climatology that believes you can always fit a “trend”, it is always meaningful and OLS will always give you best estimation of the linear relationship.

Only the first of those assumptions is true. You can always fit a straight line.

BTW the whole idea of GMST is another aberration. You cannot do an energy budget analysis where you are averaging the temperature of things with hugely different specific heat capacities.

It’s like asking what is the average of an apple and an orange. The answer is a fruit salad.

Greg,

Climate science and how it’s results are presented is full of simplifications that distort the reality in one way or another. In most cases that does not change the main message significantly, but there may be exceptions.

GMST is, indeed, not equally significant as the average of a quantity that can be calculated by dividing the total value of an extensive variable like energy by the total mass (or volume) of the material that carries that energy. Still GMST is a reasonably good descriptor of global temperature changes.

Thank you for that insightful remark. I think the constancy of the “message” is indeed the driving force behind the presentation “full of simplifications that distort the reality in one way or another”.

The arguments I presented above, that you chose not to consider, explain how this has become bereft of physical meaning.

Figure 2f shows that studying these models global statistics is just studying the output of a complex random number generator.

What this study is showing is that the current divergence problem, serious as it is, is no worse that the models inability to reproduce past climate in general.Any divergence is, by definition, regarded as “internal variability” and it is also well known that models are particularly bad at reproducing internal variability.They have been tuned to reproduce a very small segment of the historic record fairly closely but we just have not been paying enough attention to the fact that models are as bad a reproducing earlier climate as they are at reproducing the post 2000 pause.

We should probably credit the authors for pointing this out.

Pekka,

You have offered many thoughtful comments on this thread, and I thank you for that effort.

However, I think you will find that it is going to be a very difficult (impossible?) task to convince many scientists and engineers working in other fields that the apparent circularity in the M&F paper (effectively using the same model ΔT on both sides of their equation via ΔF from Forster et al 2103) could ever generate anything but suspect (at best) or nonsense (at worst) results. This obvious circularity is the sort of thing that most everyone is taught to NOT do in an introductory course on statistics or experimental design… so as to avoid wasting time and effort in the production of meaningless results. You have on this thread comments from many experienced scientists, engineers, and statisticians all saying pretty much the same thing…. one just can’t do a circular analysis.

I have carefully read your comments, and there is nothing you have said which shows me M&F have addressed the question of circularity in their paper in a meaningful way, nor anything which makes me think the most important conclusions drawn by M&F are highly suspect. Calling the obvious circularity in M&F ‘a red herring’ does not make that circularity go away. Further, if the circularity is removed mathematically (as I believe it must be to have a defensible analysis) then most of the M&F paper must disappear; the paper depends almost entirely on the circular analysis.

Time will tell, of course, but if I were a gambler, I would not place a bet on the conclusions of M&F remaining credible in the future. I would bet that continued divergence between current CMIP5 projections and reality over the next decade or two will make M&F irrelevant, even if there is no published refutation. ‘Nature’ can continue to publish whatever suspect papers it wants. Reality will not read them.

Steve,

They have not discussed the circularity largely because their method does not introduce circularity. The model of Nic and RomanM is different, and it does have circularity. What tells, which model is better justified is not an issue of statistics but of physics, or more precisely in this case an issue related to the properties of the models of CMIP5 ensemble. M&F have presented arguments to justify their choice.

I have in several comments described, how the method of M&F is not circular. I have also explained, where the model of Nic and RomanM differs, and why that difference makes it circular.

I will make one more stab at explaining the issue in terms which do not involve any statistical analysis.

Let us suppose that we know all of the coefficients in the predictive equation exactly so there is no estimation involved. To simplify things we combine all of the predictors except ΔF into a single term A so they are less intrusive. Thus, we have:

ΔT = A + b ΔF + ε

where ε is understood as the effect of “weather” in the model.

We will also suppose that ΔN is known, but ΔF must be calculated

exactlyby the equation ΔF = α ΔT + ΔN as in the M and F paper.I am told the value of ΔT for a given situation. Calculate ε and ΔTo = what the value of ΔT would be if ε = 0.

RomanM,

Why do you “suppose that ΔN is known, but ΔF must be calculated exactly by the equation ΔF = α ΔT + ΔN as in the M and F paper”, unless you accept that it must be calculated using the real ΔT that includes the residual and that can be obtained only from the original data?

%Delta;F is supposed to be rather stable and unaffected by the internal variability, but ΔN and ΔT are affected by the internal variability. Therefore it’s wrong to use predicted ΔT in the determination of ΔF. It must be determined only once from the CMIP5 data to get it correctly.

An additional point is that moving from non-zero ε two ε = 0 does not change ΔF in a meaningful model, it changes ΔN, but in this analysis no attention is given to the new value of ΔN.

Whereas the value of a scientific paper is in its innovative reasoning to make a testable conclusion, do you feel that the approach by M&F is either innovative or allows for a testable conclusion? Are you saying the paper must stand because its critics could not improve upon it? Are you disputing Greg’s assertion that it in fact does not follow the actual physics well?

In a comment upthread , Pekka Pirilä appears to make the case that physics or physical arguments can trump statistics or math.

Yet when a physical proxy was used upside down, we were told that since the math worked out, criticisms were “bizarre”.

It would be refreshing if the “community” were consistent in its use of physical meaningfulness as compared to mathematical meaningfulness.

Physics cannot overturn mathematics, but mathematics is a tool that must be used on correct concepts.

Here the essential point is, how the residual enters the equations, and this in turn depends on what is the principal source of the residual. Statistics or mathematics cannot answers these questions. In case of physical systems only physics can.

Pekka, I have just posted the comment below the ### on CLB. For some reason, it is currently under moderation. Your fixation on “a separate model” is misguided and indicates to me that you might not have had much experience with statistical theory. [Update: It is no longer under moderation at CLB.]

###

The problem here is statistical in nature. It has nothing to do with physics or which variables are external or internal or when they are observable or what drives what. If there are flaws in the way the data reflects the “physics”, this is a different situation which should have been dealt with before ever submitting the publication.

The authors have provided a data set and a statistical model which underlies the data and the relationships between those variables and the analysis of that data is done within the context of the statistical model. From this juncture on, it is purely a mathematical and a statistical problem.

The basic relationship in this model is given by the equation

ΔT = a + b ΔF + c α + d κ + ε

It contains several unknown parameters and a variable ε (usually termed the “error”) which accounts for the “random” variation in the model.

The intent of the analysis is to determine how much of the variable ΔT can be accounted for by a given set of other variables. In order to do this, the unknown parameters need to be estimated along with estimates of the values of ε. The authors have chosen to use Least Squares methodology to do this.

The starting point for this analysis was an

error sum of squares. Its format is not an arbitrary choice by the authors, but rather based on certain optimal properties of the solution within the model structure.SSE = ∑ε^2 = ∑[ΔT –(a + b ΔF + c α + d κ)] ^2

This quantity is minimized with respect to the unknown parameters a, b, c and d. From these we can estimate the values of ε and calculate the

predictedvalues of ΔT along with theresiduals= Observed( ΔT) – Predicted(ΔT).It should be pointed out that there is a distinction between the residuals and the estimated errors (http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics). In ordinary linear regression, they are the same, however this need not always be the case in least squares methodology.

Now, it turns out that in this data set, there is an identity relating three of the variables: ΔF = α ΔT + ΔN. If we substitute this identity into the above SSE, and rearrange terms, we get

SSE = ∑ε^2 = ∑[(1 – b α) ΔT – (a + b ΔN + c α + d κ)]^2

This is not the sum of squares of a “new model”. It is exactly the same SS as that above with exactly the same unknown parameters and exactly the same ε’s and exactly the same relationships between variables in the data set . Describing it as “a different model invented by the critics” indicates a lack of understanding of statistical models and of the mechanics of least squares methodology in general.

Since the two sums of squares are just two representations of the same equation, the following principle seems to be quite evident.

If the presence of the hidden relationship between ΔT and ΔF in the data has no effect, then minimizing the latter SS must produce the same estimates of the unknown parameters and ε’s as the former.The two minimizations do not produce the same results. In particular, the

residualsfor the latter SS are now dependent on the individual climate model’s α: res = ε’/( 1 – b’ α) where the ‘ denotes an estimated value. This clearly indicates that there is a systematic effect on the residuals due to α which is not been accounted for in the equation coefficient c. You will also note that in this case, the residuals in fact are not the same as the estimated errors terms.I have pointed out exactly where the shortcomings occur when applying the standard regression calculations to the data in the comment linked by Nic Lewis. (https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-751723) The analysis using the revised form of the SS takes this into account, but unfortunately there are other effect present in the results (such as bias) which are due to the fact that the minimization procedure has become non-linear because of the circularity in the data.

Roman,

I does not matter how clearly you explain this problem; some people will not or cannot think it through and draw the obvious conclusion: the results of M&F are highly doubtful, and likely non-informative (AKA, wrong).

At CLB I wrote the following as response to RomanM. (Three successive comments:

—

RomanM,

The model is different, because you assume that the new ΔT obtained from the model should be used to determine ΔF, while the M&F assumption is that the estimate of ΔF given by the original data from CMIP5 data base gives the result that should be used at every later step.

Which of the two alternatives is the correct choice is an issue of physics, not of statistics.

—

(This was preceded by a comment of aTTP)

In slightly different words.

The assumption of M&F is that ΔF for each model run is obtained from ΔN and ΔT of that model run. All these values come from the CMIP5 database. They do not vary during the determination of the model. There’s explicitly no feedback.

When the model has been determined it’s taken to be a model that links ΔF to the estimates of ΔT.

No analysis is done related to the values of ΔN any more. ΔN is not part of the further analysis, and it cannot be part of a feedback equation.

—

A few more words about the physics.

The TOA imbalance is almost identical to the net energy flux into the ocean, because the heat capacity of the atmosphere is small. The net heat flux into the ocean varies rather strongly due to the El Niño -La Nina variability and other forms of variability that are present also in the models. Therefore N is not very stable. F is expected to be more stable. That’s possible, because surface temperatures vary due to the same processes that cause N to vary.

Whether the values of calculated from the formula used F are, indeed, more stable that the values of N can be checked. The authors write in their response

The paper Forster et al (2013) contains timeseries of the forcing obtained by this approach, but not those of TOA imbalance to compare with.

Roman: It may be again worth pointing out that a more physically relevant regression model would be:

ΔT = a + b ΔF + c /(α + κ) + ε

If Nic is right about circularity, will the b ΔF term account for all of the variance?

Frank, you may very well be right about the inappropriateness of the choice of model in the paper, However, the problem with the circularity would still remain.

No, the b ΔF term would not account for all of the variance because the effect variable of ΔT is masked in ΔF by ΔN.

Pekka

Sorry to harp on about this, but if M&F choose to use F as reported by Forster, then they have to use the uncertainty term he reports too. In reality Forster doesn’t report F for any particular T and N, he reports N+adT. M&F should be using the latter.

On your own logic M&F are diagnosing what happens in the model database (thus justifying ignoring uncertainty in it); F isn’t in it and is derived from it and has uncertainty when so derived. In being derivative it is no different from internal variation so needs to get the same treatment.

Could I too ask that you put your arguments in equations. It removes ambiguity.

HAS,

I have said in several comments that I’m not defending the paper more generally. I have expressed doubts on its accuracy and reliability.

The reasons for my doubts include both the crudeness of the linear regression model, when it’s justified as an approximation of nonlinear formulas that cannot approximated well by a linear regression model over the relevant range of variables. The reasons include also the uncertainties in the determination of ΔF as well as other uncertainties in the validity of input assumptions like the constancy of α and κ for each model.

Hi Pekka

However above you clarified your defense of it (to the extent it exists) in terms of M&F just demonstrating attributes of the model database. My point is that even on that very narrow interpretation their methodology isn’t fit for purpose.

HAS,

I wouldn’t say that it isn’t fit for the purpose. I don’t have enough evidence to say that, but I can say that I’m not at the moment convinced of it’s value.

Yanis Varoufakis.

Surprised to see this post have so much comment.

Fig 2f from the paper showing the distribution of the regression residuals, looks to be a fairly classic gaussian distribution.

This seems to support a result that Willis Eschenbach reported a couple of years ago: that despite their immense complexity, all models are really doing is adding random noise to a linear trend. As I already commented above, it seems the despite all the huffing and puffing and inordinate investment of time and money we are still stuck with the naive paradigm of “trend” + “noise” of 30 years ago.

Greg, I cited your comment in CLB about the complex non-linearity of the feedback variables. I had a similar thought in a comment here 4 days ago but you make a much better authority. You should post a composition of your arguments on CLM. It will sit in moderation if it is your first post an go live within 12 hours. I am wondering if you can tell me how the error span on the GMT projection graphs get calculated. Is the M&F paper saying that the error gap is wide enough now to accommodate the pause or does their result necessitate the widening of the error wedge now plotted?

As part of an exchange with ATTP several days ago, (above at https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-751228) I speculated about the climate forcing/feedback contribution of clouds and whether it might, in some fashion, contaminate M&F’s analysis of feedback sensitivity (and simultaneously the model calculations of forcing).

It appears that Piers Forster got there first. He was co-author of “Cloud Adjustment and its Role in CO2 Radiative Forcing and Climate Sensitivity: A Review.” http://www.see.ed.ac.uk/~shs/Climate%20change/Geo-politics/IAGP/Forster%20cloud%20adjustment.pdf

Forster, et al., discuss the concept of non-feedback cloud adjustments to radiative forcing which are distinct from the aerosol-induced cloud formation forcing that is always included in climate models. The paper also mentions that at least a few models now account for various estimates of these additional cloud adjustments (non-aerosol forcing).

Even after reading his interesting review, I’m still uncertain whether it matters with regard to my original speculation. But I wanted to post the link here in case anyone else cared to investigate cloud feedback/forcing estimates further.

In addition, M&F (2015) relies upon Forster, et al. (2013) (http://onlinelibrary.wiley.com/doi/10.1002/jgrd.50174/epdf ) to reconstruct radiative forcing from top-of-atmosphere radiative imbalances for a subset of CMIP5 models. Forster et al., (2013) explicitly considers rapid cloud adjustments in response to temperature changes due to initial forcing. So it appears that they considered the situation I was concerned with to some extent (at least for a portion of the CMIP5 ensemble), even though that is hard for me to determine from M&F (2015) itself. Indeed, at one point M&F state:

I seem to understand Pekka’s point. The dF value from the second equation is not fed back into the first one but rather obtained from CMIP5 data. However, I still fail to see how the analysis is still not circular. If the estimated value of dF from CMIP5 is a really good estimate, that is dF is indeed dF, and that the second equation linking dF to dN and dT holds true then the model thus obtained has an inbuilt circularity. It does not matter where the values of dF come from as long as in the context of this analysis they are still regarded to legitimately represent dF.

Pekka’s point about dF not being calculated from dN and dT seems to bear no importance since from M&F’s main assumptions, the dF from CMIP5 (M&F model) and the dF from calculations using dN and dT purportedly RomanM’s model) are one and the same.

I would think that for Pekka’s responses to RomanM to proceed he would have to demonstrate his arguments in mathematical form as RomanM has done in showing the effects of circularity.

Kenneth,

There’s no extra mathematics. That’s exactly the point. Nic and Roman have made extra assumptions that make the model more complicated. They have introduced circularity, when there isn’t any.

All input to the calculation of ΔF is from the CMIP5 database. It never changes, and therefore does not create any extra mathematics on top of the basic defining formula.

I have presented the basic formulas very many times, and so have Nic and Roman, but they have not stopped at that, but added the circularity.

I have more than one requests for presenting the formulas.

They are just these two

ΔT = a + b ΔF + c α + d κ + e, (1)

which is the regression model with e as residual, and

ΔF = ΔN(CMIP5) + α(CMIP5) ΔT(CMIP5), (2)

which is used only once for every CMIP5 model run included and every period using ΔN, ΔT, and α determined from the CMIP5 database. (CMIP5, in the formula is added to emphasize that). This step is done in Forster (2013).

The first equation is analyzed in the M&F paper by estimating first the coefficients by OLS and then to calculate each of the components as well as the residuals for use in the graphics and other reported results. Actually the values of all variables are fixed through the whole calculation, when the formula is written as in (1), only the regression coefficients and residuals are output values determined by the process.

The main point that differs from Nic’s arguments is that the only consistent way of using (2) is to use it only once before the analysis. It’s not expected to be valid, when ΔT is not the full value from the same model run as ΔN. Therefore it cannot be used with an value of ΔT estimated from the regression formula.

So you are saying that equation (1) is NOT ΔT(CMIP5) = a + b ΔF + c α + d κ + e.

OK, I’ll bite. How is the ΔT in equation (1) different from the ΔT(CMIP5) in equation (2)?

No, I wrote

Actually the values of all variables are fixed through the whole calculation. Thus it is the same value, when residual is included in the formula.The formula without the residual could be used to predict the value of ΔT for any chosen value of ΔF, α, and κ. Thus the formula is valid without the residual for any value within a range bound by some limits of applicability. In that sense it differs from (2) which cannot be used at all for other values than those from the CMIP5 database, because the full ΔT including “weather” is available fro those values only.

Is this climate science

^{TM}statistics?Roman,

I have realized that I could have written the formulas more systematically, essentially as you wrote in your reply to Frank, or just copying the formula (4) and the preceding unnumbered formula from the paper of M&F. (The latter is also formula (3) in Nic’s post.)

That does, however, not change the content, which is very simple and straightforward.

Pekka,

Your comment posted Feb 14, 2015 at 5:01 PM, seems to a) not answer Roman’s question, and b) is impossible for me to understand. Please clarify: Is or is not ‘ΔT(CMIP5)’ in equation #2, the same as ‘ΔT’ in equation #1?

Hi Pekka, I agree with you that if the forcing that was derived at each time interval and for each model is identical to the forcing used in each realization (model run) for that model whose feedback settings were also identical then the T would cancel symmetrically. Is that what you are saying?

The forcings are in that analysis always the ones extracted from CMIP5 data.

The regression formula could be used also for other externally specified forcings, but the analysis of M&F does not involve such use.

I may try to clarify some points later, but anymore tonight.

I just posted this on CLB:

Although there is disagreement on whether one can excuse avoidance of statistical orthodoxy, as I believe some are saying, by the circumstance of the physics being represented, I think it is universally agreed to be important that the physics be accurately represented mathematically. It has been pointed out by Greg Goodman at https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-751840 that the feedbacks in the models are not linear relationships. The ocean surface heat flux is known to oscillate as well as to become less responsive as temperature equilibrium with the air is approached. In addition, the change in TOA imbalance (TOAI) is itself non-linear as shown in the NASA graph at the link. Indeed this complex relationship made the forcing’s derivation a difficult task as the authors stated it is. And, it is another point of uncertainty as to whether all the model assumptions and the author’s interpretations were correct.

Do the author’s not agree that the models, as complex as they have become, do not approach nature’s complexity yet to be modeled? In fact, the authors conclude that this unknown portion, dubbed “natural variability,” is dominant over the models in the 15-year period. But isn’t it true that the models have been constructed with 56 groups of guesses based on trying to duplicate the behavior of GST over the past 150 years? And, as Nic Stokes pointed out, since most of the models have the ocean oscillations out of phase with each other the result is basically a linear guess of forcing mixed with artificially generated amplitude of noise? Are we to understand that the purpose of this paper and all the work being done to analyze is validity, in the end, is to see whether in fact the models have had enough noise added or if they need more?

While it was I that pointed it out here, the credit for this observation should go to Paul_K who originally made the point and did the work:

http://rankexploits.com/musings/2012/the-arbitrariness-of-the-ipcc-feedback-calculations/

There’s a band waiting in the wings that can bring all the cowbell you might desire.

==============

I think I understand Pekka to be saying that it is sufficient to show that without “feedback” changing any variable values during the regression there is no circularity or maybe that there remains circularity but with no or little effect on the regression results – I am not sure which.

As a layperson I was not aware that once a regression calculation commences that the values of the variables could change (without some kind of iteration process) and that would be the case – with or without circularity. The damage of circularity would appear to me to have been done before the regression is calculated. Are there any examples in the literature that would pertain to this particular discussion?

I found this one at a web site with reference to the inflated R value for regressions involving GDP on both sides of the regression equation.

https://statswithcats.wordpress.com/2011/04/24/regression-fantasies-part-ii/

Kenneth,

That’s a correct interpretation of what I have written.

There’s no circularity of the type discussed by Nic, if we assume that the difference ΔF – ΔN is determined by the actual surface temperature of the CMIP5 model run. This assumption is physically natural as an approximation and studied by Piers Forster and his collaborators in many earlier papers. The relationship has been found very good in some models, worse in some others (the observations I know are based on earlier CMIP3 runs, Forster and Taylor 2006).

It’s quite possible that using both the observed and the predicted temperatures would explain the difference slightly better. If that’s a real phenomenon (not only a spurious signal that comes out of every regression at statistically insignificant level) we would have a feedback related to this correction. Such a feedback would almost certainly be so weak that it adds little to the other uncertainties of the approach.

It’s worth noticing that Piers Forster is a real expert on these issues. He has already studied most of the issues we might propose as amateurs in this field. A real expert may make misjudgments, but amateurs are much more likely to make them.

Pekka,

Your claim “There’s no circularity of the type discussed by Nic” is plain wrong. One day you will I am sure realise that. There is no point you just repeating your claims based on approximate physically-based relationships.

There clearly is an element of circularity of the type I have pointed out. But it is certainly possible that other shortcomings in the methods used, of which I pointed out several in my article, may be equally or even more important. The decision to analyse overlapping 62 year trends rather than the trend over the entire analysis period also makes a significant difference.

Piers Forster is indeed an expert in forcings. However, I do not think he would claim to be an expert in statistical methods.

Nic,

Defining the problem is an issue of the subject science, climate science and its subfield of climate modelling in this case. Solving the problem is an issue of the method science. The subject science defines in this case the problem in the way that there isn’t any feedback. That’s explicitly true. When you follow the steps, circularity never enters.

I have explained, what is the point where you introduce erroneously circularity to the calculation.

You define your own problem that’s not the same M&F analyze. As experts of the subject science they have justified their choices. You refer to statistics, but statistics has nothing to say on this point. Referring to that is a moot argument.

You have not presented a single argument to show that their choice is not well justified on that issue. You have discussed other issues that I also consider relevant issues and that Marotzke and Forster have not contested.

Would it be possible to set up simulations with synthetic data to quantify any effect here? That would be much more useful than appeals to authority such as

Nature and mathematics can confound even the most erudite of experts.

Tom,

Only the full climate models themselves can tell reliably about their properties. Because new model runs are possible, and because it is possible to collect more data from the new model runs, it is possible, in principle, to find the answer. In practice this is probably not a sufficient reason for making those model runs.

Climate models will be run again, and more data will be collected. There are surely many proposals for the additions to the set of data that gets collected from the next round of model runs.

I was considering the significance or lack of it of the circularity and the approximation used to separate α and κ. These effects have been claimed by some to be significant and by others trivial. It would be useful to see a quantitative analysis that concentrated on the mathematics used in the M&F paper which would clarify if M&F procedures are able to produce useful results.

Tom,

My answer applies fully to that. Assuming (counter factually) that all the required model runs had been done and all relevant data collected, it’s possible to figure out, whether the difference ΔF-ΔN correlates well with the ΔT from the same model runs (ΔT(CMIP)) or has a better correlation with the ΔT predicted by an appropriate regression model (ΔT(Regr)) derived from the model runs.

If it correlates well with ΔT(CMIP) and adding ΔT(Regr) as an additional explanatory variable does not significantly improve the ability to predict ΔF-ΔN, then the additional calculations prove that M&F are right and no circularity is observed. In the opposite case circularity seems to be present.

The earlier work that Piers Forsters and others have done tells that their assumption is justified. It does not prove that it’s the best that could be done, but lacking further information it’s the natural choice to make. There isn’t any justification to modify the model in the way Nic has done. That’s actually on extreme modification, where only the predicted ΔT is taken into account, and that extreme modification is almost certainly worse than the assumption of M&F.

Hi Pekka

Are you basically saying that T actual doesn’t appear anywhere in the derivation of F (right back through Foster and into the model outputs used), so there is no circularity when T actual gets to be used by M&F to estimate internal variability?

HAS,

The values of T given in the CMIP5 database are used to calculate F. When that’s done F is fixed as if it had been listed in the database and taken directly from there. No calculated value of T enters at any stage in the determination of F, only the one taken from the database.

Another related issue is that each value of N is used only once, when F is determined, and that was done in Forster (2013). N enters nowhere in M&F, and that’s as it should be.

Pekka

I think you answered “yes”?

marotzke-and-forsters

So what they find is that models make such a bad job of reproducing anything on a scale of 15 years or less that the errors swamp anything else. This is not particularly surprising since this is mostly injected noise to make the output look more “climatey” in the absence of any meaningful short-term modelling of climate.

They have already noted that error swamps the shorter records, so including this in “either”, though semantically accurate, is misleading. The shorter records are not informative of anything except their own lack value.

For the longer records, this is simply to remark the degree of bias selection that is present. Models that are presented in CMIP have all been pre-selected and tweaked to reproduce was well as they can the climate record. But there is nothing here that the orthodoxy would regards as “perturbed physics experiments”.

That there is “no traceable imprint” of the model climate feedback simply demonstrates that the range of values used in the sub-selection of models studies is so restrained as to be dominated by other variability and errors in the models.

This in no way proves or even suggest those values are correct.

It does show that there is such a degree of selection bias in the models chosen that they are totally uninformative about the significance of their climate sensitivity on any time scale.Since the main point of interest is CS, this means that the CMIP group of models in totally uninformative on this question and thus not fit for that purpose.

Quoted from the Last paragraph of the Max Plank news release: “The community of climatologists will greet this finding with relief, but perhaps also with some disappointment. It is now clear that it is not possible to make model predictions more accurate by tweaking them — randomness does not respond to tweaking.”

A robust science community would have appealed to these model experts to declare a re-evaluation of the (5-95%)error band, not that it has not been a moving target, and will continue to be as the models are improved (hopefully based on better physics rather than emulation). But as this study has simply expanded the uncertainty window and undefined amount it provides no value (IMO).

Yanis Varoufakis talking about mathematical economics:

Yanis Varoufakis also talking about mathematical economics:

As I am putting all the data I have into form to attempt to duplicate best I can the work of M&F, I have noted that from past analysis that the deterministic part of the temperature series (secular trends) for CMIP5 models tends to be very much alike for duplicate runs of the same model. The variations in trends resulting from duplicate runs of the same individual CMIP5 model would be very much the same for the deterministic part while these runs can produce very different trends from the noise, or natural variations as termed by M&F, component of the temperature series. Based on the use of duplicate runs in the M&F analysis, I am wondering what effect these differences in deterministic and noise components within a model would have on the results.

Kenneth,

Using the same model and same input data on all externally determined factors that cause forcings should result in the same predicted temperatures and same estimated forcings (ΔF) if the method works perfectly.

Thus your observation seems to be a partial confirmation of the correctness of their approach and evidence that there isn’t circularity that would result in erroneous results.

This would be the case for any regression. It cannot not confirm anything

Apparently you are not aware that the

correctlydone analysis also has the same property. This should have been obvious due to the fact that the same sum of squares is being minimized with respect to the same variables.RomanM,

Let’s have a closer look.

We have two model runs based on the same external input, but with different internal variability, i.e. with different residuals, different ΔT and different ΔN.

As the calculated temperatures are nearly the same, and the regression coefficients are common, we can conclude that also the ΔF values are essentially the same. As the internal variability and residuals are different, we know that the observed ΔT values and ΔN values are different in such a way that their differences cancel in the calculation of ΔF.

That’s exactly the no-circularity M&F case. That case is not consistent with the assumption that the difference ΔF-ΔN is determined by the predicted ΔT.

Roman:

First, a compliment…you are patient and competent and articulate.

Now, if the concept that the same variable shouldn’t show up on both sides of the equation in a regression scheme doesn’t strike one as…incorrect, then where can you go with the dialogue?

People who should know better, who would know better, feel its ok to violate the basic assumptions that form the foundation for statistical analysis.

The concept of ensemble means, as if the models were independent…

Using linear regression to model nonlinear functions….

Using decentered PCA,

looking at paleoproxies right side up, or upside down, depending on what gives you a better R-squared,

these are the hallmarks of this field today.

Its really no different than marketing derivative investments, made up of portfolios of sub prime mortgages, and arguing that the performance of all those sh#tty pieces of paper were actually independent from each other.

One violates the basic assumptions necessary for linear regressions at one’s own risk, or in this case our own risk.

The same

variableon both sides leads to an equation that must be solved for that variable. The same constant value on both sides leads to nothing special.For this analysis and for the physical model taken as starting point we have the same constant values on both sides during the task of determining the regression coefficients. That the constant values are marked by a symbol does not make their values vary as in an equation to be solved.

In the phase of producing predictions from the resulting formula we have directly ΔF on the right hand side. ΔN does not enter, neither does the

variableΔT appear on the right hand side. In this phase ΔF appears only on the left hand side.Pekka< aren't you surprised that the authors of the paper are not here nor at the lab defending their work?

pekka

I appreciate you engagement on this issue…we just dont see eye to eye, as it were…

delta T isn’t a constant, its a variable…used to comput an approximation of F, and then used on the left side of the equation is a regression scheme.

that you can’t, or won’t see this, is puzzling, not only to me, but to others who still teach in the field we are discussing.

Pekka:

You have me confused. By predicted dT I presume you mean the fitted values? It has never been suggested that the fitted values determines the dT used in Forsters derivation of dF has it? What has been suggested is that model dT is regressed on a linear function of model dT? Am I missing something?

Pekka, Thanks for your herculean devotion to help here. Just a few questions:

Wasn’t the difficult task and accomplishment of Forster (2013) the diagnosis of F by having to not just consider the TOA imbalance supplied by the model input spec. but to make adjustments to F to account for the feedbacks too? I think the confusion is that Forster (2013) tried to strip away the feedbacks from what the was labeled AF adjusted forcings to get back to RF radiative forcing. If he had not there would have been complete circularity. To the extent that he was successful could perhaps aid the 2015 paper’s objective (which is puzzling IMO). Now to the extent that Forster fails in his approximation of true RF that amount gets contributed to “natural variability,” which is not so bad except that is exactly what Forster 2015 is trying to quantify. I think if you clarify this it will answer a lot for a lot of people. Here is the link to Forster (2013) and the forcing issue is on lines 40-60ish. http://www.atmos.washington.edu/~mzelinka/Forster_etal_subm.pdf

I try to explain once more, how the M&F approach proceeds, and what is the physical assumption that makes it free from circularity. It is, indeed, dependent on an assumption or hypothesis, but this hypothesis is justified by physical understanding and earlier research.

The alternative that Nic proposes is based on a different hypothesis, and this different hypothesis lacks comparable support, and is actually not consistent with observations, including evidently also the calculations of Kenneth.

==========

We start with the model run ensemble CMIP5. The essential part of that ensemble consists of 75 model simulations of the climate history done using 18 different models. That includes 10 simulations from two models, less from the rest, only one from 4 models. In all model runs input data of factors that cause forcings is used to make the history to correspond approximately the real history of forcings.

Models results from other calculations done with the same models (like determining what results from quadrupling the CO2 concentration) have been analyzed to determine parameters α and κ for each model. These parameters are equal for each model run of the same model.

Now we switch to the history runs. Some of the calculated values are collected and stored in the CMIP5 database. For the 75 model runs included in this analysis those values include surface temperature T and TOA imbalance N, but they do not include forcings. (Other models were dropped due to missing data.) Forster et al determined Adjusted Forcings (called Effective Radiative Forcings, ERF in M&F). These forcings can be approximately determined from the TOA imbalance by subtracting the influence of warming of the surface relative to a reference period. The surface radiates the warmer it is at the moment. That doesn’t depend on the cause of the temperature. Internal variability is of equal importance as longer term trends. Therefore the right subtraction is based on the actual temperature that is found from the database. The coefficient that applies to this calculation is α:

F = N – αT (1)

Here the values are deviations from the reference period, where all are 0 by definition.

It’s essential to notice that the logic of this subtraction requires that T is the real temperature at the moment as stored in the CMIP5 database. It cannot be changed or recalculated by the regression model without making the formula fail. The formula is not exact, but it fails definitely badly if a value of T that’s influenced strongly by internal variability is replaced by the estimated average value for that particular time without the contribution of the variability. Emission does not know about some average, it’s determined by the actual temperature. I emphasize this point so much, because this is the source of the controversy here.

Now it is possible to start the regression analysis. M&F present the hypothesis that the average surface temperature excluding influence of internal variability varies in the different models so that the trend of the model i over a given period j can be estimated from the linear formula

ΔT[i,j] = a[j] + b[j]ΔF[i,j] + c[j]α[i] + d[j]κ[i] (2)

We see from the indices that the coefficients a, b, c, and d are the same for every model, when the period is the same, but different for each period. α and κ are different for each model, but the same for every period. ΔT and ΔF depend on both the period and the model.

This formula gives a prediction for the temperature trend. That’s the expected average temperature for the case, if the model is correct. The coefficients a, b, c, and d are determined by the requirement that the combined deviation of the predictions from the observed values of the database is as small as possible. This means in practice that the sum of squares of the predictions is minimized (this is OLS or ordinary least squares analysis). I denote the values found by capital letters A, B, C, and D. Now we can calculate for each case the predicted value

ΔTpred[i,j] = A[j] + B[j]ΔF[i,j] + C[j]α[i] + D[j]κ[i] (3)

We can also estimate, how much internal variability has contributed to the observed values as the difference

ε[I,i] = ΔT[i,j] – ΔTpred[i,j] (4)

where ΔT[i,j] is the observed value.

Where is the circularity? It’s not anywhere in this calculation, and this calculation is correct, when it is assumed that forcings must be determined by the formula (1) using the real observed temperatures as I argued, it must be determined. Nic introduced circularity by assuming that forcings must be redetermined from TOA imbalance using the predicted temperatures. When that assumption is put in the formulas ΔTpred[i,j] occurs on both sides of the formula that replaces formula (3), but that’s wrong. Forcings must be calculated using observed temperatures that contain the contribution of internal variability and that correspond to the same physical case than the value of N, not from predicted temperatures that contain only the average that didn’t really occur and that would have changed N if it had occurred.

“Where is the circularity?”

In equation 1.

M&F use primed variables to indicate variation from the ensemble and they regress on that value. Do you agree? Not that it changes the circularity issue.

The circularity is obvious if you substitute for ΔF[i,j] using the Forster et al formula:ΔF[i,j] = ΔN[I,j] – αΔT[i,j] , which is equivalent, but using your more clear notation.

Now when you calculate the error:

ε[I,i] = ΔT[i,j] – ΔTpred[i,j]

The actual calculation is:

ε[I,i] = ΔT[i,j] –( A[j] + B[j]{ΔN[I,j] – αΔT[i,j] }+ C[j]α[i] + D[j]κ[i] )

In other words, the predicted ΔT depends on the modeled ΔT, and the calculated ‘error’ comes from subtracting a linear function of the modeled ΔT from itself. In other words, circular.

Steve,

I made my best to explain that it’s

wrongto substitute for ΔF[i,j] a formula that contains any other ΔT[i,j] than that obtained directly from the database. The database contains that value of ΔT[i,j] that gives the best estimate for difference between ΔF[i,j] and ΔN[i,j].Because that substitution was done already in Forster (2013) and can never change, the whole formula is not needed in M&F. Picking the value ΔF[i,j] from there, is all that’s needed.

Whether or not you explicitly write delta F as a function of delta T, does not the variation of delta T occur in delta F? And what’s worse, do not random errors in the estimate of delta T propagate into delta F? I had a correspondence with ATTP at CLB in which he asserts that delta T is statistically independent of delta F, but I do not understand the assertion at all.

I would tend to agree that there is no pernicious circularity here if I understood that dF is uncorrelated to dT, possibly because dF = dN – adT asserts some physics identity that necessarily cancels the covariance of dT with dF. Even then, in the presence of measurement error, the equation itself will create covariance between dT and dF.

rwnj,

Many kind of correlations and relationships between variables are present in this analysis. Correlations that relate ΔT to ΔF, α and &kappa are the subject of the study.

ΔN is not part of the physical model being studied and not subject of this study. It has entered only in earlier step of another paper (Forster et al 2013), where the value ΔF has in the CMIP5 model runs is determined with help of the recorded values of N. The determined value of ΔF is by nature input data to the M&F regression analysis as are α and κ. All this input data is determined through procedures that involve the values of surface temperature as well as other output that the CMIP5 models have produced. They are all properties of the models and model runs and they are in no way dependent on the results of further analysis done using them as input. The M&F published in 2015 does not change the results of Forster et al 2013.

There’s no circularity that goes back from 2015 to the earlier results of 2013 or tells about any needs to modify those earlier results. Circularity would mean that those earlier results must be modified.

Pekka:

Where has it been suggested that some other dT[i,j] that is not “obtained directly from the database” is being used to derive dF? Does your argument come down to this? I believe this goes back to Roman’s question to you from the other day. How are the values for dT obtained by F13 different from the values of dT used as predictand input for running the M&F regression? It seems to me that you might be confusing the fitted values (regression output) for the predictand.

You are off the mark here. Because M&F use the derivation of dF from F13, one

cansubstitute dN – dT[i,j] for dF to show the circularity in the regression. There is no escaping the circularity. The regression uses the same dT[i,j] values obtained from CMIP5 as both predictand and predictor (by virtue of the substitution) to perform the regression.layman…this point has been made numerous times in this thread, i fear you will also not be successful in articulating this to Pekka.

When you do the substitution you imply that the temperature in the substituted expression is used to determine F. If that temperature is from the database, nothing gets modified. If it’s something else the model is modified and a temperature is used that does not correspond to the value of N of the same substitution. When nothing gets modified, no circularity is introduced. The other causes circularity, but is against the physics based requirement that N and T must come from the same case.

Pekka

This suggests that your answer wasn’t “yes” above, you are acknowledging that T-model is used both to determine F and then again to estimate internal variability. You were using the term observed temperatures to refer to “observed in the models” not the actually observed T (my T_actual).

The problem then comes back to my earlier point that F from Foster isn’t the same as N+dT (where N and T are from the models). F is an estimate from linear regression so we have an error term to deal with. Either T is the same throughout in which case your statement that N is only used once breaks down (it bounces all over the place), or the effective T gets modified.

In either case things are getting modified, and as you acknowledge circularity is introduced.

Correct. And basically the F13 derivation right?

This is wrong. Nothing is modified wrt the regression. The circularity concern is in the regression. You obviously disagree with me so let me ask….again, how does the CMIP5 dT values used by F13 to derive dF differ from CMIP5 dT values used as predictand input of the regression in M&F?

Layman, It just sunk into me again this morning that there is no fast talking around the circularity. I mapped out the effect of T being different in 2013 as in 2015. It’s not good. It’s just algebra. We got blinded by statistics.🙂

Please reply at the bottom of the post on my formula derivation if you agree or need to correct anything.

-Ron

Pekka,

Thinking about your responses today, I realized that you are not understanding the circularity argument. Perhaps I can clarify.

You seems to be saying that since the values of ΔF do not change during the regression, but are fixed beforehand, there is no circularity. But

that is not the argument!The circularity is in the regression that estimates coefficients for terms, one of which is an explicit function of the regression’s independent variable!

Let me give a very basic example. Suppose you have a set of measurements Θ that represent some physical quantity. Suppose there is another physical quantity Λ that can be estimated from Θ as a result of conservation laws. To make this as obvious as possible, let’s make it a simple linear relationship:

Λ = α Θ

So you make estimates of the values of Λi from the measurements Θi and stick them in a database somewhere.

Now somebody proposes a new model that says that Θ is a linear function of several parameters, say Λ, Γ, and Ξ. the model then looks like this:

Θ = a0 + a1Λ + a2Γ + a3Ξ

You regress on the observed values for Θ and discover that the only parameter that matters is a1. Now, according to your posts above, you can validly claim that this counts as experimental evidence that Θ is a function of Λ, since the values of &Lambda were taken from a database and not changed during the regression!

Unfortunately, you would be wrong, as the regression I just described is perfectly circular. The conclusion that emerged was completely fallacious and did not depend in any way on the values of Λ changing during the regression.

My example is extreme, but

thatis the circularity we are discussing. And it most assuredly is present in the paper.Fizzymagic, I think you discovered the parrot is nailed to its perch. Look at the bottom of the thread and I wouldn’t mind if you could find that great old skit on youtube or somewhere and post it. Good night.

fizzymagic

I think the problem here is that people are asserting things (eg your penultimate para) without demonstrating them.

The simple way I’ve been trying to do that is to draw the distinction (that your example hides) that the Λ calculated by your first equation is only an estimate of the Λ used in the second. If your first equation was an identity then you could gather terms together in your second equation and happily go on your way.

It is because it isn’t an identity that it breaks down. Your first equation has an error term/residue and if you want to use Eq 1 to help with Eq. 2 you have to include it. So the system becomes:

Λ = α Θ + residue

Θ = a0 + a1(α Θ + residue) + a2Γ + a3Ξ

My hope in doing this is that one doesn’t need to simply assert things that depends on a knowledge of stats to see the problem.

I did in my latest lengthier comment an attempt to explain the case as clearly and thoroughly as I can. The later discussion shows that I cannot explain that any better by additional comments. I give up on trying. It’s all in that comment.

One detail that I might still try to clarify concerns misunderstandings related to the use of words

modelandobserved.Modelmay refer both to the original GCM’s, whose results are in the CMIP5 database and to the regression model. It seems that some of my statements have been interpreted to refer to the regression model, when they have referred to the archived GCM results.Observedis in this analysis never observed from the real world, it’s observed in the GSM results stored in CMIP5 database. I have used in several situation the worldmodelin connection of such values. I have tried to make it clear that the values are from GCM runs and thus alsoobserved, but evidently failed in some cases. This ambiguity could be corrected that by saying that ΔF must be always calculated from observed values of CMIP5 database, never from anything that comes out from the regression model.Pekka, thanks for your participation in this discussion. Obviously there is argument and disagreement but also respect. This is a discussion thread that will be bookmarked by many.

Pekka

Regrettably Foster 2013 calculate F from a regression model using values from the CMIP5 database. It isn’t an observation from that database.

It still isn’t clear to me if you regard this as acceptable on not.

(I should note that F wasn’t the dependent variable in that model, N was – that adds another complications – but for the sake of exposition we can put that aside).

I realized that there’s one crucial point that I have emphasized in some earlier comments but not in my latest long comment. This long comment is actually misleading on this point, as I used ε[i,j] as symbol of the residual. I should have kept my earlier e[i,j] (the indices where also erroneous) and emphasized that e is not random error or noise typical for most use of regression but as real physical contribution as everything else, internal variability of unknown nature that affects the values of N observed from the GCM run.

The hypothesis is that the original GSM runs produce results that can be expressed as:

ΔT = ΔTpred(ΔFreal, α, κ) + e (1)

ΔN = ΔFreal – α ΔT + error (2)

where error is the inaccuracy of the second formula. Thus also

ΔN = ΔFreal – α(ΔTpred(ΔFreal, α, κ) + e) + error (3)

When (2) is inserted in the formula used in Forster (2013)

ΔFest = ΔN + αΔT (4)

we get

ΔFest = ΔFreal + error (5)

We see that under this physical assumption the term

erroris left in formula (5), but otherwise the value of estimated ΔF depends only on the real ΔF. The terms that would have caused circularity cancel out at this level.This is the model of M&F.

What Pekka was trying to tell us earlier was that in fact the ΔF which in reality is an estimate of the actual forcing in the model was being treated as the exact value of the forcing in the model at that stage. Because of the relationship ΔF = ΔN + αΔT, this implied that for a fixed ΔF, a change of δ in ΔT either in the modelling process or in the analysis procedures must correspond to

an exact changeof -αδ in ΔN thereby supposedly masking some of the circularity from the use of the same data in the derivation of the estimating equation earlier.What happens if we do not ignore the error in ΔF? I will use my own notation rather than what Pekka used to minimize any confusion. Let ΔF = ΔF_real + φ where φ (phi) is the error in estimating the actual forcing, ΔF_real. Both ΔF_real and φ depend only on the specific model they come from and it can be noted that the distribution of the φ’s may differ from model to model. Now we look at the equation used by M and F:

ΔT = a + b ΔF + c α + d κ + ε

This equation assumes that a very specific set of models is being considered. The values of a, b, c and d depend strongly on the specific models chosen and the ε’s are supposed to be only the “internal variation” from the individuals model ΔT’s after accounting for various characteristics of the model. If we now include the errors in ΔF and rearrange terms:

ΔT = a + b(ΔF_real + φ) + c α + d κ + ε = a + b(ΔF_real) + c α + d κ + (ε + bφ)

The “internal variation” of the models has become conflated with the error in the estimation of the ΔF’s and the resulting residuals from the regression procedure will overestimate that internal variation. The fact that the φ’s have possibly different distributions will produce heteroscedasticity (unequal variance in the random portion of the regression equation) meaning that some models may inordinately dominate the regression procedure calculations.

This does not reflect well on the claimed “robustness” of the results in the paper.

HAS,

You totally, completely, absolutely missed the point. Actually, in much the same way as Pekka seems to.

I am not asserting anything that someone who has taken an introductory statistics class should not understand.

The residuals from the initial estimate have zero impact on the circularity. Zero. None. They could only be significant if you knew what they were, but you don’t, which is the entire reason for using the estimated values for (in my example) Λ.

Because you have already used the values of Θ to estimate the values of Λ, you cannot turn around and attempt to estimate (via regression) the values of Θ again from the estimated values of Λ. That is circular. It has nothing to do with residuals are the quality of the estimates or the fact that the estimates are fixed in the regression or anything else.

This is stuff any student who has completed a basic statistics course should know. I am absolutely stunned that people who are apparently serious researchers in climate science cannot grasp it!

Roman,

I think we agree now on the technicalities. At least I understood your comment in a way that’s consistent with my thinking.

The model that M&F propose is based in hypotheses that have not been tested and probably cannot be tested without further GCM model runs. M&F have presented some justification for their hypotheses, but some justification is not equivalent to good evidence for the sufficient quantitative validity of the models.

I dispute RomanM’s idea that the fact that for ΔFest = ΔN + αΔT , ΔFest is assumed to be exactly ΔF_real means that the ΔT terms are effectively cancelled on further substitution as asserted by Pekka. The goal is to quantify the error e from the regression so as to assess internal variability. Recalling Pekka’s proposed means of cancelling the ΔT terms:

ΔN = ΔFreal – α ΔT + error (1)

Forster’s approximation:

ΔFest = ΔN + αΔT (2)

Now,

ΔFest = ΔF_real + e*

From M&F standing assumption:

e* ~ 0

Thus, as used in the M&F model,

ΔFest = ΔF_real

ΔN = ΔF_real – α ΔT + error

= ΔFest – α ΔT + error (3)

M&F regression equation:

ΔT = ΔTpred(ΔFreal, α, κ) + e (4)

Rearranging RHS

ΔT = a + b ΔFest + c α + d κ + ε

= a + b(ΔN + αΔT) + c + d κ + ε (5)

After determining coefficents:

X = A + BΔFest + Cα + Dκ + ε

However, because of the obvious circularity in (5), the OLS breaks down and X, whatever it is, is not identical to ΔT.

What if OLS was still applicable in the face of circularity and that values can be fitted? Well, Pekka’s proof will proceed unimpeded:

ΔT = X

= A + BΔFest + Cα + Dκ + ε (6)

ΔN = ΔF_real – α(A + BΔFest + Cα + Dκ + ε) + error

= ΔF_real – α(ΔT) + error

= ΔFest – α(ΔT) (7)

In which case, using (2),

ΔFest = ΔF_real + e* with no contribution from terms leading to circularity.

However, as hammered in since the start, the problem is mathematical and not physical. Since the OLS does break down due to circularity X is not equal to ΔT and thus cancellation cannot proceed. Note that this is due to the fact that OLS is susceptible to circularity and if another method was used or the equations were rearranged in less problematic ways, M&F procedure would not be subjected to the circularity objection.

First of all, so that these are in one place:

Gregory 2004 which introduces the N=F+aT equation.

Forster & Gregory 2006 which expands the equation (then ignores the expanded terms) and applies it to observations of N, F, and T to get a.

Forster & Taylor 2006 which analyzes 20 GCMs.

Forster et al 2013 which analyzes CMIP5 GCMs.

Pekka, you say that Forster’s equation is “ΔFest = ΔN + αΔT”. F2013 calls “Fest” in the equation AF, or “adjusted forcing”. FT2006 called it “climate forcing”. Climate forcing in FT2006 is described as containing the forcings plus internal feedbacks, scaled with a efficacy factor. So it seems to me that the calculated F is not a physically pure estimate of forcings at TOA, but contains information about the model’s response to the true external forcings.

fizzymagic @ Feb 16, 2015 at 6:15 AM

Hi

While what you said might be obvious to you much of what has been going to and fro on this post has been punters just asserting things. What’s needed is some demonstration of why stats says this is a problem and the fact of the error term helps make that point explicitly. At least it has led to Pekka and Roman getting to some form of agreement, although I still not sure that Pekka has accepted that this represents a methodological error, just that a hypothesis is as yet unproven.

And I’m sorry but it is the error term that causes the problem – if your first equation was an identity (no error) there would be no problem doing the substitution. Think of a chnage in units as an example.

Equation (1):

ΔT* = a + b.ΔF + + cα + dκ + ε

Equation (2):

ΔF = ΔN – α.ΔT**

The claim is not that fitted values of dT obtained from the regression is fed back to equation 2 to calculate dF. This is a red-herring.

The claim is that, for the requisite time periods,

ΔT* = ΔT** (3)

If the identity (3) is true, then there is an obvious circularity. As an aside, I also find it self-serving that any discrepancy ε is apriori defined as internal variability: it seems that M&F are working on the basic asumption that the models are right while at the same time investigating whether they are indeed representative of observed climate (they are claiming in effect that models = real trend + unquantified noise as opposed to models = wrong trend + unquantified noise. They are testing whether the first proposition is possibly true as opposed to testing whether it is obligatory true and that the second one is plain wrong). This does not warrant their conclusions imo.

andersongo,

They are not assuming that the GSMs are right, they are studying the properties of the ensemble of GSMs that may be right or wrong.

What they are assuming is that it’s a good enough approximation to use the formula

ΔF – ΔN = αΔT

as valid for the GSMs considered.

They have support for that assumption from their earlier studies, but they acknowledge that the formula is not exact and contributes to an error.

Pekka, Andersongo, I’m afraid the only forcing involved has been that of a psychological experiment of sorts. We have failed to accept the obvious: This is a circular equation.

It matters not about anything else once the right hand’s value depends on the left’s, but the left’s depend’s on the right. It doesn’t matter if you know you have one less fingernail on the left you still cannot say how many fingers you have or hands for that matter. It is the definition of a circular equation. “This is an ex-equation.” “Hello Polly!”

R Graf,

Every equation can be made apparently circular by adding terms that cancel out. Such circularity is spurious. That’s the case also here. There’s apparent circularity, but that’s spurious.

Nic added terms by inserting a formula to replace one variable. My above comment explains, how a further insertion can be made in the insertion of Nic. Doing both insertions, the circularity cancels out. Thus it was spurious, not real under the physics based hypothesis of M&F.

Pekka, thanks for your reply. I understand what you say and it is in fact the basis of algebra to add terms to both sides in order to help simplify. But you need at at least one completely independent variable. You need a valid equation underneath. That is the rub.

R Graf,

I included in the equation the term error. I included also the term e in another place. When these terms are included, the equations are exact. We can follow, how these unknown terms affect the outcome. I have shown that.

To the extent the term

errorcontains dependence on other variables, it may cause circularity, but that’s not the circularity that Nic has presented.Pekka, Have you looked at my algebra at the bottom of the post? There is no independent variable. Mathematics requires one.

“The first principle is that you must not fool yourself and that you are the easiest to fool.”

-R. Feynman

It’s pining for the ΔFjorcings.

Pekka @ Feb 16, 2015 at 6:10 AM (and andersongo from earlier thread)

“… they are assuming is that it’s a good enough approximation to use the formula”

Actually Forster (2013) reports the errors in the estimate of F and they are not insignificant.

“It’s essential to notice that the logic of this subtraction requires that T is the real temperature ………….where ΔT[i,j] is the observed value…….”

I have decided to go a different route with my analysis of the M&F paper result in partitioning the deterministic and noise parts of the historical temperature series. I have found that going with the original plan would require downloading the appropriate rsdt, rsut and rlut data from dkrz for the historical CMIP5 model runs. I have done this for the RCP4.5 runs and it was a lengthy and time consuming task for me. I would download the data from KNMI but I have had differences in the converted gridded data in nc files to global means between my latitude weighting and KNMI’s. I have lost contact with KNMI over the past week or so on this issue after explaining what I thought was the source of the difference. KNMI had the RCP and piControl runs in a form for easy automated downloading of the radiation variables, but unfortunately not for the historical runs.

My new plan is to decompose/reconstruct the CMIP5 historical temperature series into secular trend which does not assume it is linear and into cyclical and red/white noise residuals without making assumptions that could confound the variation from the various series components. I’ll use singular spectrum analysis with functions from R. I will determine the differences in components from model to model (and/or runs) in the manner of M&F and on an individual model and model run basis.

“The alternative that Nic proposes is based on a different hypothesis, and this different hypothesis lacks comparable support, and is actually not consistent with observations, including evidently also the calculations of Kenneth.”

My calculations only show that for a given model with multiple runs the net TOA radiation outputs (accumulation rate or trends) are very much the same as well as the trends in the potential global sea water temperature (OHC changes). From model to model these outputs are in most cases very different. These findings are based on analysis of RCP4.5 runs – which showed that most CMIP5 models do not balance the TOA energy budget given that most of that difference should show in the difference in OHC.

I do not see how this finding bears on your argument here. I only posed it because it is my understanding that M&F used multiple model runs where they existed.

Kenneth,

I explained in my reply to RomanM, how I understood your earlier comment, and what that would imply. Now I’m confused about, what you did actually observe. I don’t know, whether my earlier interpretation is correct or not.

In short, how I understood your earlier comment is:

1) You have repeated the analysis of M&F determining the regression coefficients.

2) You have observed that model runs based on the same model produce predicted temperature trends that match essentially more closely than the temperature trends determined from the original data of the CMIP database for the same models.

Is that correct, or did I misunderstand what you have done and observed?

No, it is not correct. My comments on what I did are as I posted above. I have not repeated M&F and in fact do not plan to do it unless I can readily get my hands on the historical forcing data. I believe SteveM has requested that data from M&F. On further thought it may not be a good idea to repeat a flawed method except to look in better detail at the raw data used. I am going to try an alternative method (SSA) to do what M&F attempted, i.e. separate the deterministic part from the internal variability parts – and I think with fewer assumption. Selecting principle components will be, I think, my main problem.

Interesting the Michael Mann started his climate science career doing spectral analysis on climate series and more or less stopped after publishing the hockey stick paper.

Kenneth,

In that case all my references to your calculation can be forgotten.

My misunderstanding resulted, however in the realization that comparing model runs that differ only by the internal variability (i.e. by the initial state so that internal variability ends up different) would offer a test of the accuracy of the approach. The test were not a full test, but an interesting partial test anyway. I don’t know, whether the model runs contain such repetitions. Alternatively the input forcings may differ, when the same model is used in several runs. That would not provide for an equally easily interpretable test.

Pekka,

I will leave to Roman and the statistics people the acceptability of performing linear regression on non-linear variables in the manner done, and problems with having your independent variable derived from a derivation of essentially the same equation backwards earlier. But if I am not understanding the following many are not. My question is not Nic’s. Mine involves your assumptions when you wrote above:

“These forcings can be approximately determined from the TOA imbalance by subtracting the influence of warming of the surface relative to a reference period. The surface radiates the warmer it is at the moment. That doesn’t depend on the cause of the temperature. “

The conservation of energy equation:

F = N – αT (1)

Radiative forcing (F) = TOA imbalance (N) – temperature (T) * feedbacks (a, k, and unknown)

Forster approximates TOAI by simulating a massive (4x) dose of CO2 and assuming you will see a maximum imbalance and can diagnose the relationship between T, N and feedback components. This is his expertise. Forster employs linear regression here and perhaps intuition and unknown abilities to coax out the values of N and feedbacks. Let’s remember in real life these feedbacks are non-linear, some longterm, some short-term, some permanent but temperature dependent. And from all this deduces F, or RF, radiative forcing.

Now two years later the degree of feedback and RF becomes increasingly important as the pause continues so we need to see if the models are still performing as programmed or do they need to be once again adjusted, like from to CMIP5 to CMIP7 (or something).

So M&F get together and use some existing calculated F values from 2013, toss a couple, and create 4 fresh ones. By the way, they add 16 new models so I don’t understand why they didn’t need to make 16 new forcings. They do 114 runs to create new data and now instead of deducing all the RF and feedbacks the way Forster did they do it by direct equation. The obvious question becomes what is the innovation over Forster 2013? Why don’t they diagnose feedbacks the way Forster did before? Why use the equation that just oversimplifies and compounds and earlier error?

My point is that any error that is compounded will be attributed to “natural variability,” (which are unknown feedbacks.) Getting good clean numbers on natural variability seems particularly relevant since this again was a major point of the exercise.

R Graf,

My earlier comment applies to other concerns on this paper.

I’m not sure if somebody has already done this but I put the assumption that T from Forster 2103 is the same as 2105 with an added error from diagnosis. This is not well behaved (IMO). See here:

F = Ta+N Forster (2013)

where F is forcing, T is temp, a is feedbacks and N is TOA imbalance.

T = F / (a+k) +e M&F (2015)

where T is temp, F is forcing, a + k is feedback and e is variability

Now, combining a, k and e as all being known or unknown feedbacks and substituting the 2013 equation into the 2015 and adding e now as the error in Forster’s diagnosis of T 2013 we have:

T = [(T+e)a + N]/a

multiplying both sides by the feedback a

Ta = Ta + ea + N

simplifying for a, to see what the Forster 2013 error does:

a = -N/e

Where e is must be negative only or the feedback flips to a negative (equating to a positive physical feedback like water vapor was thought to be).

The higher the error the lower the feedback and the anomaly will be assigned to natural variability.

As N decreases in later years feedback will diminish and the anomally will be assinged to natural variability.

If you keep the variability as separate from feedback the error becomes proportional to variability and simplifies to this:

e – N/a = V where V is variability.

Here’s the work:

T = F / (a+k) + V

T = [(T+e)a + N]/a + V

Ta = Ta + ea + N + Va

ea – N = Va

e – N/a = V

R Graf

“… and adding e now as the error in Forster’s diagnosis of T 2013 we have:”

I’m confused by this – in my reading of F(2013) they say:

“As in Andrews et al. [2012b], this analysis uses the CMIP5 abrupt 4xCO2 simulations and regresses N against ΔT to diagnose the 4xCO2 AF as an intercept term and a as the slope of the regression line.”

Thus the error term in Forster’s diagnosis is in AF, not T, isn’t it?

The abrupt 4XCO2 simulations were to aid in diagnosing the radiative forcing component out of the overall adjusted forcing which included feedbacks. So they new if they over-forced the model with 4XCO2 the response of TOA imbalance (N) would be much stronger than the feedback which is much slower to respond. They then could run regression analysis, which obviously I am less than knowledgeable about, in order to end up with radiative forcing separated from (a) alpha which is both air and ocean feedbacks (combined for simplicity). The equation resulted: F = Ta + N

Then in 2015 M&F needed an equation that included the feedbacks and variability all broken out so they could run linear regression on them. So they went back to the energy balance and looked to see what they could do and came up with their 2015 equation:

T = F / (a+k) +e

Now the alpha (a) in the first equation is brought into the denominator of the other side and ocean and air are separate because CMIP5 is proud they now account for the Pacific Decedal Oscillation or to give more terms; I am not sure. I simply combined them back for simplicity and changed (e) to a (V) for variability because I am using (e) now for the error in T that occurs from diagnosing it. Even though I am sure there is nobody more qualified on the planet than Forster to do so, he could not have nailed it exactly. And since T in the equation is the observed temperature even though it is from a model it is different than Forster’s diagnosed value. Thus I have:

T = [(T+e)a + N]/a + V

Here I just substituted the (F) from 2013 and show (e) as the error in diagnosis. Feedback is in both numerator as part of T, like in the 2013 equation and in the denominator from the 2015 equation. The rest you can see above.

They were getting variability in all sorts of ways but all meaningless. The equation an invalid. “It’s a dead parrot nailed to a perch” — Monty Python.

I can see in F (2013) they say they go on “to diagnose the time series for F in a

transient scenario run, using diagnostics of N and ΔT. In step 2 we substitute these a terms into equation (1), using N and ΔT diagnostics from various forced scenarios to compute each model’s AF.”

I had assumed that the “N and ΔT diagnostics from various forced scenarios” would be exactly what M&F used when they say: “we determine the extent to which the across-ensemble variations of DF, a and k contribute to the ensemble spread of GMST trends DT, using the 75-member subensemble of CMIP5 historical simulations for which radiative forcing information can be obtained from the

CMIP5 archive.”

So I thought the problem is with the dF error of estimation on Pekka’s logic, not the dT.

DF from 2013 contains both T plus N. You can place error, probably more appropriately, in both terms of F to get:

T= (Ta+N+e)/a +V

Ta = Ta + N + e +V

V = -e -V

I don’t see the parrot moving any.

They planed to use 2013 (F) on 2015 runs of mostly the same models. I think they just forgot that (T) was in (F) since they did not care about (T). Or, they just thought like some here that if you run it again it’s a new variable.

We were just discussing the colour of the parrot.

I messed up the end of the end of that equation.

T= (Ta+N+e)/a +V

Ta= Ta + N + e +Va

V= -(N+e)/a

still nonsense..

Pekka:

“…ΔFest = ΔFreal + error (5)

We see that under this physical assumption the term error is left in formula (5), but otherwise the value of estimated ΔF depends only on the real ΔF. The terms that would have caused circularity cancel out at this level. ”

And how is that error calculated? By regressing ΔT on a linear function of itself.

It’s not calculated. It’s not known.

One hypothesis of the analysis of M&F is that it’s so small that the results remain useful. That’s part of the hypothesis. It’s justified by their earlier work, but not proven correct.

Justified but not proven correct…Is this climate science new-speak?

Also, once again, the claim is not that ΔT is fed back from equation (1) to equation (2)to obtain an estimate of ΔF. The claim is that ΔT is regressed from a linear function of itself since the ΔF estimate (since real ΔF is not available due to unquantifiable error) used in the regression is really a function of ΔT by virtue of ΔF – ΔN = αΔT. In this case, the α(ΔTpred(ΔFreal, α, κ) + e) term in your equation

ΔN = ΔFreal – α(ΔTpred(ΔFreal, α, κ) + e) + error seems dubious and cancellation of ΔT upon further substitution is thus not possible. Also the error e, where

e = ΔT – predicted ΔT

is then used as an indication of internal variability, is obviously affected by this circularity. By error I mean the error calculated from the regression and not the error term in

ΔFest = ΔFreal + error

Note that for the circularity to hold, M&F assumption that the error term in

ΔFest = ΔFreal + error

must be true.

“must be approximately zero” must be true. typo

Pekka says It’s essential to notice that the logic of this subtraction requires that T is the real temperature at the moment as stored in the CMIP5 database. It cannot be changed or recalculated by the regression model without making the formula fail.

And by the same token, it has to be the real surface flux. It cannot be changed or recalculated from TOA by a regression model without making the formula fail.

The natural variability “error” term is assumed to be “random” and this is a necessary condition for the regression to correctly remove it. However, this is not the case, neither in the models and even less so in the real climate record.

Figure 1b panels b,c from M&F2015 shows the distribution of model trends and ( vertical line ) the observed HadCRUT trend.

Here we realise the “watch the pea” issue. What they are showing is that the recent over-estimation of the trend is comparable to the under-estimation of the trend in the thirties. This is presented as showing the current model failure is not significant.

To the extent that it shows the models are a failure across the board this worth noting however, does not increase our confidence That their ECS informs us about the real climate.

In discussing simply the magnitude of deviation they are obscuring that fact that there is a systematic bias in once sense in the 30s and in the other sense since 1998, and a very limited section 1965-1985 in between where the models have been tuned to fit reasonably well.

This would appear to be a clear indication that model ECS is exaggerated and that the conclusions of the paper are unjustified.

Here we analyse simulations and observations of GMST from 1900 to 2012, and show that the distribution of simulated 15-year trends shows no systematic bias against the observations.

…

The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded

The models fail to reproduce the early 20th warming when CO2 was not significant, are tuned to fit the later 20th warming and fail by running consistently hot since 1998. This is consistent with them being over sensitive to CO2.

The “innovation” of this paper is in kicking up enough pseudo-statistical dust to disguise the fact.

Sorry, the formatting seems to have been stripped off the quoted sections there. This initial para was quoting Pekka, the following is from the paper:

A sliding trend analysis, like they are doing is mathematically equivalent to doing a 15y running mean on the rate of change.

I discussed the issues and distortions injected by running means here:

https://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/

A significant amount of the model output is injected noise to make the time series look more climate-like. There is very little real modelling of the internal variably.

Their naive methods and their gross approximations are injecting additional noise into both the model and observed data that will be unrelated.

Thus any conclusions about the statistical significance of recent deviation based on this kind of method is thus without value.

This naive “AGW + noise” paradigm that has dominated climatology for the last 30 years has been an abysmal failure from a scientific point of view ( though remarkably successful politically ).

Using that same failed model to assess the success or failure w.r.t the divergence problem brings us back to my earlier quotation of Varoufakis about mythology and the delphic oracle.

https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-751971

I’m just curious how long it will take everyone to notice there is no independent variable. Saying the variable on the right hand is an approximation of the left hand does not give you any information.

BTW, I noticed in the abstract in the 2013 paper that the conclusion was that feedbacks dominated forcing in the 85-year span and the opposite true for the short span. This is 180 degrees of the direction of the claim in 2015.

everyone here has noticed, with the notable exception of pekka.

That is not accurate.

Even accepting Nic’s substitution, it is sufficient to rearrange the equation to find out that alpha disappears totally and it ends up as a regression of T vs N to determine kappa.

whether that is meaningful and can be compared to the kappa introduced by the authors from earlier work remains to be considered. IMO this diffusion constant is yet another fiddle parameter that can be used to make the models fit a restricted part of the climate record whilst maintaining the desired high sensitivity.

For this result to have any significance, the authors need to show that it is capable of showing a positive.

They describe this as “novel” method yet, as is customary in climatology, there is not analysis and verification of the method itself to establish that it is valid before using it and publishing results and conclusions.

The first thing that is necessary with a new method is to establish its validity before using it.

Getting the answer you desired or “knew” to be right before doing the study does not count as validation of a method.We can identify three questions:

(1) Does the approach of M&F defined by their stated hypotheses involve circularity by the mechanism proposed by Nic?

(2) Are the hypotheses of M&F justified at all and not contradictory or obviously wrong?

(3) Are the hypotheses good enough to produce useful insight to the models of the CMIP5 ensemble?

My answer to the question (1) is that such circularity is not present. The claimed circularity cancels out (or in another correct formulation does not enter at all), when their hypotheses are accepted. I have explained the reasons for my conclusion in my recent comments.

To the question (2) my answer is that M&F present enough justification to pass the weak requirements of that question.

To the question (3) I do not have clear answers. I remain myself skeptical of the usefulness of the results.

The combination of my answers to (2) and (3) means also that I have doubts on the justification of publishing the paper in a journal like Nature.

Pekka,

To understand how this happened here one has to remember that the models never created information that could be used to test themselves with their own output. It really is that simple.

Given the M&F paper depends upon such shaky hypotheses I propose adding an additional error term to their equation to more accurately account for its influence. The internal variability error (epsilon) must be modified by the error of hypothesis (eta) to produce:

ΔT = ΔF / (α + κ) + (ɛ + η)

Are you saying the dead parrot is just stunned or really dead?

Although M&F defenders claim it is just pining for the fjords, this Norwegian Blue is definitely deceased. It’s bleeding demised. Passed on. Bereft of life. It’s run down the curtain and joined the choir invisible. This is an ex-parrot.

Pekka,

We were all fooled for some time. I thought for a while that there was just circularity contamination as did you. The fundamental circularity comes from the fact that the only connection of the model to the real world is temperature. The energy balance and temperature trend was used to derive all the other components to the model, a model who’s output is temperature and the same trends used to program it. The model’s output can allow extrapolation into the future but it does not give you more insight than you already programmed into the model. The energy balance is useless to give information about past model runs to current ones.

Temperature cannot be both the dependent and independent variable at the same time.

R Graf:

yes.

Exactly. This is quite clear from the regression equation.

So is there anyone who think this equation is “probably pining for the fjords.”

This equation had become awfully quiet.

“The claimed circularity cancels out”…say what?

Pekka,

I will count your answer as a thumbs up on support to Nic.

If we do not hear from anyone else who has support for the equation or the conclusion I propose the following:

Nic, Steve,

Clearly we will all today come to the conclusion you had a week ago and I hope you will forgive us for not finishing this sooner. I propose all who have been debating here that would like to submit their opinion to CLB should all compose our own independent comments from here down and have both of you review them for a day and then report them on CLM.

This paper should be withdrawn.

Congratulations Nic.

You would do better to limit yourself to stating your own opinion rather than trying to rewrite those of others.

climategrog,

I am for debate, not to attributing my opinions to anyone, or anyone’s to mine. Do you not see the perpetual motion machine or does it need more study to see if it is? You can read my logic written to Pekka above. Please tell me your thoughts on the equation and if it is circular what, in your opinion should be dome beyond ended the debate to support Nic?

Agree….. R Graf you’re jumping the shark on this one. The discussion should continue with input from Mcintyre. I appreciate Pekka’s input, and he can think and speak for himself without you summarizing.

Fair enough. I was thinking we would have heard more people realizing the circularity.

I agree I should not be the leader on this. Nic, Steve, where are you?

“Nic, Steve, where are you?”

Well, my guess is that they figure they have much better things to do with their time.

The entire thread is borderline nutzo; yes, you have to respect the requirement Pekka insists on: “use only the CIMP5 archived T data”. But even then, the whole exercise is beyond the pale; you can’t logically use model diagnosed temperature ‘data’ to ‘verify/evaluate’ those self-same models which generated the temperature ‘data’. It is the silliest waste of time I have encountered in some time. That the ‘climate science community’ is unable, or unwilling, to see the obvious problems with this kind of paper simply means they are unable to practice rational discrimination between reality and rubbish. As I have said many times before, the field is not well.

steve:

good luck with your hopes…if the guys can’t understand the point hat has been articulated countless times in this thread by now, they will never get it.

“math is hard”, as barbie said.

the condenscenscion, and abusive language that has been applied to critics of this “scheme” only goes to demonstrate the innumeracy and lack of training of those who delve into this field.

If I had only known, when I was kid, just how much money could be made from such cheesy applications of statistical analysis….

oh well.

OK, dave.

“condenscenscion”

Steve has started a trend. How about “condenscensation”.

I stand corrected

😉

The fundamental circularity comes from the fact that the only connection of the model to the real world is temperature. The energy balance and temperature trend was used to derive all the other components to the model, a model who’s purpose is to output a temperature from it’s temperature derived assumed input forces. The model can extrapolate into the future but it does not give insight into the past or present because that is what was used to program it. Although it is valid to compare the model’s accurate conformity with the past to check one’s program, as is the claim by the author’s, you cannot use this equation. Never mind the problems with non-linear variables being assumed to be linear and unknown accuracy of estimation of F. One absolutely CAN NOT have the same value (T) act as the independent variable and dependent variable at the same time. Trust your grade school algebra teacher. Follow my algebraic trial solutions a few comments above.

Who challenges this? Who supports? Be brave.

https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-752164

I support your take on this issue,, to me its really as plain as you put forth…but what do I know?

i only know statistics…not climate science.

apparently the two really dont ever meet.

If we accept the authors hypothesis and method, they demonstrate that the serious divergence since 1998 is nothing unusual for the CMIP5 model group. In fact, it is typical of their ability to reproduce even a hindcast for which they know the answer before tuning the model.

This seems to be a clear indication that they are not fit for the purpose of reproducing climate behaviour and certainly not for extrapolating it way outside the calibration period.

Keep up the good work on the modelling. Come back in 10 years an let us know how you are progressing.

I want to give them a fair chance to show their stuff, so I would suggest they come back in 62 years.

Interesting discussion but instead of fighting over possible circularity in the the Marotzke regression analysis it seems to me more profitable to take a look at the model results presented in great detail by Forster et al. 2013 (Lewis ref. v). Here is a brief summary of the essentials. 23 CMIP5 models are included in the analysis. As discussed by Lewis, the adjusted climate forcing, AF, is derived from the imbalance, N, of the energy fluxes at the top of the atmosphere by addition of a response term, the change in outgoing radiation due to the change in temperature,

AF=N+αΔT. (1)

N and ΔT are outputs from the historical model runs. The constant α is determined independently from model runs with an abrupt increase by a factor of 4 of the CO2 concentration in the atmosphere. The jump in radiation imbalance (intercept in linear regression of N vs ΔT) is named 4xCO2 AF and –α is the slope of the following decrease of the imbalance with increasing temperature. Both α and κ, the efficiency of heat uptake by the oceans, can be derived from runs with a gradual change in CO2. For such a scenario of continuously increasing forcing F, the temperature change is approximately given by

ΔT=F/ρ, with ρ=α+κ. (2)

This scenario is close to the real situation modeled by the 23 different models for the time period late 19th century to 2005 (historical). The results, averaged over 5 years, are given for 2003.

There are large differences among the models with respect to all the results and parameters, as illustrated in a number of tables and figures. The most relevant figure in relation to the Marotzke and Forster paper is Fig. 9, containing four different scatter plots. In all plots the x-coordinate is the temperature increase ΔT in 2003, varying among the models from about 0.3 K to 1.9 K. In Fig. 9a the y-coordinate corresponds to the right hand side of Eq. (2), y=2003 AF/ρ. As expected from Eq. (2) the 23 points fall pretty well on a straight line, with slope not too far from unity and correlation coefficient R=0.87. Figs. 9b, 9c, and 9d, show separate correlations of ΔT with AF, ρ, and α. The correlation with AF is weaker than with AF/ρ but still strong, with coefficient R=0.72. However, there is insignificant correlation with the other two variables – just as claimed by Marotzke and Forster!

How can we understand this? The reason becomes clear from the scatter plot in Fig. 8a, showing the correlation between the parameters α and 2003 AF. There is a strong positive correlation with coefficient R=0.62. This means that the negative correlation between ΔT and ρ or α, expected from Eq. (2), is eliminated by the positive correlation of both parameters with AF!

It appears to me that this is the most serious problem with the Marotzke regression analysis. The parameters AF and α are not statistically independent in the sample. (I do not see this as a necessary consequence of the relation in Eq. (1), claimed to give circularity.)

Figure 7 in the Forster et al. paper is also interesting. It shows a scatter plot of the climate sensitivity (ECS) against the adjusted forcing, 2003 AF, for the 23 models. Both parameters vary by about a factor of three with little apparent correlation. However, when models are selected with linear trend from 1906 to 2005 between the IPCC (2007) 90% confidence limits, 0.56 K to 0.92 K, then there is a clear correlation, very similar to that found earlier by Kiehl and by Knutti. Models with high sensitivity have low forcing and vice versa.

Welcome JUA. We are in our eleventh day here. Please read above to catch up.

Anyone who wants to read Forster 2013 its here: http://www.atmos.washington.edu/~mzelinka/Forster_etal_subm.pdf

And for M&F 2015 search “Stokes”, who provided a link near the top of this post.

The abstract 2013 says quote: “Multi-model time series of temperature change and AF from 1850 to 29 2100 have large inter-model spreads throughout the period. The inter-model spread of 30 temperature change is principally driven by forcing differences in the present day and 31 climate feedback differences in 2095, although forcing differences are still important for 32 model spread at 2095.”

Abstract to 2015 says quote: “The differences between simulated and observed

trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings

used to drive models over the longer timescale.”

My interpretation of the above are in direct contradiction. I don’t know if that is acceptable, (even in climate science,) when it is the same research and same models. But, I am admittedly new here.

BTW, I believed the same as Pekka for all but about the last day, and I came from the opposite bias.

Ignore the line numbers 29 thru 32 in the text for 2013, sorry.

JUA,

Thanks for your informative and constructive comment, with which I very largely concur.

Forster et al (2013) is an excellent, very useful paper, and its Fig.9 is indeed relevant. That figure is based on changes over the full historical simulation period (1860-2005), which is probably better than taking changes over 62 year periods as in M&F 2015. I don’t think Fig.9.c) and d), showing regressions of α and κ on ΔT, are that useful. Only the sum of those parameters, ρ=α+κ, enters into the surface energy-balance equation, and separating the individual influences of α and κ on ΔT is not simple.

Fig. 9.a) and b) however are highly relevant, as they show regressions of ΔT on AF alone and on AF/ρ (actually the regressions have the variables the other way around, which doesn’t correspond to physical causation, but I’ll ignore that). The R^2 for AF alone is 0.512, which implies the standard deviation of the deterministic regression predictions is 71.5% of the predictand’s standard deviation – the R of 0.72 that you refer to. For AF/ρ, the R^2 is 0.246 higher at 0.757. That 0.246 increase in R^2 corresponds, roughly speaking, to a standard deviation of deterministic predictions from ρ of sqrt(0.246) = 49.6% of the predictand’s standard deviation. So AF and ρ contribute in terms of prediction standard deviation in the ratio 71.5%:49.6% or 1.44:1. Hardly an insignificant contribution from ρ.

This calculation is only very approximate, and properly calculated the ratio is probably higher, but the point that the contribution from ρ is significant would remain valid.

One of the reasons for the high correlation of AF with ΔT is the circularity, or non-exogenicity, problem. Internal model variability can change ΔT without compensating changes in N over 62 year or longer periods, implying that it will cause diagnosed AF to covary with ΔT, which artificially increases the R^2 of the regression between AF and ΔT. I think failure to appreciate this fact is probably at the root of Pekka’s misunderstanding on the circularity issue.

AF is almost bound to be correlated with both α and in principle also with κ. That follows from the fact that AF is calculated as AF = α ΔT + N and κ is meant to represent N/ΔT, implying that N = κ ΔT, if κ as diagnosed in Forster et al (2013) is truly a model property (a dubious assumption).

Nic,

Can you present some justification for your claim that the way M&F perform their analysis introduces rather than removes circularity?

It’s totally clear that N contains such contribution from surface temperature that must be removed to make a meaningful analysis in the spirit of M&F. It’s also totally clear the calculation of AF removes very much of that effect that must be removed.

It is not known, how much error is left in the analysis, but the term that you have described as the source of the circularity has exactly the opposite role. It is there to remove the circular effect of temperature.

Pekka,

Do you agree that if your argument were correct then fluctuations in T and N over 15 and 62 year periods arising from model internal variability with unchanging external forcing should be strongly negatively correlated?

Nic,

Yes. That is part of the hypothesis. Real ERF is affected very little by internal variability. Thus M&F must assume that the contribution of internal variability to N is approximately -α times the change in temperature due to the variability.

Pekka, most of what you are referring to constantly as a

hypothesisshould be more correctly termed as anassumption.Hypotheses are statements made ahead of time which are checked using the data and analysis to come to a decision as to their believability. Assumptions are statements made which are assumed to be true throughout the analysis without necessarily checking at any point to see whether they were true or appropriate to the analysis.

Using the equation ΔF = α ΔT + N as an

identityis an assumption to facilitate the analysis. As far as I can see, nothing in the paper was posited as supporting its use as not substantially affecting the results of the analysis. In fact, I pointed out to you that there was indeed problems due to its presence with interpreting the magnitude of the “internal variability” of the climate models. If the assumption is not made, then there is indeed quantitative “circularity” in the equations used by the authors which would completely invalidate the conclusions of the paper.Nic and Pekka: Circularity and other problems ALSO arise from M&F’s INTERPRETATION of the results from their regression equation. The abstract says:

“Using a multiple regression approach that is physically motivated by surface energy balance, we isolate the impact of radiative forcing, climate feedback and ocean heat uptake on GMST—with the regression residual INTERPRETED as internal variability—and assess all possible 15- and 62-year trends”

Immediately after the regression equation, M&F say:

“We INTERPRET the ensemble spread of the regression result … as the deterministic spread …”

If I understand correctly, M&F believe that all of the unforced variability from each model run is found in the regression residuals and all of the deterministic temperature change is found in the sum of the other terms of the regression. Forcing is deterministic, but effective radiative forcing contains both a deterministic component (from rising GHGs etc) and unforced variability. (ERF is derived from temperature output, which contains both forced and unforced variability.) Therefore, their regression CAN’T be used to separate deterministic (forced) warming from unforced temperature variability. I suspect Nic is correct in believing that a circular regression is inherently flawed, but M&F’s INTERPRETATION of the regression results is unambiguously flawed.

Furthermore, it is inappropriate to interpret the regression residuals as unforced variability. The residuals from a simple regression are often interpreted as measurement error or flaws in the regression equation used. Suppose I measured the acceleration of gravity by dropping an object at various altitudes above the surface of the earth and measuring the distance fallen with time. I can regress distance fallen as a linear function of time squared, but I can get a better fit to the data by adding a second term with the starting altitude. Any second term will improve the fit at least slightly, even though the first power of the starting altitude isn’t directly involved in physics of the situation. If I know how air density varies with altitude (negative exponential of altitude divided by scale height), then I could add an appropriate term to the regression equation that accounts for the drag, which varies directly with air density. I could add a third term accounting for the fact that g gets weaker with altitude above the surface of the earth. If all of the important physics effecting a falling object were correctly incorporated into the regression equation, then I might be able to interpret the residuals as measurement error or perhaps as unforced variability arising from conducting my experiment where the atmosphere might be rising or subsiding. Rarely do we understand the physics of a problem well enough to confidently assign a meaning to residuals.

Since M&F made serious mistakes in the derivation of their regression equation, it is unreasonable to interpret the residuals from that regression as unforced variability. (The regression contains an inappropriate additional degree of freedom because the regression coefficient for alpha and kappa must be the same. They have inappropriately approximated 1/(1+x) as 1-x. Kappa is known to decrease with time/warming in model output.) So there is no way the residuals from this primitive regression should be interpreted as unforced variability – they include flaws in the regression equation.

The place to learn about unforced variability is the unforced control runs for each model, not these regression residuals.

Frank,

The word

interpretis their way of telling that additional assumptions are used to allow extracting information.That does not tell about their beliefs more than that they consider it likely that the results they obtain making those additional assumptions have useful informational content. They must think that the values are likely be close enough to correct ones to add to understanding of the models rather than lead to wrong conclusions.

I haven’t seen any valid claim to support your assertion

M&F made serious mistakes in the derivation of their regression equation. Their analysis rests on hypotheses and assumptions that can be contested, but it’s not based onserious mistakes.Pekka wrote: “The word interpret is their way of telling that additional assumptions are used to allow extracting information.”

Frank replies: Those assumptions appear to be wrong. The part of the regression equation they claim is deterministic has one term with unforced variability in it. The residuals they claim are unforced variability also contain systematic errors from their regression equation.

As I discussed more fully above, the “linear expansion of equation 3” involves approximating 1/(1+x) as 1-x. The range of both alpha and kappa is about 50% of the ensemble means, so x is probably too big for this to be a reasonable approximation.

If you look at the coefficients in front of the a’ and k’ terms in this expansion, you will find they are the same. The regression coefficients for a’ and k’ are optimized independently even though the physics of equation 3 says that only the sum of these terms (and ERF) are important.

Frank,

It’s totally clear that their regression formula cannot be derived. M&F write

Poor Pekka. He ends up defending the absolutely indefensible. The CIMP5 models are circular self-references… and he can’t see that. Sad.. very sad.

steve, I would advise against being condescending towards Pekka. He knows his stuff and is a proper scientist, not a politically motivated alarmist. From you comments here, you do not appear to be in the same class.

He clearly has some serious doubts about the worth of the paper but weighs his words a little more carefully than many here.

I’ve actually learned a great deal by reading Pekka’s thoughtful comments, here and elsewhere. The fact that he is willing (at least for this debate) to accept the “black box” aspect of M&F’s selected climate model simulations may simply be due to the fact that he has more experience in the general field than others.

On the other hand he seems to agree, on the whole, that several non-statistical aspects of the M&F paper raise questions about the overall value the paper provides to other researchers. I’m impressed by his willingness to engage and, especially, educate.

Kiitos Pekka!

Steve…..really uncalled for remarks vs Pekka. Because he doesn’t see things your way he’s ” poor, sad,pekka” ? He disagrees with some points here, but he’s always been respectful and professional. It’s silly comments like yours that bring meaningful discussions to a halt.

I disagree also on some of his points, but he’s certainly qualified to have his own views, and in light of the time has has put in to explain himself, he deserves a little more respect.

SteveS,

He would deserve a bit more respect if he could see the the logical fallacy of M&F. The ‘archived’ delta-T’s from the CMIP5 ensemble can’t be logically used to diagnose delta-F and then that delta-F used to diagnose a different delta-T.

true that

“He knows his stuff and is a proper scientist, not a politically motivated alarmist.”

Ya, well, I am a ‘proper scientist’ as well. I intend no condensation. But Pekka really is defending a paper with very serious logical problems: you can’t use the models to diagnose delta-F and then use that delta-F to ‘verify’ the self-same models. It is a silly exercise. I trust you can see that.

“condescension”, not “condensation”… spelling correction, ugg.

Actually, Steve, “condensation” was really good. Really creative use of the language is unfortunately all too rare these days.

“I intend no condensation.” I think Yogi Berra said that first. Just kidding steve. Anyway, I agree with your analysis. You don’t have to delve very far to see that this one doesn’t hold condensation.

Well our model really doesn’t get natural variability (just watch us prove it), so we have an excuse when a convenient excuse when we miss.

You know what I mean. Steve threw me off with the condensation.

I said somewhere near the top that I was interested in Pekka’s take on this. He knows what he is talking about and he’s honest. And though I lean towards circularity, I have reservations due to Pekka’s persistence. He remains a gentleman and a scholar.

Yes Don, Pekka is a honest broker, a gentleman (certainly more than me!) and a scholar… he is just very, very wrong in this case. M&F is rubbish.

Talking hot air leads to condensation, it was a freudian slip.😉

but, at least he is polite when expressing his opinions, which is certainly worth respecting.

This discussion has been somewhat confusing what with all the different “errors” being thrown around, as well as terms like “real T,” “actual T,” etc. getting mixed in. The closest I can come to making sense of Pekka’s position is that by using a backed-out measure deltaFhat using only older, holdout runs of the model, an instrumental variable of sorts has been constructed that can then get plugged into a different, new set of model runs to try to overcome the endogeneity problem. That only works if the instrument deltaFhat is valid, having a high correlation with the true deltaF and no correlation with the residuals in the new model runs, hence his caveat about it only working if certain parameter values hold. That’s my best shot–if this is not an IV regression (for some reason not being described as such by the authors), then the circularity critique must hold.

It is normal that new ways to handle numbers evolve over time. The way to test the new way is typically to include it in a published paper, where others can review it.

Unless the paper is themed about the new math, it is best to incorporate it, as the authors have done here, in a paper that has little relevance to scientific advancement.

The most important thing when introducing a new method would be to TEST it and validate that it works.

Publishing unfounded conclusions based on non-validated “innovative” methods and leaving it to other to waste their time doing YOUR obligations of test and publishing rebuttals is not acceptable science.

That is what is happening here.

Had they published a validation with an error, it could be the role of others to point this out and possibly correct it.

Normal scientific practice seems to have been put on hold for the special case of climatology, especially in the once “prestigious” journal Nature.

It appears that the circularity argument has become a bit of a red-herring.

Pekka has a valid point in that the flux from the model is the proper, most direct quantity to use, except that this is not what the authors do since they do have the relevant flux: the SURFACE flux. So they substitute the TOA flux crudely adjusted by a highly approximative linear regression model.

So Pekka’s objection to Nic’s substitution applies equally well to the paper itself.Even if Nic’s substitution does not provide a more accurate form, it is valid in demonstrating that there is collinearity in both the dependant in independent variables. Something that the authors failed to discuss or consider. Presumably they failed to realise.

This collinearity will lead to a bias in the regression result. Unless this is recognised and corrected or shown to be negligible, this invalidates the method and hence the conclusions. This seems to be RomanM’s main point.

Rearranging the equation after Nic’s substitution to isolate the temperature variables on one side of the regression shows that alpha is no longer present. Thus it is hardly surprising that any further analysis is found to be insensitive to alpha !

Before adopting the “innovative” method the authors need to show that it is capable of showing a dependence on alpha if one is present.

If they had attempted this, they would probably have found that it is the method itself that is insensitive to alpha and realised what Nic has pointed out.

There is quite a lot information in the paper about the models that may be useful for the record such as figure 1b,1c that shows they completely underestimate the earlier warming as badly as they fail to capture the recent lack thereof:

For that reason I don’t think a retraction is necessary but a published comment is required to point out the fundamental flaws in the method, it’s lack of validation hence the invalidity of the conclusions.

This would preferably avoid discussion of circularity and concentrate on collinearity and the absence of the key parameter alpha.

Greg Goodman:

Excellent post. RomanM additionally has pointed out some of the apparent typos/mistakes in the presentation of data that frustrate efforts to replicate the authors’ exact methods. Yet as you point out, the method itself seems to drive the result, not the data.

I am also interested in the extent to which the definition of terms determines the outcome and diminishes the apparent influence of feedbacks (alpha). For example, in his response at Climate Lab Book, Marotzke admitted that:

The methodological problems identified by critics would seem to explain why M&F’s results were “insensitive to the ambiguity”.

M&F admit that feedbacks change over time — in other words, feedbacks can “strengthen” to partially negate an increase in exogenous forcing (otherwise we would quickly experience the dreaded “runaway greenhouse” effect). Many of us wonder how that process fails to display significance in multi-decade runs. Perhaps it is related to the authors’ decision to ignore the fact that many models allow alpha and kappa to change over time although I’m not clear on how they managed this if they were using unadjusted model output.

Marotzke explained that, in their chosen formula (1),

Yet this seems to suggest that α can have a “significant” influence even within their chosen method.

Forster should read my post at Climate Etc. (linked above).

It is surprising if he has not read Douglass and Knox 2005 and their reply to Wigley et al’s criticism of that paper in Douglass and Knox 2006 ( refs in my article ).

If one is going to make gross simplifications and linearise everything it is possible to use the linear relaxation response correctly contains both and allows extraction of the appropriate scaling.

Greg Goodman, “they completely underestimate the earlier warming as badly as they fail to capture the recent lack thereof:”

It is more like the models completely missed the early cooling that lead to the 1941 rebound.

Calling it a “rebound” implies a lot of unstated and unjustified assumptions.

I too completely missed the cooling that ended in 1941

I think the key thing shown by the paper, and the figures I posted in particular, is negative error in 30s, close fit in 75-95 and all model fits above reality in last 15 segment.

That is consistent with models being too sensitive. Omitting the sign of the divergence and just looking at magnitude, allows a conclusion that the recent divergence is to be expected.

If it is not significant that the models are completely wrong to 15 years or more, it is equally insignificant when the are correct for 15-20 years.

It also needs to be noted that even this agreement was achieved by adjusting a multitude of essentially free parameters and so is statistically meaningless.

Taken at face value, MF2015 seems like a fairly good demonstration that the models are of no value at this stage and should be ignored until they can produce meaningful results.

Greg, “Calling it a “rebound” implies a lot of unstated and unjustified assumptions.”

Worse actually, it involves looking at the modeled estimated temperatures versus instrumental and trying to be descriptive🙂 While I haven’t looked at every possible model combination, in general, the models don’t get cooling and reversion to “normal” or mean. Crowley and Unterman have a volcanic reconstruction that should be considered in the pi and historical “experiments” but AFAIK isn’t. If the models do a great job of matching an incorrectly assumed near zero variability past, why expect them to produce anything worthwhile in the future?

I hope this will be the last comment I feel necessary to write on this topic.

I have explained my points using formulas in two comments that complement each other. The logic of the M&F approach here, and the error that Nic made here. RomanM presented similar formulas using a different notation and discussing a little more the influence of the error.

I add one more explanation of the same arguments without formulas.

M&F wrote a paper, where a crude linear regression model was used to learn about the properties of models in CMIP5 database.

The hypothesis that a crude model is applicable is not proven correct, but to formulate the point so weakly that it should not be controversial: “Their hypothesis is not totally unreasonable”. Thus it makes sense to study, what follows from that hypothesis.

One part of the hypothesis is that the three variables that may contribute to the temperature trends of 15 years and 62 years are α and κ of the models, and the effective radiative forcing ERF that results from external input of GHG, concentrations, aerosols, volcanism, and solar radiation.

The effective radiative forcing (variable F) is estimated from TOA imbalance N using α and surface temperature T from the same model runs in Forster 2013. N is strongly correlated with T, the resulting F much less as it’s approximately the ERF determined by external input.

Performing the regression analysis on this basis does not technically involve any circularity. There’s nothing in the later stage that affects the data obtained from the earlier stage. The forcing used as input is independent, because it contains the independent data about TOA imbalance.

The value of ΔF has, however, been obtained from a formula that’s not exact for ERF.

It’s now possible to use the regression formula of M&F to determine, how much of the temperature trends is contributed by the three explaining variables, and how much is residual that’s assumed to come from internal variability. Technically that’s not problematic, but one problem remains. We get the contributions of the three variables defined by the method used in the work of Forster et al (2013) and M&F. Forcings, in particular are determined from the same model runs as temperatures, but the formula may have significant errors. It’s possible that these errors correlate with the values of the other variables. If that’s the case, then the attribution of the contributions to the three explained parts and to the internal variability are distorted.

The formula used in the analysis (in Forster 2013) has been justified and is arguably the best available. The correlation between the values of TOA imbalance and surface temperatures in the CMIP5 data is very strong. The calculation of forcings by their formula reduces the correlation very much. In ideal case the calculation would result in the ERF determined almost totally by the externally input sources of forcings (not quite 100% as it’s ERF, not RF).

The results of M&F may attribute the temperature trends somewhat erroneously, because the ΔF that’s one of the explaining variables is not the real ERF. In what way the attribution is erroneous depends on the correlation of the error in ERF with the other variables. Correlation with temperature affects mainly the share of ERF and the residual, correlation with α or κ affects also their shares.

The error I discuss is, however, not circularity in the formulas that form the hypothesis, the error comes from the inaccuracy of the formula that’s used. The formula that’s used is actually the natural simple choice to

minimize the spurious circularity, not the reason for the circularity. That’s because the F used is close to the real ERF that’s essentially external input, while N is highly dependent on T. The error of Nic was not to realize that M&F remove the circularity error as well as they can rather than create it.Pekka Pirilä, could you explain this remark of yours:

This seems to be a non-sequitur. Whether or not later portions of a methodology affect earlier portions does not determine whether or not there is circularity. That only determines if the methodology is recursive.

If this is an accurate description of your position, your argument is nothing more than misinterpreting “circular” as “recursive.” If it is not an accurate description of your position, you ought to fix it so people aren’t misled.

Brandon,

One issue that may be confusing is the use of the world

circular, when the analysis is once trough. The same problem could be described better using a different expression. What happens in the cases labeledcircularis that the variables are not independent as they should be in a regression model to avoid erroneous interpretation of the results. This is the approach I picked, when I postponed the discussion of the potential issues to the final stage of the analysis – the interpretation of the results. At that stage I discussed, how correlations of the error term might affect the results.One way of describing the problem is to say that when the explaining variables are not fully independent, variability due to one explaining variable leaks into the contributions attributed to the other variables making them either too large or too small.

Pekka Pirilä, I’m afraid I don’t see how that explains your remark. I don’t find the word “circular” remotely confusing, and I don’t see that anyone else here has either. The only confusion I’ve seen regarding the word is your apparent claim there is no circularity because there is:

Which would only determine if the approach was recursive, not circular.

It would help if you would address the question I raised head-on. As it stands, I’m not certain I properly understand your response to me. It sounds like you’re actually acknowledging the circularity people have pointed out, but I can’t tell because I can’t see how what you say is supposed to respond to what I said.

Brandon,

My wording was, indeed, not very good. Another form of circularity is that a variable must be solved from a equation, where it occurs in many places, and that’s the case I did discuss in my recent comments.

In this case we have the equation, where ΔT occurs

– explicitly only on the left hand side in M&F

– explicitly once on both sides after the substitution on Nic

– explicitly once on the left hand side and twice on the right hand side after second substitution I introduced.

The third case goes back to the first, as the two terms on the right hand side cancel. These two equivalent alternatives are the ones that are consistent with the physical hypotheses of M&F. Nic’s alternative makes sense for a different physical hypothesis.

Based on the above, I think that the question you asked referred to bad formulation of the point from my side, not any real disagreement on the substance.

Pekka,

“One part of the hypothesis is that the three variables that may contribute to the temperature trends of 15 years and 62 years are α and κ of the models, and the effective radiative forcing ERF that results from external input of GHG, concentrations, aerosols, volcanism, and solar radiation.”

This may be the hypothesis, but your description of ERF is not what Forster provides. Forster 2013 uses the algorithm from Forster and Taylor 2006. An example of the diagnosed forcing values is given in that paper.

“Imagine, for example, that the atmosphere alone (perhaps through some cloud change unrelated to any surface temperature response) quickly responds to a large radiative forcing to restore the flux imbalance at the top of the atmosphere, yielding a small effective climate forcing. In this case the ocean would never get a chance to respond to the initial radiative forcing, so the resulting climate response would be small and this would be consistent with our diagnosed effective climate forcing rather than the conventional radiative forcing.”

I’m no climate scientist, but the seems to me that the ΔF from Forster is essentially the forcing that was required to effect the ΔT seen in the models.

Once again, the circularity is that ΔT is regressed onto a linear function of itself , thus leading to erroneous results from the linear relationship thus obtain and more specifically the error term which is used to assess internal variability. Your assertion that the error term from the regression does not affect results is puzzling as it has already been shown that there is no such cancelation of the ΔT term. Discussion of recursive use of ΔT is a red-herring.

“What happens in the cases labeled circular is that the variables are not independent as they should be in a regression model to avoid erroneous interpretation of the results. This is the approach I picked, when I postponed the discussion of the potential issues to the final stage of the analysis – the interpretation of the results. At that stage I discussed, how correlations of the error term might affect the results.”

This right there indicates the confusion; it seems that you are admitting that the circular regression yields garbage but then assert that it does not affect final results. You are in effect claiming that the OLS has no significance and can thus be disregarded. You can’t postpone discussions of whether circular regression leads to erroneous final results otherwise. The question is simple: does the regression equation where ΔT appears on both side leads to circularity and thus breakdown of the OLS or not? It appears you agree after all.

My current working hypothesis (in accord with Roman) is that Pekka is making the charitable assumption that M&F put aside Foster (2013)’s estimation of F from T and N using regression, and instead elevated it to an identity by way of assumption.

Thus all conclusions from the paper need to be prefaced with “If we assume ΔF = α ΔT + N in climate models ….”.

As I said “a working hypothesis”, Pekke has not confirmed this, and also “charitable”, I’m reminded of the old adage “If we had eggs, we could have ham and eggs, if we had ham”.

Even if this was so, we would still be left with circular regression. Rearranging the equation to put all ΔT terms on the left side may circumvent this difficulty. But this is still problematic for:

(1) That’s not what M&F did. They explicittly regressed ΔT onto ΔF and thus a linear function of itself.

(2) The minimization so as to obtain coefficients becomes problematic and unreliable.

RomanM’s suggestion is that due to the identity ΔF = α ΔT + N, a change Δ in F must be reflected by a change αΔ in N which will then supossedly “alleviate” the circularity. But the regression process is still mathematically flawed.

The problem at hand is to clarify Pekka’s assumption that gives M&F a pass.

I think his point is that if there is an assumed identity, regressing against ΔF is fine, it contains no more or less information than its component parts.

I was multi-tasking when I pressed Post.

Actually reflecting on this perhaps Pekka assumption is also that F and T are independent.

Forster (2103)’s estimation of F is done by regressing N against T and setting F to the intercept. On that basis and assuming an identity I think M&F get to pass go.

Very early on in the thread we were talking about M&F’s failure to do tests on this.

Which identity is assumed, please ?

F is defined as α ΔT + N as produced directly from the model runs i.e. not as α ΔT + N + residue as estimated as by Forster using regression.

Pekka’s argument seems to be it is legit to use the direct outputs from models if models are all you are making statements about, so if F is just a linear combination of those it can be used. That assumption falls over if there is a residue floating around i.e. F is a product of regression.

I was musing that this is probably not a sufficient condition to make it legit, you needed also to have F and ΔT independent. I also think that is what Pekka has been implying in some of his comments.

This assumes no variation in ΔF over time, that is that a change in ΔT is perfectly cancelled out by an equivalent change in N. Pekka and others have to prove this over the relevant time periods before declaring the circularity issue a red-herring. If by “fixed” Pekka really meant that ΔF is constant,then we can all see where the misunderstanding lies.For the case where F is being defined as an intercept:

ΔF = N + α ΔT fot N = 0

Isn’t it weaker than that – just the the partial derivative of N wrt ΔT is α?

Indeed but only on assuming that ΔF is indeed independent from ΔT, in which case ΔF/ΔT =0. Good luck proving that.

Correction: F independent of T.

Pekka’s point would be that they are allowed to assume that, and the fact that Foster calculated F as the intercept of the regression of N against ΔT this might give some comfort in this regard.

In fact I see from further down the thread where Pekka has now spelled out his views in the lingua franca of maths he is saying that he doesn’t need an identity, he can stand an error in the equation provided F is independent of ΔT i.e. the partial differential of the error wrt ΔT is 0 (or close to 0).

This is of course testable in F as estimated by Foster 2013, a point that some suggested should have been done when the thread first got going.

This sounds like groundhog day. The issue was that ΔF was thought to be linearly dependent on ΔT, resulting in ΔT being regressed on a function of itself. If Pekka’s point was for all this time that F is independent of T by virtue of compensation from N, then why did he not say it from the start?

People went round and round in circle for nothing due to strange arguments such as “Performing the regression analysis on this basis does not technically involve any circularity. There’s nothing in the later stage that affects the data obtained from the earlier stage.”

His argument is testable and from Nic’s test regarding negative corelation of N and T (compensatory contribution of N to changes in T) then Pekka’s premise seems flawed.

Pekka and others: I’ve been trying to understand this situation from a slightly different point of view that may be useful to others.

In climate models and the real world, deterministic variability is caused by various forcings: GHG’s, aerosols, volcanos or solar; while unforced variability is caused by fluctuations in ρ (α+κ) – the rate at which heat is removed from the surface compartment (surface, atmosphere and mixed layer). For example, if an unusual number of strong El Ninos occur in a 15-year period, less warm water is buried in the deeper Western Pacific and less upwelling in the Eastern Pacific. The warmer surface waters there warm the atmosphere. The unusual warmth in this period would be due to a reduction in κ (kappa). Hopefully, these details about ENSO are correct, but they aren’t essential to my argument.

Using ΔT = ΔF / (α + κ), M&F have created individual models of each climate model (probably of each run), and then one comprehensive model for the ensemble of model runs. The average values for α and κ are obtained from model output, but unforced variability is NOT created by allowing α and κ to vary with time. These parameters are fixed. (If I were trying to model unforced variability, I might start by saying ΔT = ΔF / (α+α” + κ+κ”), where the double-primed parameters a represent the unforced natural variability that is observed in these parameters and the unprimed values are their average. M&F are doing something very different.)

If the model of M&F doesn’t allow α or κ to vary, how can ΔT show unforced variability? Obviously ΔF must contain both forced and unforced variability! (And there should be a different ΔF for each model run.) If a climate model chaotically produced an unusual number of strong El Ninos and α and κ are not allowed to change, then the effective radiative forcing for that period (ΔF) must be higher than normal. In the real world and climate models, forcing is deterministic. In M&F’s equations, however, ΔF also contains the unforced variability that produces unforced variability in ΔT.

ΔF = α ΔT + ΔN. Which terms provide the forced and unforced variability in ΔF? α ΔT is usually the larger term and it certainly provides unforced variability. ΔN may also provide some unforced variability. So Nic’s mathematical argument for circularity makes physical sense to me. You just need to remember that “effective radiative forcing” is not solely a deterministic forcing. It doesn’t help that we often use the same symbol (ΔF) both forcing and “effective radiative forcing”.

To complete my argument, M&F’s regression equation contains ΔF, which contains both deterministic and unforced variability. Therefore M&F’s claim that the regression equation provides only the deterministic temperature change is wrong.

Isn’t “internal variability” essentially a lagged component of α and κ? Adding a climatic cosmological constant to the equation shouldn’t change the cause-and-effect analysis. Even if you “cheat” a little by including portions of feedback in your forcing values, the oscillating return to equilibrium would seem to be driven by feedbacks (including the lagged effect of the ocean heat sink).

Opluso: Alpha and kappa are parameters that control the rate of heat flow from the surface compartment (surface, troposphere and ocean mixed layer) to space and the deep ocean respectfully in response to a change in surface temperature. AVERAGE values for these parameters can be abstracted from the output of climate models. Unless I’m sadly mistaken, unforced variability around the average value of these parameters is responsible for unforced variability in climate. For a simple derivation of alpha and kappa, see Isaac Held’s blog. Alpha is called beta in this post.

http://www.gfdl.noaa.gov/blog/isaac-held/2011/03/11/3-transient-vs-equilibrium-climate-responses/

I think we are agreeing. My point was that the M&F claim (that “internal variability” explains climate model inaccuracies while feedback does not) is just another way of saying that climate models do a poor job of representing α and κ.

Splitting “feedback” into multiple parts is no doubt helpful in calibrating models. However, the simplified physics equation only requires an error correction value because we don’t accurately model α.

ΔT = ΔF / (α + κ) + ɛ

The surprising conclusion of M&F

depends on treating α and κ and ɛ as independent functions rather than related parts of total “feedback”.

“If the model of M&F doesn’t allow α or κ to vary, how can ΔT show unforced variability?”

Injected noise as I’ve noted twice above.

“If a climate model chaotically produced an unusual number of strong El Ninos …”

Generally they won’t. There is no understanding of the cause of El Nino et al that permits even an approximate modelling of the process. Climate models usually just add some randomly distributed noise to make the output look more climatey.

Climategrog: I’ve never heard of “injected noise”. Lorenz showed long ago that the solutions to coupled non-linear differential equations used in weather forecasting and climate models exhibit chaotic behavior. For a review, see: Lorenz (1991), “Chaos, Spontaneous Climatic Variations and the Detection of the Greenhouse Effect”. If forcing is kept constant, surface temperature in climate models show unforced variability on a decadal time scale about an average value. If changes in forcing are added (GHGs for example), there is both a deterministic change in temperature and unforced variability in temperature.

Although it isn’t critical to my argument, climate models do reproduce some aspects of El Nino. Kosada recently showed that recent multi-decadal variability in the rate of global warming (faster from 1975-1998, slower since) could be observed in the output of a climate model if SST’s in a small portion of the Eastern Equatorial Pacific were constrained to match observed SSTs in that region.

Thanks Frank,

if you force (constrain) a certain critical part of the Pacific to follow observed SST, there will be certain effects like wind feedbacks and Bjerkness effect that are modelled and to some degree produce a knock on effect in a broader region.

This says nothing about modelling the original cause of ENSO, it is about ASSERTING it into the model.

Some models ( not many ) do display a degree of ENSO-like patterns but not at the right time. Just comparable wiggles. That again confirms what I said that they have no skill or understanding of the underlying cause and do not model it.

Just for the record, I hypothesise that the root cause of ENSO is tidal effects on the thermocline. Since we still cannot model surface tides in a deterministic way, I’d guess we are a long way from being able to model gravitational effects on the thermocline.

However, if you look at the ratio of the density differences at the surface and at the thermocline it suggests the predominant 12h response of the surface would correspond to something of the order of 2 years at the thermocline.

As a back of envelop figure that is about right for ENSO.

Spectral analysis of trade wind data shows the classic “3 to 5 years” pseudo-periodicity o fENSO may be modulation of a lunar driven 4.43y periodicity.

http://climategrog.wordpress.com/?attachment_id=283

At that point the modelled feedbacks would have something causal to bite on. But I digress.

Clive Best has written a lot on tides: http://clivebest.com/blog/?p=5986

Can you give me your opinion on my comments regarding Second Law on bottom of post?

Rock me tender though the cradle is constrained.

==========

A very interesting and insightful comment.

For those who may wonder, why I haven’t answered.

I have written one more comment to express my points better in some respect, but that comment is in moderation. Thus there’s an unknown delay before you see it.

Oh dear, you did not use F-word did you ? Fami-l-i-a-r.

I posted how to fix this a couple of days ago but our host oddly chose not to do so.

If you did use that word, I suggest reposting using “au fait” instead😉

Hi Pekka,

Let me ask the following: If I grant that M&F can create any energy balance they want for T and diagnose any value they want for the black box values as long as the equation is in balance, is that what they truly did here? I see them plugging in one of the black box values from another diagnosed run on another day and substituting it in. Once they do that aren’t introducing the assumptions of all the rest of the black box variables left behind? This is why and I felt it appropriate to take the equation from the past diagnosis of the variable and substitute it in, because that is how it was derived. But plugging in any value of F from another run or other model makes the equation invalid, as it should. The algebra is telling us the truth. M&F’s results are not. Am I making sense.

Pekka,

I realize in your timezone you should be off to bed but tomorrow I would ask that you let me know if you agree or disagree with my definition of circularity is valid or not, or whether or not it applies to climate models. Thx. Here is my test:

“Any value brought forward from a past trial (or alternate of average) to be placed into an equation brings with it all of its assumptions. If any of those assumptions appear on both sides of the equation, no matter how small, you are breaking the law.”

Pekka

Thanks for mentioning this problem – I don’t get involved in moderation, but I have been able to release your comment.

You are however incorrect in still thinking that I have made a mistake.

Nic,

On the question of mistake, we have 100% opposite views, and I have presented the evidence by a formulas in the latter of the two earlier comments linked in that comment.

I agree that there are uncertainties. I have also doubts on the accuracy of their results, but discussing the real issues requires that the false claims are first put aside.

Thanks Pekka, I found that explanation clearer than your earlier posts.

On a number of occasions here you have stated that the object of the subtraction is to remove the correlation. This is not the case.

The term αΔT is the climate feedback. The strongly negative Planck feedback +/- whatever *assumed* parametrised feedbacks they build into the various models.

Under the current orthodox view. these are predominantly +ve and reduce the magnitude of the Planck f/b.

The effect of a -ve feedback is to stabilise the system and will reduce the correlation. However, it will not remove it, nor is the subtraction of the climate f/b “intended” to remove the correlation. It is intended to leave the net forcing after all feedbacks ( here called ERF ).

This will still be correlated with surface temps since it is what is driving them. If it was essentially uncorrelated, as you appear to be suggesting, there would almost zero sensitivity to external forcing. As we know, this is far from the case in the models.

So, to the extent that climate is sensitive to forcings, there is still correlation in the explanatory variable. It is not accidental nor an result of the various inaccuracies and approximations. It is the essence of the question being studied.

Now as I have already posted, and as yet no one has objected, if Nic’s substitution ( which as argued is legitimate ) is done, the equation can be rearranged to isolate ΔT in one term on the left and alpha disappears from the equation.

That is why the results are found to be insensitive to alpha. If there is some residual but insignificant correlation, it is likely due to the rather gross approximations and linearisations that are being done.

To put this one to bed and look at the real issues, I think it necessary to agree to drop the entrenched argument about whether this is “circularity” or not.

There is collinearity in the dependant and independent variables as presented in the paper and if the equation is rearranged to remove it alpha disappears.

Since they have been fairly open about highlighting possible issues, it appears that the authors had not realised this, since it was not mentioned.

Thus there is a legitimate issue that has been raised.

Greg,

I have stated that I find the final results of the paper very surprising. That remains true. Thus I’m not surprised, if it will be found that some part of their hypothesis turns out to be violated by the GCMs. The only thing I have argued for is that the reason is not in the basic circularity of that particular step.

There may be problems in the accuracy of the formula that connects N and F, but there are also many other details that can deviate so much from the hypothesis that the explanation is in some of those assumptions. The use of a linear regression model can also influence significantly the results, when the actual relationships may deviate very much from the linearity.

Nic did discuss also some of the other problems in the original post. Those may be worth a closer look.

It would be nice to get the strong expectations and the results of analysis to agree better, whether that happens by finding, where the analysis fails to describe the CMIP5 model behavior correctly, or by helping in correcting wrong intuitive expectations.

Thanks for the reply Pekka,

I think the continued argument about whether this should be called cirularity is getting in the way. It seems to have degenerated into a battle of prides where someone has to say “yes, OK, it is circularity” or someone else has to say ” OK, I was mistaken, it isn’t”.

Everyone agrees the result is counter-intuitive and as you say the importance is to understand why.

I think I have pointed out why but no one seems to be considering whether I’m correct and either agreeing or refuting what I’ve pointed out:

using Nic’s substitution shows where the problem is because alpha falls out of the equation and we see immediately that the rest of the study could never produce a result that was sensitive to alpha.

Had the authors attempted to validated whether their “innovative” method was capable of showing sensitivity to alpha before using it and publishing , they would have realised this themselves.

Many people incorrectly talk of “using a methodology” where they mean “using a method”. A methodology is a *study* of the method. It seems that in this case there was NO methodology, just an assumption that what seems like a good idea would work. When it gave a result that tied in with the author’s desire to explain the pause, confirmation bias kicked, they concluded they had made a “significant innovation” and published.

Uncritical peer review at Nature let it through.

Apologies, this has got so long I lost track. This is exactly what Nic did in eqn6.

ΔT = ΔN / κ + ε

He also notes that this has effectively removed everything but ocean heat uptake. So the continual approximations, substitutions and linearisations have ended up evacuating the rest of climate system into the error term. Which of course is being assumed by M&F to “random”.

Pekka has objected to this substitution but this is precisely the source of ΔF used by the authors and is drawn from the 2013 paper.

Nic’s equ6, which is nothing but an algebraic rearrangement of the authors’ present and previous work, shows that a regression where ε is (erroneously) assumed to be randomly distributed, is in effect using κ as the only explanatory parameter linking TOA variability to surface temperatures.

The apparent presence of alpha in M&K is thus an algebraic illusion, and no matter what the models do, their result would be insensitive to alpha.

Curiosity

Spoke to itself all about

Strong expectations.

============

I also objected earlier to their not accounting for the phase lag introduced by a linear relaxation process.

They do mention this obliquely but do not state what it really means or implies.

I would suggest that this is exactly the point at which alpha disappears from the equation.The phase lag and the difference in temporal evolution between driver and climate response ( which is not a simple fixed time lag ) is precisely what determines the depth of ocean involved and informs us about the parameters of the response.

In sweeping this under the carpet, by calling it quasi-steady state, they are in effect assuming ZERO lag and thus instantaneous equilibration.

From that point onwards, the real climate response that tells us about the sensitivity ( and alpha ) either gets falsely attributed to ocean heat uptake or dumped into the error term and dismissed as part of random variability.

Through the interstices of mathematics, and the choate cage of logic, shines beaming physics.

===========

The basic question (IMO) boils down to: can F be assumed to be a constant in the same model? If so then the substitution problem is thrown out. But once F depends on T then you cannot use F from any other model or run or it will necessarily bring error.

This is how come M&F’s results led them to believe that feedbacks dominated early years and forcings the long haul when right in the abstract of Forster 2013 he concluded:

“The inter-model spread of temperature change is principally driven by forcing differences in the present day and climate feedback differences in 2095, although forcing differences are still important for model spread at 2095.” -Forster

R Graf, “can F be assumed to be a constant in the same model?”

Not so sure. In one view, a constant is a well known number that is subject to continuing refinement, like pi to a million places, so that its error can properly be ignored and the calculation algorithm can be classed as exact. Then you get the next stage of accuracy,

Rydberg constant, Value 10 973 731.568 539 m-1 Standard uncertainty 0.000 055 m-1. Relative standard uncertainty 5.0 x 10-12 ,

Concise form 10 973 731.568 539(55) m-1 where a measurement is involved and therefore a limitation on significant figures. Next to a less rigorous constant such as Stefan Boltzmann, Stefan-Boltzmann constant, Value 5.670 373 x 10-8 W m-2 K-4, Standard uncertainty 0.000 021 x 10-8 W m-2 K-4, Relative standard uncertainty 3.6 x 10-6, Concise form 5.670 373(21) x 10-8 W m-2 K-4.

For better symbols, please see http://physics.nist.gov/cgi-bin/cuu/Value?ryd|search_for=abbr_in!

When it is possible to use only far fewer significant figures for a “constant”, a limitation arises because of the physical limits of a measurement, from the absence of a verification method of sufficient validity or from known or unknown exogenous factors entering the calculation of the value of the “Constant”, or 2 or even all 3 of these. Somewhere in this spectrum we move from the nomenclature of a “constant” to a “variable”. To me, the F discussed here is the latter.

Sorry to nit pick, I’m not even sure if this adds to the discussion.

Taking an assumption for a variable from a prior event and plugging it into an equation that has one of it’s dependencies on the other side, no matter how small or in what form, violates algebra because it violates the arrow of determination. So M&F complied with the First Law but violated the Second Law.

If your looking for evidence of Nic’s assertion that M&F used values of F from 2013 all I found in M&F 2015 is a footnote to Forster 2013. But their CLB response contains the following admission:

“Because N is readily available but F is not, Forster et al. (2013), from where the time series of F were taken, used the pre-determined model property α to obtain F by:

F = N + αT (3)

using the N and T that they diagnosed from simulations of the 20th century.”

Dilbert: I’m obsessed with inventing a perpetual motion machine. Most scientists think it’s impossible, but I have something they don’t.

Dogbert: A lot of spare time?

Dilbert: Exactly.

Steve McIntyre posted nine days ago: “I’ve done a quick read of the post at Climate Lab Book. I don’t get how their article is supposed to rebut Nic’s article. They do not appear to contest Nic’s equation linking F and N – an equation that I did not notice in the original article. Their only defence seems to be that the N series needs to be “corrected” but they do not face up to the statistical consequences of having T series on both sides.”

ANY value brought forward from a past trial (or alternate of average) to be placed into an equation brings with it all of its assumptions. If any of those assumptions appear on both sides of the equation, no matter how small, you are breaking the law, period!!!

This is another way of stating the Second Law of Thermodynamics. It must be followed or it is not science.

Please tell me what assumption I am breaking. The silence is deafening. Steve, Nic, are you OK?

R. Graf – please be quiet. You are spoiling a really interesting thread. Let’s leave commentary to those who know what they are talking about.

I don’t know whether repeating the same argument once more with equations makes it clearer to anyone, who hasn’t understood my argument already, but one more time.

The basic hypothesis is that the temperature trends produced by GCMs follow the equation

ΔT[i,j] = a[i] + b[i] ΔERF[i,j] + c[i] α[j] + d[i] κ[j] + e[i,j] (1)

where ΔERF[i,j] is determined (almost) totally by externally given input and e[i,j] is internal variability that averages to zero.

An additional part of the hypothesis is that the internal variability is mainly due to variability in the oceans. When ocean currents bring cold water to the surface we have a cold phase in surface temperatures. When the surface is colder, it radiates less and that’s reflected in the TOA imbalance leading to larger downwards imbalance. That relationship is assumed to be equal for the temperature trends caused by internal variability as it is in the creation of the long term balance, which defines α. The equation

ΔN = ΔERF – α ΔT (2)

defines α, when the other variables refer to the difference between two long term equilibrium situations. Model runs, were CO2 concentration is quadrupled and a long period after that is simulated are used to determine α. The assumption is that the same α can be used for the influence of internal variability to the 15 year and 62 year trends.

I follow Roman and use the symbol φ for the error in this assumption. Thus φ is defined by

ΔN[i,j] = ΔERF[i,j] – α[j] ΔT[i,j] + φ[i,j] (3)

The estimate of effective forcing is now

ΔF[i,j] = ΔN[i,j] + α[j] ΔT[i,j] (4)

and

ΔF[i,j] = ΔERF[i,j] + φ[i,j] (5)

M&F use the equation

ΔT[i,j] = a[i] + b[i] ΔF[i,j] + c[i] α[j] + d[i] κ[j] + e[i,j] (6)

to estimate the regression coefficients. This differs from the original assumption by including F rather than ERF. We can insert (4) to get

ΔT[i,j] = a[i] + b[i] (ΔN[i,j] + α[j] ΔT[i,j]) + c[i] α[j] + d[i] κ[j] + e[i,j] (7)

but we can also insert (5) to get

ΔT[i,j] = a[i] + b[i] (ΔERF[i,j] + φ[i,j]) + c[i] α[j] + d[i] κ[j] + e[i,j] (8)

Equation (7) has ΔT[i,j] on both sides indicating apparent “circularity”, equation (8) doesn’t. The equations are equally correct, but which tells more correctly about circularity?

In equation (8) we may have influence of ΔT in φ, in (7) we have influence of ΔT both explicitly and through ΔN that is with certainty affected by ΔT. Thus the equation (7) tells correctly about circularity only, when both contributions are taken into account. Doing that we end up in equation (8), which does not have explicit circularity, but may have something of unknown nature through φ.

The only correct way to discuss circularity in the M&F paper is by discussing the properties of φ. M&F acknowledge that such an error term leads to uncertainty. That error term has the nature of circularity, if the partial derivative of φ with respect to ΔT is not zero, when the other independent variables ΔERF, α and κ are kept constant. If that partial derivative is small, the circularity is of little concern. We have no reliable knowledge on φ, but the explicit assumption of M&F is that φ is not so large that it would severely modify the results. Contesting that assumption is legitimate, when the arguments used are not weaker than those of M&F for their assumption.

This whole exercise seems like a very complicated version of :

A = B +α -α + ε

A = Beff +a + φ ; does not resurrect α it simply hides the fact it is irrelevant.

further obfuscation of the problem by introducing alternative and additional terms and definitions does not improve understanding it just buries it another layer deeper. It seems that the authors confused themselves.

Perhaps you could consider my reply to your last comment, where I point out that presence of the climate reaction is being inadvertently equated to zero.

While the logic of doing this was that the equilibration time is sufficiently less that the averaging period, it is a mistake to ignore it.

This move is *asserting* that temporal difference between driver and surface effect is zero and *assuming* that the error this introduces does not matter.

As I have pointed out this is equivalent to saying there is instantaneous equilibration; zero depth of ocean involved in the climate response and a time const tau of zero and dumping the errors this induces into the supposedly “random” error term.

Santer et al cited earlier works finding a time const of 30 – 40 months for the CMIP5 models.

Nic’s eqn 6 shows what is really being done is regressing N and T via κ and ε, the banished relaxation response present in both the models and physical climate is being partly interpreted as ocean heat uptake, the rest as “random error”.

The new eqns 7 and 8 are simply adding four or five more parameters and four or five more variables and is an exercise in over-fitting that data that adds no information.

Until the importance of that inappropriate simplification is addressed, the rest becomes academic. IMO.

Since it is always better to visualise what all this talk of relaxation response is all about, I again suggest looking at my recent article on Judith Curry’s site:

http://judithcurry.com/2015/02/06/on-determination-of-tropical-feedbacks/

Start with pictures😉

Note that the models have a tau far longer than the climate system , but that is a story for another day.

The form of the temporal relationship is still typical of what happens.

Now clearly if we are going start out by saying that difference between the light blue and the dark blue line “doesn’t matter” we’re going to be dumping most of the information about climate response into the “random error” box.

If further analysis turns out to be “insensitive” to values of TCS, this should not be surprising.

Now most here, including Pekka, find the results surprising and I’m suggesting this is why.

Pekka,

Using equations to spell out your argument does indeed help. You wrote earlier (comment 752295) that is was hypothesised that:

“Real ERF is affected very little by internal variability. Thus M&F must assume that the contribution of internal variability to N is approximately -α times the change in temperature due to the variability.”

I agree that is being assumed. To test this assumption, I have examined the correlation between non-overlapping 62-year trends in N and T in ten CMIP5 model preindustrial control runs: nine models used by M&F and one model with an exceptionally long control run. The correlation was only significantly (p=0.05) negative in one case, and it was weakly positive in four cases. Internal variability appears able to change forcing, not just GMST, on multidecadal timescales, in models as well as the real world. The standard deviation of 62-year trends in T in control runs is material – typically a 0.1 to 0.2 K change.

There is another problem. Variations across models in how well the equation ΔT = ΔF/( α+κ) holds in relation to ‘true’ forcing, arising from causes other that internal variability (such as time or state dependence of α and κ), will lead to error in estimating ΔF using (4) in the same direction as the error it causes in estimation of ΔT. That will artificially boost the correlation between ΔT and ΔF and hence the apparent ability of ΔF differences to explain intermodel differences in ΔT, similar to the effect of internal variability that affects ΔF.

Nic,

I don’t have any immediate comments on what you wrote. I just observed that Pehr Björnbom has written a comment to CLB on changes over the full 144 year period available. I haven’t studied that comment carefully, but his approach seems to be similar to what I have had in mind, but not done. The results of that calculation seem to be closer to what I would expect intuitively as α and κ have a stronger influence on temperature than in M&F.

There may be something interesting to learn in the unexpected results of M&F, whether they turn out to be reasonably correct or seriously erroneous. Many uncertain assumptions enter in their analysis, but how much the inaccuracies of each of them affects the outcome is difficult to tell without quantitative work. I haven’t spent the effort needed to obtain CMIP5 data, and have presently no plans to proceed on that.

I tried to find from Forster et al 2013 some comments on the differences between the multiple entries from the same models in CMIP5 ensemble. Those differences might help in testing some ideas. They seem to tell in every case that they have used the average of such entries for quantities supposed to be independent of internal variability without any comment on the spread between the results. You might have the data and the tools to look at that.

Pekka:”There may be something interesting to learn in the unexpected results of M&F, whether they turn out to be reasonably correct or seriously erroneous.”

Since M&F’s unexpected results producing analysis was built on assumption piled on top of assumption, wouldn’t it have been sciency of them to have done the kind of checking that has been done by Nic and Pehr? And I wonder what’s stopping them from doing some testing to find support for their methods and assumtpions, after having seen the criticisms from Nic, Ross M, Roman M. et al. I am also wondering why Pekka hasn’t done something along those lines given the amount of time and effort he has spent defending this baloney. But what do I know.

Don,

There are many kinds of scientific papers, and that’s as it should be.

Looking at the results, I have the suspicion that the CMIP5 ensemble cannot really answer the questions M&F try to figure out. The models and model runs do perhaps not cover sufficiently different combinations of forcing and sensitivity feedback parameters to allow for strong conclusions, or even to avoid misleading results. The limited information content of the ensemble on the issues analyzed forces them to use a very simple model. Choosing a linear regression model is methodologically safe, but perhaps a nonlinear model based on the original equations had still been better. The issues of statistical analysis had, however, been more complex in that case. Maximum likelihood method might perhaps been appropriate fer the estimation of the parameters of such a model.

Many alternatives for this analysis can been proposed in retrospect (including the alternative of forgetting the whole idea). They picked one, and got out this much.

I had a rapid look at the CMIP5 database and noticed that using it takes some effort – as Nic told in one of his comments. That made be stop digging deeper in that by myself. I never planned to spend this much time in explaining my views of the method. Thus no choice was based on an estimate of that effort.

I have tried to write several comments in the way that would clarify also some realities of scientific work that apply more generally than only to this paper. In that this paper has acted as a case study rather than the goal by itself. (Whether I have succeeded in that at all, I don’t know.)

the whole concept of an ensemble of models, or an ensemble mean, or averaging different runs of the same model..are all dubious statistical constructs.

Pekka, this is not some inconsequential little paper in a backwater science, like entomology. We wouldn’t be talking about an obscure journal publishing speculative research full of unexplained assumptions, on why green eyed gnats prefer diddling on Thursdays.

“The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.”

That’s blatant propaganda not supported by the research. You know that. The authors have gone into hiding. They aren’t going to provide their data and methods for examination. They don’t deserve the defense you have been putting on for them. Cut em loose. Just my very humble opinion.

Don,

The paper is a small contribution to science. It draws attention by having been published in Nature, but that attention does not last for long.

The attention has lasted long enough for political purposes. You are recehorsing this one, Pekka. Science is not being well served.

Nic Lewis:

Yes indeed.

Thanks for taking the time to write this out carefully.

There is a very old English expression, believed to date from 1579, which seems to apply here: “You can’t make a silk purse from a sow’s ear.”

Especially when this particular sow’s ear has been put through a meat grinder, too. They are trying to infer the surface properties of the skin from the ensuing pulp.

Accepting that I know the least of anyone here, I can still tell this equation is not just shagged out after a long squawk.

R Graf–

In the words of the man in “The Princess Bride”, “You have made one of the all-time classic blunders!” Your blunder is in stating that you know the least of anyone here–that dubious honor certainly applies to moi instead. But hey, I get to learn a few things, and from you as well. Yes, you gloriously said it: “This equation is not just shagged out after a long squawk.” To paraphrase someone else in some other universe, “My computer takes so long to shut down that I am thinking of naming it ‘M&F at CA’.”

Pekka wrote: “It’s totally clear that their regression formula cannot be derived. M&F write

“This equation holds for each start year separately and suggests ..”

They do not say that the following formula follows from the previous, they say that the previous arguments suggest that the linear regression model might be useful.

You have also questioned the separation of dependence on α and κ. That’s a choice made to separate the two influences, whether they really turn out different or not. That makes sense, because the values of α and κ obtained from Forster et al (2013) for the different models are almost totally uncorrelated. The dependencies on α and κ might be expected deviate more for the 60 year trends than for the 15 year trends as their sum is inversely related to TCR while α alone is inversely related to ECS.”

Frank replies: And I can add a population term to their regression equation and say that it is “suggested” by UHI. That term will improve the fit. In the abstract, M&F claim that their regression approach is “physically motivated by surface energy balance”, but their algebra and regression for energy balance is wrong. Surface energy balance considerations suggest a different model. One can play “assume a statistical model”, but Doug Keenan and the Met Office have taught me that a model must be based on physics or you can’t “prove” that it is “significantly” warmer today than it was a century ago. The regression equation must have the correct physics – especially when you INTERPRET the residuals as unforced variability.

Compare the histograms for 1998-2012 trends in Figure 1c and Figure 2e. Add the ensemble mean trend of about 0.2 degC/decade to the Figure 2e histogram so we are looking at the total trend in both cases. Processing the output from 75 of 114 model runs through M&F’s flawed regression model has widened the histogram so that 1998-2012 is no longer an outlier. If their model is obviously flawed, what have they accomplished? They haven’t separated deterministic variability from unforced variability. One-third of the model output (39/114) wasn’t suitable of their analysis. And now the 1998-2012 outlier in the output from 114 models is magically within M&F’s 5-95% confidence interval.

If one looks closely at Figure 2a of M&F15, the error bars fail to connect or barely connect the ensemble mean to observations more often than appropriate for a normal distribution. For 1954-1956 the 5-95% error bars are actually too short to bridge the gap between observations and the ensemble mean. From 1950-1953, the error bars barely span the gap. The error bar for 1927 just spans the gap, but 1928-1931 are nearly as bad. From 1962-1965, the error bars don’t bridge the gap, while 1961 and 1966 span the gap. The error bars fail to span the gap in 1995, 1997 and 1998, while just barely spanning the gap 1990-1994 and 1996. I count 10 years out of 98 where the error bars don’t span the gap (not unreasonable) and 16 more years when they barely span the gap (probably unreasonable). That would put 25% of the data outside the 10-80% confidence interval. Of course, these errors are not randomly distributed; they mostly occur because excessive cooling by aerosols in the models creates excessive warming and cooling trends when aerosols change.

Note that M&F cleverly put the error bars on the OBSERVATIONS in Figure 2a, not on the ensemble mean – where they belong. If they had put the error bars on the ensemble mean, we would see that their 5-95% confidence interval of 0.26 degC extends up to +0.57 degC/decade for 1992-2006 and down to -0.28 degC/decade from 1951 to 1956. This is a total change over 15 years of +0.86 degC (equally all of 20th century warming) and -0.42 degC. For the 20-year period beginning in 1951, more than -0.5 degC. (Negligible cooling was actually observed during this period.)

RE: Pekka Pirilä Posted Feb 18, 2015 at 4:37 PM

“The paper is a small contribution to science.”

As much as I (and most here) appreciate your efforts to show there is always another side . . . I don’t think I’ve understood your position on the main point of the paper.

IE: “The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.”

I think you provide an excellent argument, perhaps from a different point of view/perspective than many (most) of the commenters here . . . however . . .

Pardon my spelling and interpretation but: rehellisesti sanoen; do you believe climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations?

I have asked Pekka the same question already a week ago. Pekka answered as a true politician; a no-answer.

I have posted the following question at Climate log book. It’s my first post so held for moderation. I’ll see whether they accept it and what response they have.

These kind of non validated, home-spun techniques are typical of much published work in climatology and are one of the main problems that the Wegman report highlighted in 2006.

Panel a of their figure 3 shows the 62 years trends. It would be easier to visualise had they used the mid-point of the period rather than the start date for the x axis. So it is helpful to add 31 year to the dates they use. The x axis would then show that the periods are centred on 1931 to 1983.

http://www.nature.com/nature/journal/v517/n7536/fig_tab/nature14117_F3.html

We can see that ensemble mean trends centred on 30s and 40s were well below the observational data. Firmly outside the declared measurement uncertainty.

As modelled AGW kicks in they begin to narrow the gap. There is a limited period from 1975-1980 where the they match, then a steadily increasing divergence on the hot side, which becomes more marked in the last 7 years of the record.

This is also highlighted in figure 2a,b I excerpted above.

That is, even after taking a 62 year average that will remove AMO, PDO and ENSO variability there is a very clear progression across the full range of the record from serious underestimation to an increasing over estimation of the rate of change of temperature.

Plotting the difference of the 62 year trends of the ensemble mean and HadCRUT4 and comparing to the calculated AGW should be a good indication of the degree to which they are over-estimating AGW.

Though their method is untested and their conclusions about alpha probably erroneous, there is some useful information to be gained from some of the other information provided in this paper.

http://www.theguardian.com/science/2015/feb/18/haruko-obokata-stap-cells-controversy-scientists-lie

Yes, I saw that article. Entitiled “Why scientists lie?”

It interesting to compare how that issue was dealt with and compare to climatolgy.

At least the japanese still have enough honour and integrity to know when to fall on a sword.

The “presigeous”😉 journal Nature was at the heart of that one too.

greg, “Panel a of their figure 3 shows the 62 years trends.”

But what are they comparing? To get an apples to apples comparison I have to use 70% tos (ocean) and 30% tas (air) for hadcrut4 or a combination of ERSSTv4 with Berkeley. The models mainly miss the oceans and since there isn’t an official marine tas, there isn’t an official “global” tas. Comparing model tos in degrees with global ocean SST in degrees provides a more realistic comparison. Comparing model land tas to Berkeley land tas in degrees takes more time, but is better than comparing anomaly on an arbitrary baseline.

I agree in principal that these sea+land, water+air means are phyically corrupt. I usually ask: what is the average of an apple and an orange ? Answer: a fruit salad.

However, eternal arguments about circularity to one side, there is some things that can be taken from this paper that the authors will not be able to argue against if we use their choice of data and their graphs.

It appears that simply subtracting hadCRUT4 from the ensemble mean leaves a pretty clear rising bais that looks a lot like the progression of CO2 “forcing”.

It seems that this would be a much clearer indication of the fact that the models are over sensitive to CO2.

“The models mainly miss the oceans and since there isn’t an official marine tas”

Its called MAT.

http://www.metoffice.gov.uk/hadobs/hadisst/HadISST_paper.pdf

http://www.ncdc.noaa.gov/bams-state-of-the-climate/2009-time-series/mat

and there isnt an “official” anything.

well mosh…

if the complaint is “there isn’t an official marine as”

and your response is “Its called MAT”, followed by

“….there isnt an “official” anything.”, one is hard pressed to understand your response. Is there an official marine TAS?

apparently according to you there is…and then, there isn’t.

BTW,

there is an official soft drink of the olympics…

so to write that there isn’t “an “official” anything” is,

on its face, incorrect.

For someone who has an obsession with pissant corrections and usage, youre pretty sloppy today.

Thats “official marine TAS” my apologies.

Mosher, there is no official anything. I thought you were the official spokes model for models🙂

There are MOHMAT and HADMAT1 which are attempts at a night marine air temperature but with the models it appears you only have tas, tas max, tas min and tos to choose from. So if you compare the model mean tas to hadcrut4 you have apples and oranges. I believe you told me “I” can’t compare models to observations, but apparently M/F are trying to do just that.

By comparing the 62 year model anomaly with the 62 year hadcrut4 anomaly they are completing the fruit salad. When Berkeley published their product they (that would include you) provided a global mean “temperature” with a remarkable +/- ~0.06 C of uncertainty and the models produce a “temperature” output. That should allow a more meaningful direct comparison of real temperatures not anomalies, unless of course your uncertainty interval is meaningless.

simple david.

there are many MAT products. see the link.

none is labelled “official”

same with SAT. there are 5 or more products. each slightly different.

there is no international body that says This is the official.

there are versions. and folks argue about which is best.

so there is MAT for the obsevervations ( several)

none is labelled “official” that I know of.

Finding a certification would be your first step

sorry mosh…

there is always the issue of ambiguity in your brief posts…

I inferred that your quote “its called MAT” referred to the closest thing to an “official” Marine product, as you put it…apparently you were correcting the commenter and letting him know that the time series in question wasnt TAS, it was MAT.

BTW, even though you can be quite prickly, you were of great help to me when i first encountered R, so I think kindly of you for that.

David, for the CMIP5 model runs there were “official” as in recommended data sets. For SST and sea ice there is a merged product of HADISST1 and version 2 of the NOAA optimally interpolated ocean temperature data sets. The models output a tas (temperature air surface) for the globe meaning 70% of that would be a marine tas which may or may not be equivalent to one to the MAT products. Since the original CMIP5 model runs for AR5 there have been a few changes to some of the versions and volcanic forcing has a new reconstruction by Crowley and Unterman 2013 which is considerably different than the volcanic forcing estimates used for AR5.

As I said, most of the model misses are sst related and likely due to outdated volcanic forcing estimates among other things. I don’t consider model error due to poor input data to be a very good test of model “natural” variability emulation.

Either what M&F did was just too sophisticated for any of the dumb readers here to understand, or it is not clear what they did and what it means even for a roomful of statisticians and engineers. After 700 comments and applying occams razor, I conclude the latter.

I think what they did was too sophisticated for the authors to understand too. But the answer fitted their personal biases so they concluded it must be “right”.

Peer reviewers at the “prestigeious” journal Nature were as uncritical as ever and it got published.

The protracted discussion here has being trying to indentify how they got to this surprising result and to find where they went wrong.

This whole waste of time would be unnecessary had the authors attempted to validate their “innovative” method before publishing a study based on it.

Once upon a time, when publishing a new method the custom was show that it worked. First.

The authors do not even seem to realise what they were doing with their sliding “trend” is applying a running mean filter to rate of change of temperature.

Running mean is a crappy low-pass filter that introduces a lot of distortion.

Looking at spike in 1991 on their figure 2a of 15y “trends” it is obvious that they have significant amount of sub-15y variability in their result.

On top of the climate “noise”, measurement error, additional errors introduced by piling on linearisation approximations at all stages, ignoring the lagged nature of the response, they then insert further distortions by poor data processing.

It is really unsurprising that this method fails to detect anything.

Therefore, that failure to detect an influence of α and κ tells us nothing about climate or climate models but tells us a lot about the competence and rigour of these authors.

On the evidence of this paper I have a lot of trouble agreeing with Pekka’s suggestion the Piers Forster is an “expert” and we amateurs may be mistaken.

This paper is frankly amateurish.

Greg –

The sliding trend is not the same as a running mean filter on the rate of change of temperature. It’s equivalent to a low-pass-filtered version of the temperature first differences, with a weighting function which is parabolic in shape, similar to a Welch window.

HaroldW:

Whether they were aware of this is another question.

Nick has a couple of posts up here and here on derivative (Savitzky-Golay) filters that comments on this.

(But for some reason he only shows the real part of the transfer function. I never got around to asking him why.)

Thanks Carrick, I hadn’t seen Nick’s posts.

Carrick,

<"(But for some reason he only shows the real part of the transfer function. I never got around to asking him why.)"The filters are real and either symmetric or antisymmetric, so Im or Re are zero. I show the non-zero part.

That makes perfect sense, thanks.

C-Lion Man

Craig,

I suspect that you are right of course, but all the 700+ comments on this thread may not amount to much.

What is needed is a formal publication which lays out the logical and procedural problems with the M&F paper, similar to how O’Donnell et al pointed out the serious flaws in the Steig et al Antarctic warming paper that made the cover of Nature in 2009. Like M&F, Steig et al appeared motivated by a desire to ‘explain’ an apparent discrepancy between models and reality. Like Steig et al, M&F adopted very doubtful methods, and claim as credible results which are contrary to any reasonable expectation.

If no formal refutation of M&F is ever published, then this very dubious paper will for the next decade or more be trotted out as an ‘explanation’ for the divergence between the CMIP5 ensemble projections and measured reality, or at least until the divergence is so large that even M&F’s ‘internal variation’ can’t explain it. I accept that the M&F authors believe their analysis, as did the authors of Steig et al. I also believe they are similarly mistaken; perhaps they were beguiled by results which fit their hopes/expectations, just as Steig et al almost certainly were.

I still find it very odd that so many recent papers reach the conclusion of ‘no significant error in the models’, despite their divergence from reality, even while those papers reach that conclusion via a dozen different assumed mechanisms. William of Ockham would perhaps have a different suggestion for the cause of model/reality divergence.

” Like M&F, Steig et al appeared motivated by a desire to ‘explain’ an apparent discrepancy between models and reality. Like Steig et al, M&F adopted very doubtful methods, and claim as credible results which are contrary to any reasonable expectation.”

It would be interesting to know how many doubtful innovative methods they tried, before they came up with the results they were after. We should compile a list of the other numerous examples of the climate science method: confirmation bias motivated by noble cause corruption.

Don,

I do not suggest willful effort to find a desired result. In Steig et al, the authors made choices in their analysis (eg. small number of retained PC’s) which basically smeared substantial peninsula warming over the entire continent…. while peninsula warming mostly disappeared (in conflict with reliable on-the-ground measurements for the peninsula!). The rest of Antarctica, with fewer ground measurements, was artificially warmed by the smearing. Steig et al were expecting/hoping to discover warming over the whole of the continent, so my guess is they didn’t critically examine if their analysis choices made any sense…. in light of the loss of peninsula warming, it seems pretty clear they didn’t make sense, but the authors probably were not aware of that. I think M&F fall into the same kind of trap. As Feynman noted, the easiest person for you to fool is yourself.

Yeah Steve, you can tell how sincere they are by how conscientiously they make their data and methods available to those who want to check their work and by how willingly they own up to their errors.

There’s circularity in the duct-taping around and around and around the models.

=============

“I still find it very odd that so many recent papers reach the conclusion of ‘no significant error in the models’, despite their divergence from reality”

I’m not at all sure that is what the current paper shows, despite some that may conclude that.

What is shows is that the current divergence is not more larger than past divergence.

Put this the other way around: the models have been consistently as bad in hindcasting the period with known data that they were trying to match, as they have been in anticipating the lack of warming.

The divergence is not a new problem, models have always been this bad. Too much attention has been focused on a relatively short period between 1975 and 1998 which they were tune to reproduce more accurately.

Look at their fig 3, the divergence in the earlier 62y periods ( centred on 1930-1940 ) were far worse than the divergence at the end.

Their fig 2a,b also shows this.

The authors should be credited for pointing this out. The reader should not allow himself to be guided only by the abstract.

Greg,

I guess it depends on how other people ‘use’ the results from M&F. It it clear from headlines that M&F is already being used to say the the current divergence of models from reality is “not significant”. If the authors are trying to say the models are overall pretty poor, over the entire instrumental period, then they ought to say that in response to inaccurate claims about the conclusions of their paper.

Makes it a lot easier, Greg, when the historical ‘data’ is constantly ‘homogenised’ to convergence with the hindcasts. In either direction, they are not fit for purpose.

Steve, was Steig et l ever retracted or a corrigenda issued?

No.

https://climateaudit.org/2009/08/05/the-steig-corrigendum/

You may delete my previous comment.

If our aim is to improve the methodology we will need to clarify the problem to make it apparent to an audience outside of top physicist and statisticians. A lawyer would break the issue down for a jury but it helps the lawyer think too. What if we model the method if every-day terms like travel time to work? What if we start with the assumption that we can break down travel time by time stopped in traffic and time in motion. We know those two items account for total time but we want to account for traffic and possible breakdowns as well so those are variables we add.

travel time = (time stopped + time moving) (traffic factor) + Unpredictable breakdowns

t = (S + M)(f) + U

We can run tests when there is no traffic to normalize perfect traffic factor as f = 1

How about if one of our climate model experts replace Forster’s and M&F’s equations into these terms and describe it as a historical travel time study. Then one our statisticians can test validity if we even need to get to that point. Just a suggestion. Anybody in?

BTW, I have comment in moderation. I don’t know why but it can be discarded.

Still trying to get my head around this particular issue. I wonder if the following analogy is opt? When using the least squares method to analyse trends a requirement is that the individual results are statistically independent (I am lead to believe – though I have never actually done this – that this is due to the fact that the calculation required the inversion of a large matrix and if the data points are independent the matrix will be “orthogonal” making this step trivial). Of course you can apply least squares with data elements that are not independent and you will get results. However, the best that can be said of such results would be that the estimates of the errors are overstated and at worst meaningless?

Many thanks for anyone whom can take the time to respond.

yes, there are many conditions for OLS to give the “best unbiased linear estimator” that are being gleefully ignored.

Sadly this is not limited to climatology, although the level of general incompetence in this field leaves one reeling.

Another condition that is often ignored ( but is not the case here ) is that the x variable should have negligible error.

That is particularly pertinent in the many attempts to estimate climate sensitivity by regressing dRad on dT.

I wrote an article on this, much of which was incorporated into my recent article on Judith’s blog.

http://climategrog.wordpress.com/2014/03/08/on-inappropriate-use-of-ols/

Some corrections can be applied in some cases the but first requirement is to know what you are doing and avoid improper regressions in the first place, if possible.

I thought it was Nature that has recently adopted a policy of have at least on statistically competent reviewer on all climate related papers,

Was I dreaming when I read that?

PS another conditions is the “error” terms ( ie all that is not the linear relationship being sought ) should be of random ( gaussian ) distribution.

There is quite some flexibility on this but having strong cyclic variability , for example will bias the result.

Also a significant lag will decorrelate the relationship.

If my memory serves, “Science” made that stipulation.

Least squares methods (and maximum likelihood estimation) do not require that the any of the variables be independent of each other. What is important is the proper identification of the (unobserved) random components in the statistical model and how they relate to each other and to the observed data being analyzed. The mathematics can then deal with the estimation of unknown parameters and of the values of the random variables themselves.

In the case where a variable containing such a random component appears more than once in the set of equations defining the relationships in the system, one needs to ensure that

allof the appearances of the randomness are properly taken into account when applying the math. Otherwise, the results will not be reliable and any statistical interpretation of those results will be incorrect.In the regression in M and F, the authors posit that ΔT can be decomposed additively into a deterministic component and a random component: ΔT = ΔT” + ε. Since the same ΔT is used in defining ΔF: ΔF = α ΔT + ΔN = α (ΔT” + ε) + ΔN. The only way ε can

disappearfrom that equation is to have ΔN = ΔN” – αε where ΔN” (as well as ΔF) is either deterministic or itself has a random portionwhich is independent of ΔT. This is what I meant in an earlier comment about ΔN “masking” the effect of ΔT in defining ΔF. Assuming that ΔN is of this form does not seem to me to be warranted so carrying out a simple regression done in M and F would be flawed.Roman, your breakdown of the problem in common sense terms is the best that I have seen.

Am I understanding correctly the Forster equation simply did not break out variability as a factor. Yet it certainly was there and had a hand in determining F, radiative forcing.

Then M&F write their equation with F and the variability in it is forgotten. Is this right?

Also, whereas all of these relationships are circular and the only thing that allows the variables to manifest is their relative kinetics, it seems that time is a critical missing component to all the equations. After all, the way the equations are written the effects are reversible. (I am not suggesting time should be added and anyone spend their time trying to solve it. They should have programmed the computer to output the data that was desired.)

Thanks for the expert input, Roman.

“What is important is the proper identification of the (unobserved) random components in the statistical model and how they relate to each other ”

Isn’t this a rather liberal use of the word “random”? Something the authors do liberally😉

The description that they are “random” implies that they have no effect on the regression estimation. Indeed the authors frequently use the adjective random with the implicit assumption that they can then be ignored.

In the purest sense OLS assumes normally distributed “errors”, doesn’t it?

It seems to me that a lot of the reason for the surprising conclusions of this paper is that they are shunting off statistically important variability into “random error” terms and duly ignoring them entirely.

If you are still in agreement with Nic’s substitution, it would appear that the dependency on alpha has been shunted off into the error term.

Could you comment on that interpretation?

Just a follow up on RomanM’s comment… while it is true that the variables (basis functions) of the LSF need not be orthogonal to each other, you do pay a price when they are not.

You get “noise amplification” which is proportional to the square root of the ratio of the largest to smallest eigenvalue of the Hessian matrix

This is why singular value decomposition or similar techniques are used in the inversion process, which reduce the amount of noise amplification by dropping the include of eigenmodes with very small eigenvalues (but at the expense of a loss of fidelity). In particular for Gaussian white noise the factor is just

(hopefully the notation is obvious).

Whether M&F used SVD (or similar) is something I haven’t checked, but technically for a problem like this, they should.

My link to the Hessian Matrix got dropped somehow. (In case this one also is dropped, it’s a standard term that you can find in Wikipedia.)

Roman wrote: “In the regression in M and F, the authors posit that ΔT can be decomposed additively into a deterministic component and a random component.”

Don’t regression residuals contain things besides “random components”? If I perform a linear regression one variable has a quadratic influence, the residuals will contain systematic errors, in addition to “random components”. M&F’s regression equation is a poor approximation of the physics of surface energy balance and contains an extra, inappropriate degree of freedom. I don’t understand how they get away with equating the residuals with “unforced variability”. I don’t know how any analysis of signal with a chaotic component can separate “unforced variability” from possible flaws in the regression equation and from uncertainty in the variables (T, N, alpha and kappa). I would benefit from a clear discussion of “random components” in chaotic systems.

The allegedly deterministic portion of the regression equation contains ΔF, which is calculate from ΔT – and T contains unforced variability. M&F need to prove that the unforced variability contributed by ΔT and ΔN is negligible.

Here is a comparison of the 15y running average of CMIP5_rcp4.5 ( essentially the same thing they are doing with “sliding trends”, compared to a real low-pass filter of the same data.

Compare this to the red line in figure 2a of the paper.

It follows that their conclusions about 15y “trends” are dominated by the distortions and inadequacies of their data processing and thus their conclusions in this regards are spurious.

I think the problem with 62y “trends” lies elsewhere.

BTW the peak at 2000 here corresponds to the peak they show around 1992 since they use the beginning of the 15y period , not the mid-point.

I use a 78mo 3-sigma gaussian filter which has similar frequency characteristics to the 180mo ( 15y ) running mean without the leakage and distortions of the latter.

A sliding 15y “trend” is the same thing as a running average of dT/dt. Identical mathematically.

Their red line is somewhat smoother since they are doing individual regressions. I used the ‘anomaly’ of the CMIP5 ensemble mean, 60S-60S rather than the detailed HadCRUT mask.

here is the excess rate of change between CMIP5-rcp4.5 tas and HadCRUT4 :

M&F’s observation that the current departure is not exceptional is true. They really have been just as inaccurate at hindcasting even when trying to match the historical record. This certainly does not increase the confidence we should have in the models.

However, what they are trying to sweep under the carpet in presenting it like that is that the deviation has progressed from totally missing the early 20th c. warming, to currently missing the lack of warming.

There has been a steady progress from under-estimation to over-estimation of warming. This, despite also over-estimation of the cooling effect of volcanoes.

All of this underlines that the models are over sensitiveness to radiative forcing.

M&F’s primary conclusion is counter to what is shown by the very data they chose to try to demonstrate it.

Slight correction for the record. The RM of dT/dt is not identical to the sliding “trend” since the mean minimises the absolute deviations not the squares of the deviations.

However, it has the same temporal structure which is the origin of the distortions and inversions caused by a running mean.

http://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/

The peak of the inverting lobe of the running mean is at window period / 1.433. In the case of 15y window that is 10.5 years. So any variability around that period will not be removed but

inverted.Their choice of a 15y window is most unfortunate for studying models where a very significant part of the variability comes from volcanic forcing and the latter part of the record is dominated by two major eruptions about 10.25 years apart.

I just found a comment by HaroldW above about the difference between sliding trend and running mean of dT/dt.

In effect the square window of the RM is replaced by a Welch window, so it is slightly less distorting. As I said above OLS trend minimised the squared errors ( the variance) rather than absolute errors.

However, the essence of the problem remains. He links to a very good discussion by Nick Stokes who shows the frequency response of the sliding trend “filter”.

We see the large negative lobes that cause the inversions in the data that I referred to.

Though not “identical” the problem is essentially the same. The distortions introduced by the sliding trend are very similar to those of a running mean and it would be far better to use a properly chosen filter.

Nick Stokes’ articles , from which the above graph comes:

http://www.moyhu.blogspot.com.au/2015/01/trends-breakpoints-and-derivatives.html

http://moyhu.blogspot.fr/2015/01/trends-breakpoints-and-derivatives-part.html

The red line in Nick’s graph above is the frequency response of the “sliding trend”.

It can be seen that the negative inverting peak is almost 50% of the main peak that we are interested in.

It’s worse than we though (TM) !

It is not surprising that M&F could not find anything after the way they mangled the data.

Had they used a gaussian-derivative ( with sigma=5y for ex. ) they may have more of a chance of getting a result.

To bring back the focus on the core questions here:

1) Was the equation used by M&F appropriate for the goal, valid, fed with untainted values?

2) Were M&F’s results in contradiction to other studies?

3) Was M&F’s conclusion warranted by their results?

4) How can we devise tests to determine the above?

How about we do an inventory and have each weigh in on each of the above?

Would it be good if afterward someone took the lead to assign further investigation?

Good idea.

No one seems too interested what I’ve shows about their defective sliding “trends” but it basically invalidates anything they are doing with 15y windows. The rest of questions then become immaterial.

However, I think the 62y is long enough and there does not appear to be significant energy in the system around 62/1.433 = 43 years, so they probably hit lucky there.

So I suggest further consideration of 1) and 3) should be restricted to the 62y case. All their results and conclusions for 15y are invalid.

Here’s a silly question, if someone doesn’t mind answering: if M&F’s goal was to determine variability in the temperature signal versus its rise why not simply analyze only that? Why bring feedbacks into it at all? Why not simply run the models and use the equation T = aF +e ? After all, who cares what the CO2 forcing is vs. the aerosol forcing vs. the feedback? It’s different in most of the models, which is the whole point to the design of eventually determining the right mix? The skeptic’s question is: “are the models systematically overestimating adjusted forcing?” Could M&F’s unnecessary complication of adding feedbacks brought in error that is clouding the analysis?

My guess is that their goal was to debunk the claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations and they didn’t get what they wanted, until they tried innovative methods.

That would be untested innovative methods. Ones that invert 50% of the signal.