Marotzke and Forster’s circular attribution of CMIP5 intermodel warming differences

A guest post by Nicholas Lewis

Introduction

A new paper in Nature by Jochem Marotzke and Piers Forster: ‘Forcing, feedback and internal variability in global temperature trends’[i] investigates the causes of the mismatch between climate models that simulate a strong increase in global temperature since 1998 and observations that show little increase, and the influence of various factors on model-simulated warming over longer historical periods. I was slightly taken aback by the paper, as I would have expected either one of the authors or a peer reviewer to have spotted the major flaws in its methodology. I have a high regard for Piers Forster, who is a very honest and open climate scientist, so I am sorry to see him associated with a paper that I think is very poor, even as co-author (a position that perhaps arose through him supplying model forcing data to Marotzke) and therefore not bearing primary responsibility for the paper’s shortcomings.

In putting together this note, I have had the benefit of input from two statistical experts: Professor Gordon Hughes (Edinburgh University) and Professor Roman Mureika (University of New Brunswick, now retired). Both of them regard the statistical methods in Marotzke’s paper as fatally flawed.

The Marotzke and Forster paper analyses trends in simulated global mean surface temperature (GMST) over all 15- and 62-year periods between 1900 and 2012, and relates them to contemporaneous trends in model effective radiative forcing (ERF) and to measures of model feedback strength (alpha) and model ocean heat uptake efficiency (kappa).

The paper is very largely concerned with the behaviour of climate models, specifically atmosphere-ocean general circulation models used in the CMIP5 simulations. In discussing relevance to the actual climate system, it ‘assumes that the simulated multimodel ensemble spread accurately characterizes internal variability’.

The authors’ principal conclusions are:

The differences between simulated and observed trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings used to drive models over the longer timescale. For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends or, consequently, on the difference between simulations and observations. The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.

Marotzke claims to have shown that in model simulations the structural (alpha and kappa) elements – which encapsulate model GMST responses to increases in CO2 forcing – contributed nothing even to recently-ending, longer-term GMST trends. It is difficult to see how that can be so if the models work properly. It is certainly possible (in fact likely) that over the period 1900–2012 the combined contribution of alpha and kappa to model GMST trends was largely obscured by countervailing variations in model ERF trends: high sensitivity models tend to have more negative aerosol forcing than lower sensitivity models, enabling both to match 20th century GMST trends. But aerosol levels have changed little over the last 35 years and higher sensitivity models have been warming much faster than observed GMST over that period.

In order to show why the paper’s conclusions are not justified, I need to explain what Marotzke has done.

What Marotzke did

Marotzke starts with a ‘physical foundation’ of energy balance: ΔT = ΔF / (α + κ), where ΔF is the change in ERF; α is the climate feedback parameter (the reciprocal of equilibrium/effective climate sensitivity [ECS] normalised by F2xCO2, the ERF from a doubling of CO2 concentration: α = F2xCO2/ECS ); κ is the ratio of change in the rate of heat uptake by the climate system – or in its counterpart, top-of-atmosphere (TOA) radiative imbalance – to change in GMST, termed ocean heat uptake efficiency; and ΔT is the change in GMST.[ii]

Marotzke then adds a random term, ε, to represent internal variability in GMST, resulting in the equation

.                                             ΔT = ΔF / (α + κ)  +  ε                                         (1)

which is taken to apply to linear trends, rather than changes, in GMST and ERF.

He then takes temperature data and individual ERF time series relating to their historical simulations[iii] from an ensemble of 18 CMIP5 models.[iv] The ERF time series were not included in the model simulation output but had previously been diagnosed (estimated) therefrom by Forster et al, along with values for α and κ.

Marotzke expresses each quantity in (1) as Marotzke_x_overbarwhere the overbar represents the ensemble mean and the prime the across-ensemble variation. By considering a linear expansion of equation (1), using those expressions, he arrives at the approximation

Marotzke_eqn2

Marotzke states that this equation suggests a regression model

Marotzke_eqn3

where the value of j identifies the particular model run involved. Some models have multiple simulation runs, but each model’s values for ΔF, α and κ are common to all its runs.

I’m rather dubious about the validity of the approximations used in (2) given that α is typically somewhat larger than κ and there is nearly a threefold variation in α across the models, meaning that many of the α’ terms are substantial in relation to the model ensemble mean Marotzke_alphakappa_overbar. But I will leave that aside for the present purposes.

Marotzke performs multiple linear regressions according to the statistical model (3) for each start year (1900–1998 for 15 year trends; 1900–1951 for 62 year trends). He then determines the extent to which the across-ensemble variations in ΔF, α and κ contribute to the ensemble spread of GMST trends. Marotzke’s main factual conclusions follow from these three factors explaining little of the ensemble spread of GMST 15-year trends, with the majority being attributed to internal variability, whilst for 62-year periods starting from the 1920s on variations in ΔF, or ERF trends, dominate with variations in model feedback α and ocean heat uptake efficiency κ having almost no effect.

Flaws in Marotzke’s methods

To a physicist, the result that variations in model α and κ have almost no effect on 62-year trends is so surprising that the immediate response should be: ‘what has Marotzke done wrong?’

Some statistical flaws are self evident. Marotzke’s analysis treats the 75 model runs as being independent, but they are not. Only 18 models are analysed, and only one set of predictor variables is used per model. The difference between temperature simulations from each individual run by a model with multiple runs and the run-ensemble mean for that model is accordingly noise that one could not expect to be explained by the regression. The use of all the individual runs invalidates the simple statistical model used and the error estimates derived from it. Also, moving from equation (1) to (3) above will have made the errors correlated with the predictor variables, biasing the coefficient estimates. Uncertainty in the values of the parameters α and κ and in the forcing time series is also ignored. As I show later, uncertainty in κ, at least, is large. And in equation (1) α and κ appear only in terms of their sum. Allowing a separate predictor variable for each of them may result in part of the internal variability being misallocated.

However, there is an even more fundamental problem with Marotzke’s methodology: its logic is circular.

The ΔF values were taken from Forster et al (2013)[v]. For each model, historical/RCP scenario time series for ΔF were diagnosed by Forster et al using an equation of the form:

.                                                 ΔF = α ΔT + ΔN                                                         (4)

where ΔT and ΔN are the model-simulated GMST and TOA radiative imbalance respectively, and α is the model feedback parameter, diagnosed in the same paper.

Moreover, κ had been diagnosed from the model transient climate response[vi] (TCR) as Marotzke_F2x_TCR. Therefore, the denominator in equation (1) is simply Marotzke_F2xoverTCR, termed ρ (rho) in Forster et al (2013). Note that F2xCO2, the ERF from a doubling of CO2 concentration, does not take a standard value (3.71 Wm‑2 per IPCC AR5) but is a diagnosed value that differs significantly between models.

One can therefore restate the ‘physical foundation of energy balance’, with added random term representing internal variability, (equation (1)) as:

.                                       ΔT = (α ΔT+ ΔN )  / ρ  +  ε                                            (5)

As is now evident, Marotzke’s equation (3) involves regressing ΔT on a linear function of itself. This circularity fundamentally invalidates the regression model assumptions. Accordingly, reliance should not be placed on any of the results in the Nature paper. That is particularly the case for the 62-year trend results, where the offending, non-exogenous ΔF’ term dominates the ensemble spread of GMST trends for start years from the 1920s on.

Since the ΔF predictor variable is a linear function of the response variable ΔT, which becomes larger relative to noise as the start year progresses, it is hardly surprising that the across-ensemble variations of ΔF are the main contributor to the ensemble spread of GMST 62-year trends starting from the 1920s onwards. As the start date progresses the intermodel variation in 62-year trends in ΔF is increasingly determined by intermodel variation in trends in α ΔT: ΔN trends are noisy but intermodel variation in trends in ΔN is of lesser relative importance for later start years. However, since ΔT is not an exogenous variable, domination in turn of intermodel variation in trends in GMST by variation in trends in ΔF tells one nothing reliable about the relative contributions of forcing, feedback and ocean heat uptake efficiency to the intermodel spread in GMST trends.

Examining the effects of the circularity in Marotzke’s method

One could, at the expense of changing the error characteristics somewhat, rearrange (5) to eliminate ΔT from the RHS and remove the circularity, which (since κ = ρ α) results in simply[vii]

.                                                 ΔT = ΔN   / κ  +  ε                                                     (6)

However, Marotzke does not do so, and in any case this equation only deals with the element of forcing that is associated with ocean etc. heat uptake, not with the (larger) element associated with increasing GMST, and it does not include α.

I’ll stick with Marotzke’s approach for the time being but derive a regression equation from (5) via a similar expansion to that employed by him, here keeping the two terms comprised in ΔF separate but not splitting the ρ term between α and κ. Linearly expanding (5) yields:

Marotzke_eqn7

which on keeping only the lowest terms but separating the influence of the Marotzke_alphaTj and Marotzke_Nj terms leads to a regression equation of this form:

Marotzke_eqn8

I have carried out a regression analysis based on equation (8) using the same set of models. I used the run-ensemble mean where a model had multiple runs, not all the individual runs. The justification for not using all the separate runs was given earlier. Using run-ensemble means will however result in model internal variability not being fully represented in the regression residuals.

Over early, middle and late 15-year periods within the 1900–2012 analysis period, the intermodel spread in GMST trend is dominated by the Marotzke_alphaTj term; internal variability (assessed from the variance not explained by the regression fit) is small. Over the earliest and latest 62-year periods (1900–1961 and 1951–2012) the Marotzke_alphaTj term continues to be dominant but less so, with a greater amount of unexplained variance. The other two terms explain very little of the intermodel spread, save for modest contributions from the ρ term in the 62-year trend cases. There is little point in examining more than two or three historical 62-year trend cases, as results from periods with substantial overlap are far from independent.

The results from this rejigged regression show that the apparent internal variability shrinks greatly when different coefficients are permitted for the two terms in the diagnosed forcing. And one can actually get even better fits for all periods by regressing using just Marotzke_alphaTj and Marotzke_alphaj terms.

However, none of the analysis examined so far is valid, because in all of it ΔT appears on both sides of the equation – whether explicitly as in equation (8) or, as in Marotzke’s paper, concealed within ΔF – so there is circularity involved either way. Naturally, if one separates out, as in equation (8), the predictor variable term in which the response variable appears simply multiplied by a parameter – Marotzke_alphaTj – from an associated noisy term with little explanatory power – Marotzke_Nj – the regression will explain more of the variability in the response variable. But the fact that the Marotzke_Nj term – the only exogenous part of Marotzke_Fj – has no significant explanatory power suggests that Marotzke’s 62-year period results likely just reflect the decline in the intermodel variation in the noisy Marotzke_Nj term relative to that in the circular Marotzke_alphaTj term as the period considered ends closer to 2012.

Another reason why Marotzke’s approach is doomed

Another major problem with this type of attribution approach, even if the circularity could be removed by somehow diagnosing ΔF differently and other statistical problems dealt with, is that the underlying assumption that the previously diagnosed α and κ values for individual models are realistic enough to use in equation (1), or in its circularity-free reduced κ-only version (6), appears to be false.

I have compared κ values based on the ratio of ΔN and ΔT trends over 1951–2012 from the model-simulations with the values used by Marotzke, which as explained were diagnosed in Forster et al 2013 by a quite different method. The ΔN / ΔT trend-based estimates vary from 0.54 times to 2.48 times those Marotzke uses; for only five models are the two estimates the same within 10%. Estimates of κ based on the 2005–2066 period under the RCP8.5 scenario, which provides a strong greenhouse gas forcing ramp with little influence from variations in aerosol forcing, range from 0.46 times to 1.09 times those Marotzke uses, and from 0.18 times to 1.75 times those estimated from changes over 1951–2012. And estimates of κ based on changes in the rate of simulated ocean heat uptake during 1961–2005,[viii] rather than simulated TOA radiative imbalance, are substantially different again. It seems doubtful that estimates of α values would be robust enough either.

With this degree of apparent variation in κ when estimated by different methods and over different periods, one would expect equation (6) to have very little explanatory power (regressing ΔT on Δ/ κ). And that is indeed the case. The intermodel spread in GMST trend is dominated by internal variability over both 15 and 62-year periods, whether towards the start or end of the analysis period. The more valid, circularity-free, version of the surface energy-balance equation is useless for investigating the intermodel spread in GMST trends. The same applies when using a regression equation based on (6) but separating the and κ terms, leading to this form:

Marotzke_eqn9

Conclusions

I have shown that there are no valid grounds for the assertions made in the paper that ‘For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends’ and that ‘The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded’.

Marotzke conclusion that for periods ending in the last few decades the non-noise element of 62-year GMST trends in models is determined just by their ERFs is invalid, since he hasn’t used an exogenous ERF estimate. Indeed, if the models are working properly, their GMST trends must logically also reflect their feedback strengths and their ocean heat uptake efficiencies.

The interesting question is how much the large excess of model ensemble-mean simulated GMST trends relative to observed trends over the satellite era is attributable to respectively: use of excessive forcing increases; inadequate feedback strength (excessive ECS); inadequate ocean heat uptake efficiency; negative internal variability in the real climate system; and other causes. The Marotzke and Forster paper does not bring us any closer to providing an answer to this question. It certainly does not show the claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations to be unfounded.

One of Marotzke’s conclusions is, however, quite likely correct despite not being established by his analysis: it seems reasonable that differences between simulated and observed trends may have been dominated – except perhaps recently – by random internal variability over the shorter 15-year timescale.

Gordon Hughes had some pithy comments about the Marotzke and Forster paper:

The statistical methods used in the paper are so bad as to merit use in a class on how not to do applied statistics.

All this paper demonstrates is that climate scientists should take some basic courses in statistics and Nature should get some competent referees.

The paper is methodologically unsound and provides spurious results. No useful, valid inferences can be drawn from it. I believe that the authors should withdraw the paper.

 

[i] Jochem Marotzke & Piers M. Forster. Forcing, feedback and internal variability in global temperature trends. Nature, 517, 565–570 (2015)

[ii] This so-called kappa model does not respect conservation of energy over long periods, but as Marotzke says it is a reasonable approximation (at least in climate models) over periods of one to several decades.

[iii] Extended from 2005 to 2012 using, it appears, the RCP4.5 scenario runs.

[iv] The NorESM1-M model was incorrectly shown as not having forcing estimates available, but does seem to have been included in the models used.

[v] Forster, P. M. et al. Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models. J. Geophys. Res. 118, 1–12 (2013).

[vi] The rise in GMST over a ~70 year period during which CO2 concentration increases at 1% pa, thereby doubling.

[vii] Although it would arguably be more logical to regard ΔN rather than ΔT as the response variable in this equation.

[viii] Derived from IPCC AR5 Fig.9.17.b

867 Comments

  1. Lance Wallace
    Posted Feb 5, 2015 at 11:13 AM | Permalink

    Between equations 7 and 8 some missing words occur:
    “separating the influence of the and terms”

  2. Lance Wallace
    Posted Feb 5, 2015 at 11:18 AM | Permalink

    and more missing letters or words:

    “the intermodel spread in GMST trend is dominated by the term;”

    “the term continues to be dominant but less so”

    “And one can actually get even better fits for all periods by regressing using just and terms.”

    “But the fact that the term – the only exogenous part of – “

  3. Lance Wallace
    Posted Feb 5, 2015 at 11:23 AM | Permalink

    One more–there may be others

    “The same applies when using a regression equation based on (6) but separating the and κ terms”

    • Posted Feb 5, 2015 at 12:00 PM | Permalink

      Thanks. I think I’ve now fixed all the expressions that WordPress didn’t convert. Sorry about that.

  4. Posted Feb 5, 2015 at 11:34 AM | Permalink

    Hi Nic,

    There are some character insertions in your text that don’t appear.

    Let me see if I have digested your argument. What Marotzke and Forster did was run a regression for every year from 1900 to 1950 in the form:

    [1] dT = b0 + b1*dF + b2*alpha + b3*kappa + e

    where in each year there are 75 observations, taken from each of the 75 model runs. dT is the warming trend counting forward 62 years from start date 1900, 1901, 1902, etc. b0 is the constant term. dF is the trend in forcing counting forward 62 years from start date 1900, 1901, etc.; alpha is the model’s GHG sensitivity (or some transformation thereof), kappa is the model’s ocean heat uptake efficiency (or some transformation thereof), and e is the residual term. The forcing trend (dF) is meant to be a summation of the net effect of GHG+aerosols+solar+vocanoes+other warming/cooling influences, and it is assumed to be exogenous, i.e. determined by data that are independent of temperature trends.

    In their regression results they show that b1 is large and significant (I guess? they don’t actually report the regression results in the paper!) while b2 and b3 are nearly zero. And the residuals are also large. So their conclusion is that variations in forcing (the dF term) and noise (the residuals) account for the spread of model-estimated trends, while variations in sensitivity and ocean heat uptake play no role in accounting for the spread of model-generated temperature trends. Hence, they conclude, it can’t be the case that the ‘pause’ implies models are too sensitive to GHGs since sensitivity (alpha) plays no role in short- or long-term trends.

    Before turning to your particular point about circularity, your first observation is, in essence, that they seem to be asserting that the structural elements of their models (alpha and kappa) play no role in the key model behaviour. And this, we are to believe, is the basis for their defence of the validity of models. Alternatively, it is prima facie evidence that they have screwed up somewhere because it is inconceivable that that the main structural elements of the models play no role in the behaviour of the models.

    Your diagnosis of where they went wrong, as I understand it, is singularly devastating. The authors took their forcing trend estimates (dF) from an earlier paper that constructed them using the equation

    [2] dF = a0*dT + dN

    where a0 is a feedback term, dN is a term capturing the Top of Atmosphere radiative imbalance, which in this context is just a source of noise, and dT is… dT! So in [1] they regressed dT on itself + other terms! The Marotzke regression is actually something like

    [3] dT = c0 + c1*(a0*dT+ dN) + c2*alpha + c3*kappa + e

    and not surprisingly they found alpha and kappa contribute nothing. The only reason the regression model didn’t collapse due to dT being on both sides is that a bit of noise in the form of dN is added to dT on the right hand side.

    And when they do the same regression on 15-year trends they find dT and the residuals e again “explain” everything and this time the noise component is even larger.

    Now, as the saying goes, just because it was published in Nature doesn’t automatically mean it’s wrong. But I have difficulty seeing how this wreck can be salvaged. Have I correctly summarized your argument? If so, how do the authors defend the claim that dF is exogenous?

    • joe
      Posted Feb 5, 2015 at 12:26 PM | Permalink

      As a layman, my observation/impression is that we have had a very consistent overall warming trend since circa 1850 with the natural ocean cycles amplifying this trend (as in the 1920/30’s and 80/90’s) along with dampening the trend which includes the 60/70’s and the current pause. (ie one warming trend with almost the same slope over the entire period interposed with a 60-70 ocean cycles) (fwiw, far too much emphasis is placed on the pause). My beef with the models and the discrepancy is the failure to incorporate the ocean cycles into the models, especially since they were reasonably well known by the mid 1990’s.

      I am unable to tell from the critique of the paper whether the ocean cycles are given credit for any of the discrepancy or are treated as having no effect on the discrepancy. Any commentary or enlightenment on the subject would be appreciated

      • Posted Feb 5, 2015 at 1:34 PM | Permalink

        The paper gives no role to ocean cycles as such. Some models do exhibit multidecadal ocean oscillations, but as they are unlikely to be in phase in different models (or with the real climate system) they generally show up as part of random climate noise.

        I agree that if one incorporates a 60-70 year ocean cycle (the AMO being the obvious one) then warming over the instrumental period bears a more consistent relationship to extrenal forcing influences.

    • Posted Feb 5, 2015 at 12:33 PM | Permalink

      Hi Ross
      Thanks for your very helpful comment. Your summary of my circularity argument is good.

      Just to clarify, Marotzke & Forster find that for 15 year periods the regression residuals are dominant for all start years – which they interpret as implying that internal variability (in the real climate system) dominates the difference between model simulations and observations during the hiatus period.

      They find that for 62 year periods the regression residuals are dominant for early start years – when the temperature and forcing trends were low – but that forcing dominates thereafter.

      For both periods and all start years, the structural sensitivity and ocean heat uptake characteristcs, as represented by alpha and kappa, are found to have a negligible influence on model temperature trends.

    • Steven Mosher
      Posted Feb 5, 2015 at 1:20 PM | Permalink

      Thanks Ross.

      much clearer now

    • S. Geiger
      Posted Feb 6, 2015 at 12:02 PM | Permalink

      Is alpha actually prescribed in a specific model? I thought it was a number that was determined ex post based on the resulting temperature trend (?) Or am I conflating alpha with a more encompassing term that takes into account all feedback (TCR maybe?)

      • Posted Feb 9, 2015 at 5:04 AM | Permalink

        Your understanding is correct. Alpha is usually estimated as minus the slope coefficient in an OLS regression of ΔN on ΔT over the first 150 years of a model simulation that starts with CO2 concentration being abruptly doubled or quadrupled from an previously equilibriated position.

    • William Larson
      Posted Feb 6, 2015 at 1:39 PM | Permalink

      “Now, as the saying goes, just because it was published in Nature doesn’t automatically mean it’s wrong.” Aha!, a treasure of a saying! Kind of reminds me of W. C. Fields: “Anyone who hates children and dogs can’t be all bad.”

    • pdtillman
      Posted Feb 6, 2015 at 3:13 PM | Permalink

      @Ross McKittrick,

      “…just because it was published in Nature doesn’t automatically mean it’s wrong.”

      Heh. +3!

    • DocMartyn
      Posted Feb 6, 2015 at 7:17 PM | Permalink

      Ross, I hope I am not being dumb here but the starting equation is

      ΔT = ΔF / (α + κ)

      The units of temperature are degrees and units of force are Watts.

      Does that mean that the units of both α + κ must be in W/K, as they are additive?

    • Joe Born
      Posted Feb 8, 2015 at 8:22 AM | Permalink

      A long-time lurker, I must confess that I only occasionally understand understand completely the posts that people like Steve McIntyre and Nic Lewis write. No doubt this says more about my limitations and lack of effort than what those posts effectively tell the intended audience. But I always get it when Dr. McKitrick boils it down for us laymen.

      Thank you, Dr. McKitrick, for this and the other oases of clarity that you have provided over the years.

    • Posted Feb 9, 2015 at 2:50 AM | Permalink

      So without dN the regression would pick up c1=ao and zeros elsewhere because $(X^TX)^{-1}X^TX=I$ and residuals would be zero because $(I-X(X^TX)^{-1}X^T)X=0$ ?

      • Posted Feb 9, 2015 at 3:00 AM | Permalink

        \hat{c1}=1/ao

  5. Arthur Dent
    Posted Feb 5, 2015 at 12:05 PM | Permalink

    Whatever happened to Peer Review?

    • Posted Feb 5, 2015 at 12:12 PM | Permalink

      Who are the peers of idiots and scoundrels?

  6. Posted Feb 5, 2015 at 12:07 PM | Permalink

    Another effort to explain the model/pause discrepancy collapses.
    Nature’s reviewers may not have had Nic Lewis and Ross McKitrick’ s statistical chops, but they should have caught the fatal contradiction in a comclusion that the two self proclaimed most important model emergent features, alpha and kappa, do not statistically influence model behavior. That is illogical to the point of the absurd.

  7. miker613
    Posted Feb 5, 2015 at 12:39 PM | Permalink

    As I’ve posted elsewhere, I don’t understand what Marotzke and Forster is trying to prove. Maybe someone can explain it. A cursory glance at model outputs vs. global temperature measurements shows that the models do a reasonable job following temperatures for the past century. Therefore, they were not “running hot” then, correct? On the other hand, they are not doing a good job of following temperatures this century, and seem to be running hot.
    Surely the claim that they ought to be rebutting is: The models were overfitted somehow, those with too-high sensitivity were balanced by other factors, and the balancing lasted for the century of training data. Now that we are looking at new data, the balancing isn’t working anymore and their too-high sensitivity is becoming apparent.
    How is a study of last century’s data going to answer that issue? All they are showing is that things work for the last century.

    • Posted Feb 5, 2015 at 12:50 PM | Permalink

      “Predicting” history is not that difficult…

    • Posted Feb 5, 2015 at 1:16 PM | Permalink

      The paper is another effort to argue that the now over 18 year pause does not falsify CMIP5. BAMS said in 2009 that 15 years would. Santer’s 2011 paper said 17 years. OOPS! That is why the Max Plank Institute gave it the media spin it did.
      Many fun details (and other absurdities as bad as this paper) in essays An Awkward Pause and Unsettling Science in ebook Blowing Smoke. Nic’s evisceration of Marotze would have made a nice additional example to the latter essay.

    • jorgekafkazar
      Posted Feb 5, 2015 at 7:06 PM | Permalink

      “♫♪…Those were the days, my friend, we thought they’d never end…♫♪”

    • stevefitzpatrick
      Posted Feb 7, 2015 at 10:41 AM | Permalink

      Miker,
      It think they are trying to do pretty much what Foster & Rahmstorf were trying to do a couple of year back with their silly curve-fit paper: show that the rather glaring discrepancy between modeled and measured warming is NOT due to the models just being too sensitive to forcing. If the true values of transient and equilibrium sensitivity are much lower than the model ensemble (Lewis & Curry, for example), then there is less urgency for costly immediate forced reductions in fossil fuel use…. and IMO, that is why there have been so many recent papers published which offer a host of ‘explanations’ for the model/reality divergence, none of which seriously contemplate the most obvious explanation: the models have too much net positive feedback.

    • Mike Jonas
      Posted Feb 14, 2015 at 10:49 AM | Permalink

      miker613 – neatly put. As I see it, their ε is the difference between model results and reality. Since ε, in their view, over time tends to zero, it doesn’t matter how big ε is now, because over time the models will be correct. Circular logic indeed.

    • miker613
      Posted Feb 22, 2015 at 2:24 PM | Permalink

      Wow: http://julesandjames.blogspot.com/2015/02/that-marotzkeforster-vs-lewis-thing.html
      “My first thought on a superficial glance at the paper was that it wasn’t really that useful an analysis, as we already know that the models provide a decent hindcast of 20th century temps, so it’s hardly surprising that looking at shorter trends will show the models agreeing on average over shorter trends too (since the full time series is merely the sum of shorter pieces). That leaves unasked the important question of how much the models have been tuned to reproduce the 20th century trend, and whether the recent divergence is the early signs of a problem or not. (Note that on the question of tuning, this is not even something that all modellers would have to be aware of, so honestly saying “we didn’t do that” does not answer the question…)”

  8. Gerald Machnee
    Posted Feb 5, 2015 at 12:42 PM | Permalink

    I have not looked at the paper – Do we know what the peer reviewers said?

    • Posted Feb 5, 2015 at 1:09 PM | Permalink

      No. Peer review comments are not made public.

    • Bill
      Posted Feb 5, 2015 at 1:29 PM | Permalink

      Normally, comments by peer reviewers are confidential. Only the editors and the authors see them unless the authors choose to share them. But a peer reviewer can choose to make themselves known and share the comments I believe.

      And in some cases an editor (possibly with reviewers permission) has shared the comments without disclosing the name of the reviewer to show that there was due diligence.

    • David Young
      Posted Feb 5, 2015 at 10:18 PM | Permalink

      I have found the peer review process to be very uneven. Many senior people just skim the paper. If the paper is controversial, it is likely to get a more careful review. But in my experience, attempts to replicate the work are rare. I do very few reviews these days because I have higher standards than editors and get tired of seeing inferior or mediocre papers published.

      • Arthur Dent
        Posted Feb 6, 2015 at 11:11 AM | Permalink

        I share your concern. I used to act as a peer reviewer for a couple of high impact factor analytical journals and I stopped for the very same reason. It saddens me to see the decline in Peer Review standards especially as it coincides with the raising of Peer Review on to an unjustifiable pedestal.

      • stevefitzpatrick
        Posted Feb 7, 2015 at 10:49 AM | Permalink

        Hi David,
        If substantive peer review comments (not corrections of typos!) were published along with papers, then I suspect there would be more people willing to do solid reviews, and a lot fewer silly papers like M & F published.

  9. AJ
    Posted Feb 5, 2015 at 12:59 PM | Permalink

    To bad Nic’s analysis didn’t appear before the paper was actually published. An opportunity was missed to have the paper “gergised”.

    • Posted Feb 5, 2015 at 1:42 PM | Permalink

      There was no such opportunity in this case, I think. Nature maintains a tight embargo system and the paper was only published online on 28 January.

      • Posted Feb 5, 2015 at 7:06 PM | Permalink

        Now that it is published, will you be seeking to make a comment in Nature?

      • dfhunter
        Posted Feb 5, 2015 at 7:49 PM | Permalink

        Just to be clear Nic, you spotted the problems wrt this paper(flaws in its methodology) and then asked Roman & Gordon to independently review the stats only methodology problems ?

        only trying to figure out if a reviewer would ever be expected to go this deep into a paper & the problems you highlight are so glaring the reviewers should blush !!!

        ps – in engineering we now have about 6 votes before parts are good for manufacture, any no vote has to be countered or the “reject for changes as per xx comments” button is pushed 🙂

        • Posted Feb 7, 2015 at 7:00 AM | Permalink

          Yes, I found the paper’s results about 62-year trends very difficult to believe, and when I read it I spotted the circularity. It was easy for me to do so because I was familiar with the (very well known) Forster et al (2013) paper from which Marotzke and Forster got their model forcings, and I knew how model forcings had been derived there. Then I asked Roman and Gordon to review my arguments and other aspects of the paper from a statistical angle.

          I think that reviewers who were expert in this field should really have realised that the results in the paper were extremely surprising and, in view of that, delved deeper than they might normally be expected to do. But reviewing is a unpaid role with no kudos earned, so it is probably unrealistic to expect too much of it. The fact that a paper has been peer reviewed doesn’t count for much IMO. Papers that go against the ruling paradigm tend to get tougher peer review, so the poor ones are more likely to get weeded out.

        • k scott denison
          Posted Feb 7, 2015 at 8:53 AM | Permalink

          “The fact that a paper has been peer reviewed doesn’t count for much IMO.”

          +1

  10. AJ
    Posted Feb 5, 2015 at 1:04 PM | Permalink

    I didn’t know that Roman was at UNB. I seem to remember taking a stats course from a curly haired blond hippy looking dude back around 84. Maybe it was before his time there?

    • RomanM
      Posted Feb 5, 2015 at 2:10 PM | Permalink

      I arrived there in 1976 so it could very well have been me.

      Were you the guy who always sat in the back of the class and never paid attention to what I was saying?

      • AJ
        Posted Feb 5, 2015 at 2:17 PM | Permalink

        Bingo! How’d you guess? I’ll have to dig out my transcript to see if it was you and what my grade was.

        • Beta Blocker
          Posted Feb 6, 2015 at 6:37 PM | Permalink

          AJ, have you thought about comparing your transcript copies of your grades to your school’s current official grade records to be sure your alma mater hasn’t adjusted your scores either up or down in the years since you graduated?

        • AJ
          Posted Feb 6, 2015 at 7:22 PM | Permalink

          BB, according to Wikipedia, Canadian universities have experienced grade inflation comparable to those in the U.S. I don’t see any reason why my school wouldn’t be affected by the same influences either. Luckily, GPA’s influence on future career prospects has a fairly short half-life.

      • S. Geiger
        Posted Feb 5, 2015 at 2:43 PM | Permalink

        “Were you the guy who always sat in the back of the class and never paid attention to what I was saying?”

        – I think that was the Marotzke fella.

        • Streetcred
          Posted Feb 6, 2015 at 12:50 AM | Permalink

          Marotzke was the dude folding paper rockets, lighting the tails, and flighting them down the banked lecture theater. LOL we had a bloke that did this (back in ’75) and our maths lecturer abandoned the class for 2 weeks.

      • AJ
        Posted Feb 5, 2015 at 5:33 PM | Permalink

        Roman, the class I think I might have taken from you was “STAT3083 – Prob and Math Stat I” Fall 84. A few things stand out in my memory. When demonstrating the Birthday Problem, there was a match with the first person asked. The match was in the adjoining seat. I also remember some infinity arithmetic that blew my little undergrad mind away. Maybe a demonstration that 0.99999… = 1.0. I also remember the gal that sat next to me. I had a dirty liking for her which probably explains why I actually attended class that semester.

        • RomanM
          Posted Feb 6, 2015 at 7:29 AM | Permalink

          Yes, that would have been my course. When covering combinatorics, I always did the birthday problem in class by having people state consecutively their birth day and month. Others would then respond upon hearing their own date mentioned. It made the students more involved so I could occasionally sneak some math and stat in before they were aware I was doing so.

        • AJ
          Posted Feb 6, 2015 at 10:49 AM | Permalink

          I remember you being one of the better prof’s I had. You kept the subject matter interesting. Coming from someone with attention “difficulties”, that’s a complement. I got a good mark, so you must have given easy exams 🙂

          Cheers, AJ

  11. Steven Mosher
    Posted Feb 5, 2015 at 1:12 PM | Permalink

    http://berkeleyearth.org/graphics/model-performance-against-berkeley-earth-data-set#gcm-acceleration

    another piece

  12. rabbit
    Posted Feb 5, 2015 at 2:31 PM | Permalink

    even as co-author (a position that perhaps arose through him supplying model forcing data to Marotzke) and therefore not bearing primary responsibility for the paper’s shortcomings.

    I strongly disagree with this. All authors are equal, no matter what order they are listed as. They all take credit and blame for the contents of the paper in equal portions.

    We all know that in most papers with many authors, one or a few of them are the main drivers, and some of the authors might barely know what the paper is about. But upon publication they officially become equally responsible. There is no hierarchy.

    • Monty
      Posted Feb 5, 2015 at 3:45 PM | Permalink

      Well, in lots of high status journals like Nature the author contributions are listed and there is a hierarchy. One person may only be on a paper because they provided some technical info….doesn’t mean they should be equally responsible for any flaws.

      • Jeff Norman
        Posted Feb 5, 2015 at 8:39 PM | Permalink

        Or… the contribution of the “principal” author may simply be that they supervise the actual author.

  13. Monty
    Posted Feb 5, 2015 at 3:36 PM | Permalink

    Seems like open access journals (like CPD and other EGU journals) where peer review is open are the way forward and should stop all arguments like this before the paper is published.

  14. miker613
    Posted Feb 5, 2015 at 3:42 PM | Permalink

    This circularity seems like an easy thing for a (somewhat) careless peer reviewer to miss. He just has to not track down how the previous paper Forster 2013 calculated its values for the forcings.

    • Monty
      Posted Feb 5, 2015 at 3:49 PM | Permalink

      Maybe the reviewer was careless. But it’s not reasonable to expect reviewers to do a full audit of a paper….it takes too long and there are only so many hours in a day. It’s not reasonable IMO for a reviewer to dig out old papers and redo the calculations. All the reviewer has to focus on is whether: the paper is well written; appears sound; is replicable; whether conclusions follow on from the results etc.

      • Posted Feb 5, 2015 at 4:14 PM | Permalink

        That is true. But in this case both the abstract and conclusions contain a logical flaw that should have been a huge red flag, as pointed out upthread. How on earth can a models major emergent structural properties NOT influence its outputs? Circumstantial evidence of editorial bias and pal review.

        • Monty
          Posted Feb 5, 2015 at 4:20 PM | Permalink

          OK. Point taken. I haven’t read it myself yet!

        • Posted Feb 6, 2015 at 10:23 AM | Permalink

          Agreed, the authors and the reviewers should have woken up that something was wrong as soon as their regression coefficients showed the inputs to the models were not determining the outputs. In that case of course random variability would dominate, because it would mean that the models are basically generating random noise regardless of their inputs. Which may of course be true, but it would be devastating for climate science and it would mean that climate models are no better than dice or a coin toss at predicting climate. Maybe that is why they didn’t catch the error. It came as no surprise that the model’s inputs were not determining the outputs, so they didn’t see the error.

      • tty
        Posted Feb 5, 2015 at 4:28 PM | Permalink

        It seems rather difficult to determine whether a result is replicable without running through the calculations.
        In any case in the observational sciences replication is often not possible. You can hardly turn down a paper on e. g. Shoemaker-Levy’s collision with Jupiter on the grounds that the observations aren’t replicable.

      • TerryMN
        Posted Feb 5, 2015 at 4:38 PM | Permalink

        But it’s not reasonable to expect reviewers to do a full audit of a paper….it takes too long and there are only so many hours in a day. It’s not reasonable IMO for a reviewer to dig out old papers and redo the calculations.

        Perhaps, but if the paper came to a different conclusion, such as “the models are wrong” I’ll be they would have had lots of time, and plenty of hours in the day, to do a full audit of the paper. I only say that because it has played out so often before.

        • stevefitzpatrick
          Posted Feb 7, 2015 at 11:01 AM | Permalink

          Any paper that is contrary to a dominant paradigm will get a very close look, while fetid papers that support the paradigm will get their typos fixed during review. It is not just in climate science…. but climate science, like any field with serious real world policy implications, the effect is likely worse.

    • jorgekafkazar
      Posted Feb 5, 2015 at 7:22 PM | Permalink

      A superficial looking-over could easily miss the circularity. More than once, after hours of analysis of n equations with n unknowns, I’ve discovered that combining two of the relationships gave:

      Z = Z,

      a relationship that, while reassuring in a post-Normal way, was of about as much utility as Marotzke & Forster.

      • AJ
        Posted Feb 5, 2015 at 8:02 PM | Permalink

        Don’t all equations have this problem:
        E=mc2
        substituting E for mc2 gives:
        E=E

        In your case, maybe the expressions weren’t fully simplified?

        • Posted Feb 5, 2015 at 8:32 PM | Permalink

          AJ, Nope. Erroneous algebraic substitution. You can add, subtract, divide, multiply… Anything to both sides at the same time.
          But you cannot just substitute one side for the other. Operators have to work on both sides of the equation simultanteously. Al Hazan’s logic from long ago. (His name and writings eventually gave the english name to algebra. google)
          Follow established mathematical rules, and your post would produce
          MC^2= E . Not nearly as revolutionary as this bogus circular paper.

        • AJ
          Posted Feb 5, 2015 at 9:21 PM | Permalink

          I’ll give you a thumbs up on this Rud. I’ll confess I didn’t give it much thought… thanks

        • kim
          Posted Feb 6, 2015 at 3:23 AM | Permalink

          Energy equals mass times the speed of light squared. How can they possibly be the same? There aren’t even very many letters in common and they don’t sound the least bit equal.
          =======================

        • John Archer
          Posted Feb 7, 2015 at 5:01 PM | Permalink

          Rud,

          AJ, Nope. Erroneous algebraic substitution. … But you cannot just substitute one side for the other.

          Taking that as a general statement, I don’t agree. Quite the contrary.

          Take the simple example of a system of 3 linear equations in 3 unknowns (x, y and z, say). Forget any of that row-reduced echelon-form juggling and just do it as it comes: take the first equation in which z appears and jiggle it around to get z on the LHS. Then substitute the resulting RHS for z wherever z appears in the 2 other equations. You’re now down to 2 linear equations in 2 unknowns, and on you go.

          The point here is that you made a substitution — a wholly legitimate one.

          Indeed, anything on one side of an equals sign can always be substituted for any occurrence of the other side, wherever it appears — otherwise there’s pretty much no point in the notion of equality and no point in bothering to have such a thing as an equals sign. Of course, whether such a substitution is useful is an entirely different matter.

        • jorgekafkazar
          Posted Feb 9, 2015 at 10:25 PM | Permalink

          Obviously, I had N-1 equations, not N, as I thought.

      • Streetcred
        Posted Feb 6, 2015 at 12:54 AM | Permalink

        Jorge, it is well established that: ZZZ = Fail.

  15. rabbit
    Posted Feb 5, 2015 at 3:52 PM | Permalink

    Peer review opinions are exactly that: opinions. The ultimate decision lies with the editor, who can choose to publish despite a reviewer’s strong criticisms.

    I’m in that situation now. I’m reviewing a paper that I will recommend not be published, but the editor knows I have a jaundiced view concerning this piece of research (cause I warned him ahead of time) and might publish anyway.

  16. Craig Loehle
    Posted Feb 5, 2015 at 4:10 PM | Permalink

    As you mention, the runs of each model are not independent of each other. If you use the ensemble mean of runs of each model, you have 18 models (data points) and 4 Beta terms, which seems very iffy to me inference-wise.

    • Posted Feb 5, 2015 at 5:01 PM | Permalink

      Yes. It doesn’t necessarily help much using all the individual runs since the differences between each run and the run-ensemble mean for a model will not carry any extra information about the beta terms. And the regression will be weighted towards the models with multiple runs (some models have 10 runs, some only 1 run, some an in between number).

    • Craig Loehle
      Posted Feb 5, 2015 at 10:47 PM | Permalink

      If I do an experiment on different groups ability to hit a bullseye, but I get repeated trials from only 18 participants, I don’t have 75 data points. There are statistical tests to handle this.
      To clarify my point, if you do a regression with 4 parameters and 18 data points (18 ensemble means), the confidence intervals are going to be pretty wide, so hard to “prove” anything.

  17. Posted Feb 5, 2015 at 5:25 PM | Permalink

    You can get the paper here.

    • Matthew R Marler
      Posted Feb 5, 2015 at 6:08 PM | Permalink

      Nick Stokes, thank you for the link to the full paper.

    • AndyL
      Posted Feb 5, 2015 at 6:38 PM | Permalink

      That deadpan comment from Nick Stokes is possibly the most damning criticism of the M&F paper one could imagine.

      • pdtillman
        Posted Feb 6, 2015 at 4:08 PM | Permalink

        “That deadpan comment from Nick Stokes is possibly the most damning criticism of the M&F paper one could imagine.”

        Huh? All Nick did was link to a free copy. ???

        • k scott denison
          Posted Feb 6, 2015 at 7:59 PM | Permalink

          Um, that’s the point. If there was an argument to be made Nick S would have tried.

    • Don Monfort
      Posted Feb 6, 2015 at 2:03 PM | Permalink

      Looks like nicky racehorse is pleading nolo contendere on this one.

  18. Posted Feb 5, 2015 at 10:46 PM | Permalink

    Despite Nic’s courteous review, this looks a bit like supply/demand to me. I am sorry for the cookie-cutter remark but the technical sophistication of the regression with the surprisingly inexplicable result…

  19. hunter
    Posted Feb 6, 2015 at 12:23 AM | Permalink

    Circular in a spiraling swirling flushing sort of way.

    • kim
      Posted Feb 6, 2015 at 3:32 AM | Permalink

      Regression dilution by Charybditic Bay.
      ============

  20. Posted Feb 6, 2015 at 2:30 AM | Permalink

    Nice breakdown on the paper. I’ll have to go through this later when I have more time.

    All this paper demonstrates is that climate scientists should take some basic courses in statistics and Nature should get some competent referees.

    I did short article on the incompetent use of OLS in climatology ( and elsewhere ). Much of the misattribution and spurious “forcings” is due to a basic misunderstanding of how and when to use linear regression.

    On inappropriate use of least squares regression

    Nic touches on some of these issues here but biggest one is probably regression dilution.

  21. knr
    Posted Feb 6, 2015 at 6:59 AM | Permalink

    When we consider if something is ‘right or wrong ‘ we first need to define what we actually mean by ‘right ‘
    In science this should be straight forward has we have ideas such as empirical , peer review etc
    However in pratice its not , sometimes because we are dealing with theory’s where there is no clear ‘right’ answer.
    In this case we do have an opportunity to have a ‘right answer’ but we failed to achieve it because?

    Well because in this case the ‘right answer ‘ that little to do with the facts but much to do with the ‘impact’ this paper had with the AGW community and more importantly the ‘media’ A classic case of science by press release paper was ‘right’ in that for the authors it achieved what they wanted it to do , that its facts where ‘wrong ‘ makes no difference to that . And within climate ‘science’ we seen this time and again and often far fro being a probable for the authors coming up with the ‘right answer ‘ no matter the method has been rewarding .

    The massive expansion of climate ‘science’ as a area of study . thanks to a mixture of lots of money and its ‘progressive politics’ means there are a lot of people coming from studies into professions who have be taught how to be ‘right ‘ even when your wrong . So if anything paper such has this will be growing problem.

    • hunter
      Posted Feb 6, 2015 at 8:14 AM | Permalink

      knr,
      In the climate obsessed community the right answer is always, “we are right”.
      Facts, data, methodology are good as long as that answer is the conclusion.

  22. clays
    Posted Feb 6, 2015 at 8:30 AM | Permalink

    Nio,

    Nice post. Your analysis looks devastating to the conclusions in the paper. Have you contacted the authors and asked them to comment?

    • Posted Feb 7, 2015 at 7:13 AM | Permalink

      Yes, I liaised with Piers Forster and sent him a draft of my article for his (and Jochem Matotzke’s) comments nearly 24 hours before posting it. I have received no comments.

  23. juakola
    Posted Feb 6, 2015 at 10:45 AM | Permalink

    Check the comment #2 on this one, and the moderator response:
    http://www.skepticalscience.com/climate-climate-models-overestimate-warming-unfounded.html
    They got themselves some attitude!

    Steve: 🙂

    An SKS reader politely wrote:

    I see that there is some information on climateaudit that the statistical methods used are flawed. I do not have the background to double check that. I hope the authors can check it out and act appropriately…. quickly.

    SKS Moderator JH responded:

    [JH] Your comment appears to be a thinly-disguised attempt to castr a shadow on the information presented in the OP. If so, please cease and desist playing such a game on this website.

    • Sven
      Posted Feb 6, 2015 at 1:21 PM | Permalink

      There’s another strange comment by Tom Dayton ( no. 9) that is syrreal considering what he’s responding to. SkS is getting ridiculouser and ridiculouser… 🙂

    • joe
      Posted Feb 6, 2015 at 2:39 PM | Permalink

      The current theme from the SKS kids is that the models are underestimating the warming ie way too conservative in the modeling estimates.

      • hunter
        Posted Feb 7, 2015 at 8:38 AM | Permalink

        joe,
        And they call skeptics “deniers”. lol.

        • kim
          Posted Feb 7, 2015 at 8:18 PM | Permalink

          That thread trails into the plaintive.
          ==========

    • A. Scott
      Posted Feb 7, 2015 at 5:51 PM | Permalink

      The SkS kids are in full meltdown mode … including lighting up even ‘beleivers’ posts with incendiary Mod comments.

      A number of the usual suspects popping up with blind defense such as ‘its the extra special folks at the big timey Nature journal – they’re a stupid ‘ol blog’ … yet nary a single comment, let alone rebuttal, of Nic’s diligent and speedy work.

      Well done Nic …

    • TimTheToolMan
      Posted Feb 7, 2015 at 9:59 PM | Permalink

      SKS kids are at it again. The only thing missing to make it perfect was the word “independent” before review.

      Moderator Response:
      [JH] Your comment appears to be a thinly-disguised attempt to castr a shadow on the information presented in the OP. If so, please cease and desist playing such a game on this website.

      Upon further review, this comment is retracted.

    • TimTheToolMan
      Posted Feb 7, 2015 at 10:13 PM | Permalink

      Whether this paper stands or falls is of little consequence to AGW really. What really matters is that Nature and its peer review process let though a fatally flawed paper and that rightly puts the whole question of the sanctity of peer review and editor’s prerogative at a major journal into doubt.

      Heads should roll.

    • harkin
      Posted Feb 8, 2015 at 5:22 PM | Permalink

      Regarding the “thinly-disguised” comment by the SKS moderator; that statement has now been lined out, replaced by this:

      “Upon further review, this comment is retracted.”

      Maybe the jig is up….

      • harkin
        Posted Feb 8, 2015 at 5:22 PM | Permalink

        whoops didn’t realize repeat

  24. Paul_K
    Posted Feb 6, 2015 at 11:32 AM | Permalink

    Nic,

    Another excellent catch.

    Yet, in a certain sense, the findings of the authors are correct and inevitable i.e. that, based on the use of Forster’s abstracted forcings, the temperature gain in the models is not dependent on feedback and ocean heat uptake, provided that the values are also taken from the same source.

    There should be a giant bell ringing here for Forster, quite apart from the glaring problem with this paper. The circularity in this argument does not start with this paper. Since Gregory and Forster 2008, an entire edifice of Escherian stairwells have been built, founded on the same illusions. It starts with the unnecessary and demonstrably inapplicable use of a degenerative ocean model (the “kappa model”) to analyse GCM results. It continues with the demonstrably inapplicable assumption of an invariant feedback in the GCMs, an assumption absolutely rebuffed by the GCM data themselves. It continues with the simultaneous abstraction of Adjusted Forcing (AF) values and feedback values from the inapplicable model, having (only) the properties that (a) in combination they will track the late-time temperature behaviour to the given ECS of the GCM and (b) unknown forcings can be estimated as an approximately scaleable function of temperature. It is readily shown that the estimated feedbacks are unrelated to the “true” feedbacks apparent in the GCMs since the shorter-term feedbacks (upto several decades) are eliminated arithmetically by a mechanical reduction of the actual forcing; and the resulting AF values are then so disconnected from the emulated GCM’s reality that the forcings cannot be related to verification of that model’s RTE against LBL code, nor indeed to any independent estimate of forcing.

    To give an example, under this Escherian architecture, Hadgem2-ES ends up with an AF value of 2.9 W/m2 for a doubling of CO2, against an estimated stratospheric adjusted forcing of over 4.0; the derived AF forcing value in 2003 for the historical runs is then 0.8 W/m2 – less than half of the average of the AF values abstracted from the other models, but this is the value required to match the Hadgem2-ES temperature evolution.

    This amounts to taking a poorly qualified emulation model, plugging in physically meaningless values, and then scaling the historical forcing values to produce some sort of a match to the GCM results. Marotzke et al’s results should therefore not surprise us, but it was still a very nice catch.

    • Posted Feb 6, 2015 at 2:39 PM | Permalink

      Forster & Gregory 2006 addresses the regression dilution issue ( which leads to an exaggerated climate sensitivity ) in the appendix but avoids mentioning it the body of the paper and the conclusion:

      Click to access Forster_sensitivity.pdf

      They explain this as basically not wanting to distract attention from the main point of the paper by rocking the boat too much. It may now be long over due that this boat got rocked.

      On determination of tropical feedbacks

    • Posted Feb 6, 2015 at 2:45 PM | Permalink

      Paul, I recall a very useful exchange with you over at Lucia’s blog a year or two back. I would appreciate someone of your background criticising the above linked article of mine that Judith Curry has just posted.

      regards, Greg Goodman.

    • Posted Feb 9, 2015 at 5:38 AM | Permalink

      Paul, Thanks for your insightful comment. I think you are right that the findings in the paper are, at least in large part, inevitable.

      I agree that the kappa model is physically unsatisfactory, although it does appear reasonably to represent heat uptake behaviour in many AOGCMs over periods of up to several decades in idealised CO2 forced simulations.

      The assumption made in the paper of time-invariant feedback value α is indeed problematical for many, perhaps the majority, of AOGCMs. Moreover, as you say it leads to Adjusted Forcing values for ERF derived from the product of α and ΔT that may be a considerable way away from ERF estimates derived using more direct techniques, which are very probably more realistic. That may well be part of the reason why my equation (6), to which their simple physical model equations reduce, has no explanatory power.

      However, the main point I make in my article is that, even if the assumptions embodied in the equations (1) and (4) that the paper relies upon were valid, the results of the analysis carried out in the paper are invalid because of the circularity involved.

  25. Tom In Indy
    Posted Feb 6, 2015 at 11:54 AM | Permalink

    Correct me if I am mistaken, but I believe this system can be estimated using Full Information Maximum Likelihood (FIML), 3 Stage Least Squares, etc. These models will account for the endogeneity associated with dependent variables appearing on the right hand side of the regression equation, creating correlation between the errors and dependent variables.

    [1] dT = b0 + b1*dF + b2*alpha + b3*kappa + e
    [2] dF = a0*dT + dN

    In SAS I would use Proc Model (Syslin if the system is linear) with FIML estimation to get efficient parameter estimates and significance levels.

    Can anyone with the data try this and report back? It would be interesting to see the true results once the model is correctly specified.

    • Posted Feb 7, 2015 at 11:55 AM | Permalink

      For a system estimation to work you need enough exogenous variables to identify the endogenous ones. In the system as you’ve written it, dF is given by equation [2] by construction, so there is no additional information in the system to identify a0 through estimation. Substituting [2] in for dF in [1] is therefore equivalent to the 2-equation system. To estimate this as a system and identify a0 empirically you would need at least one other variable that explains some of the variation of dF but that is independent of dT.

    • Matthew R Marler
      Posted Feb 7, 2015 at 7:54 PM | Permalink

      Tom in Indy: [1] dT = b0 + b1*dF + b2*alpha + b3*kappa + e

      I agree on the use of FIML in Proc Model, but there is a difficulty with the IV and DV as written. As Nic Lewis noted, dF is calculated from dTm so the model is circular. As I wrote below, I think that Marotzky et al used a misleading notation, and dT is not actually the change in temperature, but a deviation of the particular slope from the mean of all slopes.

      It’s possible that my confusion is different from what I think it is. I am waiting to read what corrections I receive.

      • Posted Feb 9, 2015 at 4:43 PM | Permalink

        Matthew Marler
        In the paper’s notation, deltaT is the trend in model GMST over the period concerned, whereas the regression model involves the deltaT_primes, the deviations of the particular slope of each model run from the mean of all slopes. It is the explanation for inter-model variation in slopes that is being sought here.

        • Matthew R Marler
          Posted Feb 9, 2015 at 9:34 PM | Permalink

          Nic Lewis, thank you. I have been rereading, and both deltatT and deltaF are linear trends.

          Is it not peculiar that they compute the mean trend and then compute each trend deviation from the mean? Most regression packages do that automatically if you specify that you want an intercept in the model.

  26. Posted Feb 6, 2015 at 11:57 AM | Permalink

    I suggest that polite Letter to the Editor of Nature summarizing these objections should be sent.

    Whether or not it is accepted for publication is not the issue,
    rather it should be done as a matter of public record.

  27. Solomon Green
    Posted Feb 6, 2015 at 2:29 PM | Permalink

    Surely the lesson that should be learnt is that any scientist purporting to obtain information from data should have a reasonably competent knowledge of statistics. Those scientists and others attempting to produce models from the data should have a greater understanding of statistics; the more complex the model the greater the knowledge of statistics required.

    In my world if a statistician is not included in the list of authors then it is expected that any statistics in the paper have been reviewed by a competent statistician.

    Nic Lewis, who is far more than just a competent statistician, shows the way by having sought the advice of two other statistical experts before publishing his note.

    It is interesting to contemplate whether this site would even exist were all Climate Scientists to have a better understanding of statistics.

  28. milep
    Posted Feb 6, 2015 at 5:45 PM | Permalink

    Can I just clarify something. It seems that all the variables, both dependent and (allegedly) independent, come from model runs and there is no direct use of observational data. The dependent variable for any given time period is the linear trend produced by a particular model run over that time period and the independent variables are values “diagnosed” from the relevant model runs – i.e. they are estimates of climate sensitivity forcings etc estimated from model runs. Then the residual term is essentially that part of the model temperature trend that can’t be accounted for by the other RHS variables. Is this correct?

    • Posted Feb 8, 2015 at 9:41 AM | Permalink

      Correct. It is the intermodel differences that are being investigated, with the same regression coefficients being applied to each model.

      • george h
        Posted Feb 9, 2015 at 7:41 AM | Permalink

        Someone help me out here.

        With no direct use of or comparison with observational data, how can any of this be used as a public defense of model skill? Even if Marotzke & Forster gotten that stats right and shown that something other than climate sensitivity is responsible for the model / observational gap, we still have the gap. At best it is a defense of why the models don’t work, but it does nothing to rehabilitate them.

  29. Pat Frank
    Posted Feb 6, 2015 at 7:49 PM | Permalink

    It’s not that climate modelers do not know statistics. It’s that climate modelers do not know how to carry out a physical error analysis.

    Every experimental scientist (and engineer) needs to know how to do that, in order to judge the accuracy of a result.

    Climate modelers do not; they invariably equate model ensemble variance with accuracy. A more basic mistake is hard to imagine.

    They do not understand propagation of error, and do not understand that conformance with an observable is meaningless if the result is not unique.

    With all the model parameter uncertainties, no model expectation value is a unique result. I’ve yet to encounter a climate modeler who understands this basic concept of physical science.

    Results such as in the Marotzke paper are physically meaningless.

  30. Posted Feb 6, 2015 at 8:57 PM | Permalink

    Reblogged this on I Didn't Ask To Be a Blog.

  31. RoyFOMR
    Posted Feb 6, 2015 at 9:13 PM | Permalink

    Has Climate-Science scientific media turned into Climate-Science social media where facts and opinions depend on whom one’s friends are?
    I see in this site, for the most part, cold, hard analysis backed up by cold, hard numbers and logic pitted against those whose view of the world is as fanciful, as it is as cheerful, about the inevitability of impending doom brought about by our perceived excesses!
    Thank you CA for highlighting, once again, the absurdity of equating populist expectations and the scientific method in a way that suspends any notion of disbelief.

  32. C Wells
    Posted Feb 6, 2015 at 11:33 PM | Permalink

    I think you went further than necessary with this neo-scientific ‘study’ , Nic. When I read the press release it sounded to men like a schoolboy advocating that he’d got the correct answer even though his required proof work didn’t support it. It’s like prove ten, and with luck, choosing any of 9 ‘right’ answers doesn’t necessarily reflect the actual, i.e, 1+9, 2+8, 3+7, etc. to 9+1. In this case it’s 9 correct potentials with just two variables- how many variables in a typical climate ‘model’? But their all correct because….climate statistics. C

  33. Frank
    Posted Feb 7, 2015 at 1:32 AM | Permalink

    How much can ΔF (the radiative forcing from doubling CO2) vary from year to year in the real world? Models may disagree about the correct value for ΔF, but the absorption cross-section for CO2 itself doesn’t change. Clouds, water vapor and lapse rate have a small effect on the ΔF one calculates, but do their annual average values change as much as M&F postulate?

    One might also ask the same question about the climate feedback parameter (α), which tells us how much outgoing OLR plus reflected SWR increase with surface warming. Planck feedback is a constant. Therefore water vapor, cloud and lapse rate feedbacks may reach equilibrium on a yearly time scale. The average water molecule remains in the atmosphere for only about a week. Temperature anomalies show autocorrelation for months, but not years. .

    We also have observational evidence about the annual variation in ocean heat uptake efficiency (κ) from ARGO and the climate feedback parameter from CERES and ERBE. Re-analysis data could be used to determine how ΔF varies with time. So M&F’s

  34. Posted Feb 7, 2015 at 4:52 AM | Permalink

    Nic, I think you should send this to Nature as a ‘Communication arising’.

    http://www.nature.com/nature/authors/gta/commsarising.html

    “Critical comments on recent Nature papers may, after peer review, be published online as Brief Communications Arising, usually alongside a reply from the criticized Nature authors.”

    • Posted Feb 7, 2015 at 6:47 AM | Permalink

      Me too.

      • pdtillman
        Posted Feb 8, 2015 at 2:32 AM | Permalink

        Me three. Please.

    • Coldish
      Posted Feb 7, 2015 at 9:00 AM | Permalink

      To qualify as a ‘Brief communication arising’ Nature’s criteria include the folowing: “Manuscripts …. should not exceed 600 words (main text), with an additional 100 words for Methods, if applicable.”
      The length of Nic’s post is currently about 3000 words, including about 250 words as footnotes. Might be a challenging précis exercise…perhaps Ross could help.

  35. j ferguson
    Posted Feb 7, 2015 at 7:45 AM | Permalink

    Have other papers published by Nature been withdrawn? In what other ways has Nature handled the sort of thing we are looking at here?

    • stevefitzpatrick
      Posted Feb 7, 2015 at 12:21 PM | Permalink

      J ferguson,

      It does happen: http://www.iflscience.com/health-and-medicine/controversial-stem-cell-paper-set-be-withdrawn-nature

      But I suspect it is quite rare in a ‘high impact’ journal like Nature, and when it happens, usually involves obvious fraud, rather than obvious error. It is one thing for journal editors to have egg on their faces, but quite worse to have to publicly admit that egg exists. I very much doubt 1) that the paper will be withdrawn, 2) that Nature will allow publication of any letter/comment/paper which shows the circularity (and silliness) of the paper’s logic. The only chance I see for withdrawal is if Forster is sufficiently embarrassed by the paper to request that Nature remove his name as an author…. and I don’t see much chance of that happening either. Some people are more easily embarrassed by stupid errors than others. Some just don’t care, because they have ‘bigger fish to fry’.

    • henk
      Posted Feb 7, 2015 at 1:11 PM | Permalink

      Nature does retract. See for instance this link:
      http://www.nature.com/nature/journal/v505/n7485/full/nature12968.htmlIt was the groundbreaking discovery that stem cells could be induced from somatic cells by a short treatment with lactic acid. It created a complete frenzy and a firestorm in the medical/cell biological world. It appeared all due to a contaminated sample…

  36. Ron C.
    Posted Feb 7, 2015 at 1:24 PM | Permalink

    In December 2014, Willis posted GMT series generated by 42 CMIP5 models, along with HADCRUT4 series, all obtained from KNMI.

    CMIP5 Model Temperature Results in Excel

    We were able to analyze the temperature estimates of CMIP5 models and compare them with HADCRUT4 (1850 to 2014), as well as UAH (1979 to 2014). The models estimate global mean temperatures (GMT) backwards from 2005 to 1861 and forwards from 2006 to 2101.

    Bottom Line:
    In the real world, temperatures go up and down. This is also true of HADCRUT4.
    In the world of climate models, temperatures only go up. Some variation in rates of warming, but always warming, nonetheless.

    The best of the 42 models according to the tests I applied was Series 31. Here it is compared to HADCRUT4, showing decadal rates in degrees C periods defined by generally accepted change points.

    Periods HADCRUT4 SERIES 31 31 MINUS HADCRUT4
    1850-1878 0.035 0.036 0.001
    1878-1915 -0.052 -0.011 0.041
    1915-1944 0.143 0.099 -0.044
    1944-1976 -0.040 0.056 0.096
    1976-1998 0.194 0.098 -0.096
    1998-2013 0.053 0.125 0.072
    1850-2014 0.049 0.052 0.003

    In contrast with Series 31, the other 41 models typically match the historical warming rate of 0.05C by accelerating warming from 1976 onward and projecting it into the future.

    Over the entire time series, the average model has a warming trend of 1.26C per century. This compares to UAH global trend of 1.38C, measured by satellites since 1979.

    However, the average model over the same period as UAH shows a rate of +2.15C/cent. Moreover, for the 30 years from 2006 to 2035, the warming rate is projected at 2.28C. These estimates are in contrast to the 145 years of history in the models, where the trend shows as 0.41C per century.

    Clearly, the CMIP5 models are programmed for the future to warm more than 5 times the rate as the past.

    • Posted Feb 8, 2015 at 4:40 AM | Permalink

      Ron C, thanks for drawing attention to Willis’s post. His spreadsheet that it linked to giving the data didn’t identify which run came from which model. But Willis has very kindly just rechecked for me.

      The best series, 31, was the (single) run from the inmcm4 model, as I suspected might be the case. That is the CMIP5 model with the lowest climate sensitivity (ECS), and it has a TCR of 1.3 C, in line with good observational estimates. It comes out top at matching the BEST tempereature record as well – see their website.

  37. Brandon Shollenberger
    Posted Feb 7, 2015 at 4:18 PM | Permalink

    I happened to read some comments by the blogger Anders, and they were so funny I had to share them. According to him, this post is wrong. His explanation is… remarkable:

    What I think Nic Lewis is suggesting is that the earlier work that determined the external forcings used the temperatures to do so. Therefore using the external forcings to then determine the temperatures is circular. The problem is that this would only be true if the estimates of the external forcings were not a reasonable representation of the actual external forcings. If they are a fair representation of the actual external forcings, then there is no problem with then using them to determine the externally forced trend. It’s kind of how its defined. So, unless Nic Lewis can really show that the earlier work that produced the external forcings has a problem (i.e., that these estimates are not correctly representing the actual external forcings) then I don’t think his criticism is actually valid.

    A commenter responded by saying circular arguments are circular even if they happen to be right. Anders responded:

    But I didn’t say anything like this, did I? The point is simply that using model temperatures to determine the external forcings does not imply that the external forcings depend on temperature (they don’t).

    Apparently, Nic Lewis would know using temperature to estimate the effect of forcings then using those estimates to estimate the effects forcings have on temperature is okay if he had talked to climate scientists:

    miker613,
    I’ll have to have a look but a quick glance indicates that he’s had help/assistance from an economist and a retired mathematician. Can’t he discuss this with other climate scientists?

    I think Anders has a point. You have to talk to climate scientists. After all, who but climate scientists would accept arguments like these?

    • Hoi Polloi
      Posted Feb 8, 2015 at 6:30 AM | Permalink

      Rice will only accpt that he was wrong when Nature will pull the research and even then. I mean this fitted so nice in the argument that the models don’t overestimate, such a waste to throw that away….

    • David Young
      Posted Feb 8, 2015 at 12:19 PM | Permalink

      I think Anders is perhaps in over his head here. I think he just hasn’t had time to really look at it carefully. I suggested he come here and talk to Nic about it to get it resolved. I suspect that won’t happen.

      I do think Nic should publish his critique as a note. That would more likely result in the papers authors either defending their work or retracting it.

  38. Matthew R Marler
    Posted Feb 7, 2015 at 7:47 PM | Permalink

    I am confused by the notation used by Marotzky et al in equation 4. In the text, they seem to describe using the 15-year trend errors as the dependent variable, but in the equation preceding equation 4 the dependent variable is denoted by “delta T prime sub j”. In Equation 4 the dv is denoted “delta T hat sub (reg, j)”. The text in between says “The complete GMST trend is obtained by adding the ensemble mean trend to the regression for the across-ensemble variations:”

    It looks to me like Nic Lewis has written a lot about a poor notation. Being confused about the notation, I suggest this with more than my usual modesty.

    However, Marotzky has definitely used the wrong estimation/testing procedures for what are in fact several autocorrelated and possibly cross-correlated time series.

  39. frenchie
    Posted Feb 7, 2015 at 10:14 PM | Permalink

    Just a layman, no background in stats, so someone correct me if I’m wrong… Just trying to wrap my head around the basics.

    It seems to me that this critique is only relevant to the 2nd & 3rd sections (“Energy balance and multiple regression” & “Deterministic versus quasi-random spread”)… and leaves the first section (“Observed and simulated 15-year trends”) completely intact.

    Am I totally off, here? Because it’d mean that the models have, to a layman, been shown to be valid & bias-free; and only the paper’s attempt to explore why they fail on short runs, has been trashed.

    If I’m completely misunderstanding this, could somebody please explain it, in layman terms? Thx.

    • miker613
      Posted Feb 8, 2015 at 1:22 AM | Permalink

      One attempt, anyhow: I think that the paper misses the point, even if its statistics were right. https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-750648

    • Posted Feb 8, 2015 at 5:05 AM | Permalink

      Section 1 (“Observed and simulated 15-year trends”) relies on the assumption “that the simulated multimodel-ensemble spread accurately characterizes internal variability” and goes on to say “We now test the validity of this assumption by identifying deterministic and quasi-random causes of ensemble spread”. It is that testing which my article shows to be fatally flawed. In fact, had their parts 2 and 3 results been correct, they would have shown the CMIP5 models to be unphysical rather than valid. As it is, they prove nothing at all.

      Part 1 does not show the CMIP5 models to be bias free. It merely shows that over 1900-2012 (and therefore on average for 15 year sub-periods within 1900-2012), they roughly match the historic record, but with rather greater variability of 15 -year trends. As matching the 1900-2012 record can be achieved by many different combinations of model forcings, model climate sensitivity and model ocean heat uptake efficiency, and the temperature record was very largely known when the model versions were selected, that does not at all prove that the models are bias free.

      • TAG
        Posted Feb 8, 2015 at 11:30 AM | Permalink

        Nic lewis wrote:

        In fact, had their parts 2 and 3 results been correct, they would have shown the CMIP5 models to be unphysical rather than valid.

        I think that Nic Lewis’ statement above reflects an is an issue that concerned me with the M&F result. If their result was correct then the usefulness of comparing of model results to empirical measurements would be lost. Any result could be justified by an appeal to natural variability. This would be a major setback for research into the potentially critical issue of AGW. Am I correct in this interpretation?

  40. miker613
    Posted Feb 7, 2015 at 10:52 PM | Permalink

    I’m trying to check if I get the basic idea here. I brushed off my rusty R skills and tried a simulation:

    //x = 4a + 3b +5f + e, where e is error term

    > set.seed(1)
    > a b f e x plot(x)
    > lmm summary(lmm)

    Call:
    lm(formula = x ~ a + b + f)

    Residuals:
    Min 1Q Median 3Q Max
    -2.26987 -0.49973 -0.00857 0.50395 2.14144

    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) -0.032828 0.053684 -0.612 0.541
    a 4.008051 0.011139 359.818 <2e-16 ***
    b 3.003538 0.007383 406.831 <2e-16 ***
    f 5.003245 0.011183 447.403 <2e-16 ***

    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 0.7278 on 996 degrees of freedom
    Multiple R-squared: 0.9981, Adjusted R-squared: 0.9981
    F-statistic: 1.741e+05 on 3 and 996 DF, p-value: lmf summary(lmf)

    Call:
    lm(formula = f ~ x)

    Residuals:
    Min 1Q Median 3Q Max
    -5.6699 -1.0461 0.0670 0.9834 5.3395

    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) -0.727846 0.111691 -6.517 1.14e-10 ***
    x 0.081310 0.002956 27.511 < 2e-16 ***

    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 1.556 on 998 degrees of freedom
    Multiple R-squared: 0.4313, Adjusted R-squared: 0.4307
    F-statistic: 756.8 on 1 and 998 DF, p-value: f2 lmm2 summary(lmm2)

    Call:
    lm(formula = x ~ a + b + f2)

    Residuals:
    Min 1Q Median 3Q Max
    -3.249e-14 -1.780e-15 1.360e-16 1.913e-15 6.075e-14

    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 8.951e+00 2.622e-16 3.414e+16 <2e-16 ***
    a 6.899e-16 7.721e-17 8.935e+00 <2e-16 ***
    b 1.089e-15 5.322e-17 2.045e+01 <2e-16 ***
    f2 1.230e+01 1.448e-16 8.492e+16 <2e-16 ***

    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 3.844e-15 on 996 degrees of freedom
    Multiple R-squared: 1, Adjusted R-squared: 1
    F-statistic: 6.251e+33 on 3 and 996 DF, p-value: < 2.2e-16

    // note that f2 has totally swallowed any dependence on a and b

    • miker613
      Posted Feb 7, 2015 at 10:59 PM | Permalink

      Hmm – a lot of things didn’t show up properly. I hope it’s at all comprehensible.
      The main parts that didn’t show up:
      a=rnorm(1000,3,2)
      b=rnorm(1000,4,3)
      f=rnorm(1000,2,2)
      e=rnorm(1000,0,.7)
      x=4*a+3*b+5*f+e
      lmm=lm(x ~ a+b+f)

      // now lets try deriving f from x
      lmf=lm(f~x)
      summary(lmf)

      f2=fitted(lmf)

      // now let’s try regression again, this time using f2 instead of f
      lmm2=lm(x ~ a+b+f2)
      summary(lmm2)

  41. Posted Feb 8, 2015 at 1:23 PM | Permalink

    There is now apparently a response from M&F at climate lab book. See pingback.

    • Steve McIntyre
      Posted Feb 8, 2015 at 2:11 PM | Permalink

      I’ve done a quick read of the post at Climate Lab Book. I don’t get how their article is supposed to rebut Nic’s article. They do not appear to contest Nic’s equation linking F and N – an equation that I did not notice in the original article. Their only defence seems to be that the N series needs to be “corrected” but they do not face up to the statistical consequences of having T series on both sides.

      Based on my re-reading of the two articles, Nic’s equation (6) seems to me to be the only logical exit and Nic’s comments on the implications of (6) the only conclusions that have a chance of meaning anything. (But this is based on cursory reading only.)

      • Posted Feb 8, 2015 at 6:59 PM | Permalink

        Steve

        The equation linking F and N is from Forster et al (2013) (so it’s not Nic’s equation, and it’s hardly surprising that Jochem and Piers don’t “contest” it!)

        In their Climate Lab Book post, Jochem and Piers say:

        Because radiative forcing over the historical period cannot be directly diagnosed from the model simulations, it had to be reconstructed from the available top-of-atmosphere radiative imbalance in Forster et al. (2013) by applying a correction term that involves the change in surface temperature. This correction removed, rather than introduced, from the top-of-atmosphere imbalance the very contribution that would cause circularity.

        and

        Not correcting for the increased back radiation would, on physical grounds, imply using N, which contains the very contribution from the surface response T that we must eliminate in our estimate of F.

        Of course one could legitimately ask how accurate this correction is, and we would hope that in future generations of coordinated model simulations a better direct diagnostic of F is possible. But for the CMIP5 models used in our study and in Forster et al. (2013), applying equation (3) has been the only approach possible. Forster et al. (2013) performed a number of tests of their procedure and found it to be adequate to produce time series of radiative forcing.

        So they are aware that they rely on some assumptions, but have already checked these out in previous work.

        This can of course can be tested here in future work – if the CMIP6 models allow F to be obtained more directly, the M&F procedure can be re-done with that.

        • kim
          Posted Feb 8, 2015 at 8:51 PM | Permalink

          The halt leading the blind,but it’s the halt that’s blinding.
          =============

        • Steve McIntyre
          Posted Feb 8, 2015 at 10:15 PM | Permalink

          Richard, all of this is new to me so I’m commenting just in data analysis/statistical terms based on partial understanding. You say:

          This correction removed, rather than introduced, from the top-of-atmosphere imbalance the very contribution that would cause circularity.

          Maybe that’s the objective, but, viewed in statistical terms, adding a linear function of T to one of the right side components and then regressing T against the right side appears to be precisely the lunacy that Nic described. You just can’t do this. A couple of very competent statisticians have already weighed in on this and, if I’ve understood the setup correctly, Nic and they are right and you and Marotzke-Forster are wrong in terms of meeting the requirements of a regression.

          There was nothing in the reply at Lab Book that was responsive to Nic’s criticism. As I read it, they more or less just re-asserted that they were right. But to this third party reader with specialist statistical knowledge, they look completely out of their depth. Exactly the sort of ad hoc and home made statistical analysis undertaken for advocacy that has so marred Team paleoclimate.

        • Steve McIntyre
          Posted Feb 8, 2015 at 10:32 PM | Permalink

          A further gloss on this article. Once again, I think that the penultimate section provides the most direct analysis. In equation (6), the circularity is removed and a regression can be done and these (very negative) results are the only ones with any conceivable meaning. Marotzke and Forster (and Betts) made no attempt to address the findings in this paragraph, which are very clear.

        • Posted Feb 9, 2015 at 3:51 AM | Permalink

          “As I read it, they more or less just re-asserted that they were right.”

          That’s what I took away from the article. I kept reading for the part that says ‘…this is why our argument is not circular” and couldn’t find any.

          Inference-making is a chain of logic. When the mind locks on, it becomes hard to see other perspective. Maybe the authors should try being more explicit, if they think they are correct.

        • Frank
          Posted Feb 11, 2015 at 2:44 PM | Permalink

          Ron C, thanks for your kindly response. Of course also HadCRUT4 is a model, anyway.. with a fresh initialisation every month in contrast to CMIP5. And of course the mean is a mean of so many models…some with an ECS below mean and some with above mean. I also made a comparison only for 1975…2004 and the models ( Willis sheet…)with a very small difference of trendslopes to HadCRUT4 are very suspiciuos for matching some parameters (only) for this interval…the first candidate is aerosol forcing. It seems to me that this “tuning” makes the big difference during other periodes. As less a model is tuned with aerosols to a good performance during 1975…2004 as better it is in other periodes?

      • Posted Feb 9, 2015 at 4:48 AM | Permalink

        Steve, Thanks for your input, with which I fully agree.

        To recap, there are only two relevant CMIP5 model Historical simulation outputs available and used here, T and N. The simple physical model used in the paper’s analysis and to diagnose F in the first place reduce algebraically to ΔT = ΔN / κ which, with an error term added, is my equation (6). Whether or not that equation has any significant explanatory power here (it doesn’t), it does not enable any separate estimate of ΔF to be made that would enable the relationship between ΔT, ΔF and α to be investigated. Jochem Marotzke and Piers Forster do not appear to have realised this when they undertook their analysis, and neither have they addressed the issue now that I have pointed it out.

        A minor point, but although Jochem and Piers write of “correcting” ΔN for the increased back radiation (α ΔT), that correction term is larger than the ΔN term for most of the 62-year periods they analyse. It might be better to say that forcing is diagnosed from the increased back radiation resulting from the rise in surface temperature it causes, with a correction for changes in the rate of heat absorption by the not-yet-equilibrated climate system (the counterpart of, and equal to, the change in top-of-atmosphere radiative imbalance ΔN).

        • Paul_K
          Posted Feb 9, 2015 at 6:24 AM | Permalink

          Nic,
          I have been trying to work out what Profs Marotzke and Forster are trying to say, and am having great difficulty. I think that they are trying to make a non-trivial, but entirely erroneous, point in their response.

          According to Profs M&F, you are confusing two different entities which are both called temperature. (You aren’t but I think that this is the basis for their argument rejecting circularity.) The first entity is represented by the actual temperature (anomaly) observed in the given GCM. This temperature anomaly is made up of two components, the first of which, Tf, is the forced response including temperature-dependent feedbacks, and the second of which, Tnv, are surface temperature variations caused by “natural variability” in the GCM. By assumption in Forster’s energy balance model, the restorative flux responds linearly to surface temperature change; the model does not care what caused that temperature change. Hence, the restorative flux is represented as a simple linear function of both components of this observed temperature. Hence, in order to estimate the forcing from the net flux time series, it is necessary to adjust the net flux using the total actual temperature change observed in the GCM. So the derived adjusted forcing (AF) value is given by:-
          F(t) = N(t) + α*T(t)
          = N(t) + α * (Tf(t)+Tnv(t))
          Where F, N, Tf and Tnv all denote change in values from some initial theoretical steady-state, T(t) = Tf(t) + Tnv(t)) and the time series, N and T, come directly from the GCM results. (None of this will come as any surprise to you, but their response seems to imply that it should.)
          Now Profs M&F want to separate out the forced change in temperature plus associated feedbacks, from the “natural variability” change in temperature in the same GCM. The model they use to do this assumes that
          ΔTf = ΔF/(α + κ)
          I think that they are arguing that the ΔTf is not the same animal as T(t) above, since the natural variation component is now excluded. If we substitute the (exact) expression for the derived ΔF, we obtain:-
          ΔTf = [N(t) + α*T(t)]/(α + κ)
          Hence, since the temperatures on the LHS and RHS are different animals, they reject your argument of circularity. Bingo.
          In reality, the problem has not gone away at all, since the actual regression itself is not against ΔTf, but against a mean shifted T(t). I don’t think that they have thought this through.

          On a different point, there is substantial error in the emulation model in terms of its ability to match GCM temperature results, since it relies on (a) the assumption of infinite ocean and constant flux per degree temperature change (b) a linearly changing forcing with time and (c) in this instance a zero intercept in a plot of N vs T, which implies a zero surface layer capacity. None of these assumptions are perfectly met, and the “model error” is especially substantial over the shorter 15 year periods. All of this model error is dumped into the regression error term and ends up being dubbed as natural variation.

          On a third point, I do find the entire M&F paper ironically amusing. In summary it seems to be:- The AOGCMs have done a cr*p job of matching variation over 15 year periods, so why should you expect them to get the last 15 years right? It is a pity that their methodology is flawed. If it wasn’t I would love to see it applied to 31 year periods instead of the 62 years they adopted. The latter conveniently eliminates the quasi-60 year oscillations from the picture. I suspect that if the M&F logic were applied to 31 year periods, we would find that the models have also done a cr*p job at matching variation over this period. (smiley)

        • stevefitzpatrick
          Posted Feb 9, 2015 at 7:05 AM | Permalink

          Paul_K,
          Nice summary. I think you are correct about the impact of the 62 year period they use, since 62 years is close to the apparent period of ‘oscillation’ in the instrumental record. As to the motivation behind the paper, I think it is very clear that this but one of many recent papers that offer ‘explanations’ for the post 2000 divergence between modeled and measured response to warming. Some of the ‘explanations’ are plausible, some, like this one, risible. One might suggest that the blizzard of papers along these lines is an effort to… err… paper over the obvious divergence.

        • stevefitzpatrick
          Posted Feb 9, 2015 at 7:07 AM | Permalink

          Paul_K,
          sorry, that should have been “measured and modeled response to forging”.

        • stevefitzpatrick
          Posted Feb 9, 2015 at 7:09 AM | Permalink

          ‘forcing’, not ‘forging’….

        • Posted Feb 9, 2015 at 7:54 AM | Permalink

          Piers Forster comments at Climate Lab Book:

          Nic is right that deltaT does appear on both sides, we are not arguing about this we are arguing about the implications.

          We see the method as a necessary correction to N, to estimate the forcing, F. This is what we are looking for in the model spread, not the role of N – it would be more circular to use N as this contains a large component of surface T response.

          We know this method of diagnosing F work very well – e.g. see Fig 5 of Forster et al. 2013

          We only see it as a problem as it affects the partitioning of the spread between alpha and F. We find that this creates some ambiguity over the 62 year trends but not the 15 year trends.

          Uncertainty in the partitioning is different than creating a circular argument. We simply don’t do this.

        • Posted Feb 9, 2015 at 8:06 AM | Permalink

          Paul,

          Thanks. You may be right as to what M&F are arguing; I’m having some difficulty telling what exactly they mean. In any case, as you say (and I knew), it is the same T that they use in both equations, and the circularity does not go away no matter how much they protest that there is none involved.

          I agree that “model error” (of their simple physical model used to emulate the GCMs, particularly as linearised into their regression model) is a major component of the regression residuals here.

        • Arthur Dent
          Posted Feb 9, 2015 at 8:18 AM | Permalink

          But surely if the same variable dT appears on both sides of the expression then any regression analysis is automatically meaningless

        • Ron C.
          Posted Feb 9, 2015 at 8:53 AM | Permalink

          In my comment above, I mentioned an analysis of CMIP5 temperature series. In presenting the CMIP5 dataset, Willis raised a question about which of the 42 models could be the best one. I put the issue this way: Does one of the CMIP5 models reproduce the temperature history convincingly enough that its projections should be taken seriously?

          To reiterate, the models generate estimates of monthly global mean temperatures in degrees Kelvin backwards to 1861 and forwards to 2101, a period of 240 years. This comprises 145 years of history to 2005, and 95 years of projections from 2006 on-wards.

          I identified the models that produced an historical trend nearly 0.5K/century over the 145 year period, and those whose trend from 1861 to 2014 was in the same range. Then I looked to see which of the subset could match the UAH trend 1979 to 2014.

          Out of these comparisons I am impressed most by the model producing Series 31, which Willis confirms is output from the inmcm4 model.

          It shows warming 0.52K/century from 1861 to 2014, with a plateau from 2006 to 2014, and 0.91K/century from 1979-2014. It projects 1.0K/century from 2006 to 2035 and 1.35K/century from now to 2101.

          Note that this model closely matches HADCrut4 over 60 year periods, but shows variances over 30 year periods. That is, shorter periods of warming in HADCrut4 run less warm in the model, and shorter periods of cooling in HADCrut4 run flat or slightly warming in the model. Over 60 years the differences offset.

        • Layman Lurker
          Posted Feb 9, 2015 at 8:56 AM | Permalink

          IMO Piers Forster’s comment at CLB seems to be justifying the circularity by treating the model response to forcing as tautological. Just an extension of the same circular logic.

        • Frank
          Posted Feb 9, 2015 at 1:12 PM | Permalink

          Ron C, Nic: It’s interesting to look at the model mean of the “Willis sheet” and see the relative failure in relation to HadCRUT4. (Not)Surprisingly the trendfailure of the mean for 1975…2004 is only about 1% (!) and the failure of 1979…2013 (Sat-periode) is 37%, for 1998…2013 it’s 196%. This result could mean: The mean is matched to the periode 1975…2004 ( see Mauritsen http://onlinelibrary.wiley.com/doi/10.1029/2012MS000154/full ) and fails dramaticly during other periodes. If the M/F conlusions are correcht this would mean that during 1975…2004 there was NO internal internal variability in the climate system because the models are nearly 100% on the track. In all ohther intervals we saw a much greater internal variability?? This seems to me not very likely. Just another thought: Over at CL M/F claim that the methode for the splitting of dT from forcings and dT from internal variability is robust and that’s why there is no cirularity. This is the maipoint of the discussion and an essential basic of the paper. Anyway, they didn’t show a comprehensible justification in the paper. It should habe been an essential core and not a matter of dicussion at blogs AFTER the release of the paper at “Nature” with so much PR. So IMO the paper is very, very questionable.

        • Ron C.
          Posted Feb 10, 2015 at 11:26 AM | Permalink

          Frank, your analysis is interesting, and seems to support your conclusions (which fit the category “Suspicions Confirmed.”
          Please help me understand the mean failure rates. Do these cover all 42 series? Are you comparing slopes? How is mean failure defined and calculated?
          Thanks.

        • Frank
          Posted Feb 11, 2015 at 5:18 AM | Permalink

          Ron C., For the first approach I calculated 30y running trends from 1880 on for both: HadCRUT4 and the model mean ( from CE). The differences of the trendslopes over time is shown here: http://kauls.selfhost.bz:9001/uploads/trenddelta.png with the upper and lower 1 sigma. It’s very strange that the difference is near zero for the trends to 1995…2005 when a noise due to the internal variability is working… Look at the trends 1905-1915, the difference is greater 2*sigma.

        • Ron C.
          Posted Feb 11, 2015 at 9:23 AM | Permalink

          Frank, thanks for that. So the %s are extent of variance of the mean from the HADCrut4 slope for each period. This does provide a measure of how the set of models compare to HADCrut4 estimates. I hasten to add that, of course, GMT is a statistical construct, and not a physical reality that can be measured. Thus, HADCrut4 is also an estimate, albeit starting with surface thermometers rather than model parameters.

          Of course, the mean of the models includes many individual model variances both + and – which can offset, giving a misleading impression of accuracy. That is a major argument why the ensemble mean is a bad indicator, combining as it does many deviations. One good model is much better than averaging an ensemble with so many deficient models.

        • Greg Goodman
          Posted Feb 14, 2015 at 8:11 AM | Permalink

          Paul_K says;

          I think that they are arguing that the ΔTf is not the same animal as T(t) above, since the natural variation component is now excluded. If we substitute the (exact) expression for the derived ΔF, we obtain:-
          ΔTf = [N(t) + α*T(t)]/(α + κ)
          Hence, since the temperatures on the LHS and RHS are different animals, they reject your argument of circularity. Bingo.

          This seems to be an artificial distinction. How can one kind of T cause a difference in diffusion while another “kind” of T does not?!

          The factor α/(α + κ) is spurious.

          Temperatures do not wear a little yellow star to show their ethic origins.

  42. davideisenstadt
    Posted Feb 8, 2015 at 9:25 PM | Permalink

    I had an interesting colloquy with anders at ATTP… he is apparently incapable of understanding that regressing a variable on itself isn’t particularly enlightening…and is also unwilling to describe his own training and background in statistical analysis.

    snip

    • stevefitzpatrick
      Posted Feb 9, 2015 at 6:45 AM | Permalink

      David,
      His background and CV are here: http://www.roe.ac.uk/%7Ewkmr/
      He is an astronomer who works on the mechanics of planetary formation from accretion discs. Name: Ken Rice. After his undergrad work he was employed by the South African Environmental agency, and made several trips to Antarctica. He finished his PhD in astronomy 1998 IIRC. Short of getting a list of courses he has taken, there is no way to judge what specific statistical training he may have had, if any. My impression is that he does not understand (or perhaps doesn’t care) that regressing a variable against itself is … ahem…. ‘uninformative’.

      • Bill
        Posted Feb 9, 2015 at 7:31 AM | Permalink

        But is anders the same person as ATTP?

        • LaurieChilds
          Posted Feb 9, 2015 at 7:57 AM | Permalink

          Bill,

          Yes. They are one and the same. Anders is apparently a shortened version of andThenTheresPhysics (aTTP).

    • Posted Feb 9, 2015 at 7:49 AM | Permalink

      David,
      Or, maybe, I disagreed with your assertion that Marotzke & Forster are actually regressing a variable on itself. I guess, however, that accusing me of being dishonest and disingenuous is much simpler than considering that possibility.

      And if you were annoyed by how I responded to your comments, maybe don’t start with a demand that I answer your question and maybe don’t be quite so condescending yourself. Also, if I’m annoyed with someone, I still try not to go around calling them a liar, but maybe that’s just me.

      Steve: I agree that such accusations should be avoided. I’ve removed the language.

      • davideisenstadt
        Posted Feb 9, 2015 at 10:04 AM | Permalink

        snip

        I note that you still have not answered the question, now put to you for the fourth time:
        “what is your training and background in statistics?”

        Steve: this is now a foodfight and this is last bite.

        • Posted Feb 9, 2015 at 10:09 AM | Permalink

          David,
          Seriously, you think I’m interested in answering your question? Also, how did I mischaraterise your question?

          I’ll explain something to you. You demanded I answer a question on my blog. I don’t need to answer your question on my blog. I don’t even need to answer it here. Of course, Steve could insist that I do, but I still don’t have to. This is not a complicated concept. Additionally, my interest in engaging with someone who has called me a liar is normally limited to one snarky response (this one) and then ignoring because anyone who thought doing otherwise would be constructive is a fool.

          Steve: this is now a food fight where you seem merely petulant. My usual practice would be to snip both responses, but I’ve left one extra bite for both of you.

        • Don Monfort
          Posted Feb 9, 2015 at 10:26 AM | Permalink

          david, it seems the lack of a background in statistics is not a hindrance in the practice of climate science. It actually helps to be naive when it is often necessary to make up novel statistical approaches to get the right answer. The Nature reviewers and editors know this. They are smart.

        • Posted Feb 9, 2015 at 12:51 PM | Permalink

          Steve,
          Don’t not snip these on my behalf. I have no great interest in these discussions.

          Steve: I give a longer leash to critics and let this go on so you could have a last word. But my comments were intended to draw a line.

  43. Posted Feb 9, 2015 at 9:55 AM | Permalink

    Steve, thanks.

    Since I’m commenting here, I think that if you want to argue that this analysis is circular, you’re essentially suggesting that climate models do not conserve energy. Consider a climate model that is known to have a climate sensitivity of alpha. Consider that it starts in equilibrium and that you apply a change in forcing of dF. If the temperature response is dT, then the TOA flux has to be (by energy conservation)

    dN = dF – alpha dT.

    Unless these models don’t conserve energy, the above is true.

    However, dF is an external forcing and so does not depend on dT by definition. However, you can still rewrite the above as

    dF = alpha dT + dN

    Since dF does not depend on dT, the quantity alpha dT + dN also does not depend on dT. Any change in dT is compensated for by a corresponding change in dN (i.e., if the surface temperature goes up without a change in dF, then dN goes down, and vice versa).

    Therefore if you use the output from climate models (dT, dN, and alpha) to determine the forcing timeseries, dF, it is independent of dT as long as the model conserves energy. Of course, climate models are not perfect and don’t conserve energy exactly, but that doesn’t really change that dF is not explicitly dependent on dT.

    Therefore, I would argue that this analysis is not circular. Just because the temperatures are used to determine the external forcings does not mean that the external forcings depend on temperature.

    • Posted Feb 9, 2015 at 10:22 AM | Permalink

      It depends on what your definition of “depends” is….

      • Posted Feb 9, 2015 at 1:21 PM | Permalink

        Let me amend my comment about the definition of “depends” since it appears flippant against the seriousness of ATTP’s point. Whether models conserve energy well or poorly, modeled dT is not entirely independent of modeled dF since you are solving for one value by assuming the others. The results depend upon the initial assumptions in the models, and changing the assumptions changes the model’s (and the simplified equation’s) output. For example, changes in forcing (dF) may not be dependent upon changes in temperature (dT) to the same degree (pun intended) that changes in temperature are dependent on changes in forcing — but they are interconnected. Consider that cloud formation responds to temperature changes and clouds can produce both feedback and forcing.

        In any event, much of the debate is actually over climate sensitivity (alpha, in the above) and Nic Lewis’ original posting implicitly challenges the majority’s calculation(s) of sensitivity. It does so by undermining the Morotzke and Forster defense of modeled dT’s divergence from recently observed dT. Morotzke and Forster’s paper suggested, essentially, that the assumptions used to produce model results are sufficiently accurate to reproduce the recent pause in the global temperature trend — after accounting for a few more assumptions about internal variability. Nic Lewis has presented a serious challenge to the statistical methods employed by Morotzke and Forster and most of us are still trying to work our way through the arguments. Intelligent comments, therefore, are greatly appreciated from all sides in the debate.

        • Posted Feb 9, 2015 at 1:41 PM | Permalink

          opluso,
          My laptop has died, so am using a tablet and am not that used to this. Excuses out of the way. I think there is some confusion. Forcings are, by definition, external. Things like water vapour, clouds, albedo, are feedbacks. They’re all included in the alpha term in front of dT. Therefore if energy is conserved, the term dN + alpha dT gives the external forcing and is, by definition, independent of dT. The forcing are driving the temperature changes, not the other way around.

        • Steve McIntyre
          Posted Feb 9, 2015 at 3:17 PM | Permalink

          There obviously seems to be a difference between how a physicist and a statistician approach statistical analysis. It seems to me that the physicists are to some extent hypothesizing a can-opener.

          But watch what happens (as I understand it and I haven’t parsed it) if you start from the data: in this case, what you have are the series N and T. ATTPhysics says: dN depends on dT while dF does not. Well, if dN depends on dT, that’s precisely the sort of thing that you want in a statistical relationship. Rather than being ill-suited to regression, isn’t it ideally suited to regression? ATTP’s comment seems to misunderstand the entire purpose of statistics.

          From a physics point of view, you may want to add alpha*T to N get F, but from a data/statistics point of view, the two series: T and N+alpha*T, are going to be related by construction. Even if there is a real relationship somewhere, you won’t be able to disentangle it from the tautological relationship created by construction.

          Again, the statement: “dN depends on dT while dF does not” really seems to show how you’ve grabbed the wrong end of the stick so to speak.

        • Posted Feb 9, 2015 at 3:36 PM | Permalink

          Steve,
          I’m not actually talking about statistical analysis, though. Let’s do this in two steps. First you have Forster et al. (2013) who use the dT and dN values from the climate models to determine the external forcing. They use that conservation of energy means that dF = dN + alpha dT and that even though there is a dT on the right-hand side, dF does not depend on dT.

          Now we have Marotzke & Forster who take the dF time series and use them to determine the forced trend and then add a residual (epsilon) to estimate internal variability. Since dF dies not depend on dT there is no actual circularity.

          I’m not quite sure why you think I’ve got the wrong end of the stick. If you want to determine the forced trend, you need to use the external forcings. You can’t do it using dN, for example, because you can’t get the forced trend from dN. You can’t really criticise Marotzke & Forster for not doing what you think they should have done. You can only really criticism them for not doing what they said they’d done, properly. You can’t do their analysis using dN.

          oplus,
          I should check this, but I think the cloud radiative effect is bcause of anthropogenic aerosols seeding clouds, and so is a forcing. It’s not the same as the cloud feedback response.

          Steve: there’s a difference between the ideal concepts and what you can measure. Once you use dT to contruct F, you end up with a tautological property because of the math. Andthentheresmath, so to speak. At the end of the day, linear regression is just some matrix algebra: if you do all the matrix algebra, you should be able to see the tautology that Nic observed. He’s right.

    • stevefitzpatrick
      Posted Feb 9, 2015 at 11:27 AM | Permalink

      Anders/ATTP/Ken,

      But dF was in fact calculated directly from dT in Forster et al (2013), as Nick Stokes and others have pointed out in criticizing the circularity of a post at WUWT by Willis E, based on the self-same temperature-calculated forcing from Forster et al (2013). dF is calculated from dT by Forster et al, and the equations:

      dF = alpha dT + dN

      combined with:

      dT = dF / (α + κ) + ε, or

      dF = (dT – ε) * (α + κ)

      Makes the circularity of regressing one function of dT against the Forster et al (2013) calculated forcing…. which is really just another function of dT…. explicit. Surely you can see that the temperatures used are the same.

      There is nothing that is going to remove the circularity except an independent forcing history for each model which is not calculated from dT.

      • Posted Feb 9, 2015 at 1:00 PM | Permalink

        FWIW, I would normally try to respect someone’s pseudonymity on my blog. However, I guess this is Climateball(TM) and so there aren’t any rules and the only losing move is to not play.

        Stevef,
        I don’t know what Willis did or why Nick Stokes criticised it. That may or may not be relevant for this discussion. However, the thing that you seem to be ignoring is energy conservation that adds an extra constraint. So, yes, dT is used to determine dF, but so is dN. If climate models conserve energy (as they will to within the accuracy of the method) then the following quantity

        dN + alpha dT

        is independent of dT and depends only on the change in external forcing dF. As Ed Hawkins, Piers Forster and others are pointing out on Ed Hawkins’ post, by combining dT and dN in this way you can determine dF in a manner that does not make dF depend on dT.

        • Posted Feb 9, 2015 at 4:05 PM | Permalink

          Denote X = dN + alpha dT. The analysis requires X to be dependent of dT. If it isn’t then the regression coefficients are biased and inconsistent. So we are asked to assume that it is, namely deriv(X) / deriv(dT) = alpha = zero. Or, if we treat alpha as a function of dT, then we need dT * deriv(alpha) / deriv(dT) + alpha = 0.
          Either expression requires alpha to be independent of dT, so one of the main empirical “results” is an assumption required for the empirical method to work. This is methodologically invalid.

          There is a prima facie case that X is likely dependent on dT. It is not sufficient for M&F simply to assert that it ain’t. The empirical issue could be settled using a Hausman endogeneity test, though it would require collecting additional data to serve as valid instruments for dF.

        • Posted Feb 10, 2015 at 4:26 AM | Permalink

          Ross,
          Denote X = dN + alpha dT. The analysis requires X to be dependent of dT.
          Yes, because of energy conservation. If you apply a change of forcing, dF, to a climate model with a climate sensitivity of alpha, then if the temperature response is dT, the TOA imbalance has to satisfy (because of energy conservation),

          dN = dF – alpha dT,

          where dF, above, is your X. Therefore, if the above is true,

          dF = dN + alpha dT,

          and since dF does not, by definition, depend on dT, neither does dN + alpha dT.

        • Posted Feb 10, 2015 at 1:21 PM | Permalink

          I suspect that it would be difficult to find a physics textbook that shows that conservation of energy implies anything at all about dN and dT, since these are fitted trends in statistical constructions based on ad hoc averages rather than basic physical variables. alpha isn’t a basic physical variable either.

          However, if what you say is true, then a Hausman test should rule out endogeneity bias.

        • scf
          Posted Feb 10, 2015 at 9:25 PM | Permalink

          It is absurd to assert “dN + alpha dT” is independent of dT.
          The presence of dT in the former expression means otherwise, unless you also assert alpha is 0.
          If you do not assert alpha is 0, then it is clear that the former expression changes as dt changes, which by definition of the word “independent” means the two quantities are not independent. The equation means they are not independent, by definition.

          It is fascinating to see what people are willing to argue.

      • Don Monfort
        Posted Feb 9, 2015 at 1:03 PM | Permalink

        Steve, what Willis post was that?

        • stevefitzpatrick
          Posted Feb 9, 2015 at 4:41 PM | Permalink

          Don,
          A post where Willis showed that the model temperature histories were nothing more than the lagged forcing histories from Forster et al (2013). (Model climate sensitivities calculated directly from model results, was the title, I think.) Like M&F, his analysis was circular, because the forcing came from the temperature, though he probably did bot appreciate where the forcing came from.

        • Don Monfort
          Posted Feb 9, 2015 at 5:10 PM | Permalink

          I see what you mean, SteveF. It didn’t take nicky long to jump on Willis for circularity:

          Mechanical Models

        • Not Sure
          Posted Feb 9, 2015 at 7:03 PM | Permalink

          “In fact, the close association with the “canonical equation” is not surprising. F et al say:

          ‘The FT06 method makes use of a global linearized energy budget approach where the top of atmosphere (TOA) change in energy imbalance (N) is split between a climate forcing component (F) and a component associated with climate feedbacks that is proportional to globally averaged surface temperature change (ΔT), such that:
          N = F – α ΔT (1)
          where α is the climate feedback parameter in units of W m-2 K-1 and is the reciprocal of the climate sensitivity parameter.’

          IOW, they have used that equation to derive the adjusted forcings. It’s not surprising that if you use the thus calculated AFs to back derive the temperatures, you’ll get a good correspondence.”

          The author of that? Nick Stokes. The irony, it is so rich!

        • Steve McIntyre
          Posted Feb 9, 2015 at 8:35 PM | Permalink

          In the present context, Willis Eschenbach’s post http://wattsupwiththat.com/2013/12/01/mechanical-models/ is well worth re-reading. I haven’t parsed it, but at a quick read, Willis seems to have tried to go down the same road as Marotzke.

        • stevefitzpatrick
          Posted Feb 9, 2015 at 9:10 PM | Permalink

          Steve McIntyre,

          Nick was perfectly justified in pointing out the circularity of Willis’ calculations. I do wish he would bring his considerable analytical talents to bear on the M&F paper, since, while it is more complicated than what Willis did, it suffers from exactly the same circular reasoning…. and having been published in ‘Nature’, can lead to a lot more confusion and incorrect understanding about climate models than what Willis posted on WUWT. What is good for the Willis ought to be good for the Jochem you know. I figured Nick would find such an obvious and glaring error offensive to his scientific sensibilities, but so far he has been pretty quiet. Count me surprised.

        • Don Monfort
          Posted Feb 9, 2015 at 9:38 PM | Permalink

          Nicky’s comment on M&E was quoted by Not Sure, above. Oh wait, it’s his comment on Willis’s similar circularity. Of course, nicky could deny it applies to M&E. Will nicky talk. Or will he take the fifth?

        • Posted Feb 9, 2015 at 9:48 PM | Permalink

          SteveF,
          “while it is more complicated than what Willis did, it suffers from exactly the same circular reasoning”
          It is more complicated, and I haven’t had a lot of time for it lately. And if Nic Lewis and Piers Forster are in disagreement, it needs thinking about. It’s not exactly the same.

          Willis’s was simple. he just said – look, all models are doing is taking in ΔF and producing a linear ΔT. Silly models. But, as I well know, the models aren’t working on ΔF as input. Forster explicitly back-computes it by pretty much that same formula.

        • stevefitzpatrick
          Posted Feb 9, 2015 at 10:40 PM | Permalink

          OK Nick, maybe not exactly the same reasoning, but IMO, terribly close to the same reasoning. Forster et al does indeed ‘back-calculate’ the individual model forcings from the change in temperature, so it is difficult to see how those calculated forcings ever be independent of temperature as some are now insisting. I mean, you can just do the agebra and see that delts-T ends up on both sides, and as Nic Lewis points out, the alpha*delta-T term dominates TOA imbalance term.

          I do hope you find some time to think about it.

        • Don Monfort
          Posted Feb 9, 2015 at 10:46 PM | Permalink

          Thank you, Dr. Stokes. We get it.

        • Don Monfort
          Posted Feb 9, 2015 at 11:21 PM | Permalink

          It only took Dr. Stokes 3 hours after the first comment on the post to make a monkey out of Willis. He has to think about this M&F pretty much the same thing.

          Here is another interesting Dr. Stokes comment on that post. He agrees with Nic, who had also spotted Willis’s error:

          “Nick Stokes
          December 2, 2013 at 7:24 am

          Joe Born says: December 2, 2013 at 6:19 am
          “Whether the forcings values you use are the models’ actual stimuli or represent the forcings they respectively infer from the stimuli they do use, I find it telling that, after all their machinations, their results differ from respective simple one-pole linear models by much less than they differ from each other.”

          I think you’ve missed the point of my earlier comment, and of Nic Lewis. Forster et al took the temperature outputs of the models and calculated adjusted forcings ΔF (they call it F) using the formula
          N = ΔF – α ΔT (1)
          Here N, the TOA imbalance, has to be small by cons eng, and some models constrain it to be zero.

          This post substitutes those ΔF into a regression and finds that, presto
          ΔF – λ ΔT=0.
          But of course, they have to. It has nothing to do with what the models actually do. It’s just repeating the arithmetic of Forster et al by which ΔF was derived.”

        • Hoi Polloi
          Posted Feb 9, 2015 at 11:44 PM | Permalink

          Dr.Stokes is not programmed to critisize alarmists.

      • Kenneth Fritsch
        Posted Feb 9, 2015 at 4:11 PM | Permalink

        SteveF, you lumped me, I think, in with a couple of defenders of the paper being critiqued here, and like lumping the individual climate models, I do not think that is a good idea. (I don’t do emoticons).

        I do hope the circular reasoning criticism does not take away from some other aspects of these paper that I think bear further discussion. Putting all the model/model runs in one population to do regression appears an artificial construct for me and one that would not necessarily stand up statistically if the authors had looked at individual model outputs. The kinds of noise and noise levels are different for the individual models. Further conflating stochastic noise with differences in deterministic output of the individual models does not seem correct in the eyes of this layperson. If the authors are using overlapping trends there must be some auto correlation issues that need to be addressed in the analysis.

        • stevefitzpatrick
          Posted Feb 9, 2015 at 5:01 PM | Permalink

          Kenneth,
          No, I referenced the real first name of ATTP/Anders/Ken Rice…. nothing I wrote was directed toward you. Now if you would put together a guest post or two on your work, that could change… 😉

    • Posted Feb 9, 2015 at 3:22 PM | Permalink

      To ATTP:

      There is more than one point of confusion, at least on my part. You originally stated:

      However, dF is an external forcing and so does not depend on dT by definition.

      In my longer reply, I was thinking of the fact that the IPCC refers to “cloud radiative forcing” effects (which precede any feedback). Perhaps this effect (which ultimately influences alpha estimates) is officially subsumed solely under the feedback mechanisms inherent in warming-induced changes in the clouds themselves. So if clouds are exclusively and always “feedback” I stand corrected in my chosen example. Otherwise, “clouds” are on both sides of the equation dF = alpha dT + dN.

      Although my initial comment about the definition of “depends” was somewhat contingent upon cloud forcing/feedback assumptions I also was uncertain whether you were talking about the “dependent” and “independent” variables in an equation. Typically, one thing has to be the dependent variable you are testing for. You originally stated:

      Since dF does not depend on dT, the quantity alpha dT + dN also does not depend on dT. Any change in dT is compensated for by a corresponding change in dN (i.e., if the surface temperature goes up without a change in dF, then dN goes down, and vice versa).

      Yet if dF “does not depend” on dT, why is dT in the equation in the first place?

      Even if you accept observed measurements for dT and dN, you still have to use a model generated climate sensitivity (alpha) to produce a result for dF — since neither is directly observed. Thus, plugging values into the discussed equation requires a bit of bootstrapping because there are multiple uncertainties hidden in the equation’s symbols. The underlying assumptions seem to generate most of the confusion. In other words, it all depends…

      • stevefitzpatrick
        Posted Feb 9, 2015 at 5:43 PM | Permalink

        I get the feeling I am trying to converse in Portugues with someone who knowns no Portuguese.

        ATTP keeps saying that the forcing is independent of the model temperature. For the true forcing, that is cirrect. But the forcing from Forster et al is NOT the real forcing, it is a value for forcing calculated from the temperature rise in the model, after taking into accout the model diagnosed TOA imbalance. There IS NO explicit data for forcing… it is 100% inferred from the temperature and TOA imbalance. It is not possible to remove the circularity with arguments about energy conservation; it is implicit in the Forster et al calculation. This is a very strange thread.

        • davideisenstadt
          Posted Feb 9, 2015 at 6:00 PM | Permalink

          SteveF
          Thanks for articulating that which I was unable to do.
          ATTP refers to the actual forcing function; staying that it is independent of T, but thats not what M&F used…they calculated that function using the very variable they then used as a dependent variable.

          Its like walking into the argument clinic.

        • Don Monfort
          Posted Feb 9, 2015 at 6:11 PM | Permalink

          Can’t they just pretend that it’s the real forcing? M&F must be saved, somehow.

        • stevefitzpatrick
          Posted Feb 9, 2015 at 8:15 PM | Permalink

          Don,

          They already are pretending it is the real forcing…. it’s not, it’s calculated from the temperature change.

        • davideisenstadt
          Posted Feb 9, 2015 at 8:21 PM | Permalink

          stevef:
          thanks so much…i felt like I was getting gas lit, so to speak.

        • Posted Feb 10, 2015 at 4:29 AM | Permalink

          Stevef,
          But the forcing from Forster et al is NOT the real forcing, it is a value for forcing calculated from the temperature rise in the model, after taking into accout the model diagnosed TOA imbalance. There IS NO explicit data for forcing… it is 100% inferred from the temperature and TOA imbalance.
          Well, yes, but if the models conserve energy, then

          dF = dN + alpha dT,

          and, because dF is independent of dT, so is dN + alpha dT.

          Hence, the point I made above, that the argument being made here is essentially that climate models do not conserve energy. Of course, they don’t conserve them exactly, but they do to within the accuracy of the method.

        • stevefitzpatrick
          Posted Feb 10, 2015 at 7:54 PM | Permalink

          ATTP,
          If you are calculating dF from the equation:
          dF = dN + alpha dT
          using dN, dT, and alpha from the models, then
          how on Earth is the calculated value of dF independent of dT? Forster et al use that equation to calculate dF; there is no possibility the value of dF calculated from dT is independent of dT. Like I said, I feel like we are speaking different languages.

    • DocMartyn
      Posted Feb 9, 2015 at 6:16 PM | Permalink

      Just so I know my apples and oranges;

      dN = dF – alpha dT.

      Now dT is in K, dF is in watts.

      What are the units of alpha and N?

      • Posted Feb 9, 2015 at 7:26 PM | Permalink

        Doc,
        F is expressed in units of Wm-2, as is N. α has units of Wm-2/K, and varies (model to model) over a range of 0.64 to 1.79 Wm-2/K per Table 1 of reference (v) of the original post.

    • Spence_UK
      Posted Feb 9, 2015 at 6:22 PM | Permalink

      ATTP is not only wrong on the statistics, he’s wrong on the physics as well.

      The idea that conservation of energy in a GCM is narrowly closed on GMST and TOA imbalance is quite wrong. There are plenty of energy transfers in GCMs, the obvious ones being the rest of the atmosphere and the ocean, but many more subtle ones as well, and the conservation of energy closes around all of them, not narrowly GMST and TOA radiation. Unfortunately when ATTP is out of his depth, you can pretty much guarantee an appeal to energy conservation will be the argument of last resort. One day, ATTP will perhaps realise that while energy conservation is a key constraint to close the system, it is a weak constraint in terms of the defining the model dynamics. Until then, Zzzz.

      • Posted Feb 10, 2015 at 4:27 AM | Permalink

        Spence,
        The amount of energy in a box depends only on the fluxes through the surface, not on the movement of energy inside the box.

        • Spence_UK
          Posted Feb 10, 2015 at 4:39 AM | Permalink

          ATTP, the box does not just consist of GMST in a GCM. This much should be obvious. The deltaT you refer to is not “the whole box”. In fact it isn’t even the largest part of the box. It’s a tiny bit of the corner of the box. And we’re not directly measuring the fluxes in and out of that corner of the box.

          Remember, the values populating deltaT and deltaN here are not from a simple one-box or two-box models, they are GCM outputs. Those values are then used to feed a simplistic model, but the input values are not constrained in the way you think they are.

          That is empirically obvious from what Nic Lewis has already pointed out – that variations in T are larger than variations in N. As a result, the difference between the two must be dominated by variations in T, which means in turn the regression is necessarily broken.

          Your physics and your statistics are both wrong here. I note at your blog Pekka is politely steering you away from this red herring conservation of energy argument. You would do well to heed his advice.

    • fizzymagic
      Posted Feb 9, 2015 at 8:56 PM | Permalink

      There obviously seems to be a difference between how a physicist and a statistician approach statistical analysis.

      There shouldn’t be. If ATTP is actually a physicist, I am embarrassed for physicists as a whole, because his argument is completely wrong. Or, as Pauli would have said, it’s not even wrong.

      What ATTP does not seem to understand, and what is absolutely critical to any statistical analysis, is that for statistical analysis the important thing is not whether the true physical values are independent or not, but rather how the estimates for those true physical values are obtained.

      In this case, the estimate of the physical value for dF was obtained by using dT. Doesn’t matter if the true values of dT and dF are independent or not. The very fact that dT was used to estimate dF means that, for statistical analyses, the two are NOT independent.

      ATTP’s argument shows a shocking lack of statistical understanding. As I said before, I am embarrassed for physicists everywhere. I assure you that most competent experimental physicists would not make such a basic mistake.

      • fizzymagic
        Posted Feb 9, 2015 at 9:29 PM | Permalink

        BY the way, as a side note, multiple regression and PCA are really quite elementary statistical methods. They were invented when computational power was very limited and they can be quite useful as long as one understands them well.

        A valid multiple regression has three main requirements: first, that relationships between variables are linear, second, that the errors on the measurements are Gaussian, and, finally, that the values of different measurements are statistically independent.

        As Nic describes the paper, it is a trifecta of bad: relationships have no reason to be linear, the errors are nowhere near Gaussian, and multiple runs from the same models were treated as independent.

        It’s going to be a black mark on the resumes of everyone involved.

        • Posted Feb 11, 2015 at 3:30 PM | Permalink

          You don’t need Gaussian errors for ordinary least squares to be the best linear unbiased estimator of the coefficients. And there exists a host of methods for correctly performing non-linear regressions to deal with limited or censored or discrete dependent variables. For some reason, the very extensive development of regression theory and practice by econometricians is not widely appreciated in other disciplines.

        • fizzymagic
          Posted Feb 11, 2015 at 6:56 PM | Permalink

          You don’t need Gaussian errors for ordinary least squares to be the best linear unbiased estimator of the coefficients.

          There are two problems here: first, justifying the use of least squares for non-normal errors requires detailed knowledge of the actual error distribution, and second, any uncertainty estimates on the parameters will be useless.

          For many (I would say most) error distributions, least squares gives a biased estimate. The estimate is unbiased if and only if the distribution is symmetric about the mean. Proof is left as a (trivial) exercise for the reader.

          Parameter uncertainties from least-squares regressions arise as a result of the application of the Maximum Likelihood Ratio theorem to the Gaussian distribution. If the underlying distribution is not Gaussian, then least squares cannot be used to estimate parameter uncertainties.

          If you’re going to use multiple regression on non-Gaussian errors, then why not just do Markov-chain Monte Carlo and just get the right answer directly?

        • Posted Feb 11, 2015 at 10:24 PM | Permalink

          I’m just stating the Gauss-Markov theorem, Day 1 in econometrics. Yes, you need the errors to be uncorrelated and homoskedastic with mean zero, but not necessarily Gaussian. OLS would still be BLUE. (And you need the independent variables to be measured correctly and uncorrelated with the error term. And of course the model can’t be functionally misspecified or omit variables.)

          Gaussian errors in addition also make OLS maximum likelihood, which is nice, but not necessary to be BLUE. For non-Gaussian errors you have to break it down into the case where you know the error distribution versus where you don’t. For the former, you would use the correct covariance matrix; for the latter, something like bootstrap estimators can be tried. Your general statement is too strong.

        • James McCown
          Posted Feb 16, 2015 at 6:19 AM | Permalink

          Steve Postrel is correct. Assuming that the error terms are Gaussian is one way to conduct OLS regressions, but not necessary. Hermann Bierens and other econometricians developed the asymptotic theory that requires only that the error terms be independent and identically distributed with a finite variance as the sample size goes to infinity.

        • R Graf
          Posted Feb 16, 2015 at 6:28 AM | Permalink

          I’m not sure what you just said but I hope you just proved with statistical certainty that evaluating a circular equation will produce garbage if anything at all.

      • davideisenstadt
        Posted Feb 16, 2015 at 6:25 AM | Permalink

        James :
        do you think that those basic assumptions required by bierens are met. Are the error terms of repeated runs of the same model really independent? are the error terms of different models’ runs identically distributed?
        do you think the variance is finite as the number of runs goes to infinity?

        • davideisenstadt
          Posted Feb 16, 2015 at 6:57 AM | Permalink

          and the length of those runs goes to infinity?

        • michael hart
          Posted Feb 16, 2015 at 9:57 AM | Permalink

          It just seems like they will…

        • James McCown
          Posted Feb 16, 2015 at 12:59 PM | Permalink

          David Eisenstadt:

          Without having worked with this data, I have no idea about the answers to your questions.

          I am simply pointing out that Steve is correct. Gaussian errors are not necessary in order to run an OLS regression.

    • HAS
      Posted Feb 10, 2015 at 12:29 AM | Permalink

      I’ve idled part of my recent life away trying to understand why the simple point that Ross McK makes about the amount of information available isn’t somehow instinctive to many in the community. So help me ATTP (if you are still monitoring this thread).

      It seems to me that you make a number of empirical testable assertions in your initial comment. For example you assert climate models conserve energy and do this in a particular way, you assert “dF, .. is independent of dT as long as the model conserves energy” while conceding “climate models are not perfect and don’t conserve energy exactly but that doesn’t really change that dF is not explicitly dependent on dT”.

      At that point you then assert “this analysis is not circular. Just because the temperatures are used to determine the external forcings does not mean that the external forcings depend on temperature.”

      Now as I said all those assertions are empirical, and as a good empiricist you’ll be keen to test them.

      Now here’s the thing. We aren’t dealing with abstract theoretical concepts, we are dealing with specific climate models warts and all. We have some rumpty incomplete data with which to do this. We can take the cheats way and just add into the paper that we assume all the above, but if we did that we wouldn’t have much of a conclusion.

      And in fact M&F try and take the high road. They attempt to estimate the various relationships from the data to hand. They admit that their information is incomplete. But they don’t test their assumptions including those required by the tools they use with the data along the way. And by that I mean the real data they have from the models under study.

      Help me understand why is it sufficient to simply assert these things as givens, as you have done, when it is obvious they are empirical?

      • Don Monfort
        Posted Feb 10, 2015 at 3:04 AM | Permalink

        “We aren’t dealing with abstract theoretical concepts…”

        It looks like ATTP is looking for help from the realm of abstract theological concepts. He is hoping and praying that M&F can be saved by some miracle.

  44. Kenneth Fritsch
    Posted Feb 9, 2015 at 10:16 AM | Permalink

    This post is very timely with regards to my learning and study of the CMIP5 models. What I see most notably on reading the Nic Lewis’criticism of the Marotzke and Forster paper and the paper itself (ignoring the more fundamental errors pointed to by Nic)is that the authors motivation is based on an assumption that the model outputs can be lumped into a single data base for statistical analysis. In my studies I have attempted to find tools to look at differences in model outputs that might well question the validity of this lumping effort or at least warn against the interpretation of the analysis results.

    My efforts have been based on first attempting to decompose the temperature series of the CMIP5 models and observed data sets into deterministic, or at least secular trends, cyclical components and red and white noise using Singular Spectral Analysis (SSA). While my analysis to this point using (SSA) has not been rigorous in determining significant difference, these decompositions and subsequent reconstructions visually reveal some very different patterns and residual white and red noise and differences between models and observed temperature series.

    For the time being I left the noise study and went on to the study of the individual CMIP model equilibrium climate sensitivity (ECS) and Transient Climate Response (TCR) emergent parameters. What I find interesting is that the attempts to classify the warming pause of the past 15 or so years in terms of model and observed differences have tended to obscure looking further back in time like 40 years where the white and red noise have a lesser effect in finding statistically significant differences. While one can find significant trend observed to model differences over that time period after accounting the auto correlations, there remains the potential of the difficult to measure low frequency cyclical (60 to 70 year)component that could affect the analysis result if not properly accounted for. I was motivated by these difficulties to look at the more deterministic part of the model outputs like ECS. I was further motivated from Nic Lewis posting at these blogs about the estimation of ECS and TCR from observable data and the comparisons with the climate models.

    My first surprise from looking at the individual CMIP5 models ECS estimation from the abrupt 4XCO2 experiment was the need for correcting the net TOA radiation and surface temperature with the pre industrial control runs. The control runs, in general, do not appear to be going to an equilibrium even after a 200 year run up. Forster, who Nic has mentioned here, was a coauthor of the paper that used ordinary least square regression on the net TOA radiation and surface temperature to estimate ECS. Based on a later paper coauthored by Andrews using the same regression method those estimated values appear in the AR5 chapter 9. On the suggestion of Carrick, I did both ordinary and total least square regressions and found that the estimated ECS values were, in general, larger by 10% or more using total least square regression. I am currently finishing downloading the CMIP5 model and model run radiation values in order to compare the net TOA radiation to the potential global sea water temperature for individual CMIP models and runs. Ultimately I want to determine whether the TOA is truly made to balance by tuning as noted in publications or if, for at least some models, there remains a residual TOA not accounted for by changes in the ocean heat content as realized in the global sea water temperature change.

  45. Don Monfort
    Posted Feb 9, 2015 at 11:17 AM | Permalink

    Like some others commenting here, I don’t fully grasp the stats that are at the center of the argument. In situations like this I look for what Pekka has to say. He will defend the consensus side when possible, but he is honest and he knows his doo-doo:

    Pekka on Lab Book:

    “Basically we have first

    F = N + α T

    Then we do regression

    T = a + b F + c α + d κ + e

    Using the first in the second and moving one term to the left hand side

    (1 – α b) T = a + b N + c α + d κ + e

    That seems to lead problems, if the coefficient of T may be close to zero. Thus we should perhaps not trust the results, if the regression tells that b is close to 1/α even in part of the situations.

    Basically we have first

    F = N + α T

    Then we do regression

    T = a + b F + c α + d κ + e

    Using the first in the second and moving one term to the left hand side

    (1 – α b) T = a + b N + c α + d κ + e

    That seems to lead problems, if the coefficient of T may be close to zero. Thus we should perhaps not trust the results, if the regression tells that b is close to 1/α even in part of the situations.”

    Also, the absence of a racehorse defense from nicky is telling.

    • AndyL
      Posted Feb 9, 2015 at 11:26 AM | Permalink

      Pekka’s latest comment on aTTP seems interesting:

      We can see that all regression parameters multiply variables that have significant variability. Therefore the regression is not hampered by the circularity. The situation is not nearly as bad as Nic claims.

      One problem remains. The coefficient of temperature may be very small in some cases. Therefore there may be situations, where the results of the regression lead to large uncertainties in the calculation of the temperatures from the results of the regression. It’s possible that this effect affects the spread of predictions from regression seen in the Figure 2b over years 1950-70 (the Figure 2 is shown in aTTP’s post). That’s at least a possible consequence of this issue. (M&F propose other possible reasons for the effect, but only propose).

      If my above proposal is correct, it might contribute also to the somewhat less increased variability of the latest predictions and to the variability over most of the full period in the case of the 62 year trends.

      Thus I do not think that the whole analysis would be affected strongly as Nic claims, but the circularity might have influence. Certainly it would be nice to know, whether the coefficient of T is small at all, and if it is, how much influence that would have on the results. Checking that would be possible either from full information from the original calculation or from a repetition of that calculation recording the relevant coefficients during the calculation.

    • Posted Feb 9, 2015 at 3:03 PM | Permalink

      > the absence of a racehorse defense from nicky is telling.

      An alternative to this innuendo is that Nick may still have problems commenting:

      http://moyhu.blogspot.com/2015/01/echo-chamber-at-climate-audit.html

      Steve: I have a longstanding record of allowing critics to comment. It is ludicrous to think that I would depart from this longstanding policy in Nick’s case. Nick has posted hundreds of comments here, including a comment on this thread https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-750695. The commenter’s innuendo is entirely justified and your proffered rebuttal isn’t.

      In addition, I was unaware of Stokes’ whinge as he had never contacted me about an issue. It’s ludicrous for Stokes to complain that he’s being censored. Nick has had some problems with wordpress from time to time. I haven’t tracked his recent complaints and am too busy right now to do so. But to speculate that I’ve suddenly changed a longstanding policy of allowing adverse comments is absurd. If he has posting problems from the filters or otherwise, it’s easy enough to email me and he should have done so before whining. You Climateballers are pieces of work.

      • Posted Feb 9, 2015 at 8:11 PM | Permalink

        I wrote a comment at Nick Stokes’ blog explaining how comments get chewed up at at blogs – the website just disappeared it. I couldn’t be bothered to re-write it so I let it pass.

        At a later data, I wrote another comment at Nick’s blog and his blog swallowed it again, repeatedly.

        I even made a movie of it: https://twitter.com/shubclimate/status/563820850441252865

        • Posted Feb 9, 2015 at 8:35 PM | Permalink

          Shub,
          That is an editing issue with the Google Blogger software, which is outside my control. It seems that if you choose an ID after writing your comment, it clears the comment space. Nothing comes through to me. I do not moderate comments there.

          At CA, on the other hand, my comments were making it into the moderation queue, where they stayed (visible to me) for a very long time. Below is one that stayed for a week, and never made it. It is just a simple protest at my comment being said to be “making stuff up” when I had based it entirely on the paper which I quoted.

          I haven’t had problems at any other WordPress site for months. In fact, I was in the clear at CA until some time during the Sheep Mountain thread. And my one comment on this thread passed without moderation.

        • stevefitzpatrick
          Posted Feb 9, 2015 at 9:22 PM | Permalink

          After losing many comments at Nick’s blog, I learned to paste a copy of my comment on the clipboard (or in a word processing program) before trying to post it. When it gets eaten, I just paste in into the comment field a second time and it usually works. There are also issues with trying to type from an iPad or iPhone, where the cursor gets locked up, but these are minor compared to losing whole comments.

        • Posted Feb 9, 2015 at 9:51 PM | Permalink

          SteveF,
          “I just paste in into the comment field a second time and it usually works.”
          Yes. I think it works because by then you have supplied your ID.

        • Posted Feb 10, 2015 at 4:24 AM | Permalink

          Nick, the comment I wrote on your website said exactly this: my comment would remain visible to me with the ‘Your comment is awaiting moderation’ on top but would never clear moderation. Sometimes I would get a grey blank screen after posting a comment saying ‘oops looks like you’re already said that’ but the original comment would not appear.

        • Sven
          Posted Feb 10, 2015 at 4:30 AM | Permalink

          So, when it happens on your blog, it’s “an editing issue with the Google Blogger software”, but when it happens to you, Nick, at CA, it’s the evil CA and Steve Mc. I think an apology from you, Nick, to Steve is a right way forward.

        • Nιck Stοkes
          Posted Feb 10, 2015 at 4:41 PM | Permalink

          Shub,
          The animation you showed does not show a comment going into moderation. It shows the edit screen clear as soon as you press the ID select button. And the remedy for that is to adopt an ID first. I (and the system) can only deal with comments that are actually submitted.

          I have the blog set to no moderation, and there is no moderation queue. Comments go either straight to screen or to spam. Very few now go to spam, and I think none of yours. Some time ago I did set it to moderate comments on posts more than four weeks old (which were mostly spam). But not recently.

        • Posted Feb 12, 2015 at 4:38 AM | Permalink

          Nick, you say “The animation you showed does not show a comment going into moderation”

          I said I made a movie to show how ‘…comments get chewed up at at blogs”

          You misread.

        • MikeN
          Posted Feb 14, 2015 at 11:20 AM | Permalink

          Not at all. Have you read the Sicre paper?

          Testing.

      • Posted Feb 11, 2015 at 7:20 AM | Permalink

        Dear Auditor,

        You assert that “The commenter’s innuendo is entirely justified and your proffered rebuttal isn’t.” I disagree. Don Don’s innuendo mismanages the commitments in play:

        My sense was that my audience at Climate Audit had placed me on “one side” of what they saw as a “two sided debate,” and held me responsible for everything “my side” had ever said. That kind of refusal to allow a conversation partner to define the responsibilities she is willing to undertake is unlikely to lead to a productive discussion. In this particular case, I think the demands for to defend things we hadn’t said occluded possible areas of agreement about what we did say.[0]

        Nick owes no constant room service on CA, more so considering the reception he constantly gets. This leads to a Procrustean bed: slimed if Nick comments, slimed if he doesn’t. Since you allow yourself not to say what you think from time to time, the more Omertà “trick” on CA may very well be suboptimal.

        ***

        Here are some instances with commitments you have yet to fulfill. You still have not declared having read Sicre at the time of writing. No response to Fabio Gennaretti has been forthcoming [1]. For more than a year now, you failed to respond to Robert Way [2] regarding your use of his private correspondence. In this very thread, you have yet to opine on Nic’s exhortations:

        The paper is methodologically unsound and provides spurious results. No useful, valid inferences can be drawn from it. I believe that the authors should withdraw the paper.

        Neither have you endorsed Gordon Hughes’ comment:

        All this paper demonstrates is that climate scientists should take some basic courses in statistics and Nature should get some competent referees.

        May I suggest that it is now time for you to lead by example and, at the very least, commit to Nic’s and Gordon’s claims? Show us why you are the fiercest player in the history of ClimateBall ™, after all. Unless you prefer more Omertà.

        If you would be so kind as to resurrect the MMH10 thread [3], that would be nice.

        Thank you for the kind words,

        W

        PS: Some, but not me, might wonder why Nic assumes α is constant. Any auditor to wonder why too?

        [0]: scientistscitizens.wordpress.com/2011/07/26/debate-in-the-blogosphere-a-small-case-study/

        [1]: climateaudit.org/2014/10/13/millennial-quebec-tree-rings/#comment-740376

        [2]: neverendingaudit.tumblr.com/post/110629968739

        [3]: moyhu.blogspot.com/2011/06/effect-of-selection-in-wegman-report.html?showComment=1308008366856#c7386160124216218370

        • Sven
          Posted Feb 11, 2015 at 7:40 AM | Permalink

          So, first there was an insinuation by Nick ant thouself that Steve is censoring Nick. When this did not turn out to be true, then, instead of apology we here that it doesn’t matter, Nick is not nicely treated… This is just stupid, snip

          Steve: was in moderation for some word. No need to engage in Willard’s foodfight.

        • Sven
          Posted Feb 11, 2015 at 7:44 AM | Permalink

          No, went still into moderation, has to be some other word… Agh, doesn’t matter 🙂

        • Hoi Polloi
          Posted Feb 11, 2015 at 8:01 AM | Permalink

          “PS: Some, but not me” blah

          Ach, we have another rabett in our midth…

      • Greg Goodman
        Posted Feb 14, 2015 at 5:03 AM | Permalink

        Steve, one thing I worked out during comments to my recent post at Judith’s, and which probably explains many blogs seeing these “unexplainable” false positives, is that the list of words that moderators provide via the WP admin page are NOT words but strings.

        WP will match any occurrence of that STRING of characters, not its presence as a space delimited word.

        I sussed this because Agung was triggering moderation holds. It turned out that Judy had added “gun” (for some reason) to her block list, thinking it was a word match.

        Similarly, I posted using the word “familiar” which trigger moderation because it contains the string “liar”.

        The solution, should it be the case here, is to provide moderations traps as quoted strings including leading and trailing spaces if you wish to trap those words.

        eg

        ” liar ”
        ” gun ”
        ” Stokes ” 😉

        HTH

        • Greg Goodman
          Posted Feb 14, 2015 at 5:11 AM | Permalink

          Well that comment (which the world will see later) got held for moderation … Bingo.

          I’ll bet that has correctly identified the problem here and that Steve is including L_I_A_R in his block list.

  46. son of mulder
    Posted Feb 9, 2015 at 12:15 PM | Permalink

    The original exercise of Marotzke is essentially to reconcile the outputs of models of a chaotic system to a measured global average and justify that the very significant differences are down to “natural” variability. It is obvious why such an exercise was undertaken ie an attempt to keep the models from being thrown on the scrapheap because they are diverging significantly from observation and that the hypothesis that pedicted dangerous warming is a result of anthropogenic CO2 is becoming less and less credible.

    It’s like saying that the predictions don’t fit the measurements because there is internal stuff that we don’t know. Well knock me down with a feather. If we knew the unknown internal stuff then the output of the models would be running cooler ie not as dangerous (or dangerous at all) and it would be unreasonable to demonise anthropogenic CO2.

    • Posted Feb 9, 2015 at 5:47 PM | Permalink

      They apparently did not think things through enough. Note that Marotzke of MPI works for the German center of climate modeling, dependent on ever more funding for same. Such a defence was to be expected. Just notmthis poorly. Or, to paraphrase Napoleon, never interupt an enemy in the process of making a fatal mistake.

      So, lets suppose for the sake of arguement they are right and Nic is wrong. (The opposite appears true–this is for the sake or argument only.) Then they showed that ‘internal’ aka ‘natural’ variability has a large role in temperature time series.
      Ah, but the about 1975 to about 2000 temp rise attributed by fundamental GCM model design to GHG is then falsified. And so the model parameterizations. And so the model outputs. Either way falsified… See my horns of a dilemma comment below.

  47. Posted Feb 9, 2015 at 1:06 PM | Permalink

    Many people have learned that a strong enough positive feedback makes the system unstable and behave very differently from the system without feedback. A moderate positive feedback leaves the system stable, but adds to its variability, while a negative feedback makes it only more stable and less variable than the system without feedback.

    This case is analogous to that. The kind of circularity the analysis involves affects the stability properties of the whole analytic process that includes both the earlier Forster et al (2013) analysis and this paper. In that analogy the present circularity seems to have a nature similar to the positive feedback making the analysis less stable. It’s, however, not at all obvious that the method becomes worthless. Actually the results are quite reasonable, in general, proving that nothing very drastic takes place. How much the circularity affects the accuracy of the final results is another question that I’m unable to tell.

    • Posted Feb 9, 2015 at 3:40 PM | Permalink

      Pekka

      Thanks for your comment; I agree that your analogy seems apt. If the variability of the ΔN term dominated that of the α ΔT term, then the diagnosed forcing would be nearly exogenous in relation to ΔT. However, as I wrote in my article, that does not appear to be the case for recently ending 62 year trends. The inter-model variance of α ΔT is around three times that of ΔN. That seems to correspond to quite strong positive feedback, in your analogy.

      I think that you wrote elsewhere that your suggested regression had similarities to one based on my equation (6)? But isn’t the obvious starting place my simple equation (6), rather than your complex regression system? There are only 18 models, and if several coefficients have to be the estimated from noisy data and for a linearised version of the equation (1) model that is extremely approximate, I am doubtful that one would be able to obtain reliable results. In any case, if a regression based on equation (6) – arguably best performed on a logarithmic rather than linearised version – does not yield significant and reasonably stable results for 62 ear periods ending in recent decades, doesn’t that strongly suggest that the whole simple model edifice on which the paper’s analysis is based isn’t valid? And, as I wrote, regression based on equation (6) generally has very little explanatory power.

      You say here that the paper’s results in general are quite reasonable. But you wrote at ATTP: “The models have highly different parameter values for α and κ. They are much closer in their temperature trends. Thus they must have highly different forcing histories. Those highly different forcing histories are used in the comparison presented in the paper.”, which implies that you – like me – believe that the parameter values for α and κ do, between them, have a significant impact on model temperature trends, at least over multidecadal periods ended recently. Am I right? Yet the paper claims that they have no significant impact on 62 year trends ending recently, with forcing variations alone dominating.

      • Posted Feb 9, 2015 at 4:07 PM | Permalink

        Nic,

        It’s not important that the variability of the ΔN term dominates that of the α ΔT term, it’s enough that the contribution from ΔT to the right hand side of the regression is significantly less than the left hand side. If that’s not the case we might expect some strange variability in the temperatures.

        By reasonable I mean that the time series behave essentially as expected. That’s a different thing than the ultimate results of the analysis, and I consider it more likely that the reason for the surprising final results is somewhere else, perhaps simply in the fact that 62 years is so long and more than half of the full period considered, or perhaps the CMIP5 ensemble is not representative due to the implicit (and in part explicit) selective processes that have contributed.

        Marotzke and Forster have also listed several problems that they have recognized. Perhaps the real problems are among those.

        My impression based on the output of the analysis is that the circularity has probably not affected the outcome very much. It’s influence should be checked, but that’s my guess.

        • Posted Feb 9, 2015 at 6:25 PM | Permalink

          Pekka, with all due respect, your reply is illogical. Think it through rather than reflexively defending the apparently indefensible. Logic explained in more detail in other comments.

        • Carrick
          Posted Feb 10, 2015 at 8:25 AM | Permalink

          Rud Istvan, with all due respect, you not following the argument doesn’t make it illogical. :-/

    • DocMartyn
      Posted Feb 9, 2015 at 7:30 PM | Permalink

      Pekka, this is not the forum for a discussion of system control under different feedback regimes, but you might ask Anders to give you an ATL where you can explain what you mean, in terms that are used in classical control theory and the huge body of statistical validation that has been explored in control theory.
      I have yet seen any application of control theory to cAGW.

      • Posted Feb 10, 2015 at 3:52 AM | Permalink

        DocMartyn,

        Pekka was using control theory to provide an analogy, which is fair enough whether or not classical control theory itself is applicable for analysing climate system behaviour.

        I generally find Pekka’s comments sensible and informative, and I hope he will comment more often at CA.

      • Matt Skaggs
        Posted Feb 10, 2015 at 10:25 AM | Permalink

        Doc Martyn wrote:

        “I have yet [to see] any application of control theory to cAGW.”

        One reason for that is climate science hasn’t the slightest clue as to the historical mix of feedback versus system capacitance. If you cannot tease those two apart, the control equation cannot be resolved in the time domain, at least not to the point of elucidating decadal trends. There is still the (relatively unexplored) possibility of deriving useful control information using a control volume or control mass approach over very long time intervals, but that won’t help with decadal variability.

    • Posted Feb 10, 2015 at 4:00 AM | Permalink

      I agree with Pekka, and I think the feedback analogy is appropriate. I see here assertions that you can’t do regressions where there is dependence, but that is not true. You just have to use an appropriate covariance matrix, which will be less well conditioned because of the dependence.

      There is a familiar example in finding the trend of a time series with autocorrelation. There is dependence between the terms. That modifies the covariance of the random term, often approximated (with AR(1)) by a Quenouille correction. And it generally means the result is more uncertain than OLS, but the expected value is not much different.

      Steve: this post was in moderation for 5 hours. I moderate after the fact. I have learned that Nick Stokes has been whinging at his blog about being supposedly censored – even though he’s posted hundreds of comments here. Stokes has gotten onto some wordpress moderation lists but claims to have solved them all. I don’t know why Stokes’ comment went into moderation. He’s not on any CA blacklist despite whatever claims he may make at his blog.

      • igsy
        Posted Feb 10, 2015 at 11:08 AM | Permalink

        Sometimes the angle you take on things makes my head spin. Are you saying it is “not true” to claim an analysis is invalid due to the failure of key assumptions because it is possible to redo the analysis in a more general way, even though that was not actually done? Or did MF, in your opinion, actually specify an appropriate covariance matrix in this instance to address the linear dependency?

      • stevefitzpatrick
        Posted Feb 10, 2015 at 9:48 PM | Permalink

        Nick, SteveMc,

        FWIW, I have had several comments held up at Climate Audit (including one on this thread) for ~2 to ~5 hours. I figured there must be some key words that trigger transfer to a moderation list rather than immediate posting. I did not ever think there were nefarious motives involved, and I rather suspect that is also the case with Nick Stokes’ above comment which ended up in moderation.

        Steve: as you observe, Nick is not the only person that gets a comment tied up from time to time for reasons that seem puzzling. It’s easier for me to deal with such incidents manually than to try to figure out the interaction between various spam filters. Because Nick has gotten on wordpress blacklists unrelated to CA in the past, it is entirely possible that he’s run into problems additional to those experienced by others. And yes, there are a variety of key words that trigger moderation, some of which are related to spam control rather than to good manners. For the most part, his comments seem to come through, so I’m not going to lose any sleep trying to figure things out. Because Nick is in an opposite time zone, such delays are longer. I’m also a little inconsistent in my editing diligence and sometimes don’t do things for a few days.

      • fizzymagic
        Posted Feb 11, 2015 at 2:30 AM | Permalink

        I see here assertions that you can’t do regressions where there is dependence, but that is not true. You just have to use an appropriate covariance matrix, which will be less well conditioned because of the dependence.

        Maybe I should have said something more like you can’t do a meaningful regression when there is dependence.

        You can use a covariance matrix when there are correlations between dependent variables (those whose coefficients are being regressed) but to do one when there is a correlation between the independent variable and the dependent variables makes interpretation of the results iffy.

        In this case, the dependent variables are not independent because several came from the same models. And one of them is estimated using the independent variable, making those not independent either.

        It doesn’t matter whether the impact on the results is large or small — in a competent, rigorous, scholarly paper, these problems would have been identified and attempts made to quantify the resultant uncertainties. Nothing of the sort appears in the Nature article. If an author omits a crucial set of issues in a paper, then the entire thing should be called into question until it can be proven valid.

        It seems to me that people are defending this paper because they like the conclusions, not because it represents good science. Think hard about that — is that really how you want climate science to move into the future?

      • Greg Goodman
        Posted Feb 15, 2015 at 9:04 AM | Permalink

        I posted an explanation of where this comes from but because of the forbidden word in the explanation it is, itself, held in moderations and Steve has not cleared it.

        The problem is that admins put words for moderation traps into the WP interface but WP regards them as strings and matches any instance where a comment contains one of the forbidden words as a sub-string.

        Nick say:
        “There is a fami_L_I_A_R example …!”

        Geddit?

        The solution is to pad the words with leading and trailing spaces and put them in quotes. Most admins don’t realise this and commenters across WP are railing against apparently spurious moderation holds that no one can understand.

        Some then get all paranoid and start concluding they are banned.

        For example to ban Nick Stokes, Steve would need to enter:

        ” Nick ”
        ” Stokes ” and not

        Nick
        Stokes

        since that would end up trapping words like “knickers” and holding the comment for moderation.

        Hopefully, once our host reads my comment that got held he will fix this and everyone can feel less paranoid.

      • Posted Feb 17, 2015 at 10:43 PM | Permalink

        Probably my comment is very late and will not be noticed.

        I have a blog called Science of Doom – it is hosted by wordpress. I do not moderate, but have a bunch of words and a few rules that call wordpress to send a comment into moderation – waiting for me to release it.

        That aspect works pretty well. WordPress never lets a comment through if it contains a word – or violates a rule – that I have given it.

        However – and this is a bit of a kicker – some comments get put into moderation until I release them. And I can’t work out why. I review their comment – no keywords, or rules violated.

        We could say – plenty of “false positives”.

        One particular commenter comes to mind – in a given week he might have 3 out of 5 comments held in moderation. In another week he might have 0 out of 5 or 0 out of 10. I can’t understand the logic and I can’t see the reason.

        What I *guess* is that some combination of his IP address/name/words are triggering other rules that wordpress has decided are bad.

        For someone without a wordpress hosted site it will be different, but the “behind the scenes magic” even with a client side hosted account are not at all clear.

        Sometimes people whose comments end up in moderation get a little testy. Other times people are understanding. It all depends on their day, their week and their demeanor.

        • Don Monfort
          Posted Feb 17, 2015 at 11:45 PM | Permalink

          Your comment is noticed and appreciated. So too is your blog.

  48. Posted Feb 9, 2015 at 1:24 PM | Permalink

    It seems M&F have placed themselves on the horns of a dilemma with their reply.
    If their procedure is correct, as they claim, it leads to the conclusion that the two most important emergent structural properties, a and k, do not influence model outputs. Illogical, but if taken at face value then ‘internal’ climate variability caused the pause. But then that ‘internal’ natural variability would have also been present in the hindcast period back to roughly 1975 to which the models were parameterized for best hindcasts per the CMIP5 experimental ‘near term’ protocol. And so the underlying temperature rise in this period to GHG is falsified. So the models run excessively hot.
    Or, their procedure is faulty because of circularity, and so just produces an illogical result. Since M&F have not addressed the point (because they cannot since it is true) their explanation fails and the models are now falsified. They are too sensitive– because of the parameterization tuning period contained natural variation. Moreover, since the rise from 1975 to about 2000 is indistinghishable from the rise from about 1920 to 1945 (Lindzens point, noting that even the IPCC does not attribute the earlier rise to GHG), natural variation could well be most of the later rise as well. Still more observational support for the root cause of the model/ temp divergence.
    Either way, the model results are unsupportable in the bigger picture.

    • Kenneth Fritsch
      Posted Feb 9, 2015 at 4:46 PM | Permalink

      The authors talk about the emergent parameters not affecting the differences in the models trend outputs much, I think, while still, I would assume, acknowledging that the value of the parameters affects the trends and particularly the longer term ones. To me this is saying that the noise level is the overwhelming factor in determining the trend differences and that is where in individual model ECS and TCR values can be 100%.

      Is there or should there be any dependency between the emergent parameter ECS or TCR and the natural variation that I call noise? If not the authors should be decomposing individual model outputs into secular trends, noise and cyclical structure and not doing the group thing and assuming the regression residual from these differences is all noise – or quasi-random variability as the authors reference it. In my mind, a better comparison would be the observed climate versus the individual model output and better still where the individual model has multiple runs and the noise levels can be better estimated (modeled).

      • Posted Feb 9, 2015 at 5:09 PM | Permalink

        See Akasofu 2009 on this point. Summarized in essay Unsettling Science with reference footnotes. There is a dependency through the necessary model parameterization (see essay Models all the way Down) for ‘best fit’ multidecadal hindcasts specified in the ‘experimental design’ published by Taylor, Meehl et. al. in BAMS in 2012.

      • Kenneth Fritsch
        Posted Feb 9, 2015 at 8:16 PM | Permalink

        Should have said ECS and TCR estimated values for individual models can be at ratios of 2:1.

  49. RomanM
    Posted Feb 9, 2015 at 2:19 PM | Permalink

    More on this topic:

    http://www.reportingclimatescience.com/news-stories/article/blog-row-erupts-over-nature-model-paper.html

    Steve: ironically their article doesn’t contain a hyperlink to the CA article. They refer to CA, but link to Nature.

    • Posted Feb 9, 2015 at 2:50 PM | Permalink

      Link fixed. Apologies.
      L

      • David K
        Posted Feb 9, 2015 at 3:10 PM | Permalink

        Now they link to CA but refer to (2015, Nature ). 🙂

        • Posted Feb 9, 2015 at 3:18 PM | Permalink

          No it says: “Post by Nic Lewis criticising Marotzke & Forster (2015, Nature) here.”
          Marotzke & Forster (2015, Nature)is the citation for the paper…

          L

        • David K
          Posted Feb 9, 2015 at 3:34 PM | Permalink

          Thank you sir, I stand corrected.

  50. Michael Jankowski
    Posted Feb 9, 2015 at 6:05 PM | Permalink

    “Andthentheresmath, so to speak”…lol Steve!

    • TimTheToolMan
      Posted Feb 10, 2015 at 4:43 PM | Permalink

      Made me laugh too 🙂

  51. stevefitzpatrick
    Posted Feb 9, 2015 at 11:10 PM | Permalink

    Nic Lewis,

    Seems to me the circularity is complete in the paper. Forster et al (2013) defined delta F as you showed above:

    ΔF = α ΔT + ΔN (1)

    Where ΔN is the change in the TOA imbalance and α is the inverse of the ECS

    M&F start with the basic equation:

    ΔT = ΔF / (α + κ) (2)

    Where α is the inverse of the ECS, and κ is the “ocean uptake efficiency”, or the ratio of change in TOA imbalance to change in temperature, and then add an error term ε. But κ is related to ΔN and ΔT:

    κ = ΔN / ΔT (3)

    Substituting (3) into (2) we get:

    ΔT = ΔF / (α + ΔN / ΔT) (4)

    And rearranging:

    α ΔT + ΔN = ΔF (5)

    Which is nothing more than the equation (1) used by Forster et al (2013). So the M&F paper uses the SAME equation as Forster et al (2013), slightly rearranged, and with an error term added. I don’t see how using an equation to calculate forcing from change in temperature, and then that same equation to calculate change in temperature from calculated forcing adds much to the world’s knowledge.

    • Posted Feb 10, 2015 at 3:43 AM | Permalink

      stevefitzpatrick,

      I think you have overlooked that the estimates they use for κ come from a different set of simulations (see the paragrqph below my equation 4) and are quite different from the values of ΔN / ΔT obtained during the Historical simulations – see 2nd paragraph of my section ‘Another reason why Marotzke’s approach is doomed’.

      The circularity appears thus to be somewhat diluted, but at the expense of a key assumtion in their eqn. (1), where they apply the previously diagnosed values for κ to model behaviour in th eHistorical period, being falsified.

  52. R Graf
    Posted Feb 10, 2015 at 1:47 AM | Permalink

    This is fascinating. M&F and their defenders basically argue on the logic that their is no big deal because the paper’s statistical analysis is validated by it’s producing the result that was expected (by them). The threat of circular logic was the furthest things from their minds apparently.

    For sure astrology had a root in strong foundations of settled assumptions proven by generations of observation.

    Science and humanity owe a debt to Nic, Steve, Ross and many others for providing an inspectors general of sorts to tax-payer funded science driving our politics, which in turn drives our press, voter choices and science funding. Is there a valid equation for that?

    Nic, Steve, Ross Nature does not need to be call you to be a referee/reviewer. You are changing the way science is done, making history here. Public audits are the wave.

  53. rwnj
    Posted Feb 10, 2015 at 6:01 AM | Permalink

    I strongly, STRONGLY recommend the work of Judea Pearl and others who have developed a theory of causal modeling. Pearl’s book begins with the striking observation that physical ideas as expressed by equations contain no causal information. If F=ma, does F cause a or does a cause F or do F and a cause m? The theory particularly clarifies complicated observational relationships versus operational relationships in which an exogenous actor changes a variable. The confusion between an exogenous forcing and the endogenous estimation of the forcing would not have occurred if the entire model were rebuilt with these ideas in mind.

    • Posted Feb 10, 2015 at 7:13 AM | Permalink

      The point you just made should be included in the lead paragraph of Nic Lewis’ submission to Nature:

      physical ideas as expressed by equations contain no causal information.

      That might limit the back-and-forth arguments over circularity as a necessary result of the M&F formula’s design.

  54. Andrew McRae
    Posted Feb 10, 2015 at 9:01 AM | Permalink

    Sorry, but as there has not been an Unthreaded post recently I will steal a moment to post this offtopic tip. The Australian BoM’s adjustments to land temperature records in ACORN-SAT will be checked by a government-appointed panel of stats experts.
    So Climate Audit, meet the climate auditors…
    http://www.environment.gov.au/minister/baldwin/2015/mr20150119.html

    No findings yet, but one to watch in the coming months.

  55. R Graf
    Posted Feb 10, 2015 at 9:09 AM | Permalink

    I see Pekka’s comment this morning on Climate Lab Notebook in M&F posted response where he concludes a lengthy statistical summary demonstrating the problem with:
    “Starting values are from a database, formulas are given. Results follow from that. Variable F is not a real forcing, it’s a derived construct (ERF) defined in Forster (2013), motivated by physics, but not an externally given forcing.”

    His analysis boils down to this: the use of modeled forcing to evaluate modeled forcing cannot enlighten the real world.

    • Posted Feb 10, 2015 at 10:36 AM | Permalink

      My present conclusion is that the circularity occurs in this analysis in the way that it does not result in any problems. It’s typical that the same effect results in a zero in one direction of the analysis and singularity (infinity) in the inverted direction. In this case only the zero occurs in the relevant calculations and that does not cause any problems.

      The zero (or variable sign) may occur in some coefficients of the regression, but the regression is well behaved in all these cases. The model obtained by the regression is also well behaved in all the calculations M&F perform. In the mathematical sense it’s possible to define additional questions that involve explicitly ΔN and that would involve singular behavior, but such cases are not part of the M&F analysis and do not affect that analysis.

      More on that at Climate Lab Book and still more at aTTP.

      • R Graf
        Posted Feb 10, 2015 at 11:40 AM | Permalink

        My apologies for misinterpreting your final conclusion. I think you are saying that dT caused by dN (the temp change at surface theoretically attributable to TOA radiative imbalance) is insignificant to their results. But I read M&F in their reply maintain it is absolutely necessary to correct for this dT and absolutely deny any circularity.

      • Carrick
        Posted Feb 10, 2015 at 12:26 PM | Permalink

        Pekka, when we have recurrence relationships, isn’t it the case you have to iterate until you’ve achieved convergence?

        It seems like even if the recurrence relationship is stable, the result of a single iteration isn’t likely to be accurate.

        • Posted Feb 10, 2015 at 1:21 PM | Permalink

          Carrick,

          The nature of this case is not such that iteration is needed.

          The starting point is fixed: the results included in the CMIP5 model archive. All input values of the regression come from that archive either directly or through the earlier analysis of Forster et al (2013), or other earlier analyses that have deduced the values of α and κ, which are used and reported also in Forster (2013).

          The issue that Nic observed is that the CMIP5 archive does not contain estimates of model specific forcings, only temperatures and TOA imbalances. The forcings are calculated from these. That’s an one-time final calculation, no later corrections based on temperatures derived from the regression model are needed. If it were necessary to calculate such new corrections, then we would end up in iteration and further problems.

        • Carrick
          Posted Feb 10, 2015 at 4:35 PM | Permalink

          Pekka, to be honest I really can’t tell without getting more immersed in this, whether there are quantities you could update using the new value of T (or F).

          There is a similar problem in sensor calibration where you measure the rations of sensitivities of two sensors relative to a third source or microphone, and use that to separately compute the calibrations of the two sensors.

          In that case, looks on paper like it’s totally circular, but the trick to straighten the reader out is to subscript quantities so you can track where the quantities are being measured.

          Here, for example:

          ΔT1 = ΔF / (α + κ) + ε
          ΔF = α ΔT2 + ΔN

          But are ΔT1 and ΔT2 really supposed to be independent measurements here? This is not obvious to me.

        • Posted Feb 10, 2015 at 4:38 PM | Permalink

          What I wrote above – and what I believe to be the case in full agreement with the response of Marotzke and Forster – means that the calculation is not circular in the serious sense that the result obtained on the left hand side would be used iteratively as input on the right hand side. It’s circular only in the way that ΔT appears on both sides of the regression formula, when it’s written as it written in the paper and when ΔN + αΔT is substituted for ΔF.

          The substitution is used (implicitly) in the determination of the regression parameters, but, after the regression parameters have been determined, ΔN is not of further interest in the use of the regression formula, which now tells, how ΔT depends on ΔF, α, and κ in the regression model that tells approximately, how ΔT, ΔF, α, and κ are related in the actual CMIP5 models. Thus the whole regression is just a simple multilinear fit to the model behavior.

          This is a totally well behaved way of figuring out something about the model ensemble. (I’m a bit embarrassed that I didn’t see that more rapidly, but I wasn’t alone in not understanding the situation immediately.) The main limitation of the approach may be that it’s a multilinear regression that cannot describe any more complex variation of ΔT in the 3-dimensional space of the other variables, while the formula used to motivate the regression

          ΔT = ΔF/(α+κ)

          is clearly not linear in the variables and leads to a dependence that cannot be approximated well by the multilinear regression over a wide range of parameter values. The values of the other variables vary, however, over a wide range. The sum in the denominator has the range 1.17-2.81 in the model ensemble, the two parts of it vary even a little more. The overall adjusted forcing from doubling the CO2 concentration varies similarly significantly (2.59-4.31), but the values of ΔF over the 15 year and 62 year periods vary surely more than that, and include also periods of decreasing forcing. Thus the linearization is a crude approximation that may affect the outcome a lot.

        • Posted Feb 10, 2015 at 4:41 PM | Permalink

          Carrick,
          Perhaps the comment that I wrote simultaneously with your comment helps you in understanding the case. If not, I try to add more.

        • Carrick
          Posted Feb 10, 2015 at 5:26 PM | Permalink

          Thanks Pekka, that makes sense.

        • bill_c
          Posted Feb 10, 2015 at 7:35 PM | Permalink

          Pekka, Carrick,

          One question I have is the sequence of the regression performed. Pekka, when you write your version of the equation removing deltaF (dF) as you have here, it seems that sets up a regression model that simultaneously produces a “best fit” of dT to all the various empirical parameters. However unless I’m mistaken they are using the dF results of a prior fit to the models (Forster 2013) which doesn’t consider all the factors that this paper does. Thus they haven’t performed the multiple regression simultaneously on all parameters as one is really supposed to…

        • Posted Feb 11, 2015 at 5:11 AM | Permalink

          bill_c,

          The regression is done in a way that’s totally equivalent to doing it simultaneously. The result is fully well defined for the determination of the regression coefficients that they define. The only problem is that the resulting regression formula may have diverging coefficients when it is solved for the temperature trend, when the free parameters are ΔN, α, and κ, but this formula is not needed in any application included in M&F, and it’s difficult to see where it could be needed.

          With this set of variables the regression formula tells without any problems the energy flux contributions based on ΔN, α, and κ, but these contributions balance closely by themselves without the term proportional to ΔT, which has a coefficient near to zero in that case, and is therefore small for all reasonable values of ΔT. That’s enough to justify all the analysis of M&F. It’s never necessary to use that formula to calculate ΔT, and that’s the only step that could be problematic.

          The division of ΔT to contributions from the other parameters and a residual is well behaved, when the energy flux parameter is ΔF, not necessarily when it is ΔN, but there are also other reasons to pick ΔF.

        • RomanM
          Posted Feb 11, 2015 at 7:21 AM | Permalink

          Pekka:

          The only problem is that the resulting regression formula may have diverging coefficients when it is solved for the temperature trend, when the free parameters are ΔN, α, and κ, but this formula is not needed in any application included in M&F, and it’s difficult to see where it could be needed.

          Did you not read the paper? The authors state:

          We thus perform for each start year a multiple linear regression of DT against DF, a and k. The regression residual e is interpreted as the contribution from internal variability. The complete regression-based prediction for GMST trend is obtained by adding the ensemble-mean trend to the regression for the across-ensemble variations:

          The calculation of the predicted temperatures and the residuals from the regression is the main reason for carrying out the regression

          The possibly “diverging coefficients” is by far not the only problem with the regression when it has been reformulated in the proper manner. The situation is not as simple as it appear to you. I hope to post a comment on CLB (and here) on that later today.

        • bill_c
          Posted Feb 11, 2015 at 6:25 AM | Permalink

          Thanks Pekka.

        • Posted Feb 11, 2015 at 8:31 AM | Permalink

          RomanM,

          Yes, that’s the idea of the paper, but that’s done for the regression model that uses the free variables ΔF, α, and κ. That’s perfectly legitimate and supported by physical arguments. That does not lead to any problems in determining the residuals.

        • Posted Feb 11, 2015 at 8:46 AM | Permalink

          I repeat once more also some of the caveats acknowledged also by the authors:

          – A linear regression model is not an accurate model, but can describe only some leading features of the models.

          – The variable ΔF used effectively in the determination of the regression parameters is not exactly the same as the ΔF that occurs in other connections, it’s only an approximation, but the best approximation they have at their disposal.

          The list can be continued. Thus it’s justified to have some doubts about the accuracy of their results. There may very well be also some more fundamental issues, but going through step by step, what they must have done shows that the circularity does not enter in a damaging way. All steps are stable against consequences of that.

        • Posted Feb 11, 2015 at 9:15 AM | Permalink

          Why not use ΔF from the RCP files, which is certainly independent of temperature, rather than ΔF_est = ΔN + α ΔT?

        • Paul_K
          Posted Feb 11, 2015 at 12:44 PM | Permalink

          HaroldW,
          Values of forcing taken from the RCP files would give rise to a huge divergence between the emulated temperature from Forster’s simple model and the GCM’s actual historical temperature. Some of the calculated AF values from Forster 2013 are less than half those in the RCP files.

        • Kenneth Fritsch
          Posted Feb 11, 2015 at 2:48 PM | Permalink

          HaroldW, you bring an interesting point up here. The abrupt 4XCO2 experiment used for CMIP5 models was a special experiment devised I assume to better capture the equilibrium climate sensitivity and yet not have to run the models for millennial time periods. A lot of the output of the regression on that data depends strongly on the pre industrial control data (piControl) used for each model to adjust the TOA and surface temperature from the 4XCO2 experiment.

          However, why would not the authors M&F run their regression/model output on RCP4.5 or other scenarios and determine how well it agrees as kind of an out-of-sample test? Not sure I have completely thought through this but it is a thought.

  56. Posted Feb 10, 2015 at 9:18 AM | Permalink

    From what I can gather by reading comments here and at Climate Lab Book, there appears to be an emerging consensus that M&F does incorporate a degree of circularity in its use of deltaF derived from an earlier study. The disagreement now seems to be centering around how significantly this has affected the (surprising) outcome of the paper. Nic Lewis above states that the circularity may be ‘somewhat diluted’ but only at the expense of untenable assumptions in the calculations elsewhere.

    Pekka, whilst acknowledging the circularity, believes it may not significantly affect the analysis, though admits that it may make it ‘unstable’. Pekka furthermore suggests that the “surprising final results” may have their origin elsewhere. Pekka also states on CLB that “Variable F is not a real forcing, it’s a derived construct (ERF) defined in Forster (2013), motivated by physics, but not an externally given forcing”.

    So, from a purely logical point of view, the paper appears to be seriously flawed by its inclusion of this circularity, whether or not it significantly affects the final results or not. Another problem seems to be the poor choice of ‘independent’ periods, particularly the 62 year ones. For these reasons, it appears to me, the ‘surprising’ conclusions cannot be relied upon from a technical perspective, nor indeed also from a scientifically purist viewpoint.

    • Posted Feb 10, 2015 at 9:28 AM | Permalink

      Good summary. I was struggling to put something similar into words.

  57. Jeff Norman
    Posted Feb 10, 2015 at 9:54 AM | Permalink

    This is an interesting read from top to bottom. Thank you all.

  58. rwnj
    Posted Feb 10, 2015 at 10:45 AM | Permalink

    Another problem with many of the discussions above is that you cannot invert a regression. That is, if y = a * x + error is an optimally fit OLS regression, then x = y/a + error is NOT optimally fit. This is also true for more complicated regressions (other than OLS). Claiming that the coefficient is a real physical constant does not fix this problem if the constant was estimated with regression.

  59. Kenneth Fritsch
    Posted Feb 10, 2015 at 11:58 AM | Permalink

    I have linked to an Excel file at Dropbox that shows the plots I made for a Singular Spectrum Analysis decomposition and reconstruction for some CMIP5 model Historical and Pre-Industrial control runs and observed temperature series. Another worksheet shows the percent variance explained by the principle components used and some ARMA modeling results. The plots show a secular trend (red line), some cyclical components and the residuals (black line).

    One can see some large differences in secular trends which can represent the deterministic part of the series. The cyclical and noise component while visually different pattern-wise among the models is at near the same level. I make no great claims for this analysis other than it does show differences model to model and further shows a measure of the deterministic trend and natural variation for the models and observed temperature series. I cannot reconcile these plots with the findings in M&F under discussion here.

    https://www.dropbox.com/home?select=SSA_Obs_CMIP5_Models.xlsx#

  60. Frank
    Posted Feb 10, 2015 at 4:38 PM | Permalink

    Are there mistakes in the expansion and regression of Equation (3) in M&F’s paper (that have nothing to do with circularity)?

    First, they appear to be using the approximation that 1/(1+x) is approximately equal to 1-x when x is small. In this case, x is equal to (a’+k’)/(a+k) where a is the ensemble-mean climate feedback parameter (a_overbar in M&F), k is the ensemble-mean ocean heat uptake efficiency (k_overbar in M&F), a’ is the “across-ensemble variation” in the climate feedback parameter and k’ is the “across-ensemble variation” in the ocean heat uptake efficiency. The ensemble range for a and k are 0.6–1.8 and 0.45–1.52 W/m2/K. It isn’t obvious to me that x must be small enough for this approximation to be valid.

    The authors define a’ and k’ using the phrase “across-ensemble variation”. It isn’t clear to me what this phrase means. During regression, each model presumably has a distinct a’_j and k’_j. Presumably each a’_j must come from subtracting the model climate feedback parameter from the ensemble mean climate feedback parameter. If so, the approximation is incorrect for at least some of the models.

    Second, when one transforms the expansion they obtained into the regression equation immediately below, the coefficients beta2 and beta3 are required to be equal. Instead of two independent terms, there should be a single coefficient multiplied by (a’+k’).

    • Posted Feb 10, 2015 at 4:54 PM | Permalink

      Frank,
      All the apparent derivation is really only motivation for the rest. The actual analysis starts from the equation that has the betas in it, i.e. the unnumbered equation above equation (4).

      • Frank
        Posted Feb 10, 2015 at 7:31 PM | Permalink

        Pekka: One can dream up many possible regression models for fitting this data. If the models are purely statistical in nature, one has difficulty deciding which model to use and what type of noise it may contain. (For example, the IPCC has arbitrarily chosen to use linear AR1 models to fit the historical temperature record, and I’m sure you are aware of the controversy that choice has caused.) Our understanding advances much more rapidly when we use “physical models” in place of “statistical model”. However, you must handle the physics equations correctly; not be “motivated” by flawed mathematics.

        Furthermore, if your explanation is the correct (and they were aware of these mistakes), the authors have inexcusably deceived the readers of this paper.

    • RomanM
      Posted Feb 10, 2015 at 5:52 PM | Permalink

      You are correct on all counts here.

      The approximation is indeed just the first two terms of the series expansion of 1/(1+x). In the paper, the authors state: “This equation holds for each start year separately and suggests the regression model…” which somehow justifies the separation of a and k when, as you noticed that the a and k terms had the same multiplier in the expanded equation. This separation helps to remove the individual effects of a and k from ΔT and they are then surprised that the residuals and the predicted values seem not to depend on the differences in the various models (which are related to the values of a and k.

      • Frank
        Posted Feb 10, 2015 at 7:40 PM | Permalink

        Roman: Thanks for confirming my work. It always seem more likely that I have made a mistake rather than an error like these two getting all the way into a published paper.

      • Posted Feb 11, 2015 at 4:54 AM | Permalink

        The coefficient is the same in the expansion of the simple formula, but α and κ have different roles in the models. Therefore it’s not known, whether they affect ΔT very similarly or not.

        As I already wrote, all the discussion that precedes the first formula with betas as coefficients is only motivation, not derivation.

        • RomanM
          Posted Feb 11, 2015 at 7:56 AM | Permalink

          Pekka:

          α and κ may have different roles in the models, but the starting point physics equation for analyzing ΔT postulated by the authors is ΔT = ΔF/(α + κ) + ε.

          In that relationship α and κ impact the the result only through their sum. A change of an amount δ in α has the same effect on ΔT as a change of an amount δ in κ. So the proper “expanded” equation for this situation is two keep the two variables together as a single variable ρ = α + κ. This is indeed still the case in the unnumbered equation next to Figure 2c in the paper.

          However, the authors then make an unjustified assumption that the two variables have different effects on ΔT by presenting the “suggested” version of the regression equation actually used. In the case that they are not separable, the introduction on an extra parameter provides room for wiggle-matching and possible distortion of the results. Do you not think that it would have been more appropriate to do the initial regression using ρ and then look at whether there is the assumed relationship exists between α or κ individually and the residuals and predicted values of the regression?

          As it stands, I did not see any formal analysis in the paper that justifies the use of the regression in the form that the authors used.

      • Frank
        Posted Feb 12, 2015 at 12:20 PM | Permalink

        Roman: Suppose I want to use linear regression to analyze data that arises from a physical situation that produces y = a/(1+x) relationship. I inappropriately use the approximation y = a-ax (ignoring -ax^2 and possibly higher order terms that may be significant). Aren’t I going to end up with residuals that are much bigger than necessary? In M&F, the residuals are interpreted as unforced variability. If you apply ANY physically-inappropriate regression equation to the CMIP5 data, you will artificially inflate the unforced variability present in the CMIP5 output.

        M&F have construct a model that converts the histograms in Figure 1 into the histograms in Figure 2. Common sense tells me that they have made a mistakes somewhere in the process.

        Furthermore, the regression equation should not have separate terms for α and κ, since they only appear as a sum. If this degree of freedom were removed, I suspect the spread of the histograms in Figure 2 would widen even more.

  61. R Graf
    Posted Feb 10, 2015 at 8:35 PM | Permalink

    From M&F response on CLB: “Because radiative forcing over the historical period cannot be directly diagnosed from the model simulations, it had to be reconstructed from the available top-of-atmosphere radiative imbalance in Forster et al. (2013) by applying a correction term that involves the change in surface temperature. This correction removed, rather than introduced, from the top-of-atmosphere imbalance the very contribution that would cause circularity.“

    One can read this paragraph many times and still not understand, IMO, that M&F are completely ignoring the accusation of derivation circularity and instead replying with a recitation of Forster’s 2013 methodology for finding F (forcing), which was to derive a conjugate assumed temperature increase for every TOA imbalance whose sum would be assumed to equal the forcing caused by the known increase in GHG. And, thus their coy response is that if one failed to plug in the temperature correction, as Forster aptly did, one would chase their tail theoretically as the forcing was satisfied toward equilibrium, (radiant balance).

    Technical talk here about their regressions being statistically troublesome (or behaved) is IMO being blinded from the forest by the trees. The aim of F&M was to validate CMIP5 wholesale to quash mermors of it already failing the old fashioned way. So they selected 36 models (not 114 as reported in Science Daily) out of CMIP5. Perhaps Nic knows if they were the same models unchanged from Forster’s 2013 study. They also ran a subset using 12 of these filling them with AR5 data and running them from 1900 (as if nobody ever thought to do this before.) Then they were relieved to report that the largest divergence for any 15 years was 0.3 K. Yes, they could hit a barn.

    Their conclusion is the same as the assumption: that 15 years is completely filled random chaos that covers the forcing signal. One must wait 62 years before forcing can resolve truly into view (with some uncertainty from 5-95%). The unmentioned huge assumption that all on CA are all too familiar with is: there is no centennial variability to worry about (thank you Mann Hockey Stick).

    “The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded. “ — Marotzke and Forster

  62. R Graf
    Posted Feb 10, 2015 at 11:40 PM | Permalink

    Ross at top wrote: “…dN is a term capturing the Top of Atmosphere radiative imbalance, which in this context is just a source of noise, …”

    I believe dN is central to F&M’s work here. It’s likely inflated and taking up the slack to account for lack of temperature rise which should be taking it’s place in the energy balance. But I find the proposition the atmosphere lacks the ability to warm itself to a new equilibrium in one year is troublesome, never mind 15 years. I know what you are thinking; ocean heat banking. But isn’t this accounted for now in the kappa term? So what am I missing?

    What will the next paper’s assumptions and conclusions be if the pause continues to 20 years? Is anyone making predictions anymore?

    • Layman Lurker
      Posted Feb 11, 2015 at 2:06 AM | Permalink

      I believe Ross’s “context” for dN is within the circular regression equation where dF is re-written as: a*dt + dN. dN is then simply an added term on the rhs of the regression equation and not a predictor wrt dT. No different then the way the error (noise) term is configured in a simple regression equation.

  63. R Graf
    Posted Feb 11, 2015 at 8:29 AM | Permalink

    I don’t see any discussion about the model selection, which being itself here being analyzed, is of course crucial to the science validity. Nic wrote there were 18 models; the paper’s abstract notes listed 36 and 12; Nature’s table shows 35 models; and Science Daily reported 114 models. The Nature link here shows the table of models, 17 with forcing data, each model having (randomly?) 1 to 10 realizations run, for a total of 114.
    http://www.nature.com/nature/journal/v517/n7536/fig_tab/nature14117_ST1.html

    As we know from M&F now the forcing data was from Forster (2013) the question becomes how familiar was he already with the models and their behavior as he selected them for study and which to repeat and to aggregate subsets of. All these question I submit are fraught with peril.

    If I understand Pekka’s feeling now on the circularity its that if the forcings were averaged and made one value before reuse that would dilute the identity problem of F somewhat. Does anyone know if the ensembles were all fed the same F value?

    • Posted Feb 11, 2015 at 11:52 AM | Permalink

      R Graf:

      How valid are multiple entries from the same model? Presumably if you imagine the extreme case where all data points are from separate runs of the same model, then all variation in dT is due to the internal variability. Maybe only one run from each model should be included.

      • R Graf
        Posted Feb 11, 2015 at 12:46 PM | Permalink

        Now that M&F have proved the models reliable and Mother Nature not there is time to run all the models in a 100 different ways. Right?

    • Posted Feb 12, 2015 at 7:26 AM | Permalink

      Forster et al 2013 analysed 23 models for which at least some data they needed was available, but for some of these TOA radiation data was not available for the relevant CMIP5 simulations. This data is required to compute N and hence derive F. Also, data for the FGOALS-s2 Historical run was subsequently withdrawn as faulty.

      It is unclear to me why the inmcm4 model was excluded from M&F’s study, but otherwise the set of 18 models used looks logical to me. Note that although the NorESM1-M model is not ticked in Extended Data Table 1 of he study, I believe that is probably an error and that it was in fact included in the analysis.

  64. rwnj
    Posted Feb 11, 2015 at 8:41 AM | Permalink

    Every observation of real data has some error attached. The errors will have some statistical characteristecs, possible pernicious, possibly benign. If the observations are fed into an equation in order to infer an estimate of another quantity, the equation itself is an observation process for the inferred quantity. For example, if x = a + b, and a and b are to be inferred from an observation of x, then the errors from observing x will be distributed to a and b, but, constrained by the equation, the errors will be negatively correlated. This will be a property of the estimate, regardless of the assumed physical proprties of a, b and x.

  65. Steve McIntyre
    Posted Feb 11, 2015 at 10:01 AM | Permalink

    No response yet from Marotzke on my request for data as used and details on his methodology,

    • Don Monfort
      Posted Feb 11, 2015 at 12:04 PM | Permalink

      It’s a really tough decision. It might take him a while.

      • AJ
        Posted Feb 11, 2015 at 8:57 PM | Permalink

        But what if they find something wrong with it?

  66. RomanM
    Posted Feb 11, 2015 at 2:49 PM | Permalink

    (This comment is a re-post of a comment posted at the Climate Lab Book blog)

    Pekka has proposed that the regression can be done in a restated form of the original equation. This is incorrect. The problems with the regression model adopted in M and F are due to the endogeneity of the situation and in no way do they depend on (nor does this comment address) the correctness of the specification of the model.

    In order to understand the arguments on the effects of circularity on the regression used in M&F, it is necessary to look at the Least Squares methodology in a bit more detail.

    The authors start with a mathematically based statistical model:

    ΔT = a + b ΔF + c α + d κ + ε

    In the model, the variables ΔF, α and κ are assumed to be independent of ε which accounts for the random variation of ΔT in the statistical model. The ε’s are assumed to be independent of each other and to have means equal to 0. In this case, the authors have implicitly assumed that the ε’s are also homoscedastic, i.e. each having the same variance. There is a further very important assumption that the ε’s also be independent from all of the predictors.

    In LS, estimates of the coefficients and the variance of the ε’s are obtained by first forming a sum of squares of the residuals:

    SSE = ∑ε2 = ∑[ΔT –(a + b ΔF + c α + d κ)] 2

    and then minimizing SSE with respect to the parameters a, b, c and d. It should be noted that the parameter estimates are functions not only of the non-random variables, but of the ε’s as well so they are random variables within this structure. In this case, the minimization procedure is simple to carry out using easily calculated matrix algebra.

    Now what happens if ΔF is calculated from a previous relationship with two variables: ΔF = α ΔT’ + ΔN?

    We substitute this relationship into the original equation to get:

    ΔT = a + b(α ΔT’ + ΔN) + c α + d κ + ε

    If ΔT’ is not the same as ΔT, then nothing is changed. The variables on the right hand side are still unrelated to ε and the entire procedure gives identical results to the previous case. However, if ΔT’ and ΔT are identical, the situation becomes radically different.

    Now, ΔT has become a predictor of itself and the ε’s are present not only at the end of the regression equation, but also (invisibly) through the ΔT which is also on the right hand side. The predictors have violated a very important assumption that they must be independent of the ε’s. Hence, the usual simple regression procedure fails and all results from it are spurious. Estimates of the parameters, confidence interval and p-values will be biased and therefore neither reliable nor scientifically meaningful. This violation occurs even if one uses ΔF in the regression procedure. Despite the fact that you can’t “see” ΔT in the equation, its effect is still present mathematically because it has been used in the calculation of ΔF.

    To produce a solution for this situation, the regression equation can be rewritten as Pekka suggests:

    (1 – b α) ΔT = a + b ΔN + c α + d κ + ε

    and the sum of squares becomes:

    ∑ε2 = ∑[(1 – b α) ΔT – (a + b ΔN + c α + d κ)] 2

    Minimizing this with respect to the coefficients in the equation is not as simple as in the above cases, but can be done with a little bit of programming or by using available optimization techniques. But the story does not end here. From the regression, we need to form the decomposition:

    ΔT = Predicted(ΔT) + Residuals(ΔT)

    In the ordinary regression case, the predicted value is calculated by replacing the corresponding values of the predictor variables into the equation for each model. The residuals are then calculated by simple subtraction or taken directly from the minimizing process for SSE. For the circular case, the entire equation must be divided by (1 – b α). Note that α (and therefore 1 – b α ) are in fact vectors whose elements have different values depending on which climate model the particular observation is from. This has some important consequences, not the least of which is the introduction of bias into the entire estimation process.

    First, b (and therefore 1 – b α) is itself a function of the ε’s in the model. The distribution of the ratio of two random quantities is very complicated and can be unstable, particularly if the divisor is close to zero.

    A second consequence is that the effective coefficients for the predictor variables will be different for each climate model in the regression. As noted above the divisor is a vector so the actual value will be different for every observation.

    Finally, the residuals are no longer the ε’s themselves. Due to the division process, they have become ε/(1 – b α). Their independence has been destroyed due to the common presence of the estimate of b and they are now heteroscedastic with a variability depending on the sign of b as well as the magnitude of α.

    The bottom line is that the regression done in the M and F publication is inappropriate and their subsequent results are scientifically unreliable and difficult if at all possible to correct.

    • Posted Feb 11, 2015 at 4:52 PM | Permalink

      RomanM,

      In the part of the analysis, where the regression coefficients are determined. ΔT is external data that’s used to determine the regression coefficients. As external data, not affected by the calculation it may occur any number of times in any number of places without causing problems in the calculation, it’s just a fixed set of numerical values as are the values ΔN and model specific parameters α and κ. This is data picked from the database. This data is used to determine the regression coefficients, that’s all, and that’s so simple.

      • John Bills
        Posted Feb 11, 2015 at 8:27 PM | Permalink

        Pekka,

        How did they derive the data in the database?

      • dljvjbsl
        Posted Feb 11, 2015 at 8:45 PM | Permalink

        As I understand it the value for F is not external data but the estimate of the value of external data. It is derived from internal values. There is an external but unknown parameter that is being estimated and the estimate is contained within the variable F.

        • dljvjbsl
          Posted Feb 11, 2015 at 8:48 PM | Permalink

          Since it is an estimate derived from a calculation the value contained in variable F is subject to the errors associated with the internal values T and N is thus dependent on them. It is not an independent external value.

      • Posted Feb 11, 2015 at 10:48 PM | Permalink

        I’m confused. The issue is not whether deltaT is exogenous but whether deltaF is really deltaFhat(deltaT), a stochastic estimator correlated with the error term that is pretending to be a deterministic variable uncorrelated with the error term. “That’s so simple.” Apparently neither view of this is simple to those seeing it the other way.

        • Sven
          Posted Feb 12, 2015 at 3:49 AM | Permalink

          stevepostrel: ” “That’s so simple.” Apparently neither view of this is simple to those seeing it the other way. ”

          Apparently Steve Mc’s statement to aTTP “you’ve grabbed the stick by the wrong end” that was snipped by Ed Hawkins at his site as offensive or inappropriate was actually not meant to be offensive but quite appropriate to describe the reason for misunderstanding each other.

    • stevefitzpatrick
      Posted Feb 11, 2015 at 7:08 PM | Permalink

      Roman, Pekka,

      You are speaking two different languages. I suspect that only analysis of synthetic data with known characteristics using the paper’s methods will resolve the issue. My guess is that Roman is correct here, and that the circularity makes the paper worthless. But in any case, it is a question which should be possible to resolve with little doubt.

      • ianl8888
        Posted Feb 12, 2015 at 3:12 AM | Permalink

        You are speaking two different languages

        We have seen this situation several times before (Beanstock’s papers, for example)

        Essentially, the physicists say that the statisticians lack physical knowledge and acumen, while the statisticians respond that the physicists abuse known statistical procedures

        They do indeed talk past each other. I expect this to never be resolved

    • R Graf
      Posted Feb 11, 2015 at 7:54 PM | Permalink

      As to the debate as to whether ΔF is an external independent input variable, in most cases it would be because is an independently measured or estimated input to get an output of ΔT. Models output ΔT, which was evaluated by M&F. But in Forster (2013) a new machine outputting ΔF was created using the climate model’s parts. This also would be fine unless you are using the same ΔF as feedstock to the same model you got it from. If you dilute the output by averaging ΔF from multiple outputs or use one output on multiple models as input you should expect less and less inherited traits but they are still there.

  67. R Graf
    Posted Feb 11, 2015 at 2:54 PM | Permalink

    ΔT = ΔF / (α + κ) + ε Where we now realize ΔF is dependent on ΔT twice, once by the equation and once by internal construct (is my understanding). Does anyone think the equation is an acceptable for analysis as stands? Seeing alpha and kappa in the denominator of this equation gives no information as to whether these feedbacks that only temporarily affect (T) like, TOA imbalance (N) and ocean heat banking (k), each with its own timescale, versus convection efficiency, which is independent of time but dependent on temperature. The equation needs more components be meaningful.

  68. Szilard
    Posted Feb 11, 2015 at 9:08 PM | Permalink

    Really interesting discussion – tks.

    For me, it’d be even better without crud like personal jabs, blog-vs-blog sniping, discussions of who ATTP’s Clark Kent is, moderation whining, general low-rent rhetoric etc etc unless formulated so as to deliver a high humor/tedium ratio, which seldom occurs.

    CA obviously has a better signal/noise ratio than the propaganda sites, Climate Etc and so on, but I think it would be a lot better with a bit more moderation rigor:

    – Nothing OT.
    – No tedious ideological rhetoric.
    – Nothing about individuals’ identities, characters, motivations, politics, dress sense.
    – No room for blog-war spill-over from elsewhere.
    – Unless funny.

    • davideisenstadt
      Posted Feb 11, 2015 at 9:36 PM | Permalink

      any thoughts on the issue you would like to contribute?

    • clipe
      Posted Feb 11, 2015 at 10:18 PM | Permalink

      Rigour is more precise than rigor if you want more [unless]funny.

    • William Larson
      Posted Feb 12, 2015 at 11:09 AM | Permalink

      Szilard–
      I got myself schooled mostly as a scientist (chemistry) but am totally out of my depth here in nearly all these posts, such that often when I contribute it is with “humor” only. And I tend not to get snipped for it, but it has no real value otherwise, it advances no discussions. So perhaps the “tedious ideological rhetoric” and “blog-war spill-over” is more valuable over-all. Perhaps. Myself, I say, “It’s the internet!”, which roughly translated means, “Here on the internet things are supposed to be free-wheeling and wild.” And on top of it all, I learn a great deal from S. McIntyre in how he “runs” this blog–I learn things about to run my own life (!). For example, he allows food fights up to a point; he’s most generous with critical comments; he doesn’t try to be a “blog tyrant”; he tries to get at the truth and stick with it; he is able to make his points often in colorful ways. I say that all this is good. If you come here a lot I think that you will have a similar evaluation. And also, thanks for your honesty in speaking your mind as you did.

  69. Posted Feb 12, 2015 at 4:35 AM | Permalink

    Several people have raised questions about my comments. I don’t answer them separately, but try explain once more, how I see the situation.

    CMIP5 database contains data on model runs from many models, and several runs from most of them. The model runs have resulted in a “spaghetti” of temperature histories. As the models and model runs differ in many different ways, it’s difficult to figure out, what the temperature histories tell. Marotzke and Forster present an attempt to extract information from that spaghetti. Based on various arguments they end up in the hypothesis that the variability of the temperature might be related to three other characteristics of the model runs by the formula

    ΔT = a + b ΔF + c α + d κ + e, (1)

    in the way that the residual e would be mainly internal variability that cannot be explained causally. Δ refers here to change in the variable expressed as linear trend over a period of 15 or 62 years.

    None of the variables ΔT, ΔF, α, and κ is directly input to the models, all are determined from the model results. As far as I understand, only ΔT can be found directly from the results of the model runs, all others are determined by analyzing the model outputs in separate studies. The values of F have been obtained by Forster (2013) for every year and every model run using the formula

    ΔF = ΔN + α ΔT, (2)

    As all variables are based on model results, the meaning of each of them is defined strictly only by the procedure that’s used to determine it’s values from the model output. Thus use of the regression formula involves two assumptions:

    1) we understand what the variables mean
    2) the formula (1) is a good enough description of the behavior of the actual models.

    Marotzke and Forster state the first point at least implicitly, and the second point explicitly. Thus they agree that the assumptions are only assumptions, part of the hypothesis they have made.

    After these preliminary considerations we have two main steps in the analysis that they report:

    1) Determination of the regression coefficients a, b, c, and d separately for every period considered (98 periods of 15 years, and 51 periods of 62 years).

    2) Use of the regression models to draw the graphical presentation of the results and to draw other conclusions.

    The first step does not contain any circularity at all. It’s a set of straightforward calculations to determine the parameters a, b, c, and d. All the numbers picked from the model runs are well defined. The fact that ΔT appears both on the left hand side of (1) and affects the right hand side of (1) through (2) does not change that observation. (The value on the left hand side is not determined by formula (1) and fed to (2), but totally fixed by the CMIP5 data.) All the variable values in (1) are totally fixed by the CMIP5 data.

    Now we have the regression models

    ΔT = a + b ΔF + c α + d κ (3)

    for every period with coefficients determined in the first step.

    Next we face the question of, what formula (3) really means, and how it can be used. We have a well estimated regression model for variables, whose meaning is not as well understood. α and κ are perhaps not really what their names imply. The dependence of equilibrium temperature on forcing is not necessarily as simple as the defining formula assumes making α perhaps to vary over time. κ may also be variable in the models and depend on the initial state. Similar problems apply to F. Forcing is a consequence of changes in external variables like CO2 concentration, volcanic activity, and aerosols, but the operational model F defined by (2) is not controlled in a well understood way by these external factors.

    In spite of all the issues of the above paragraph, we can find out the ranges that the variables have according to their operational definitions in the ensemble of model runs, and we can calculate the residuals, when we apply (3) to each model run to predict ΔT. M&F report the predicted values of ΔT and residuals in their Figures 2 and 3. They report also the contributions of the three variable terms of (3) either in the paper or in the extended data.

    So far so good. But what does this mean? Do the results support their conclusion:

    The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.

    Or should we believe the surprising result of the paper that α and κ have little influence on the temperature trends as solid and valid for a non-biased set of models with α and κ that mean, what they are usually defined to mean?

    These conclusions are not as obviously correct as the technical correctness of their basic approach. The authors do also tell about caveats that might undermine these conclusions. There’s space for further study. Access to their data would help in some of that further study, but probably it would be better to go to the original source of the data (the CMIP5 database) and use it’s content in some other way.

    It’s also possible that the CMIP5 database is too restricted as a source and that further and quite different model runs are needed to learn essentially more even about the present models.

    • Posted Feb 12, 2015 at 4:49 AM | Permalink

      Pekka, non-independent analysis is common in research of all stripes. That the conclusions drawn from such work may be correct, or incorrect is immaterial. That their methods and analysis do not lead to the conclusions is the key. Healthy science is to cut out such wrong methods quickly before others ‘build’ on it and advance more theories, professors commit their students to blind alleys of inquiry and funding is poured into chasing ghosts. Non-independence is sometimes not easily detected and the clues are papered over with egos and reputation. At least that is not the case here.

      • Posted Feb 12, 2015 at 5:09 AM | Permalink

        There’s no reason to worry about spread of some bad methodology in this case.

        The only real method used is regression. That remains as good and as prone to misuse as before.

        The task is finding a way of telling in a more transparent way what an ensemble of model runs contains about one limited question.

        The rest is so case specific that there’s nothing to spread.

        • Posted Feb 12, 2015 at 6:16 AM | Permalink

          I should have been more clear. Wrong methodology should be cut out before conclusions from such methods become accepted as part of the scientific discourse – in this case, namely, that models contain meaningful natural variability, among others. If not there are two possibilities – either this paper becomes the last word on this topic or others build on the assumption without questioning it, and both are bad outcomes. It the methods are not correct, the conclusions are not useful.

    • Hoi Polloi
      Posted Feb 12, 2015 at 5:55 AM | Permalink

      So far so good. But what does this mean? Do the results support their conclusion.

      Well does it, Pekka? Do you agree that this paper proofs that the climate models do not overestimate, everything is fine and dandy and the hiatus not being in the models seem to be a problem. At least that’s what M&F claim and parrotted in the MSM.

      I find it puzzling to note that many here and in other blogs have difficulties to see what the authors want to proof anyway.

      Another thing, Nic Lewis who wrote this scathing rebuttal, seem to be more busy with other things (see his Climate Lab comment) than to react to Pekka’s points. I find this rather disappointing.

      • Sven
        Posted Feb 12, 2015 at 6:17 AM | Permalink

        “I find this rather disappointing”
        +1

    • Posted Feb 12, 2015 at 6:56 AM | Permalink

      Pekka

      You’re argument seems to completely ignore the existence of variations in ΔT that are not explained by the regression (residual “errors”). These appear on both sides of the regression equation, contrary to the assumptions in ordinary least squares regression. That is what leads Roman M to say:

      “Hence, the usual simple regression procedure fails and all results from it are spurious. Estimates of the parameters, confidence interval and p-values will be biased and therefore neither reliable nor scientifically meaningful.”

      The fact that the values for ΔT are simply numbers in a database is not relevant.

      You also write:

      “Access to their data would help in some of that further study, but probably it would be better to go to the original source of the data (the CMIP5 database) and use it’s content in some other way.”

      I don’t know if you have ever tried obtaining and processing CMIP5 data, but it is a complex and time consuming business. The file structures, model grids, etc. vary from model to model. There is a great deal of processing needed just to get the raw data into useful form, and in my experience it is not easy to automate the processing. There are also errors in CMIP5 data, and it gets updated from time to time.

      There is also quite a lot of post-processing involved. For example, if may be necessary to identify corresponding segments of the preindustrial control runs and to deduct offset and drift ocurring in them from the data being used (here a splice of Historical and RCP4.5 experiment data). M&F don’t mention doing so, but this was done in Forster et al 2013.

      Masking for HadCRUT4 observational availability requires further processing and defining of rules as to what counts enough data in each time period. M&F provide little details of their methods, and replication of their work would be difficult if not impossible without provision of a detailed, algorithmic statement of their processing steps (in the form of their computer code or otherwise), along with all non publicly-available data used.

      • Posted Feb 12, 2015 at 7:55 AM | Permalink

        Nic,

        I do not ignore anything that is there. I only observe that it has no effect of the kind you seem to think. It’s a red herring that having ΔT in that way on both sides is a problem. It’s not, the claim that it’s a unfounded assertion based on misunderstanding the situation.

        That the values of ΔT is not only relevant but essential, because that’s the reason that prevents any problems of that kind from entering the analysis at any point in the determination of the regression formula.

        Similar formulas are problems in some other problems, where they are used in a different way. Therefore it took me as well some time to realize that it’s not a problem at all in this case.

        • Tom Gray
          Posted Feb 12, 2015 at 8:23 AM | Permalink

          Wouldn’t it be relatively simple to set up an experiment with synthetic data to investigate the conjectures that the variations in ΔT either make of do not make a difference. The effect could be quantified by simulation and its significance for the result determined. I’ve followed the discussion, as much as I could, here and at the other site. It just seems to be a trading of assertions. The requirements for OLS are not met on one side. And on the other side a) The requirements for OLS are met or The requirements for OLS are not met but any error is insignificant. An investigation with simulated data could resolve this dispute.

        • Posted Feb 12, 2015 at 9:12 AM | Permalink

          Tom,

          What synthetic data?

          The procedure of M&F is basically stable and works surely without any problems with any synthetic data (I assume no simple technical errors, but have no reason to think that there are technical errors).

          If we introduce a model with some suitably chosen properties and generate from that model both the synthetic data and results that the analysis should reach, we may find a contradiction, but there’s no way of proving that the expectations we have generated is really correct. Thus the contradiction proves nothing. The error can be equally well in our model as in their method.

          The only models that can be trusted to produce relevant synthetic data are the models used to generate the original data in CMIP5 data base, because the analysis is supposed to tell about those models. Further model runs by the same models using significantly different forcings might show that the final conclusions of M&F are erroneous – and I would not be surprised if that were the case. That would not mean that they have made technical errors in the analysis, but that would mean that their assumptions are not correct enough or that the CMIP5 runs used are not representative of models running under different conditions.

        • William Larson
          Posted Feb 12, 2015 at 11:48 AM | Permalink

          “It’s a red herring that having ΔT in that way on both sides is a problem.” Hmm. Don’t know much about climate science or statistics, but I do try to be a student of the Scientific Method. Karl Popper, whom I love to channel from time to time, has written that if an oracle in ancient Greece had said, “The structure of DNA is a double helix,” that would NOT be a SCIENTIFIC truth. A truth is only a scientific truth if arrived at by scientific methodology, so the METHOD is extremely important at all times. So here we are, staring at delta T on both sides of the equation–and how can we look at that and say that this method is still scientifically valid? Maybe the RESULT is true, is valid–and it certainly has all those scientific-looking trappings–, but one might say that any such result cannot be SCIENTIFIC truth, more like oracular truth.

        • Posted Feb 12, 2015 at 7:27 PM | Permalink

          I don’t think Pekka grasps the idea that there are supposed to be true parameters a, b, c, etc. inherent in the model and that these are what we care about. Later he says “The goal is to describe the output of the models summarizing certain potential relationships between the variables considered. Thus looking at the output is what must be done – and is done.” This statement seems clearly wrong–the goal of the regression is not to summarize the observable data relationships in an exploratory fashion but rather to infer the hidden true values of a, b, and c in the model.

          For that purpose the regression estimators are random variables that either do or do not converge to the true values with more data accumulation (i.e. they either are or are not consistent estimators of the parameters in the sense of plimming to the truth). Roman and Nic have shown that they do not converge. Yes, you get “numbers” from performing the regression steps but those numbers have no necessary relationship to the true values a, b, and c that we care about. The reason for the regression failure is correlation of the RHS variables with the error term, a garden-variety endogeneity problem often encountered in trying to run regressions.

        • Allchemistry
          Posted Feb 14, 2015 at 7:29 AM | Permalink

          For ordinary mortals working in the biomedical-research field, to get a paper published in high profile journals as Nature and Science is a very challenging enterprise. The vast majority of submitted manuscripts won’t even pass the first selection by the editorial board and are upfront declined. The ones that get through face a very stern review process and more often than not, the reviewers will ask to do a multitude of additional experiments to further strengthen the conclusions. Mind you, we are talking here about laborious experimental work, which, believe it or not, is even more painstaking than reading thermometers, measuring tree-ring widths or copy-paste data into computers. You may understand that I was a bit surprised to read that for climate research it is sufficient to present indicative, exploratory results in a Nature paper. Some animals are indeed more equal than others.

      • Posted Feb 12, 2015 at 8:44 AM | Permalink

        I should perhaps add that what I have written is true for the regression model chosen by the authors. An essential detail in that is the way the residual is introduced. When we keep the basic idea that regression coefficients are determined by minimizing the sum of squares of residuals, an alternative definition is to link the residual directly to the value of ΔT, not as it’s done in the paper. In this case we have using my notation

        ΔT – e = a + b(ΔN + α(ΔT – e)) + c α + d κ

        and

        e = ΔT – (a + b ΔN + c α + d κ) / (1 – b &alpha)

        This alternative regression model has for some model runs and some values of b very large coefficients for all the other terms than ΔT. In this alternative regression model the residual is not a linear function of the coefficients and the minimization is therefore more complex. Because the sum of the squares of residuals is minimized, a correct minimization procedure avoids such a situation, i.e. forces the value of b off from the inverse alphas of every model. Even this is probably not very serious, because the coefficients need not change very much to get far enough from the singularities.

        The basic observation is that this alternative is not the regression model of M&F.

        • R Graf
          Posted Feb 12, 2015 at 10:18 AM | Permalink

          The conclusion M&F have produced comes from their analysis of complex machines that have to be considered black boxes. The quality of the box’s output is impossible to determine by immediate inspection. Thus M&F are testing the boxes output. But one of their tools they admit part is itself output of the black boxes. Without getting to statistics can you provide why the quality can not be trusted? And, where does the burden of proof rest for quality?

        • R Graf
          Posted Feb 12, 2015 at 10:24 AM | Permalink

          I meant how can the quality be trusted. The AR5 data is not a source of new input. They only new input F was the part derived from the black boxes.

        • Posted Feb 12, 2015 at 10:38 AM | Permalink

          The goal is to describe the output of the models summarizing certain potential relationships between the variables considered. Thus looking at the output is what must be done – and is done. A simple linear model is used as the tool, whose coefficients are determined by regression. Qualitative physical arguments are used to argue that the approach makes sense. The authors list also several caveats.

          After getting over the confusion created by the post of Nic, I think that the paper describes well enough, what they have done, and what their analysis has produced. There’s obviously some new information in their results, but personally I’m not convinced that the results allow for strong conclusions. The acknowledged caveats alone allow for doubt, and there may be additional issues that have a major effect. What Nic proposed is not among them in my opinion.

        • R Graf
          Posted Feb 12, 2015 at 12:00 PM | Permalink

          Pekka, You have been a great referee and generous with your time. I know this quesion is a tough one but do you believe the the author response shows they appreciated the statistical orthodoxy issue here and came to your conclusion or was it just a near miss: no harm-no foul?

        • Bob K.
          Posted Feb 12, 2015 at 12:48 PM | Permalink

          Pekka,

          Perhaps the dodgy statistical modeling used by the authors indeed was innocuous. You said that it took some time of checking before you came to that conclusion. If true it’s not surprising, because inferior statistical methods sometimes do lead to valid conclusions, but that doesn’t justify an endorsement of their usage. The problem here is that other researchers may now use similar methods and cite this paper as support. The next time around the effect may not be innocuous.

        • Posted Feb 12, 2015 at 1:59 PM | Permalink

          R Graf,
          I think that they are fully aware that their linear regression model is at best a crude representation of the actual model runs. It’s a scientific paper and it can be assumed that readers of such a paper understand that a linear regression model cannot be accurate and reliable, even if that’s not emphasized in the paper.

          The uncertainties in estimating forcing was discussed in the earlier Forster (2013) paper that is used as source for this paper.

          There are no objective measures to tell, how much trust should be given for their results. As authors of the study they may choose an optimistic view on that and discuss their results based on that. I’m presently less confident, but that’s just a personal judgment.

          The results should perhaps be taken rather as indicative and exploratory than fully quantitative. For this reason the paper does not contain any error estimates (at least I haven’t observed any) for the results, all ranges tell about the spread of different models, not about accuracy of any conclusion.

        • Paul_K
          Posted Feb 13, 2015 at 12:30 AM | Permalink

          Pekka,

          I think that you have to drive through two STOP signs in order to get to your starting point in this analysis. The first barrier is that the “emulation model error”,( i.e. the ability of the emulation model to predict the forced temperature change) is (a) substantial and (b) has a bias which is a function of the interval length of the period(s) selected for regression (“temporal bias”). This emulation model error does not go away even if the circularity could be intelligently circumvented. And it ends up being interpreted by M&F as natural variation.
          The second problem is that M&F set up a logical contradiction by including F, alpha and K as free variables in their regression form. It contravenes the physical model which M&F establish as a basis and justification. This is not the same as the circularity error.
          For a specific GCM, the value of alpha is taken from a step-doubling or step-quadrupling of CO2. The forcing for a doubling of CO2 is defined simultaneously such that AF2x = alpha * ECS. The assumption that alpha is invariant with time and temperature immediately forces linearity onto a curve (in the vast majority of models), and results in the GCM information from the first decade or three being discarded. This introduces the first component of temporal bias in the emulation model via the calculation of the forcing term.
          For the same GCM, the value of rho (= alpha + k) is taken from a 1% per annum increasing CO2 run. It is the gradient of a plot of Forcing against Temperature. However, the forcing which is used as ordinate on this plot is determined by the value of AF2x as calculated above (simultaneously with alpha). Hence the gradient is always equal to AF2x/TCR.

          So we have rho = AF2x/TCR = alpha*ECS/TCR .
          The predictive component of the M&F emulation model is a simple linear scaling of forcing:
          DelT(predicted forced) = DelF*TCR/(alpha*ECS) + model error (1)

          This is derived as a degenerate solution of a 2-body feedback model. First assumption is that the feedback is linear with temperature (which is not valid for the GCMs, hence the enforced linearization when alpha is selected). The second assumption is of an infinite acting ocean, which leads to CdT/dt = F(t) – rho*T. Third assumption is of a constant linearly increasing forcing. The analytic solution for this case asymptotes to a linear relationship between forcing and temperature, with gradient rho. Fourth assumption is that the surface mixed layer heat capacity is negligible (C->0), which leads to (1) and which, by eliminating the early asymptotic behavior introduces a second component of temporal bias in the emulation model error.

          Moving on, DelF is taken from the historic run data over the period of interest in the form
          DelF = DelNactual – alpha *DelTactual
          =DelNactual – alpha*(DelTf + DelTnv) (2)
          Where Tf and Tnv represent the partitioning of the observed GCM model temperature change into its forced component and “natural variation in the GCM”.
          Hence, substituting (2) into (1), we obtain:-
          DelT(predicted forced) = DelNactual*TCR/(alpha*ECS) – DelTf*TCR/ECS – DelTnv*TCR/ECS +model error

          Even if you wave away the problem of circularity in this expansion, notice that the parameter, K, does not appear anywhere. Notice also from (1) that if a free regression coefficient is allowed against DelF, then the regression is apparently insensitive to any variation in alpha as a free variable. By breaking out alpha and K as free variables in the regression, therefore, one concludes that the predicted forced temperature response is not sensitive to either alpha or K! Which seems to be what the authors found.

        • fizzymagic
          Posted Feb 13, 2015 at 4:20 AM | Permalink

          Pekka writes:

          I think that they are fully aware that their linear regression model is at best a crude representation of the actual model runs.

          So why use it?

          It’s a scientific paper and it can be assumed that readers of such a paper understand that a linear regression model cannot be accurate and reliable, even if that’s not emphasized in the paper.

          There are no objective measures to tell, how much trust should be given for their results.

          the paper does not contain any error estimates (at least I haven’t observed any) for the results, all ranges tell about the spread of different models, not about accuracy of any conclusion.

          I’m sorry, but publishing a paper based on an inaccurate and unreliable model with no attempt at uncertainty quantification doesn’t seem all that scientific to me. Typically when I submit a paper, I’ve done months of work to be sure that all possible errors are addressed and quantified. To not do so evinces a complete disregard for the process of science.

          The fact that there are people in the climate science community defending this paper says a great deal about the standards of the community. None of it is positive.

        • Posted Feb 13, 2015 at 4:59 AM | Permalink

          So why use it?

          Because the CMIP5 archive contains information about the models, that information is difficult to interpret, and nobody has evidently presented any better method to extract information on the question M&F study.

        • Posted Feb 13, 2015 at 5:35 AM | Permalink

          Science is a process that has resulted in more and more understanding that describes better and better the real world.

          That’s characteristic of the full scientific process.

          Scientists study typically small details at the edge of the knowledge. The issues they study are mostly difficult. When they think that they have made progress, they publish. That brings their results to wider knowledge and allows other scientists to look at them. It’s very common that it turns out that their results are either simply wrong or, more often partly right and partly wrong or misleading.

          Science would develop very slowly, if scientists would not publish their findings in spite of their potential errors.

          Thus it’s not excluded that the results of M&F are erroneous and misleading. My point in this thread has not been disputing that possibility – I have expressed my own doubts on the reliability of the results pretty directly. What I have emphasized is that they have not made such an obvious and stupid error that the error alone would make the analysis worthless. The circularity found by Nic does not affect the calculations they have made, but the inaccuracy of the relationship that leads to circularity in Nic’s argument does affect the accuracy of the analysis. It leads also to some questions on the interpretation of their results.

          As a general rule every scientific paper should be read with a skeptical mind, only multiply confirmed results represent well established scientific knowledge (and even that may turn out to be wrong, although that’s not so common).

          In all fields of science new unconfirmed results are publicized in a way that I don’t like, climate science is no exception to that. This paper is an example of that. I do not believe that it’s conclusions are solid enough to justify the way they are presented in some media.

          I have even the cynical thought that some of the conclusions of the paper were written as they were just to get the paper published in Nature. To me Nature and Science are not the most reliable sources of scientific information. They tend to accept papers that present strong conclusions even when those conclusions are not fully supported by the actual science reported, the most interesting conclusions may be even purely speculative. More narrowly focused top journals publish better and more accurately reported science.

        • Posted Feb 13, 2015 at 5:43 AM | Permalink

          Pekka Pirilä: “To me Nature and Science are not the most reliable sources of scientific information.”

          …or as Ross McKitrick put it, “just because it was published in Nature doesn’t automatically mean it’s wrong.”

        • Hoi Polloi
          Posted Feb 13, 2015 at 9:17 AM | Permalink

          Pekka, in your last comment about science you sound more like a politician than a scientist, that’s why you didn’t answer on my direct question whether you agree with the conclusion of the paper : “The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded”.

          May be you state somewhere that you don’t agree, but that has been vanished in your “Wortsalat”.

        • Posted Feb 13, 2015 at 9:20 AM | Permalink

          When I think that any simple answer is misleading, I don’t give any.

        • fizzymagic
          Posted Feb 13, 2015 at 3:51 PM | Permalink

          Pekka’s perspective is very interesting. It doesn’t match my experience very well, but my publications have mostly been in Phys Rev Letters and related journals, not in Nature or Science.

          The thought of publishing something with strong conclusions and no detailed, quantitative uncertainty analysis seems alien to me. I’m trying to decide if it reflects a lack of professionalism on the part of the authors or if it is related to something about the field or the content.

          Climate science, compared to physics, is extremely data-poor. That is, you (generally) cannot design your data collection in advance and you can’t collect new data; you are stuck with what you have. Thus, every bit of data has to be squeezed to within an inch of its metaphorical life in order to do anything “new.”

          Maybe that accounts for several troubling features of climate science: the tendency to treat model outputs as if they were data, the tendency to use questionable statistics methodologies to reach conclusions much stronger than justified, and the tendency to overemphasize “peer reviewed” as a synonym for “correct.”

          The current paper under discussion would never be accepted for publication if it were in a field where replication with new data is possible, because it makes no testable predictions.

        • Posted Feb 13, 2015 at 4:23 PM | Permalink

          fizzymagic,

          I’m not proposing exactly what you read from my comment.

          After a decision has been made on the preferred journal for a planned paper that does very often affect the way the paper is written as the scientists know from own experience or other sources, what kind of papers the journal is most likely to publish. In the field of physics Physical Review may be more neutral in that respect, but the changes of getting the paper accepted in Physical Review Letters are improved by emphasizing some points like topicality even with little real arguments to support those claims.

          Nature and Science have a very diverse readership and many scientists have expressed the view that it’s of particular value for them that the paper is likely to be noticed also by the wider readership, That’s a different criterion from what’s most important in a more narrowly focused top journals. Looking at the papers published has, indeed, shown (IMO) that the conclusions contain often rather speculative additions on issues that are highly interesting, if true, than conclusions published in other journals.

          I have seen similar observations, presented as criticism of Nature and Science on several occasions. There have been also counterarguments to that, but at least I’m not nearly alone with my views.

        • stevefitzpatrick
          Posted Feb 13, 2015 at 4:33 PM | Permalink

          Pekka,

          “They tend to accept papers that present strong conclusions even when those conclusions are not fully supported by the actual science reported, the most interesting conclusions may be even purely speculative. More narrowly focused top journals publish better and more accurately reported science.”

          About this at least we can completely agree; I canceled my subscription to Science 15 years ago when I found the publication had become more about surprising (even shocking!) and ‘glamorous’ results….. and much less about solid science.

          This episode reminds me of the Steig et al paper on Antarctic warming that appeared on the cover of Nature. Steig et al claimed little warming over the Antarctic Peninsula, in direct conflict with extensive thermometer data for the Peninsula, but lots of warming elsewhere, including Eastern Antarctica, again in conflict with thermometer data. Nobody at Nature cared, or even seemed to notice, these glaring discrepancies. But many other people did.

          In the case of M&F, I think Nature has published a paper with similarly doubtful methodology, but upon which the authors have based similarly strong conclusions: “There is absolutely no reason to doubt the accuracy of CMIP5 warming projections.” (What!? Are they actually serious?) As with Steig et al, I very much doubt M&F’s methods and conclusions will stand up to scrutiny over time.

        • fizzymagic
          Posted Feb 13, 2015 at 4:56 PM | Permalink

          Pekka,

          I think you are correct in your assessment of Nature and Science. I have never been a reviewer for either, but I have for Physical Review and Phys. Rev. Letters. And I have indeed recommended publication of papers that I believed were erroneous because of their topicality and provocative conclusions.

          But those papers for which I recommended publication had several things in common: the errors were either a result of bad data or speculation about a new theory that had yet to be tested experimentally. I would never recommend publication of an article that included incorrect data analysis or mathematical errors. I would also never allow publication of an experimental paper that did not properly characterize the experimental errors.

          I did author and publish a paper (in Phys. Rev. Lett., as it happens) that contained a result I did not believe was completely correct; however, in that paper, we very carefully explained all the possible errors we had considered and all the corrections we had performed, and we were careful not to over-state the significance of the result. Basically, we hoped that having other eyes looking at the result would help us understand what we had observed.

          And there is a history of erroneous experimental results from improper statistical analysis being published in those journals, though it is (relatively) quite rare. In all those cases that I can recall, however, subsequent papers explored the problems with the analysis and although the results were never formally retracted, the community recognized the error and the papers stopped being cited in a positive way.

          This case feels different to me. Here there are clearly problems with the methodology that reviewers should have caught. The paper seems to have been published not because it reported something unexpected and provocative, but because it reinforced the community’s prior biases. It seems likely that, like Mann’s early climate reconstructions, it will continue to be cited as evidence long after its flaws have been recognized and it has been shown to be incorrect.

        • stevefitzpatrick
          Posted Feb 13, 2015 at 6:06 PM | Permalink

          fizzymagic (Posted Feb 13, 2015 at 4:56 PM),

          Excellent comment.

    • HAS
      Posted Feb 12, 2015 at 2:34 PM | Permalink

      Pekka

      As I commented over in CLB if we strip away the stats and think about the maths, as you are doing, then Foster & Taylor appear to estimate your equation (2) using OLS, so in fact what M&F use is the calculated F which is (N + α ΔT – Residue(t)). The Residue(t) are presumed to be independent with 0 mean but a SD that looks like a reasonably large proportion of F.

      This Residue(t) term then passes back into your equation (3) as b*Residue(t). Unfortunately it then seems to me it gets in the way of subsequent analysis, and in particular making any estimation of the internal variability unreliable (without knowledge of Residue(t) there is insufficient information to estimate it).

      • Posted Feb 12, 2015 at 3:00 PM | Permalink

        HAS,

        An error (and possibly a significant error) is introduced by the procedure of Forster et al (2013). That affects the results, but that’s not the same as the residue of the regression calculation, and does not feed back to the calculation to cause more serious problems that significant inaccuracies always do.

        That error cannot be avoided, when only the presently available data is used. Marotzke and Forster discuss this issue in their response at CLB:

        Of course one could legitimately ask how accurate this correction is, and we would hope that in future generations of coordinated model simulations a better direct diagnostic of F is possible. But for the CMIP5 models used in our study and in Forster et al. (2013), applying equation (3) has been the only approach possible.

        • HAS
          Posted Feb 12, 2015 at 3:16 PM | Permalink

          Pekka

          I understand the difficulty in estimating F, but that is a limitation of the available data and is acknowledged by M&F as you say.

          What we are discussing here is Lewis’ suggestion that there is an avoidable methodological error caused by the two stage process of analysis and the multiple regressions in t to get trends. I am attempting to understand why that may or may not be a problem in simple terms that a mathematician might grasp (aka writing down the formula :))

          I’m unclear from your response if you are saying the problem with the two stage analysis doesn’t exist, it’s immaterial or something else.

          It can be eliminated by not using F at all, but you then end up unable to estimate all the parameters you need (I think).

        • Posted Feb 12, 2015 at 3:42 PM | Permalink

          HAS,

          One way of looking at the issue is to follow the calculational process, and check, whether any step of that has problems related to the appearance of ΔT on both on the left hand side and in the estimate of ΔF. The answer is very clearly that none of the steps needed in Marotzke and Foster is affected by such problems. That’s easy to see, and no-one has presented any proposal of the opposite. None of the arguments of Nic enters in that process. So far the situation is really simple.

          It may be more difficult to argue that the resulting well determined regression model is fit for use in the way M&F use it, not because it’s coefficients were badly determined, but because it’s variables, and in particular ΔF are badly defined. The model ΔF is not the same as the forcing that’s used outside of that analysis. It’s certainly strongly related, but there may be significant differences. Here we have the problem that M&F acknowledge in their response.

          My view is that it’s a weakly justified hypothesis that the model is fit to the use in the way M&F use it. It’s a reasonable enough hypothesis to make and to try, what the results are, but how much more it is, that’s can be questioned.

          It’s an open question to me, whether better analyses that have the same goal that M&F have chosen can be devised without additional model runs using the same CCM models over wider range of forcings and collecting more data from the runs.

        • HAS
          Posted Feb 12, 2015 at 8:04 PM | Permalink

          Pekka

          That’s what I was trying to do, starting with what the F M&F use really is. The first problem I strike is the residue from its estimation in Forster. What happens to that in the subsequent steps of the analysis?

        • Posted Feb 13, 2015 at 4:08 AM | Permalink

          I have written in several comments that:
          – the calculations of M&F are stable (robust might be more accurate) and without significant problems from circularity
          – their regression model is at best a very crude description of the model behavior
          – ΔF that they use is not the real ΔF, but an operationally defined substitute that may deviate significantly from the real one. Probably more for the 15 year trends than the 62 year trends

          M&F seem to agree on the issues I mention above.

          There’s one further point that may be significant, but I have not discussed extensively in my earlier comments. That’s the final step in calculation of the results shown in their paper.

          All the above concerns the derivation of the regression coefficients, but the regression model is used further to tell, how much of the variability between models originates from each characteristic of the model and the model run, and how much is residual ( typically unresolved internal variability). This can be done in two ways. They have chosen one that comes naturally, when &Delta:F was calculated in a separate earlier step.

          In their approach the values of previously determined ΔF, α, and κ and the coefficients a, b, c, and d are used to calculate

          ΔT = a + b ΔF + c α + d κ

          The difference of that from the ΔT extracted from the model run is the residual. Contributions of the three variables are also collected for use in the figures.

          The alternative approach doesn’t use the ΔF determined by Forster (2013), but uses ΔN, and calculates the predictor for ΔT from the equation

          ΔT = (a + b ΔN + c α + d κ)/(1 – b α)

          This alternative approach leads to different results for ΔT. As bα is probably positive in almost all cases, this alternative approach results in larger contributions of α and κ to ΔT, but the overall ΔT may also be very large meaning that the residual is large as well. Finding sometimes large residuals is related to the fact that the regression coefficients were not determined based on this formula. (I have discussed that in an earlier comment.)

          Due to the nature of ΔN this second alternative is not physically justified, but in case the two approaches lead to very different final results, we might ask, whether their method is really valid either. Whether the second approach leads to very different results, could be checked from their data. The results of that check might then either strengthen or weaken our trust in their final results. (I repeat: The second alternative is not better, but the difference between the results tells something about the robustness of their method.)

        • RomanM
          Posted Feb 13, 2015 at 9:32 AM | Permalink

          Pekka, there is so much misunderstanding of statistics in this comment that one is tempted to sat that “you aren’t even wrong!”.

          – the calculations of M&F are stable (robust might be more accurate) and without significant problems from circularity

          Loosely interpreted – if you can do the arithmetic, and it comes out the same every time, then the results are correct. This is nonsense. I explained to you in my earlier, rather lengthy comment that the regression procedure has assumptions which the data must satisfy. If any of those assumptions are violated, the results will be affected in some cases more seriously than others. In some cases (e.g as in time series regression), one must make adaptations to the original procedure to get results which are correct.

          Without passing judgement as to whether it is correct let’s start again with the M and F model:

          ΔT = a + b ΔF + c α + d κ + ε

          ΔT is the temperature variable. a, b, c and d are assumed to be unknown fixed values. ΔF is the unknown forcing and α and κ are unknown parameters. ε is the “random” part of ΔT which accounts for the fact that substituting the same values of ΔF and the two parameters into the predictive equation does not always produce the same value of ΔT. I described the various assumptions made about this model in a previous comment on this thread.

          You write:

          All the above concerns the derivation of the regression coefficients, but the regression model is used further to tell, how much of the variability between models originates from each characteristic of the model and the model run, and how much is residual ( typically unresolved internal variability). This can be done in two ways. They have chosen one that comes naturally, when ΔF was calculated in a separate earlier step.

          This is correct. So let’s make the following simple substitution in the equation: ΔT” = a + b ΔF + c α + d κ so that the model is ΔT = ΔT” + ε where ΔT” and ε are respectively how much of the model variation can be explained by the predictors and ε is the residuals, i.e. the unexplained portion. Also it should be noted that in the prior calculation of the estimated forcing, the relationship ΔF = α ΔT + ΔN was used.

          There is no absolutely no reason why this calculation needs to be made in advance. We can substitute it directly into the regression equation and the least squares calculation will work exactly as before. Thus

          ΔT = a + b(α ΔT + ΔN) + c α + d κ + ε

          At this point you say that we should carry out the ordinary regression and get estimates of all the parameters and of the residuals (which we will denote in bold):

          ΔT = a + b(α ΔT + ΔN) + c α + d κ + ε

          The estimate of ΔT” is obtained by replacing all of the estimated residuals with their zero means which you claim would look like this:

          ΔT” = a + b(α ΔT + ΔN) + c α + d κ.

          However did all of the residuals get replaced and the answer is no:

          ΔT” = a + b(α [ΔT”+ ε] + ΔN) + c α + d κ.

          The correct result for the predicted values should look like:

          ΔT” = a + bΔT” + ΔN) + c α + d κ.

          The ordinary regression procedure gives you the wrong predicted values, wrong residual estimates and therefore the wrong estimates for the coefficients.

          The alternative approach doesn’t use the ΔF determined by Forster (2013), but uses ΔN, and calculates the predictor for ΔT from the equation

          ΔT = (a + b ΔN + c α + d κ)/(1 – b α)

          If you are going to try to correct for this problem, this is certainly NOT the way to proceed. In my earlier comment, I wrote the equations:

          (1 – b α) ΔT = a + b ΔN + c α + d κ + ε

          and the sum of squares to be minimized becomes:

          ∑ε2 = ∑[(1 – b α) ΔT – (a + b ΔN + c α + d κ)]2

          The solution to this is not “clean”. It is subject to biases due to the later calculations necessary for calculating predicted values and residuals. The sole reason for all of this is that the same ΔT that one is analyzing has been used in the estimation of ΔF.

          However, these difficulties do not justify carrying out in incorrect analysis just because it produces “robust results.”

        • Sven
          Posted Feb 13, 2015 at 4:14 AM | Permalink

          Pekka: M&F seem to agree on the issues I mention above.

          And yet: Two million viewer of Science Daily were blasted with an article that begins “Skeptics who still doubt anthropogenic climate change have now been stripped of one of their last-ditch arguments…”

          And not without help from the authors…

        • Sven
          Posted Feb 13, 2015 at 4:40 AM | Permalink

          Sorry, I did not specify that the second quote “Two million viewer of Science Daily were blasted with an article that begins “Skeptics who still doubt anthropogenic climate change have now been stripped of one of their last-ditch arguments…” ” was from R Graf

    • DocMartyn
      Posted Feb 13, 2015 at 7:19 AM | Permalink

      I am somewhat confused, I am informed that both α and κ have units of W m^-2 / K, so for your equation

      ΔT = a + b ΔF + c α + d κ (3)

      it follows that

      a must have units of K,

      b must have units of K/W m^2

      c must have units of K^2/W m2

      d must have units of K^2/W m2

      is that correct?

      • Posted Feb 13, 2015 at 7:27 AM | Permalink

        Doc,

        The expressions leave some room for interpretation, but at least your approach seems correct.

  70. foas
    Posted Feb 12, 2015 at 5:20 AM | Permalink

    At the risk of tilting at a straw man (not have read all earlier comments),this argument (Pirila) seems flawed. Lewis is correct that (1) is a misspecified regression. How far this matters might be argued.

    One might test this may taking (1) and replacing dF with dN. Also drop the e – this is just an exercise in (orthogonal) projection without statistics. Doing this gives a new set of regression coefficients. Convert estimated equation

    dT = a+bdN+c(alpha)+d(kappa)

    into an equation for dF by simply rewriting
    this equation in terms of dT and dF using eqn (2). That is, replace the regression coefficients x by

    x^=x/(1+b*alpha), x=a,b,c,d,

    giving an equation

    dT=a^+b^(dF)+c^(alpha)+d^(kappa).

    If this eqn approximates what M&F obtain by a direct regression using dF, then no real problem from the circularity in terms of the basic fitted model.

    This still leaves the statistics of course, but that is too much to think about just now.

  71. SHOUSSEIN
    Posted Feb 12, 2015 at 7:54 AM | Permalink

    Though okay on the graphs this time, I feel there are too many dF+e=kappa (1,2,whatever) etc
    formulas around.

    the proverbial cat does not find her young in that.

  72. R Graf
    Posted Feb 12, 2015 at 11:48 AM | Permalink

    M&F published a conclusion regarding “all” CMIP5 models yet they were selective in their models to test. One reason given by M&F (2015) was that most of the hard work of deriving F had already been done in Forster (2013). The model list accompanying the 2015 M&F paper in Nature number 35 models, 17 with forcings and runs of all models varying from 1-10 times totaling 113 runs. There were 36 reported models in the paper with 18 forcings and 114 runs so apparently one model with forcings run once is missing. Comparing the two study’s model lists, 19 of the 35 models are the same, 15 of the 17 with forcings are the same, (2 had forcing that were not used) and 4 models in 2015 with forcings are not on the 2013 models list. I think even if an explanation of the selectivity of samples are given it’s messy.

    • R Graf
      Posted Feb 12, 2015 at 6:54 PM | Permalink

      The commonly reported number of CMIP5 models is 112, which is why Science Daily may have misreported that F&M studied all 114. There are 56 coupled pairs of models in CMIP5 of which F&M omitted 21 pairs (including the one missing from their list). There seems to be no pattern as far as variables covered or variable complexity as seen from the table on pg 747 of IPCC AR5.

  73. Posted Feb 12, 2015 at 5:23 PM | Permalink

    As I said over at CLB in response to ATTP, it would be very useful if M&F were to release their data and code to Steve in order that he can try to replicate their results. We could at least then see if the method they use is flawed as claimed, or robust. I can understand though that they might be resistant to such a request. Additionally, others could try to replicate the study using independent analyses, which would be a good test, not of M&F’s method, but of their results.
    I believe it is fairly straightforward to get CMIP5 model runs using KNMI Climate Explorer – whether or not this would be sufficient to construct those independent studies, I’m not sure.

    • R Graf
      Posted Feb 12, 2015 at 6:59 PM | Permalink

      If Nic does a study which does not match M&F’s conclusion there will be claims of bias. And, unfortunately, our bench of published academics that are looking to contradict a director at the Max Plank is bare. Two million viewer of Science Daily were blasted with an article that begins “Skeptics who still doubt anthropogenic climate change have now been stripped of one of their last-ditch arguments…”

      The problems with good science does not always make as interesting a headline as mediocre science can.

    • R Graf
      Posted Feb 13, 2015 at 10:36 PM | Permalink

      Perhaps a simple test model of the model could be constructed with control data to test the methodology in various extremes to see if the results output as the methodology predicts.

  74. R Graf
    Posted Feb 12, 2015 at 10:09 PM | Permalink

    I have to admit I learned a lot in this discussion. Before now I did not know you could test the behavior of variables with an equation containing variables derived from those same system you are testing.

  75. Posted Feb 13, 2015 at 7:48 AM | Permalink

    Pekka has written in several comments that “the calculations of M&F are stable (robust might be more accurate) and without significant problems from circularity”, and asserts (I think correctly) that Marotzke and Forster agree with him about this. He has also now clarified mathematically what he is arguing.

    If I understand correctly, Pekka argues that the fact that ΔF is a linear function of ΔT does not involve a circularity since it is the actual model-simulated ΔT (ΔTs) that is used to calculate ΔF, not the purely forced, free-of-internal-variability-etc-error, version of ΔT (ΔTf), which is what the regression fit represents (if the regression model is appropriate). I will explain why Pekka’s argument does not support the conclusions of Marotzke and Forster in relation to 62 year periods.

    Suppose that over the 62 year period involved simulated multidecadal internal variability leads to ΔTs exceeding ΔTf in some models and falling below it in other models, without the simulated value of ΔN (ΔNs) being similarly affected. This seems both plausible and likely; many models exhibit substantial multidecadal internal variability, and show little correlation between multidecadal ΔTs and ΔNs (after detrending).

    In this situation, models with ΔTs > ΔTf will generally have a relatively high diagnosed value (ΔFs) for ΔF, since ΔFs = α ΔTs + ΔNs. (Note that although Marotzke and Forster write of α ΔT being a “correction” to ΔN, it is the larger of the two terms in most cases.) As a consequence of such internal variability, intermodel spread in ΔTs will be positively related to that in ΔFs, increasing the proportion of the intermodel spread in ΔTs that is “explained” by the ΔFs, or the “contribution to the regression by the ERF trend”, which Marotzke and Forster state is dominant for start years from the 1920s onward. This effect is what I refer to as circularity; it is not total and I did not claim that it was.

    I consider a contribution to intermodel spread in ΔTs that arises purely from the same elements of internal variability appearing on both sides of the regression equation to be an artefact of an unsatisfactory method. Perhaps on reconsideration Pekka may also come to this view.

    Whether the circularity element that exists in the regression method used is the largest source of error in this study is uncertain. I identified other potentially serious sources of error involved in it; they may be more important. Paul_K has set out further issues with the study’s methods.

    Note that it would be unsurprising if Marotzke and Forster has just found that the ERF trend ΔFs has a considerably larger influence than model feedback strength and model ocean heat uptake efficiency over historical 62-year periods starting from the 1920s on is. Aerosol forcing varies hugely between models (by over 1 W/m2). Up to the turn of the century, 62-year ΔTs trends have a correlation of 0.9 with diagnosed or estimated aerosol forcing levels for the models used by Marotzke and Forster. And over the entire Historical simulation period, 1860-2005, ΔTs trends have as high a correlation with aerosol forcing strength in models as with ΔFs.

    However, that intermodel differences in the ERF trend have to date had a considerably larger influence than those in model feedback strength would not justify Marotzke’s claim: “The difference in sensitivity explains nothing really”. And even if variations in model sensitivity explain relatively little of the intermodel spread over the Historical period that would not justify his statement that “The claim that climate models systematically overestimate global warming caused by rising greenhouse gas concentrations is wrong”. It is entirely possible that systematically-excessive model sensitivities have until recently been largely offset by systematically-excessive aerosol forcing and/or obscured by a positive influence of actual multidecadal internal variability on observed GMST.

    • Posted Feb 13, 2015 at 8:39 AM | Permalink

      The determination of the regression parameters is robust, but there are some issues that must be considered, when the model is used, as I discuss in my recent comment.

      • Posted Feb 13, 2015 at 8:53 AM | Permalink

        Pekka,
        Despite your reassertion of your claim, I think my comment shows why M&F’s determination of the regression parameters is not in fact robust.

        • Posted Feb 13, 2015 at 9:09 AM | Permalink

          That may be a matter of defining, what determining the regression coefficients means. I define that operationally following the approach of M&F accepting that the ΔF of the formula is defined as

          ΔF = ΔN + α ΔT

          If ΔF is defined in some other way, then the result may be much less reliable.

          I shift the discussion of the potential problems to the step of using the model. At that stage it depends on the application, whether some serious problems arise or not.

        • Posted Feb 13, 2015 at 9:45 AM | Permalink

          Another technical assumption that M&F have made is that the regression coefficients are determined by minimizing the sum of the squares of residuals calculated as

          e = ΔT – a – b ΔF – c α – d κ

          or equivalently with the operational definition of ΔF

          e = (1 – bα)ΔT – a – b ΔN – c α – d κ

          This is not the only possible choice that can be made, but this is what they chose.

          The sum I refer to above is over the models in the ensemble. The calculation is done separately for each period.

          When these two choices are made, the calculation is robust. The potential problems are in the interpretation of the resulting formula, and in the choice of the input variables, when the resulting regression formula is used to calculate ΔT.

        • Posted Feb 13, 2015 at 9:57 AM | Permalink

          The next step to ponder is the one I discuss here.

          I have still mixed thoughts on, how this affects the results of the M&F analysis.

    • Kenneth Fritsch
      Posted Feb 13, 2015 at 9:42 AM | Permalink

      “It is entirely possible that systematically-excessive model sensitivities have until recently been largely offset by systematically-excessive aerosol forcing and/or obscured by a positive influence of actual multidecadal internal variability on observed GMST.”

      I was going to comment on the potential effect of the aerosol factor on the M&F regression comparisons and better that you did. Model sensitivity can be addressed independently and should be.

  76. Kenneth Fritsch
    Posted Feb 13, 2015 at 9:30 AM | Permalink

    Having finally read in more detail the Marotzke and Forster paper, I think I could repeat what they did in their regressions of 15 and 62 year trends. I will describe here my version and hope to obtain agreement or disagreement at this thread by others who have studied the paper.
    I have the CMIP5 model historical temperature series in Excel and will be presently locating it for my use. I could make it available to others here although it is readily downloadable from KNMI Climate Explorer. The alpha and kappa data used in the M&F paper are in the paper at this link:

    http://onlinelibrary.wiley.com/doi/10.1002/jgrd.50174/epdf

    Another paper here that I have not had time to digest includes CMIP3 and CMIP5 alpha and kappa values that on first glance do not jibe with the ones used by M&F.

    http://onlinelibrary.wiley.com/doi/10.1029/2012GL052952/epdf

    I have search high and low for a published series of the individual model forcing (effective radiative forcing) used in M&F and now believe that it was derived using the historical temperature series for each model run and the information for N (TOA energy imbalance) and alpha in the equation provided in the paper linked immediately above.

    N=F-alpha*deltaT or F=N+alpha*deltaT

    I counted data for 23 models in that link.

    For the multiple regression, the 15 or 62 year successive and overlapping year trends from the historical global mean surface temperature and the similarly constructed global forcing change trends were calculated and tabulated. A variation measure of these trends was then calculated across all model/model runs for each trend starting year. I am unsure of what variation that measure was, albeit standard deviation, variance or range. Alpha and kappa are assumed by the authors constant over time and thus the variation across model runs for those variables would be the same for each trend start year. In my view it is those 4 variations that are used in the M&F multiple regression.

    • Posted Feb 13, 2015 at 12:10 PM | Permalink

      Kenneth,
      Yes, the model ERF series used in M&F were derived from CMIP5 Historical run T and N series (run-ensemble means for each model), along with the previously diagnosed alpha values. But I believe that drift in the corresonding section of the PI control run was deducted from the T and N timeseries. It is not specified whether monthly or annual means were used; I imagine annual.

      I think you will find it requires some work to derive the ERF values they used. If you succeed in doing so, and upload them somewhere, I will check them against the values I am using.

      • Kenneth Fritsch
        Posted Feb 14, 2015 at 9:19 AM | Permalink

        Nic, I have the CMIP5 model piControl run data for N and T to use for correction and used it when attempting to duplicate the ECS values derived in the published regression method. I could duplicate the results after adjustment in a general way but not exactly for all models. The differences could be accounted for by a difference in a constant amount used to adjust N. Looking at the piControl data for N (actually rsdt-(rlut+rsut)) leads me to believe that the adjustment required is not so much drift but rather a more or less constant residual amount by which the model fails to balance the TOA energy.

  77. Patrick M.
    Posted Feb 13, 2015 at 9:39 AM | Permalink

    Great discussion!

  78. Greg Goodman
    Posted Feb 13, 2015 at 12:49 PM | Permalink

    Pekka says:

    “It’s a scientific paper and it can be assumed that readers of such a paper understand that a linear regression model cannot be accurate and reliable, even if that’s not emphasized in the paper.”

    Oh what a pure and ideal world you live in Pekka. I would that it was that way.

    Before diving into the scrum here, let me recount the reality of the awareness of the applicability of OLS.

    Many years ago I was a young contract programmer in the maths dept. of a major UK university. Every Friday afternoon there was a session where a member of the department would describe their research, highlight any problems they were having and put it out to the assembled mass of mathematics PhDs for comments and solution.

    On one occasion a student presented her nearly finished thesis trying to reconcile a model with a large volume of observational data. There was problem in extracting a linear relationship between two experimental variables from the satellite data.

    A scatter plot was presented with the OLS regression fit. It was visible obvious that the slope did not match the “cloud” of data points. It’s was obviously underestimating the slope. She was stuck as why this was and put it out for discussion and suggestions. The assembled body of doctors and professors of mathematics spent two hours of intellectual jousting without coming up with a useful explanation.

    Not being a member of the academic staff, I kept quiet and listened. After the meeting I approached the student and pointed out that she could not use OLS in a situation with significant uncertainly in the x variable. It would typically under-estimate the slope and this was indeed the problem she was facing.

    She was rather taken aback and asked if I was sure.

    The next morning I presented here with a page of maths showing the derivation and the point at which it is necessary to apply the condition err(x)<< err(y).

    She thanked, but said it was too late to make any substantial change to the paper, then added a paragraph of waffled excuses and the usual "need for further study" and presented her thesis without any correction.

    What shocked me most was that none of the assembled body of egg-heads seemed to be aware of the issue.

    So Pekka, with the utmost respect, I have to say that your basic assumption that both the readership and presumable the authors can be assumed to know this kind of thing is, sadly, unfounded.

    I wrote an short article about this last year:

    On inappropriate use of least squares regression

    Much of it was incorporated into my recent article a Judith's :

    On determination of tropical feedbacks

    In fact this lack of understanding of applicability of OLS is at the heart of the exaggerated estimations of climate sensitivity.

    • Posted Feb 13, 2015 at 1:17 PM | Permalink

      On a second thought.

      Taking into account that the journal, where this was published is Nature, the need for explicit statement of caveats is larger than in a journal, where the most likely reader is from a very close expertise.

      • Greg Goodman
        Posted Feb 13, 2015 at 1:55 PM | Permalink

        To be fair to Piers Forster, Forster & Gregory 2006 did discuss the dilution issue and showed that a better estimation of the rad vs temp regression gave a notably lower climate sensitivity.

        Sadly they relegated this to an appendix and avoided any mention of it in the conclusion or abstract of the paper itself.

        I cover this in my OLS article.

        It does not seem to get any mention in the current article and my gut feeling is this was at Gregory’s insistence that this got into F&G.

        So in this case the author was aware but that does not distract from the point that the assumption that either authors or readers are aware of the issues is unfounded.

        You seem to accept that this does need to be explicitly stated. Thanks.

        • Posted Feb 13, 2015 at 4:47 PM | Permalink

          Greg,
          Your reference to Forster & Gregory 2006 brought up the observation that they write:

          OLS regression of Q-N against ΔTs was found to be the best predictor of the climate sensitivity in the Hadley Centre climate model. Values of Y derived from short 10-yr time series agreed best with the value derived from longer-term datasets (and the climate sensitivity to 2XCO2) when OLS regression of Q-N of ΔTs was used.

          This is relevant to the present discussion, because Q of that paper is forcing and Y the feedback parameter (α). This kind of earlier results are surely a factor in Forster’s expectation that their approach works.

        • Greg Goodman
          Posted Feb 14, 2015 at 2:23 AM | Permalink

          I don’t follow your logic here. What is the basis of their expectation?

          The earlier paper just derived an estimation for climate sensitivity. There is nothing suggest that this value was correct and thus the method should be re-used with “expectation that their approach works.”

          In fact the opposite is true: the earlier paper established that simple regression actually produces exaggerated climate sensitivity. In 2006 F&G did not want “distract from the main point of the paper” by including the lower results in the conclusion and abstract ( where it would get reported ) and so tucked the whole discussion and its results away in an appendix.

          Now nearly ten years later he does not even mention the issue at all.

          This looks close to scientific misconduct to me. There is a serious issue with simplistic regression being used in a context where it is technically invalid.

          At least one of the authors is aware of this because he already published a paper discussing the issue and its impact on climate sensitivity.

        • Posted Feb 14, 2015 at 3:36 AM | Permalink

          Greg,

          I refer to the way ΔF is determined from ΔN. If that connection were accurate for every case considered, no circularity at all would be present in the calculation as any point, because we could pick equally well ΔF as a direct model result as ΔN. The error in that relationship occurs in a way leads to some ambiguity in the interpretation of the results. That ambiguity is affected by the “circularity”.

          The excerpt from the Forster and Gregory 2006 tells that at least in one case the relationship was found to be close by the standards of typical relationships that they consider.

          (It’s not obvious from Nic’s post that not problem whatsoever would come from the “circularity”, if the relationship were accurate. In that case F could be added to CMIP5 data base as an equally accurate value as N, there were no reason to consider N more primary data than F).

        • HAS
          Posted Feb 14, 2015 at 3:08 PM | Permalink

          Pikka

          What do you mean by “accurate” – Foster et al state it has been estimated by regression and the distribution of the errors in F. As I asked above (to which you didn’t really respond) doesn’t that mean that we all know it involves circularity?

      • Posted Feb 14, 2015 at 3:26 PM | Permalink

        What I meant by that sentence is that some unavoidable error terms might be amplified by extra coefficients that result from an equation that is perhaps dependent on something that goes into the residual. I’m not sure, whether even that is the case but there may be reasonable physical frameworks where that would take place.

        The error term that I refer to is not the full difference between ΔF and ΔN, but a part of it that results from the approximate nature of the formula used for calculating the difference.

        • davideisenstadt
          Posted Feb 14, 2015 at 3:41 PM | Permalink

          pekka…I taught statistics at the college level for a decade or so but I still I have no idea of what you refer to….
          just what do you mean by:
          ” that some unavoidable error terms might be amplified by extra coefficients that result from an equation that is perhaps dependent on something that goes into the residual”
          “some unavoidable error terms”…which ones?
          “is perhaps dependent on something that goes into the residual”
          perhaps dependent? something?
          really?
          this is the language of mathematics? of physics?

        • HAS
          Posted Feb 14, 2015 at 3:58 PM | Permalink

          Can you write that as an equation?

        • Posted Feb 14, 2015 at 4:03 PM | Permalink

          David,

          It might have been more appropriate that I had not written that at all, as I really couldn’t say anything clear. I just wanted to add something on a small point that could have also been left out from my earlier comment without affecting it’s real content. Both are related to the fact that there are acknowledged uncertainties that affect the accuracy of the quantitative results. How they do that depends on the inner workings of the climate models of the CMIP5 ensemble.

    • Don Monfort
      Posted Feb 13, 2015 at 1:44 PM | Permalink

      “On inappropriate use of OLS”

      That’s really interesting, Greg. I hope someone who really understands the implications will read and comment. Thanks.

    • Kenneth Fritsch
      Posted Feb 13, 2015 at 2:36 PM | Permalink

      We had this discussion about using OLS and TLS at the Blackboard for the regression used to derive the ECS values that were published in Chapter 9 of the AR5 review for IPCC using temperature and radiation data. Carrick suggested I try TLS. The ECS values where 10 per cent or so higher using TLS. I emailed Tim Andrews, who has coauthored papers with Gregory and Forster on this subject, and he told me that they had not considered TLS. The paper I refer to here is linked below and does not discuss the recommendation you make about doing a reverse regression but I believe in a previous paper on the same subject that was done and used as evidence that OLS was sufficient.

      http://onlinelibrary.wiley.com/doi/10.1029/2012GL051607/epdf

      • Greg Goodman
        Posted Feb 14, 2015 at 2:45 AM | Permalink

        Thanks Kenneth. Figure 1 in the Andrews paper underlines Paul_K’s point in his posts at Lucia’s that the model responses are not linear anyway, they are “curvilinear”. So why is a linear model being regressed in the first place?

        Most of the models shown there are clearly steeper for larger deviations. In rad vs temp plots that means less sensitive. So the ‘average’ sensitivity, dominated by the bulk of small deviations is being used project future large deviations where it is inappropriate.

        Citing this “as evidence that OLS was sufficient” is simple bias confirmation, not science.

    • Bill
      Posted Feb 14, 2015 at 11:26 AM | Permalink

      Is the bolded part Greg’s? Or a comment by Steve M.?

      • Bill
        Posted Feb 14, 2015 at 11:27 AM | Permalink

        Sorry, this part: In fact this lack of understanding of applicability of OLS is at the heart of the exaggerated estimations of climate sensitivity.

  79. Frank
    Posted Feb 13, 2015 at 3:55 PM | Permalink

    Nic: In a simple regression analysis, one interprets the residuals as noise in the data or as an inappropriate regression equation. M&F15 regress data that has a chaotic component and interpret ALL of the residuals as unforced variability. Most of these residuals arise because of the limitations of the dT = dF/(a+k) model being applied to the CMIP ensemble. However, papers have shown that kappa decreases with time in TCR simulations. (As the top of the ocean warms, it becomes more stably stratified.) Paul_K has studied how estimates of climate sensitivity vary when they are deduced from different periods of model output using this approach. These parameters are abstracted from long periods, so they are more relevant for 62-year trends than for 15-year trends,

    The dilemma – as I see it – is why M&F’s regression can reproduce the ensemble mean as well as it does in Figure 2b. In M&F’s regression equation 4 (your equation 3), alpha and kappa have separate regression coefficients. This additional and inappropriate degree of freedom allows the regression equation to fit the ensemble mean more closely and thereby assign more of the variance to unforced variability. As you have pointed out, the dF terms has already been derived from simulated temperatures. So the dF term is circular AND the remaining two terms have an inappropriate degree of freedom.

    If you haven’t already done so, it would be interesting to see what happens if the regression is performed using the sum of alpha plus kappa and a single coefficient and if alpha and kappa are completely omitted.

    • RomanM
      Posted Feb 13, 2015 at 4:38 PM | Permalink

      Frank:

      The dilemma – as I see it – is why M&F’s regression can reproduce the ensemble mean as well as it does in Figure 2b.

      If I understand your statement correctly, there is no dilemma here. The way that the regression is set up using variable deviations from their mean, the following is true for equation 4:

      beta-0 is equal to 0. The average across the ensemble for each of the following variables is equal to 0: ΔF’, α’, κ’ and the residuals. Thus, the average of the predicted ΔT over the ensemble must be exactly equal to the ensemble ΔT average for each year.

      The same would be true for the reduced case using (α + κ)’ which is the same as α’ + κ’ or if the latter are removed completely.

      • Frank
        Posted Feb 14, 2015 at 4:45 PM | Permalink

        Roman: I didn’t describe the dilemma correctly. dT = dF/(a+k) an imperfect way to analyze output from climate models. Approximating 1/(1+x) as 1-x introduces additional error. Despite all of these limitations, M&F’s regression describes the multi-model ensemble mean shockingly well. How does one show what factors are responsible for this surprising result: circularity?, additional degree of freedom?, something else?

        • RomanM
          Posted Feb 14, 2015 at 5:23 PM | Permalink

          Frank, it is something else.

          Each regression takes place over the a single year of ensemble data. In any regression of the form (with n predictors):

          Y = a0 + a1*X1 + a2*X2 + … + an*Xn + e

          has predicted values from the least squares solution looking like:

          Predicted(Y) = m(Y) + a1’*(X1-m(X1)) + a2’*(X1-m(X2)) + … + an’*(X1-m(Xn))

          where the primes on the coefficients denote that they are the estimates and m() is the mean of a given variable.

          If you calculate the average of the predicted values you get

          m(Predicted(Y)) = m(Y) + a1’*0 + a2’*0 + … + an’*0 = m(Y)

          because the sum of the deviations from its mean for any variable is always 0

          There is nothing in the M and F data set that causes that. It is true for every linear regression.

    • Paul_K
      Posted Feb 13, 2015 at 8:55 PM | Permalink

      Frank,
      Full support for RomanM’s explanation is found between Equations (3) and (4) in the paper.

      If, on the other hand, you want some general indication of how well AF/rho works as a predictor of GCM temperature change over a long period (when it should perform close to its best), then you might want to look at Figure 9(a) in Forster at al 2013. This gives you some idea of the predicted/actual spread of temperature change over the historical period up to 2003. A between-model bias to underprediction is apparent, most pronounced in higher sensitivity models. The free regression on AF carried out by M&F (across the ensemble for each period) should largely correct for this between-model bias, but then still leaves significant within-model residuals, which represent some combination of model error plus natural variation. All the residuals are deemed to be “natural variation” and the model error is not quantified (and is probably unquantifiable).

  80. R Graf
    Posted Feb 13, 2015 at 11:30 PM | Permalink

    The National Academy of Sciences said despite the errors found in Dr. Mann’s paper the conclusions were basically correct. In the present case we have the friendliest view as being that the methods were hazardous, the conclusions unwarranted, but the concept basically correct. The quality of climate science, and perhaps the credibility of western science, rests in these soul’s courage.

    • thisisnotgoodtogo
      Posted Feb 14, 2015 at 12:44 AM | Permalink

      That’s one of the better trolling jobs I’ve seen!

      • R Graf
        Posted Feb 14, 2015 at 9:04 AM | Permalink

        I’m actually 100% sincere.

        • thisisnotgoodtogo
          Posted Feb 14, 2015 at 10:22 AM | Permalink

          So much the sadder

        • thisisnotgoodtogo
          Posted Feb 14, 2015 at 10:32 AM | Permalink

          We can’t say much here as it derails the thread topic, but just look up the relevant Mann issues on climateaudit.

    • Michael Jankowski
      Posted Feb 15, 2015 at 7:23 PM | Permalink

      You must’ve gotten Valentine’s Day confused with April Fools Day.

    • Paul Courtney
      Posted Feb 16, 2015 at 12:49 PM | Permalink

      I read RG’s post to say, not that Mann was correct, but that the NSA was wrong to find the errors (which undermine Mann), yet find him “correct”; just as Pekka is acknowledging error while arguing M&F could still be “correct”. tingtg, I think you misread this as RGraf saying Mann was correct.

  81. Posted Feb 14, 2015 at 3:43 AM | Permalink

    The whole multivariate regression idea is fundamentally invalid since the temperature is not a direct linear function of the forcing, neither is the diffusion of ocean uptake.

    Douglass et al 2006 figure 1 shows the temporal development of forcing , temperature and ocean diffusion:

    On the assumption of a linear feedback mechanism, the response to forcing is an exponential convolution of the forcing time series, which introduces change in the profile in relation to the forcing. In crude terms a lag and a change in magnitude. So any direct regression between such quantities will not correctly deduce the assumed linear relationship, even if the regression dilution issue is ignored.

    Throwing in extra variables to do a multivariate regression in no way improves the situation, it compounds it.

    There seems to be a large body work getting published that thinks they can describe a complex physically interlinked system by almost arbitrarily chosen linear regressions without any consideration of the physical reality of what the quantities are and how their time series are related.

    There is an unstated assumption that if they do an invalid regression enough times , with enough variables it will somehow “converge” to the right answer.

    The field of “Earth Sciences” seems to be largely devoid of any training in science or statistics yet produces prolific quantities of papers based on essentially home spun methods that have no grounding in existing knowledge nor are tested to prove their validity as novel techniques.

    It’s a one pony show based on linear trends.

    • Posted Feb 14, 2015 at 3:48 AM | Permalink

      In essence they are have spent 30 years pushing the idea that the system can be modelled as a linear AGW “trend” + “noise”.

    • Greg Goodman
      Posted Feb 14, 2015 at 6:37 AM | Permalink

      My conclusion on all this is expressed well by RomanM:

      Hence, the usual simple regression procedure fails and all results from it are spurious. Estimates of the parameters, confidence interval and p-values will be biased and therefore neither reliable nor scientifically meaningful.

      BTW “climtegrog” is my WP account, the above posts are mine.

  82. Posted Feb 14, 2015 at 8:13 AM | Permalink

    Someone wrote in this thread that physicists and statisticians live in different worlds, or something similar. That should, of course, not be the case, but arguments from both should be combined in a consistent way. In this particular example starting point is a set of physics based models. Physics based models obey at least approximately laws of physics. Thus an analysis of their properties is extremely inefficient if that’s not taken into account. It’s typical that model structure is chosen based on physical arguments, but coefficients of that model are determined by some method of fitting. Physical arguments are used also in the choice of the measure that’s used to decide which are the “best” parameter values.

    That’s the basic nature of this analysis as well. The authors have selected the regression model and the measure to be minimized based on physical considerations. The model is crude, and some of their arguments may be criticized, but with these reservations the model makes sense.

    One of the choices is that one of the model variables is ΔF in spite of the problem that ΔF is not directly available from the CMIP5 archive, but must be deduced from ΔN using a formula that’s not fully accurate, and that happens to contain ΔT.

    The above comment allows for writing the formula in the way that ΔT appears on both sides of the defining equation, but the way that takes place in M&F does not change the way the residual appears in the formula. The residual might be added to the formula that is used to calculate ΔF from ΔN. If that were done circularity would result, or in another way of saying that, the formula that defines the residual would change. The physical argument of the authors is, however, that this is not should be done, but the residual should be calculated as before from their original definition.

    Selection between the alternative M&F use and the alternative Nic and RomanM has to be based on physical arguments, more precisely the right choice could be fully confirmed by further calculations by the CMOP5 models. At the present we have some reasonable, but not very strong physics based arguments of the authors detailed in their response at CLB. For the alternative we have no physics based argument to the best of my knowledge.

    The detailed argument of RomanM in this thread is based on the alternative choice that lacks support. It does not apply to the model of the paper, and the model of the paper has at least some arguments to support it.

    By the above I do not claim that I see the analysis as very strong. There are so many issues that may make it too inaccurate and unreliable. Some of these are discussed by Nic in the post. The circularity is, however, essentially a red herring. It appears only when the problem is analyzed forgetting that physics does play it’s role in the correct way of doing the analysis, and that the authors have legitimate arguments in support of their choice.

    • Greg Goodman
      Posted Feb 14, 2015 at 9:23 AM | Permalink

      The difference of the temporal evolution of the forcing and the relaxation response of the linear model are shown here:

      The lagged-correlation plot of post-2000 CERES data from Spencer & Braswell 2011, is shown here: (negative lag: radiation change leads temperature change.)

      It is clear that lag of 12mo between the peak forcing and peak in the response will decorrelate the regression and produce an incorrect result. Even if a lagged-regression is performed, the ratio of the forcing and the response is NOT the constant of scaling in ODE that is used to define the relaxation relationship.

      There will be a final equilibrium temp change associated with a change in radiative forcing but this is not available from the time series or rad and temp ( even less so from some clumsy proxy value taken from the model ).

      It is not sufficient to simply ignore the fact that both the in-phase and the orthogonal signals are present in the data, pretend there is no lag and hope it will all come out in the wash.

      What is being done is not physically meaningful. The rest of the argument is barely of even academic interest.

      • Greg Goodman
        Posted Feb 14, 2015 at 9:25 AM | Permalink

        I went into all that in extensive detail over at Judith’s CE:

        On determination of tropical feedbacks

      • Posted Feb 14, 2015 at 9:44 AM | Permalink

        Greg,

        I have made no attempts to figure out, how your observations affect 15 year and 62 year trends. Thus I don’t make any claims on that.

        • Greg Goodman
          Posted Feb 14, 2015 at 10:30 AM | Permalink

          Thanks Pekka, I take that to be a very polite way of saying I’m off topic but I don’t agree.

          My point is that there is little point in arguing about such trends if they are trends in something that is not physically meaningful resulting from ignoring the linear relaxation upon which the whole thing is based.

          The dT=k.dR kind of equation is the equilibrium of the linear relaxation model giving rise to the whole concept of lambda and climate sensitivity.

          Trying to ignore the fact that the data being examined is not just the dT=k.dR but also contains dT/dt=k2.dR which is orthogonal to the former, means the whole exercise is just playing with numbers. Just more Shakun Mix climatology.

          One can pop up Excel and start fitting “trends” to anything but until you have a credible model and a statically acceptable reason for fitting a linear model it is meaningless.

          Regression of linear model may produce a best estimate of the slope ( linear relationship ) under specfic conditions. There seems to be whole body in the field of climatology that believes you can always fit a “trend”, it is always meaningful and OLS will always give you best estimation of the linear relationship.

          Only the first of those assumptions is true. You can always fit a straight line.

        • Greg Goodman
          Posted Feb 14, 2015 at 10:48 AM | Permalink

          BTW the whole idea of GMST is another aberration. You cannot do an energy budget analysis where you are averaging the temperature of things with hugely different specific heat capacities.

          It’s like asking what is the average of an apple and an orange. The answer is a fruit salad.

        • Posted Feb 14, 2015 at 11:00 AM | Permalink

          Greg,
          Climate science and how it’s results are presented is full of simplifications that distort the reality in one way or another. In most cases that does not change the main message significantly, but there may be exceptions.

          GMST is, indeed, not equally significant as the average of a quantity that can be calculated by dividing the total value of an extensive variable like energy by the total mass (or volume) of the material that carries that energy. Still GMST is a reasonably good descriptor of global temperature changes.

        • Greg Goodman
          Posted Feb 14, 2015 at 11:55 AM | Permalink

          Thank you for that insightful remark. I think the constancy of the “message” is indeed the driving force behind the presentation “full of simplifications that distort the reality in one way or another”.

          The arguments I presented above, that you chose not to consider, explain how this has become bereft of physical meaning.

          Figure 2f shows that studying these models global statistics is just studying the output of a complex random number generator.

          What this study is showing is that the current divergence problem, serious as it is, is no worse that the models inability to reproduce past climate in general. Any divergence is, by definition, regarded as “internal variability” and it is also well known that models are particularly bad at reproducing internal variability.

          They have been tuned to reproduce a very small segment of the historic record fairly closely but we just have not been paying enough attention to the fact that models are as bad a reproducing earlier climate as they are at reproducing the post 2000 pause.

          We should probably credit the authors for pointing this out.

    • stevefitzpatrick
      Posted Feb 14, 2015 at 9:50 AM | Permalink

      Pekka,

      You have offered many thoughtful comments on this thread, and I thank you for that effort.

      However, I think you will find that it is going to be a very difficult (impossible?) task to convince many scientists and engineers working in other fields that the apparent circularity in the M&F paper (effectively using the same model ΔT on both sides of their equation via ΔF from Forster et al 2103) could ever generate anything but suspect (at best) or nonsense (at worst) results. This obvious circularity is the sort of thing that most everyone is taught to NOT do in an introductory course on statistics or experimental design… so as to avoid wasting time and effort in the production of meaningless results. You have on this thread comments from many experienced scientists, engineers, and statisticians all saying pretty much the same thing…. one just can’t do a circular analysis.

      I have carefully read your comments, and there is nothing you have said which shows me M&F have addressed the question of circularity in their paper in a meaningful way, nor anything which makes me think the most important conclusions drawn by M&F are highly suspect. Calling the obvious circularity in M&F ‘a red herring’ does not make that circularity go away. Further, if the circularity is removed mathematically (as I believe it must be to have a defensible analysis) then most of the M&F paper must disappear; the paper depends almost entirely on the circular analysis.

      Time will tell, of course, but if I were a gambler, I would not place a bet on the conclusions of M&F remaining credible in the future. I would bet that continued divergence between current CMIP5 projections and reality over the next decade or two will make M&F irrelevant, even if there is no published refutation. ‘Nature’ can continue to publish whatever suspect papers it wants. Reality will not read them.

      • Posted Feb 14, 2015 at 10:28 AM | Permalink

        Steve,

        They have not discussed the circularity largely because their method does not introduce circularity. The model of Nic and RomanM is different, and it does have circularity. What tells, which model is better justified is not an issue of statistics but of physics, or more precisely in this case an issue related to the properties of the models of CMIP5 ensemble. M&F have presented arguments to justify their choice.

        I have in several comments described, how the method of M&F is not circular. I have also explained, where the model of Nic and RomanM differs, and why that difference makes it circular.

        • RomanM
          Posted Feb 14, 2015 at 2:18 PM | Permalink

          I will make one more stab at explaining the issue in terms which do not involve any statistical analysis.

          Let us suppose that we know all of the coefficients in the predictive equation exactly so there is no estimation involved. To simplify things we combine all of the predictors except ΔF into a single term A so they are less intrusive. Thus, we have:

          ΔT = A + b ΔF + ε

          where ε is understood as the effect of “weather” in the model.

          We will also suppose that ΔN is known, but ΔF must be calculated exactly by the equation ΔF = α ΔT + ΔN as in the M and F paper.

          I am told the value of ΔT for a given situation. Calculate ε and ΔTo = what the value of ΔT would be if ε = 0.

        • Posted Feb 14, 2015 at 2:45 PM | Permalink

          RomanM,

          Why do you “suppose that ΔN is known, but ΔF must be calculated exactly by the equation ΔF = α ΔT + ΔN as in the M and F paper”, unless you accept that it must be calculated using the real ΔT that includes the residual and that can be obtained only from the original data?

          %Delta;F is supposed to be rather stable and unaffected by the internal variability, but ΔN and ΔT are affected by the internal variability. Therefore it’s wrong to use predicted ΔT in the determination of ΔF. It must be determined only once from the CMIP5 data to get it correctly.

        • Posted Feb 14, 2015 at 2:52 PM | Permalink

          An additional point is that moving from non-zero ε two ε = 0 does not change ΔF in a meaningful model, it changes ΔN, but in this analysis no attention is given to the new value of ΔN.

    • R Graf
      Posted Feb 14, 2015 at 10:04 AM | Permalink

      Whereas the value of a scientific paper is in its innovative reasoning to make a testable conclusion, do you feel that the approach by M&F is either innovative or allows for a testable conclusion? Are you saying the paper must stand because its critics could not improve upon it? Are you disputing Greg’s assertion that it in fact does not follow the actual physics well?

  83. John M
    Posted Feb 14, 2015 at 10:25 AM | Permalink

    In a comment upthread , Pekka Pirilä appears to make the case that physics or physical arguments can trump statistics or math.

    Yet when a physical proxy was used upside down, we were told that since the math worked out, criticisms were “bizarre”.

    It would be refreshing if the “community” were consistent in its use of physical meaningfulness as compared to mathematical meaningfulness.

    • Posted Feb 14, 2015 at 10:42 AM | Permalink

      Physics cannot overturn mathematics, but mathematics is a tool that must be used on correct concepts.

      Here the essential point is, how the residual enters the equations, and this in turn depends on what is the principal source of the residual. Statistics or mathematics cannot answers these questions. In case of physical systems only physics can.

      • RomanM
        Posted Feb 14, 2015 at 11:22 AM | Permalink

        Pekka, I have just posted the comment below the ### on CLB. For some reason, it is currently under moderation. Your fixation on “a separate model” is misguided and indicates to me that you might not have had much experience with statistical theory. [Update: It is no longer under moderation at CLB.]

        ###

        The problem here is statistical in nature. It has nothing to do with physics or which variables are external or internal or when they are observable or what drives what. If there are flaws in the way the data reflects the “physics”, this is a different situation which should have been dealt with before ever submitting the publication.

        The authors have provided a data set and a statistical model which underlies the data and the relationships between those variables and the analysis of that data is done within the context of the statistical model. From this juncture on, it is purely a mathematical and a statistical problem.
        The basic relationship in this model is given by the equation

        ΔT = a + b ΔF + c α + d κ + ε

        It contains several unknown parameters and a variable ε (usually termed the “error”) which accounts for the “random” variation in the model.
        The intent of the analysis is to determine how much of the variable ΔT can be accounted for by a given set of other variables. In order to do this, the unknown parameters need to be estimated along with estimates of the values of ε. The authors have chosen to use Least Squares methodology to do this.

        The starting point for this analysis was an error sum of squares. Its format is not an arbitrary choice by the authors, but rather based on certain optimal properties of the solution within the model structure.

        SSE = ∑ε^2 = ∑[ΔT –(a + b ΔF + c α + d κ)] ^2

        This quantity is minimized with respect to the unknown parameters a, b, c and d. From these we can estimate the values of ε and calculate the predicted values of ΔT along with the residuals = Observed( ΔT) – Predicted(ΔT).

        It should be pointed out that there is a distinction between the residuals and the estimated errors (http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics). In ordinary linear regression, they are the same, however this need not always be the case in least squares methodology.

        Now, it turns out that in this data set, there is an identity relating three of the variables: ΔF = α ΔT + ΔN. If we substitute this identity into the above SSE, and rearrange terms, we get

        SSE = ∑ε^2 = ∑[(1 – b α) ΔT – (a + b ΔN + c α + d κ)]^2

        This is not the sum of squares of a “new model”. It is exactly the same SS as that above with exactly the same unknown parameters and exactly the same ε’s and exactly the same relationships between variables in the data set . Describing it as “a different model invented by the critics” indicates a lack of understanding of statistical models and of the mechanics of least squares methodology in general.

        Since the two sums of squares are just two representations of the same equation, the following principle seems to be quite evident. If the presence of the hidden relationship between ΔT and ΔF in the data has no effect, then minimizing the latter SS must produce the same estimates of the unknown parameters and ε’s as the former.

        The two minimizations do not produce the same results. In particular, the residuals for the latter SS are now dependent on the individual climate model’s α: res = ε’/( 1 – b’ α) where the ‘ denotes an estimated value. This clearly indicates that there is a systematic effect on the residuals due to α which is not been accounted for in the equation coefficient c. You will also note that in this case, the residuals in fact are not the same as the estimated errors terms.

        I have pointed out exactly where the shortcomings occur when applying the standard regression calculations to the data in the comment linked by Nic Lewis. (https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-751723) The analysis using the revised form of the SS takes this into account, but unfortunately there are other effect present in the results (such as bias) which are due to the fact that the minimization procedure has become non-linear because of the circularity in the data.

        • stevefitzpatrick
          Posted Feb 14, 2015 at 12:09 PM | Permalink

          Roman,

          I does not matter how clearly you explain this problem; some people will not or cannot think it through and draw the obvious conclusion: the results of M&F are highly doubtful, and likely non-informative (AKA, wrong).

        • Posted Feb 14, 2015 at 12:38 PM | Permalink

          At CLB I wrote the following as response to RomanM. (Three successive comments:

          RomanM,
          The model is different, because you assume that the new ΔT obtained from the model should be used to determine ΔF, while the M&F assumption is that the estimate of ΔF given by the original data from CMIP5 data base gives the result that should be used at every later step.

          Which of the two alternatives is the correct choice is an issue of physics, not of statistics.

          (This was preceded by a comment of aTTP)

          In slightly different words.

          The assumption of M&F is that ΔF for each model run is obtained from ΔN and ΔT of that model run. All these values come from the CMIP5 database. They do not vary during the determination of the model. There’s explicitly no feedback.

          When the model has been determined it’s taken to be a model that links ΔF to the estimates of ΔT.

          No analysis is done related to the values of ΔN any more. ΔN is not part of the further analysis, and it cannot be part of a feedback equation.

          A few more words about the physics.

          The TOA imbalance is almost identical to the net energy flux into the ocean, because the heat capacity of the atmosphere is small. The net heat flux into the ocean varies rather strongly due to the El Niño -La Nina variability and other forms of variability that are present also in the models. Therefore N is not very stable. F is expected to be more stable. That’s possible, because surface temperatures vary due to the same processes that cause N to vary.

          Whether the values of calculated from the formula used F are, indeed, more stable that the values of N can be checked. The authors write in their response

          Not correcting for the increased back radiation would, on physical grounds, imply using N, which contains the very contribution from the surface response T that we must eliminate in our estimate of F.

          The paper Forster et al (2013) contains timeseries of the forcing obtained by this approach, but not those of TOA imbalance to compare with.

        • Frank
          Posted Feb 14, 2015 at 12:55 PM | Permalink

          Roman: It may be again worth pointing out that a more physically relevant regression model would be:

          ΔT = a + b ΔF + c /(α + κ) + ε

          If Nic is right about circularity, will the b ΔF term account for all of the variance?

        • RomanM
          Posted Feb 14, 2015 at 2:27 PM | Permalink

          Frank, you may very well be right about the inappropriateness of the choice of model in the paper, However, the problem with the circularity would still remain.

          No, the b ΔF term would not account for all of the variance because the effect variable of ΔT is masked in ΔF by ΔN.

      • HAS
        Posted Feb 14, 2015 at 3:54 PM | Permalink

        Pekka

        Sorry to harp on about this, but if M&F choose to use F as reported by Forster, then they have to use the uncertainty term he reports too. In reality Forster doesn’t report F for any particular T and N, he reports N+adT. M&F should be using the latter.

        On your own logic M&F are diagnosing what happens in the model database (thus justifying ignoring uncertainty in it); F isn’t in it and is derived from it and has uncertainty when so derived. In being derivative it is no different from internal variation so needs to get the same treatment.

        Could I too ask that you put your arguments in equations. It removes ambiguity.

        • Posted Feb 14, 2015 at 4:11 PM | Permalink

          HAS,

          I have said in several comments that I’m not defending the paper more generally. I have expressed doubts on its accuracy and reliability.

          The reasons for my doubts include both the crudeness of the linear regression model, when it’s justified as an approximation of nonlinear formulas that cannot approximated well by a linear regression model over the relevant range of variables. The reasons include also the uncertainties in the determination of ΔF as well as other uncertainties in the validity of input assumptions like the constancy of α and κ for each model.

        • HAS
          Posted Feb 14, 2015 at 4:58 PM | Permalink

          Hi Pekka

          However above you clarified your defense of it (to the extent it exists) in terms of M&F just demonstrating attributes of the model database. My point is that even on that very narrow interpretation their methodology isn’t fit for purpose.

        • Posted Feb 14, 2015 at 5:20 PM | Permalink

          HAS,

          I wouldn’t say that it isn’t fit for the purpose. I don’t have enough evidence to say that, but I can say that I’m not at the moment convinced of it’s value.

      • Greg Goodman
        Posted Feb 14, 2015 at 4:14 PM | Permalink

        It is not the tool’s fault if it is being abused.

        Mathematics is one of the highest forms of human intellectual activity and it is one of the few weapons that humanity has had in order to conquer the threats around it.

        Yanis Varoufakis.

  84. MikeN
    Posted Feb 14, 2015 at 11:24 AM | Permalink

    Surprised to see this post have so much comment.

  85. Greg Goodman
    Posted Feb 14, 2015 at 11:28 AM | Permalink

    Fig 2f from the paper showing the distribution of the regression residuals, looks to be a fairly classic gaussian distribution.

    This seems to support a result that Willis Eschenbach reported a couple of years ago: that despite their immense complexity, all models are really doing is adding random noise to a linear trend. As I already commented above, it seems the despite all the huffing and puffing and inordinate investment of time and money we are still stuck with the naive paradigm of “trend” + “noise” of 30 years ago.

    • R Graf
      Posted Feb 14, 2015 at 5:49 PM | Permalink

      Greg, I cited your comment in CLB about the complex non-linearity of the feedback variables. I had a similar thought in a comment here 4 days ago but you make a much better authority. You should post a composition of your arguments on CLM. It will sit in moderation if it is your first post an go live within 12 hours. I am wondering if you can tell me how the error span on the GMT projection graphs get calculated. Is the M&F paper saying that the error gap is wide enough now to accommodate the pause or does their result necessitate the widening of the error wedge now plotted?

  86. Posted Feb 14, 2015 at 1:08 PM | Permalink

    As part of an exchange with ATTP several days ago, (above at https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-751228) I speculated about the climate forcing/feedback contribution of clouds and whether it might, in some fashion, contaminate M&F’s analysis of feedback sensitivity (and simultaneously the model calculations of forcing).

    It appears that Piers Forster got there first. He was co-author of “Cloud Adjustment and its Role in CO2 Radiative Forcing and Climate Sensitivity: A Review.” http://www.see.ed.ac.uk/~shs/Climate%20change/Geo-politics/IAGP/Forster%20cloud%20adjustment.pdf

    Forster, et al., discuss the concept of non-feedback cloud adjustments to radiative forcing which are distinct from the aerosol-induced cloud formation forcing that is always included in climate models. The paper also mentions that at least a few models now account for various estimates of these additional cloud adjustments (non-aerosol forcing).

    Even after reading his interesting review, I’m still uncertain whether it matters with regard to my original speculation. But I wanted to post the link here in case anyone else cared to investigate cloud feedback/forcing estimates further.

    • Posted Feb 14, 2015 at 3:45 PM | Permalink

      In addition, M&F (2015) relies upon Forster, et al. (2013) (http://onlinelibrary.wiley.com/doi/10.1002/jgrd.50174/epdf ) to reconstruct radiative forcing from top-of-atmosphere radiative imbalances for a subset of CMIP5 models. Forster et al., (2013) explicitly considers rapid cloud adjustments in response to temperature changes due to initial forcing. So it appears that they considered the situation I was concerned with to some extent (at least for a portion of the CMIP5 ensemble), even though that is hard for me to determine from M&F (2015) itself. Indeed, at one point M&F state:

      That [alpha] and [kappa] in the CMIP5 models might vary with time and climate state is ignored here.

  87. Andersongo
    Posted Feb 14, 2015 at 1:34 PM | Permalink

    I seem to understand Pekka’s point. The dF value from the second equation is not fed back into the first one but rather obtained from CMIP5 data. However, I still fail to see how the analysis is still not circular. If the estimated value of dF from CMIP5 is a really good estimate, that is dF is indeed dF, and that the second equation linking dF to dN and dT holds true then the model thus obtained has an inbuilt circularity. It does not matter where the values of dF come from as long as in the context of this analysis they are still regarded to legitimately represent dF.
    Pekka’s point about dF not being calculated from dN and dT seems to bear no importance since from M&F’s main assumptions, the dF from CMIP5 (M&F model) and the dF from calculations using dN and dT purportedly RomanM’s model) are one and the same.

    • Kenneth Fritsch
      Posted Feb 14, 2015 at 3:07 PM | Permalink

      I would think that for Pekka’s responses to RomanM to proceed he would have to demonstrate his arguments in mathematical form as RomanM has done in showing the effects of circularity.

      • Posted Feb 14, 2015 at 3:16 PM | Permalink

        Kenneth,
        There’s no extra mathematics. That’s exactly the point. Nic and Roman have made extra assumptions that make the model more complicated. They have introduced circularity, when there isn’t any.

        All input to the calculation of ΔF is from the CMIP5 database. It never changes, and therefore does not create any extra mathematics on top of the basic defining formula.

        I have presented the basic formulas very many times, and so have Nic and Roman, but they have not stopped at that, but added the circularity.

        • Posted Feb 14, 2015 at 4:40 PM | Permalink

          I have more than one requests for presenting the formulas.

          They are just these two

          ΔT = a + b ΔF + c α + d κ + e, (1)

          which is the regression model with e as residual, and

          ΔF = ΔN(CMIP5) + α(CMIP5) ΔT(CMIP5), (2)

          which is used only once for every CMIP5 model run included and every period using ΔN, ΔT, and α determined from the CMIP5 database. (CMIP5, in the formula is added to emphasize that). This step is done in Forster (2013).

          The first equation is analyzed in the M&F paper by estimating first the coefficients by OLS and then to calculate each of the components as well as the residuals for use in the graphics and other reported results. Actually the values of all variables are fixed through the whole calculation, when the formula is written as in (1), only the regression coefficients and residuals are output values determined by the process.

          The main point that differs from Nic’s arguments is that the only consistent way of using (2) is to use it only once before the analysis. It’s not expected to be valid, when ΔT is not the full value from the same model run as ΔN. Therefore it cannot be used with an value of ΔT estimated from the regression formula.

        • RomanM
          Posted Feb 14, 2015 at 4:52 PM | Permalink

          So you are saying that equation (1) is NOT ΔT(CMIP5) = a + b ΔF + c α + d κ + e.

          OK, I’ll bite. How is the ΔT in equation (1) different from the ΔT(CMIP5) in equation (2)?

        • Posted Feb 14, 2015 at 5:01 PM | Permalink

          No, I wrote Actually the values of all variables are fixed through the whole calculation. Thus it is the same value, when residual is included in the formula.

          The formula without the residual could be used to predict the value of ΔT for any chosen value of ΔF, α, and κ. Thus the formula is valid without the residual for any value within a range bound by some limits of applicability. In that sense it differs from (2) which cannot be used at all for other values than those from the CMIP5 database, because the full ΔT including “weather” is available fro those values only.

        • RomanM
          Posted Feb 14, 2015 at 5:35 PM | Permalink

          No, I wrote Actually the values of all variables are fixed through the whole calculation.

          Thus the formula is valid without the residual for any value within a range bound by some limits of applicability. In that sense it differs from (2) which cannot be used at all for other values than those from the CMIP5 database, because the full ΔT including “weather” is available fro [sic] those values only.

          Is this climate scienceTM statistics?

        • Posted Feb 14, 2015 at 5:51 PM | Permalink

          Roman,
          I have realized that I could have written the formulas more systematically, essentially as you wrote in your reply to Frank, or just copying the formula (4) and the preceding unnumbered formula from the paper of M&F. (The latter is also formula (3) in Nic’s post.)

          That does, however, not change the content, which is very simple and straightforward.

        • stevefitzpatrick
          Posted Feb 14, 2015 at 6:35 PM | Permalink

          Pekka,

          Your comment posted Feb 14, 2015 at 5:01 PM, seems to a) not answer Roman’s question, and b) is impossible for me to understand. Please clarify: Is or is not ‘ΔT(CMIP5)’ in equation #2, the same as ‘ΔT’ in equation #1?

        • R Graf
          Posted Feb 14, 2015 at 6:40 PM | Permalink

          Hi Pekka, I agree with you that if the forcing that was derived at each time interval and for each model is identical to the forcing used in each realization (model run) for that model whose feedback settings were also identical then the T would cancel symmetrically. Is that what you are saying?

        • Posted Feb 14, 2015 at 6:49 PM | Permalink

          The forcings are in that analysis always the ones extracted from CMIP5 data.

          The regression formula could be used also for other externally specified forcings, but the analysis of M&F does not involve such use.

        • Posted Feb 14, 2015 at 6:53 PM | Permalink

          I may try to clarify some points later, but anymore tonight.

  88. R Graf
    Posted Feb 14, 2015 at 3:23 PM | Permalink

    I just posted this on CLB:

    Although there is disagreement on whether one can excuse avoidance of statistical orthodoxy, as I believe some are saying, by the circumstance of the physics being represented, I think it is universally agreed to be important that the physics be accurately represented mathematically. It has been pointed out by Greg Goodman at https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-751840 that the feedbacks in the models are not linear relationships. The ocean surface heat flux is known to oscillate as well as to become less responsive as temperature equilibrium with the air is approached. In addition, the change in TOA imbalance (TOAI) is itself non-linear as shown in the NASA graph at the link. Indeed this complex relationship made the forcing’s derivation a difficult task as the authors stated it is. And, it is another point of uncertainty as to whether all the model assumptions and the author’s interpretations were correct.

    Do the author’s not agree that the models, as complex as they have become, do not approach nature’s complexity yet to be modeled? In fact, the authors conclude that this unknown portion, dubbed “natural variability,” is dominant over the models in the 15-year period. But isn’t it true that the models have been constructed with 56 groups of guesses based on trying to duplicate the behavior of GST over the past 150 years? And, as Nic Stokes pointed out, since most of the models have the ocean oscillations out of phase with each other the result is basically a linear guess of forcing mixed with artificially generated amplitude of noise? Are we to understand that the purpose of this paper and all the work being done to analyze is validity, in the end, is to see whether in fact the models have had enough noise added or if they need more?

  89. Kenneth Fritsch
    Posted Feb 14, 2015 at 8:53 PM | Permalink

    I think I understand Pekka to be saying that it is sufficient to show that without “feedback” changing any variable values during the regression there is no circularity or maybe that there remains circularity but with no or little effect on the regression results – I am not sure which.

    As a layperson I was not aware that once a regression calculation commences that the values of the variables could change (without some kind of iteration process) and that would be the case – with or without circularity. The damage of circularity would appear to me to have been done before the regression is calculated. Are there any examples in the literature that would pertain to this particular discussion?

    I found this one at a web site with reference to the inflated R value for regressions involving GDP on both sides of the regression equation.

    https://statswithcats.wordpress.com/2011/04/24/regression-fantasies-part-ii/

    • Posted Feb 15, 2015 at 4:04 AM | Permalink

      Kenneth,

      That’s a correct interpretation of what I have written.

      There’s no circularity of the type discussed by Nic, if we assume that the difference ΔF – ΔN is determined by the actual surface temperature of the CMIP5 model run. This assumption is physically natural as an approximation and studied by Piers Forster and his collaborators in many earlier papers. The relationship has been found very good in some models, worse in some others (the observations I know are based on earlier CMIP3 runs, Forster and Taylor 2006).

      It’s quite possible that using both the observed and the predicted temperatures would explain the difference slightly better. If that’s a real phenomenon (not only a spurious signal that comes out of every regression at statistically insignificant level) we would have a feedback related to this correction. Such a feedback would almost certainly be so weak that it adds little to the other uncertainties of the approach.

      It’s worth noticing that Piers Forster is a real expert on these issues. He has already studied most of the issues we might propose as amateurs in this field. A real expert may make misjudgments, but amateurs are much more likely to make them.

      • Posted Feb 15, 2015 at 6:17 AM | Permalink

        Pekka,

        Your claim “There’s no circularity of the type discussed by Nic” is plain wrong. One day you will I am sure realise that. There is no point you just repeating your claims based on approximate physically-based relationships.

        There clearly is an element of circularity of the type I have pointed out. But it is certainly possible that other shortcomings in the methods used, of which I pointed out several in my article, may be equally or even more important. The decision to analyse overlapping 62 year trends rather than the trend over the entire analysis period also makes a significant difference.

        Piers Forster is indeed an expert in forcings. However, I do not think he would claim to be an expert in statistical methods.

        • Posted Feb 15, 2015 at 6:58 AM | Permalink

          Nic,

          Defining the problem is an issue of the subject science, climate science and its subfield of climate modelling in this case. Solving the problem is an issue of the method science. The subject science defines in this case the problem in the way that there isn’t any feedback. That’s explicitly true. When you follow the steps, circularity never enters.

          I have explained, what is the point where you introduce erroneously circularity to the calculation.

          You define your own problem that’s not the same M&F analyze. As experts of the subject science they have justified their choices. You refer to statistics, but statistics has nothing to say on this point. Referring to that is a moot argument.

          You have not presented a single argument to show that their choice is not well justified on that issue. You have discussed other issues that I also consider relevant issues and that Marotzke and Forster have not contested.

        • Tom Gray
          Posted Feb 15, 2015 at 7:35 AM | Permalink

          Would it be possible to set up simulations with synthetic data to quantify any effect here? That would be much more useful than appeals to authority such as

          It’s quite possible that using both the observed and the predicted temperatures would explain the difference slightly better. If that’s a real phenomenon (not only a spurious signal that comes out of every regression at statistically insignificant level) we would have a feedback related to this correction. Such a feedback would almost certainly be so weak that it adds little to the other uncertainties of the approach.

          It’s worth noticing that Piers Forster is a real expert on these issues. He has already studied most of the issues we might propose as amateurs in this field. A real expert may make misjudgments, but amateurs are much more likely to make them.

          Nature and mathematics can confound even the most erudite of experts.

        • Posted Feb 15, 2015 at 8:07 AM | Permalink

          Tom,

          Only the full climate models themselves can tell reliably about their properties. Because new model runs are possible, and because it is possible to collect more data from the new model runs, it is possible, in principle, to find the answer. In practice this is probably not a sufficient reason for making those model runs.

          Climate models will be run again, and more data will be collected. There are surely many proposals for the additions to the set of data that gets collected from the next round of model runs.

        • Tom Gray
          Posted Feb 15, 2015 at 9:10 AM | Permalink

          I was considering the significance or lack of it of the circularity and the approximation used to separate α and κ. These effects have been claimed by some to be significant and by others trivial. It would be useful to see a quantitative analysis that concentrated on the mathematics used in the M&F paper which would clarify if M&F procedures are able to produce useful results.

        • Posted Feb 15, 2015 at 9:30 AM | Permalink

          Tom,

          My answer applies fully to that. Assuming (counter factually) that all the required model runs had been done and all relevant data collected, it’s possible to figure out, whether the difference ΔF-ΔN correlates well with the ΔT from the same model runs (ΔT(CMIP)) or has a better correlation with the ΔT predicted by an appropriate regression model (ΔT(Regr)) derived from the model runs.

          If it correlates well with ΔT(CMIP) and adding ΔT(Regr) as an additional explanatory variable does not significantly improve the ability to predict ΔF-ΔN, then the additional calculations prove that M&F are right and no circularity is observed. In the opposite case circularity seems to be present.

          The earlier work that Piers Forsters and others have done tells that their assumption is justified. It does not prove that it’s the best that could be done, but lacking further information it’s the natural choice to make. There isn’t any justification to modify the model in the way Nic has done. That’s actually on extreme modification, where only the predicted ΔT is taken into account, and that extreme modification is almost certainly worse than the assumption of M&F.

      • HAS
        Posted Feb 15, 2015 at 3:52 PM | Permalink

        Hi Pekka

        Are you basically saying that T actual doesn’t appear anywhere in the derivation of F (right back through Foster and into the model outputs used), so there is no circularity when T actual gets to be used by M&F to estimate internal variability?

        • Posted Feb 15, 2015 at 4:03 PM | Permalink

          HAS,
          The values of T given in the CMIP5 database are used to calculate F. When that’s done F is fixed as if it had been listed in the database and taken directly from there. No calculated value of T enters at any stage in the determination of F, only the one taken from the database.

          Another related issue is that each value of N is used only once, when F is determined, and that was done in Forster (2013). N enters nowhere in M&F, and that’s as it should be.

        • HAS
          Posted Feb 15, 2015 at 4:43 PM | Permalink

          Pekka

          I think you answered “yes”?

  90. Greg Goodman
    Posted Feb 15, 2015 at 4:27 AM | Permalink

    marotzke-and-forsters

    The differences between simulated and observed trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings used to drive models over the longer timescale.

    So what they find is that models make such a bad job of reproducing anything on a scale of 15 years or less that the errors swamp anything else. This is not particularly surprising since this is mostly injected noise to make the output look more “climatey” in the absence of any meaningful short-term modelling of climate.

    For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends or, consequently, on the difference between simulations and observations.

    They have already noted that error swamps the shorter records, so including this in “either”, though semantically accurate, is misleading. The shorter records are not informative of anything except their own lack value.

    For the longer records, this is simply to remark the degree of bias selection that is present. Models that are presented in CMIP have all been pre-selected and tweaked to reproduce was well as they can the climate record. But there is nothing here that the orthodoxy would regards as “perturbed physics experiments”.

    That there is “no traceable imprint” of the model climate feedback simply demonstrates that the range of values used in the sub-selection of models studies is so restrained as to be dominated by other variability and errors in the models.

    This in no way proves or even suggest those values are correct. It does show that there is such a degree of selection bias in the models chosen that they are totally uninformative about the significance of their climate sensitivity on any time scale.

    Since the main point of interest is CS, this means that the CMIP group of models in totally uninformative on this question and thus not fit for that purpose.

    • R Graf
      Posted Feb 15, 2015 at 10:12 AM | Permalink

      Quoted from the Last paragraph of the Max Plank news release: “The community of climatologists will greet this finding with relief, but perhaps also with some disappointment. It is now clear that it is not possible to make model predictions more accurate by tweaking them — randomness does not respond to tweaking.”

      • R Graf
        Posted Feb 15, 2015 at 10:34 AM | Permalink

        A robust science community would have appealed to these model experts to declare a re-evaluation of the (5-95%)error band, not that it has not been a moving target, and will continue to be as the models are improved (hopefully based on better physics rather than emulation). But as this study has simply expanded the uncertainty window and undefined amount it provides no value (IMO).

  91. Greg Goodman
    Posted Feb 15, 2015 at 5:52 AM | Permalink


    Yanis Varoufakis talking about mathematical economics:

    When the delphic oracle made predictions that did not come about … the powers that be went back to the delphic oracle for another interpretation of why the previous prediction was false.

    So the problem with mythology is that there can be no event, no observation that can over-throw it because myths are reproduced simply because the web of beliefs is such, that those who believe in it will always try to explain the failure of their system of beliefs to explain, by appealing to the same system of beliefs.

    • David Holland
      Posted Feb 15, 2015 at 6:20 PM | Permalink

      Yanis Varoufakis also talking about mathematical economics:

      [It’s] not science – religion with equations.

  92. Kenneth Fritsch
    Posted Feb 15, 2015 at 10:29 AM | Permalink

    As I am putting all the data I have into form to attempt to duplicate best I can the work of M&F, I have noted that from past analysis that the deterministic part of the temperature series (secular trends) for CMIP5 models tends to be very much alike for duplicate runs of the same model. The variations in trends resulting from duplicate runs of the same individual CMIP5 model would be very much the same for the deterministic part while these runs can produce very different trends from the noise, or natural variations as termed by M&F, component of the temperature series. Based on the use of duplicate runs in the M&F analysis, I am wondering what effect these differences in deterministic and noise components within a model would have on the results.

    • Posted Feb 15, 2015 at 10:56 AM | Permalink

      Kenneth,

      Using the same model and same input data on all externally determined factors that cause forcings should result in the same predicted temperatures and same estimated forcings (ΔF) if the method works perfectly.

      Thus your observation seems to be a partial confirmation of the correctness of their approach and evidence that there isn’t circularity that would result in erroneous results.

      • RomanM
        Posted Feb 15, 2015 at 11:26 AM | Permalink

        This would be the case for any regression. It cannot not confirm anything

        Apparently you are not aware that the correctly done analysis also has the same property. This should have been obvious due to the fact that the same sum of squares is being minimized with respect to the same variables.

        • Posted Feb 15, 2015 at 11:49 AM | Permalink

          RomanM,

          Let’s have a closer look.

          We have two model runs based on the same external input, but with different internal variability, i.e. with different residuals, different ΔT and different ΔN.

          As the calculated temperatures are nearly the same, and the regression coefficients are common, we can conclude that also the ΔF values are essentially the same. As the internal variability and residuals are different, we know that the observed ΔT values and ΔN values are different in such a way that their differences cancel in the calculation of ΔF.

          That’s exactly the no-circularity M&F case. That case is not consistent with the assumption that the difference ΔF-ΔN is determined by the predicted ΔT.

        • davideisenstadt
          Posted Feb 15, 2015 at 12:20 PM | Permalink

          Roman:
          First, a compliment…you are patient and competent and articulate.
          Now, if the concept that the same variable shouldn’t show up on both sides of the equation in a regression scheme doesn’t strike one as…incorrect, then where can you go with the dialogue?
          People who should know better, who would know better, feel its ok to violate the basic assumptions that form the foundation for statistical analysis.
          The concept of ensemble means, as if the models were independent…
          Using linear regression to model nonlinear functions….
          Using decentered PCA,
          looking at paleoproxies right side up, or upside down, depending on what gives you a better R-squared,
          these are the hallmarks of this field today.
          Its really no different than marketing derivative investments, made up of portfolios of sub prime mortgages, and arguing that the performance of all those sh#tty pieces of paper were actually independent from each other.
          One violates the basic assumptions necessary for linear regressions at one’s own risk, or in this case our own risk.

        • Posted Feb 15, 2015 at 12:39 PM | Permalink

          The same variable on both sides leads to an equation that must be solved for that variable. The same constant value on both sides leads to nothing special.

          For this analysis and for the physical model taken as starting point we have the same constant values on both sides during the task of determining the regression coefficients. That the constant values are marked by a symbol does not make their values vary as in an equation to be solved.

          In the phase of producing predictions from the resulting formula we have directly ΔF on the right hand side. ΔN does not enter, neither does the variable ΔT appear on the right hand side. In this phase ΔF appears only on the left hand side.

        • sue
          Posted Feb 15, 2015 at 12:59 PM | Permalink

          Pekka< aren't you surprised that the authors of the paper are not here nor at the lab defending their work?

        • davideisenstadt
          Posted Feb 15, 2015 at 1:02 PM | Permalink

          pekka
          I appreciate you engagement on this issue…we just dont see eye to eye, as it were…

        • davideisenstadt
          Posted Feb 15, 2015 at 1:04 PM | Permalink

          delta T isn’t a constant, its a variable…used to comput an approximation of F, and then used on the left side of the equation is a regression scheme.

          that you can’t, or won’t see this, is puzzling, not only to me, but to others who still teach in the field we are discussing.

        • Layman Lurker
          Posted Feb 15, 2015 at 1:05 PM | Permalink

          Pekka:

          That’s exactly the no-circularity M&F case. That case is not consistent with the assumption that the difference ΔF-ΔN is determined by the predicted ΔT.

          You have me confused. By predicted dT I presume you mean the fitted values? It has never been suggested that the fitted values determines the dT used in Forsters derivation of dF has it? What has been suggested is that model dT is regressed on a linear function of model dT? Am I missing something?

  93. R Graf
    Posted Feb 15, 2015 at 12:48 PM | Permalink

    Pekka, Thanks for your herculean devotion to help here. Just a few questions:

    Wasn’t the difficult task and accomplishment of Forster (2013) the diagnosis of F by having to not just consider the TOA imbalance supplied by the model input spec. but to make adjustments to F to account for the feedbacks too? I think the confusion is that Forster (2013) tried to strip away the feedbacks from what the was labeled AF adjusted forcings to get back to RF radiative forcing. If he had not there would have been complete circularity. To the extent that he was successful could perhaps aid the 2015 paper’s objective (which is puzzling IMO). Now to the extent that Forster fails in his approximation of true RF that amount gets contributed to “natural variability,” which is not so bad except that is exactly what Forster 2015 is trying to quantify. I think if you clarify this it will answer a lot for a lot of people. Here is the link to Forster (2013) and the forcing issue is on lines 40-60ish. http://www.atmos.washington.edu/~mzelinka/Forster_etal_subm.pdf

  94. Posted Feb 15, 2015 at 2:51 PM | Permalink

    I try to explain once more, how the M&F approach proceeds, and what is the physical assumption that makes it free from circularity. It is, indeed, dependent on an assumption or hypothesis, but this hypothesis is justified by physical understanding and earlier research.

    The alternative that Nic proposes is based on a different hypothesis, and this different hypothesis lacks comparable support, and is actually not consistent with observations, including evidently also the calculations of Kenneth.

    ==========

    We start with the model run ensemble CMIP5. The essential part of that ensemble consists of 75 model simulations of the climate history done using 18 different models. That includes 10 simulations from two models, less from the rest, only one from 4 models. In all model runs input data of factors that cause forcings is used to make the history to correspond approximately the real history of forcings.

    Models results from other calculations done with the same models (like determining what results from quadrupling the CO2 concentration) have been analyzed to determine parameters α and κ for each model. These parameters are equal for each model run of the same model.

    Now we switch to the history runs. Some of the calculated values are collected and stored in the CMIP5 database. For the 75 model runs included in this analysis those values include surface temperature T and TOA imbalance N, but they do not include forcings. (Other models were dropped due to missing data.) Forster et al determined Adjusted Forcings (called Effective Radiative Forcings, ERF in M&F). These forcings can be approximately determined from the TOA imbalance by subtracting the influence of warming of the surface relative to a reference period. The surface radiates the warmer it is at the moment. That doesn’t depend on the cause of the temperature. Internal variability is of equal importance as longer term trends. Therefore the right subtraction is based on the actual temperature that is found from the database. The coefficient that applies to this calculation is α:

    F = N – αT   (1)

    Here the values are deviations from the reference period, where all are 0 by definition.

    It’s essential to notice that the logic of this subtraction requires that T is the real temperature at the moment as stored in the CMIP5 database. It cannot be changed or recalculated by the regression model without making the formula fail. The formula is not exact, but it fails definitely badly if a value of T that’s influenced strongly by internal variability is replaced by the estimated average value for that particular time without the contribution of the variability. Emission does not know about some average, it’s determined by the actual temperature. I emphasize this point so much, because this is the source of the controversy here.

    Now it is possible to start the regression analysis. M&F present the hypothesis that the average surface temperature excluding influence of internal variability varies in the different models so that the trend of the model i over a given period j can be estimated from the linear formula

    ΔT[i,j] = a[j] + b[j]ΔF[i,j] + c[j]α[i] + d[j]κ[i]    (2)

    We see from the indices that the coefficients a, b, c, and d are the same for every model, when the period is the same, but different for each period. α and κ are different for each model, but the same for every period. ΔT and ΔF depend on both the period and the model.

    This formula gives a prediction for the temperature trend. That’s the expected average temperature for the case, if the model is correct. The coefficients a, b, c, and d are determined by the requirement that the combined deviation of the predictions from the observed values of the database is as small as possible. This means in practice that the sum of squares of the predictions is minimized (this is OLS or ordinary least squares analysis). I denote the values found by capital letters A, B, C, and D. Now we can calculate for each case the predicted value

    ΔTpred[i,j] = A[j] + B[j]ΔF[i,j] + C[j]α[i] + D[j]κ[i]   (3)

    We can also estimate, how much internal variability has contributed to the observed values as the difference

    ε[I,i] = ΔT[i,j] – ΔTpred[i,j]   (4)

    where ΔT[i,j] is the observed value.

    Where is the circularity? It’s not anywhere in this calculation, and this calculation is correct, when it is assumed that forcings must be determined by the formula (1) using the real observed temperatures as I argued, it must be determined. Nic introduced circularity by assuming that forcings must be redetermined from TOA imbalance using the predicted temperatures. When that assumption is put in the formulas ΔTpred[i,j] occurs on both sides of the formula that replaces formula (3), but that’s wrong. Forcings must be calculated using observed temperatures that contain the contribution of internal variability and that correspond to the same physical case than the value of N, not from predicted temperatures that contain only the average that didn’t really occur and that would have changed N if it had occurred.

    • Kenneth Fritsch
      Posted Feb 15, 2015 at 3:57 PM | Permalink

      “Where is the circularity?”

      In equation 1.

      M&F use primed variables to indicate variation from the ensemble and they regress on that value. Do you agree? Not that it changes the circularity issue.

    • stevefitzpatrick
      Posted Feb 15, 2015 at 4:02 PM | Permalink

      The circularity is obvious if you substitute for ΔF[i,j] using the Forster et al formula:ΔF[i,j] = ΔN[I,j] – αΔT[i,j] , which is equivalent, but using your more clear notation.

      Now when you calculate the error:
      ε[I,i] = ΔT[i,j] – ΔTpred[i,j]
      The actual calculation is:
      ε[I,i] = ΔT[i,j] –( A[j] + B[j]{ΔN[I,j] – αΔT[i,j] }+ C[j]α[i] + D[j]κ[i] )

      In other words, the predicted ΔT depends on the modeled ΔT, and the calculated ‘error’ comes from subtracting a linear function of the modeled ΔT from itself. In other words, circular.

      • Posted Feb 15, 2015 at 4:13 PM | Permalink

        Steve,

        I made my best to explain that it’s wrong to substitute for ΔF[i,j] a formula that contains any other ΔT[i,j] than that obtained directly from the database. The database contains that value of ΔT[i,j] that gives the best estimate for difference between ΔF[i,j] and ΔN[i,j].

        Because that substitution was done already in Forster (2013) and can never change, the whole formula is not needed in M&F. Picking the value ΔF[i,j] from there, is all that’s needed.

        • rwnj
          Posted Feb 15, 2015 at 4:36 PM | Permalink

          Whether or not you explicitly write delta F as a function of delta T, does not the variation of delta T occur in delta F? And what’s worse, do not random errors in the estimate of delta T propagate into delta F? I had a correspondence with ATTP at CLB in which he asserts that delta T is statistically independent of delta F, but I do not understand the assertion at all.

        • rwnj
          Posted Feb 15, 2015 at 4:40 PM | Permalink

          I would tend to agree that there is no pernicious circularity here if I understood that dF is uncorrelated to dT, possibly because dF = dN – adT asserts some physics identity that necessarily cancels the covariance of dT with dF. Even then, in the presence of measurement error, the equation itself will create covariance between dT and dF.

        • Posted Feb 15, 2015 at 5:56 PM | Permalink

          rwnj,

          Many kind of correlations and relationships between variables are present in this analysis. Correlations that relate ΔT to ΔF, α and &kappa are the subject of the study.

          ΔN is not part of the physical model being studied and not subject of this study. It has entered only in earlier step of another paper (Forster et al 2013), where the value ΔF has in the CMIP5 model runs is determined with help of the recorded values of N. The determined value of ΔF is by nature input data to the M&F regression analysis as are α and κ. All this input data is determined through procedures that involve the values of surface temperature as well as other output that the CMIP5 models have produced. They are all properties of the models and model runs and they are in no way dependent on the results of further analysis done using them as input. The M&F published in 2015 does not change the results of Forster et al 2013.

          There’s no circularity that goes back from 2015 to the earlier results of 2013 or tells about any needs to modify those earlier results. Circularity would mean that those earlier results must be modified.

        • Layman Lurker
          Posted Feb 15, 2015 at 6:23 PM | Permalink

          Pekka:

          I made my best to explain that it’s wrong to substitute for ΔF[i,j] a formula that contains any other ΔT[i,j] than that obtained directly from the database. The database contains that value of ΔT[i,j] that gives the best estimate for difference between ΔF[i,j] and ΔN[i,j].

          Because that substitution was done already in Forster (2013) and can never change, the whole formula is not needed in M&F. Picking the value ΔF[i,j] from there, is all that’s needed.

          Where has it been suggested that some other dT[i,j] that is not “obtained directly from the database” is being used to derive dF? Does your argument come down to this? I believe this goes back to Roman’s question to you from the other day. How are the values for dT obtained by F13 different from the values of dT used as predictand input for running the M&F regression? It seems to me that you might be confusing the fitted values (regression output) for the predictand.

          Because that substitution was done already in Forster (2013) and can never change, the whole formula is not needed in M&F. Picking the value ΔF[i,j] from there, is all that’s needed

          You are off the mark here. Because M&F use the derivation of dF from F13, one can substitute dN – dT[i,j] for dF to show the circularity in the regression. There is no escaping the circularity. The regression uses the same dT[i,j] values obtained from CMIP5 as both predictand and predictor (by virtue of the substitution) to perform the regression.

        • davideisenstadt
          Posted Feb 15, 2015 at 6:40 PM | Permalink

          layman…this point has been made numerous times in this thread, i fear you will also not be successful in articulating this to Pekka.

        • Posted Feb 15, 2015 at 6:50 PM | Permalink

          When you do the substitution you imply that the temperature in the substituted expression is used to determine F. If that temperature is from the database, nothing gets modified. If it’s something else the model is modified and a temperature is used that does not correspond to the value of N of the same substitution. When nothing gets modified, no circularity is introduced. The other causes circularity, but is against the physics based requirement that N and T must come from the same case.

        • HAS
          Posted Feb 15, 2015 at 7:16 PM | Permalink

          Pekka

          This suggests that your answer wasn’t “yes” above, you are acknowledging that T-model is used both to determine F and then again to estimate internal variability. You were using the term observed temperatures to refer to “observed in the models” not the actually observed T (my T_actual).

          The problem then comes back to my earlier point that F from Foster isn’t the same as N+dT (where N and T are from the models). F is an estimate from linear regression so we have an error term to deal with. Either T is the same throughout in which case your statement that N is only used once breaks down (it bounces all over the place), or the effective T gets modified.

          In either case things are getting modified, and as you acknowledge circularity is introduced.

        • Layman Lurker
          Posted Feb 15, 2015 at 7:20 PM | Permalink

          When you do the substitution you imply that the temperature in the substituted expression is used to determine F.

          Correct. And basically the F13 derivation right?

          If it’s something else the model is modified and a temperature is used that does not correspond to the value of N of the same substitution. When nothing gets modified, no circularity is introduced.

          This is wrong. Nothing is modified wrt the regression. The circularity concern is in the regression. You obviously disagree with me so let me ask….again, how does the CMIP5 dT values used by F13 to derive dF differ from CMIP5 dT values used as predictand input of the regression in M&F?

        • R Graf
          Posted Feb 15, 2015 at 8:08 PM | Permalink

          Layman, It just sunk into me again this morning that there is no fast talking around the circularity. I mapped out the effect of T being different in 2013 as in 2015. It’s not good. It’s just algebra. We got blinded by statistics. 🙂
          Please reply at the bottom of the post on my formula derivation if you agree or need to correct anything.

          -Ron

        • fizzymagic
          Posted Feb 16, 2015 at 12:42 AM | Permalink

          Pekka,

          Thinking about your responses today, I realized that you are not understanding the circularity argument. Perhaps I can clarify.

          You seems to be saying that since the values of ΔF do not change during the regression, but are fixed beforehand, there is no circularity. But that is not the argument!

          The circularity is in the regression that estimates coefficients for terms, one of which is an explicit function of the regression’s independent variable!

          Let me give a very basic example. Suppose you have a set of measurements Θ that represent some physical quantity. Suppose there is another physical quantity Λ that can be estimated from Θ as a result of conservation laws. To make this as obvious as possible, let’s make it a simple linear relationship:

          Λ = α Θ

          So you make estimates of the values of Λi from the measurements Θi and stick them in a database somewhere.

          Now somebody proposes a new model that says that Θ is a linear function of several parameters, say Λ, Γ, and Ξ. the model then looks like this:

          Θ = a0 + a1Λ + a2Γ + a3Ξ

          You regress on the observed values for Θ and discover that the only parameter that matters is a1. Now, according to your posts above, you can validly claim that this counts as experimental evidence that Θ is a function of Λ, since the values of &Lambda were taken from a database and not changed during the regression!

          Unfortunately, you would be wrong, as the regression I just described is perfectly circular. The conclusion that emerged was completely fallacious and did not depend in any way on the values of Λ changing during the regression.

          My example is extreme, but that is the circularity we are discussing. And it most assuredly is present in the paper.

        • R Graf
          Posted Feb 16, 2015 at 12:52 AM | Permalink

          Fizzymagic, I think you discovered the parrot is nailed to its perch. Look at the bottom of the thread and I wouldn’t mind if you could find that great old skit on youtube or somewhere and post it. Good night.

        • HAS
          Posted Feb 16, 2015 at 1:02 AM | Permalink

          fizzymagic

          I think the problem here is that people are asserting things (eg your penultimate para) without demonstrating them.

          The simple way I’ve been trying to do that is to draw the distinction (that your example hides) that the Λ calculated by your first equation is only an estimate of the Λ used in the second. If your first equation was an identity then you could gather terms together in your second equation and happily go on your way.

          It is because it isn’t an identity that it breaks down. Your first equation has an error term/residue and if you want to use Eq 1 to help with Eq. 2 you have to include it. So the system becomes:

          Λ = α Θ + residue

          Θ = a0 + a1(α Θ + residue) + a2Γ + a3Ξ

          My hope in doing this is that one doesn’t need to simply assert things that depends on a knowledge of stats to see the problem.

        • Posted Feb 16, 2015 at 2:26 AM | Permalink

          I did in my latest lengthier comment an attempt to explain the case as clearly and thoroughly as I can. The later discussion shows that I cannot explain that any better by additional comments. I give up on trying. It’s all in that comment.

          One detail that I might still try to clarify concerns misunderstandings related to the use of words model and observed.

          Model may refer both to the original GCM’s, whose results are in the CMIP5 database and to the regression model. It seems that some of my statements have been interpreted to refer to the regression model, when they have referred to the archived GCM results.

          Observed is in this analysis never observed from the real world, it’s observed in the GSM results stored in CMIP5 database. I have used in several situation the world model in connection of such values. I have tried to make it clear that the values are from GCM runs and thus also observed, but evidently failed in some cases. This ambiguity could be corrected that by saying that ΔF must be always calculated from observed values of CMIP5 database, never from anything that comes out from the regression model.

        • Layman Lurker
          Posted Feb 16, 2015 at 2:41 AM | Permalink

          Pekka, thanks for your participation in this discussion. Obviously there is argument and disagreement but also respect. This is a discussion thread that will be bookmarked by many.

        • HAS
          Posted Feb 16, 2015 at 4:08 AM | Permalink

          Pekka

          Regrettably Foster 2013 calculate F from a regression model using values from the CMIP5 database. It isn’t an observation from that database.

          It still isn’t clear to me if you regard this as acceptable on not.

          (I should note that F wasn’t the dependent variable in that model, N was – that adds another complications – but for the sake of exposition we can put that aside).

        • Posted Feb 16, 2015 at 5:04 AM | Permalink

          I realized that there’s one crucial point that I have emphasized in some earlier comments but not in my latest long comment. This long comment is actually misleading on this point, as I used ε[i,j] as symbol of the residual. I should have kept my earlier e[i,j] (the indices where also erroneous) and emphasized that e is not random error or noise typical for most use of regression but as real physical contribution as everything else, internal variability of unknown nature that affects the values of N observed from the GCM run.

          The hypothesis is that the original GSM runs produce results that can be expressed as:

          ΔT = ΔTpred(ΔFreal, α, κ) + e   (1)

          ΔN = ΔFreal – α ΔT + error   (2)

          where error is the inaccuracy of the second formula. Thus also

          ΔN = ΔFreal – α(ΔTpred(ΔFreal, α, κ) + e) + error   (3)

          When (2) is inserted in the formula used in Forster (2013)

          ΔFest = ΔN + αΔT   (4)

          we get

          ΔFest = ΔFreal + error   (5)

          We see that under this physical assumption the term error is left in formula (5), but otherwise the value of estimated ΔF depends only on the real ΔF. The terms that would have caused circularity cancel out at this level.

          This is the model of M&F.

        • RomanM
          Posted Feb 16, 2015 at 8:17 AM | Permalink

          What Pekka was trying to tell us earlier was that in fact the ΔF which in reality is an estimate of the actual forcing in the model was being treated as the exact value of the forcing in the model at that stage. Because of the relationship ΔF = ΔN + αΔT, this implied that for a fixed ΔF, a change of δ in ΔT either in the modelling process or in the analysis procedures must correspond to an exact change of -αδ in ΔN thereby supposedly masking some of the circularity from the use of the same data in the derivation of the estimating equation earlier.

          What happens if we do not ignore the error in ΔF? I will use my own notation rather than what Pekka used to minimize any confusion. Let ΔF = ΔF_real + φ where φ (phi) is the error in estimating the actual forcing, ΔF_real. Both ΔF_real and φ depend only on the specific model they come from and it can be noted that the distribution of the φ’s may differ from model to model. Now we look at the equation used by M and F:

          ΔT = a + b ΔF + c α + d κ + ε

          This equation assumes that a very specific set of models is being considered. The values of a, b, c and d depend strongly on the specific models chosen and the ε’s are supposed to be only the “internal variation” from the individuals model ΔT’s after accounting for various characteristics of the model. If we now include the errors in ΔF and rearrange terms:

          ΔT = a + b(ΔF_real + φ) + c α + d κ + ε = a + b(ΔF_real) + c α + d κ + (ε + bφ)

          The “internal variation” of the models has become conflated with the error in the estimation of the ΔF’s and the resulting residuals from the regression procedure will overestimate that internal variation. The fact that the φ’s have possibly different distributions will produce heteroscedasticity (unequal variance in the random portion of the regression equation) meaning that some models may inordinately dominate the regression procedure calculations.

          This does not reflect well on the claimed “robustness” of the results in the paper.

        • fizzymagic
          Posted Feb 16, 2015 at 6:15 AM | Permalink

          HAS,

          You totally, completely, absolutely missed the point. Actually, in much the same way as Pekka seems to.

          I am not asserting anything that someone who has taken an introductory statistics class should not understand.

          The residuals from the initial estimate have zero impact on the circularity. Zero. None. They could only be significant if you knew what they were, but you don’t, which is the entire reason for using the estimated values for (in my example) Λ.

          Because you have already used the values of Θ to estimate the values of Λ, you cannot turn around and attempt to estimate (via regression) the values of Θ again from the estimated values of Λ. That is circular. It has nothing to do with residuals are the quality of the estimates or the fact that the estimates are fixed in the regression or anything else.

          This is stuff any student who has completed a basic statistics course should know. I am absolutely stunned that people who are apparently serious researchers in climate science cannot grasp it!

        • Posted Feb 16, 2015 at 9:00 AM | Permalink

          Roman,

          I think we agree now on the technicalities. At least I understood your comment in a way that’s consistent with my thinking.

          The model that M&F propose is based in hypotheses that have not been tested and probably cannot be tested without further GCM model runs. M&F have presented some justification for their hypotheses, but some justification is not equivalent to good evidence for the sufficient quantitative validity of the models.

        • andersongo
          Posted Feb 16, 2015 at 9:54 AM | Permalink

          I dispute RomanM’s idea that the fact that for ΔFest = ΔN + αΔT , ΔFest is assumed to be exactly ΔF_real means that the ΔT terms are effectively cancelled on further substitution as asserted by Pekka. The goal is to quantify the error e from the regression so as to assess internal variability. Recalling Pekka’s proposed means of cancelling the ΔT terms:

          ΔN = ΔFreal – α ΔT + error (1)

          Forster’s approximation:

          ΔFest = ΔN + αΔT (2)

          Now,

          ΔFest = ΔF_real + e*

          From M&F standing assumption:

          e* ~ 0

          Thus, as used in the M&F model,
          ΔFest = ΔF_real

          ΔN = ΔF_real – α ΔT + error
          = ΔFest – α ΔT + error (3)

          M&F regression equation:

          ΔT = ΔTpred(ΔFreal, α, κ) + e (4)

          Rearranging RHS

          ΔT = a + b ΔFest + c α + d κ + ε
          = a + b(ΔN + αΔT) + c + d κ + ε (5)
          After determining coefficents:

          X = A + BΔFest + Cα + Dκ + ε

          However, because of the obvious circularity in (5), the OLS breaks down and X, whatever it is, is not identical to ΔT.

          What if OLS was still applicable in the face of circularity and that values can be fitted? Well, Pekka’s proof will proceed unimpeded:

          ΔT = X
          = A + BΔFest + Cα + Dκ + ε (6)

          ΔN = ΔF_real – α(A + BΔFest + Cα + Dκ + ε) + error
          = ΔF_real – α(ΔT) + error
          = ΔFest – α(ΔT) (7)
          In which case, using (2),
          ΔFest = ΔF_real + e* with no contribution from terms leading to circularity.

          However, as hammered in since the start, the problem is mathematical and not physical. Since the OLS does break down due to circularity X is not equal to ΔT and thus cancellation cannot proceed. Note that this is due to the fact that OLS is susceptible to circularity and if another method was used or the equations were rearranged in less problematic ways, M&F procedure would not be subjected to the circularity objection.

        • m.t.
          Posted Feb 16, 2015 at 12:09 PM | Permalink

          First of all, so that these are in one place:

          Gregory 2004 which introduces the N=F+aT equation.
          Forster & Gregory 2006 which expands the equation (then ignores the expanded terms) and applies it to observations of N, F, and T to get a.
          Forster & Taylor 2006 which analyzes 20 GCMs.
          Forster et al 2013 which analyzes CMIP5 GCMs.

          Pekka, you say that Forster’s equation is “ΔFest = ΔN + αΔT”. F2013 calls “Fest” in the equation AF, or “adjusted forcing”. FT2006 called it “climate forcing”. Climate forcing in FT2006 is described as containing the forcings plus internal feedbacks, scaled with a efficacy factor. So it seems to me that the calculated F is not a physically pure estimate of forcings at TOA, but contains information about the model’s response to the true external forcings.

        • HAS
          Posted Feb 16, 2015 at 1:57 PM | Permalink

          fizzymagic @ Feb 16, 2015 at 6:15 AM

          Hi

          While what you said might be obvious to you much of what has been going to and fro on this post has been punters just asserting things. What’s needed is some demonstration of why stats says this is a problem and the fact of the error term helps make that point explicitly. At least it has led to Pekka and Roman getting to some form of agreement, although I still not sure that Pekka has accepted that this represents a methodological error, just that a hypothesis is as yet unproven.

          And I’m sorry but it is the error term that causes the problem – if your first equation was an identity (no error) there would be no problem doing the substitution. Think of a chnage in units as an example.

      • andersongo
        Posted Feb 16, 2015 at 5:37 AM | Permalink

        Equation (1):

        ΔT* = a + b.ΔF + + cα + dκ + ε

        Equation (2):

        ΔF = ΔN – α.ΔT**

        The claim is not that fitted values of dT obtained from the regression is fed back to equation 2 to calculate dF. This is a red-herring.

        The claim is that, for the requisite time periods,

        ΔT* = ΔT** (3)

        If the identity (3) is true, then there is an obvious circularity. As an aside, I also find it self-serving that any discrepancy ε is apriori defined as internal variability: it seems that M&F are working on the basic asumption that the models are right while at the same time investigating whether they are indeed representative of observed climate (they are claiming in effect that models = real trend + unquantified noise as opposed to models = wrong trend + unquantified noise. They are testing whether the first proposition is possibly true as opposed to testing whether it is obligatory true and that the second one is plain wrong). This does not warrant their conclusions imo.

        • Posted Feb 16, 2015 at 6:10 AM | Permalink

          andersongo,

          They are not assuming that the GSMs are right, they are studying the properties of the ensemble of GSMs that may be right or wrong.

          What they are assuming is that it’s a good enough approximation to use the formula

          ΔF – ΔN = αΔT

          as valid for the GSMs considered.

          They have support for that assumption from their earlier studies, but they acknowledge that the formula is not exact and contributes to an error.

        • R Graf
          Posted Feb 16, 2015 at 6:17 AM | Permalink

          Pekka, Andersongo, I’m afraid the only forcing involved has been that of a psychological experiment of sorts. We have failed to accept the obvious: This is a circular equation.

          It matters not about anything else once the right hand’s value depends on the left’s, but the left’s depend’s on the right. It doesn’t matter if you know you have one less fingernail on the left you still cannot say how many fingers you have or hands for that matter. It is the definition of a circular equation. “This is an ex-equation.” “Hello Polly!”

        • Posted Feb 16, 2015 at 6:24 AM | Permalink

          R Graf,

          Every equation can be made apparently circular by adding terms that cancel out. Such circularity is spurious. That’s the case also here. There’s apparent circularity, but that’s spurious.

          Nic added terms by inserting a formula to replace one variable. My above comment explains, how a further insertion can be made in the insertion of Nic. Doing both insertions, the circularity cancels out. Thus it was spurious, not real under the physics based hypothesis of M&F.

        • R Graf
          Posted Feb 16, 2015 at 6:32 AM | Permalink

          Pekka, thanks for your reply. I understand what you say and it is in fact the basis of algebra to add terms to both sides in order to help simplify. But you need at at least one completely independent variable. You need a valid equation underneath. That is the rub.

        • Posted Feb 16, 2015 at 6:43 AM | Permalink

          R Graf,

          I included in the equation the term error. I included also the term e in another place. When these terms are included, the equations are exact. We can follow, how these unknown terms affect the outcome. I have shown that.

          To the extent the term error contains dependence on other variables, it may cause circularity, but that’s not the circularity that Nic has presented.

        • R Graf
          Posted Feb 16, 2015 at 6:49 AM | Permalink

          Pekka, Have you looked at my algebra at the bottom of the post? There is no independent variable. Mathematics requires one.

          “The first principle is that you must not fool yourself and that you are the easiest to fool.”
          -R. Feynman

        • redcords
          Posted Feb 16, 2015 at 8:01 AM | Permalink

          It’s pining for the ΔFjorcings.

        • HAS
          Posted Feb 16, 2015 at 2:05 PM | Permalink

          Pekka @ Feb 16, 2015 at 6:10 AM (and andersongo from earlier thread)

          “… they are assuming is that it’s a good enough approximation to use the formula”

          Actually Forster (2013) reports the errors in the estimate of F and they are not insignificant.

    • andersongo
      Posted Feb 15, 2015 at 4:06 PM | Permalink

      “It’s essential to notice that the logic of this subtraction requires that T is the real temperature ………….where ΔT[i,j] is the observed value…….”

  95. Kenneth Fritsch
    Posted Feb 15, 2015 at 2:58 PM | Permalink

    I have decided to go a different route with my analysis of the M&F paper result in partitioning the deterministic and noise parts of the historical temperature series. I have found that going with the original plan would require downloading the appropriate rsdt, rsut and rlut data from dkrz for the historical CMIP5 model runs. I have done this for the RCP4.5 runs and it was a lengthy and time consuming task for me. I would download the data from KNMI but I have had differences in the converted gridded data in nc files to global means between my latitude weighting and KNMI’s. I have lost contact with KNMI over the past week or so on this issue after explaining what I thought was the source of the difference. KNMI had the RCP and piControl runs in a form for easy automated downloading of the radiation variables, but unfortunately not for the historical runs.

    My new plan is to decompose/reconstruct the CMIP5 historical temperature series into secular trend which does not assume it is linear and into cyclical and red/white noise residuals without making assumptions that could confound the variation from the various series components. I’ll use singular spectrum analysis with functions from R. I will determine the differences in components from model to model (and/or runs) in the manner of M&F and on an individual model and model run basis.

  96. Kenneth Fritsch
    Posted Feb 15, 2015 at 3:12 PM | Permalink

    “The alternative that Nic proposes is based on a different hypothesis, and this different hypothesis lacks comparable support, and is actually not consistent with observations, including evidently also the calculations of Kenneth.”

    My calculations only show that for a given model with multiple runs the net TOA radiation outputs (accumulation rate or trends) are very much the same as well as the trends in the potential global sea water temperature (OHC changes). From model to model these outputs are in most cases very different. These findings are based on analysis of RCP4.5 runs – which showed that most CMIP5 models do not balance the TOA energy budget given that most of that difference should show in the difference in OHC.

    I do not see how this finding bears on your argument here. I only posed it because it is my understanding that M&F used multiple model runs where they existed.

    • Posted Feb 15, 2015 at 3:29 PM | Permalink

      Kenneth,

      I explained in my reply to RomanM, how I understood your earlier comment, and what that would imply. Now I’m confused about, what you did actually observe. I don’t know, whether my earlier interpretation is correct or not.

      In short, how I understood your earlier comment is:

      1) You have repeated the analysis of M&F determining the regression coefficients.

      2) You have observed that model runs based on the same model produce predicted temperature trends that match essentially more closely than the temperature trends determined from the original data of the CMIP database for the same models.

      Is that correct, or did I misunderstand what you have done and observed?

      • Kenneth Fritsch
        Posted Feb 15, 2015 at 4:09 PM | Permalink

        No, it is not correct. My comments on what I did are as I posted above. I have not repeated M&F and in fact do not plan to do it unless I can readily get my hands on the historical forcing data. I believe SteveM has requested that data from M&F. On further thought it may not be a good idea to repeat a flawed method except to look in better detail at the raw data used. I am going to try an alternative method (SSA) to do what M&F attempted, i.e. separate the deterministic part from the internal variability parts – and I think with fewer assumption. Selecting principle components will be, I think, my main problem.

        Interesting the Michael Mann started his climate science career doing spectral analysis on climate series and more or less stopped after publishing the hockey stick paper.

        • Posted Feb 15, 2015 at 4:22 PM | Permalink

          Kenneth,

          In that case all my references to your calculation can be forgotten.

          My misunderstanding resulted, however in the realization that comparing model runs that differ only by the internal variability (i.e. by the initial state so that internal variability ends up different) would offer a test of the accuracy of the approach. The test were not a full test, but an interesting partial test anyway. I don’t know, whether the model runs contain such repetitions. Alternatively the input forcings may differ, when the same model is used in several runs. That would not provide for an equally easily interpretable test.

  97. R Graf
    Posted Feb 15, 2015 at 4:57 PM | Permalink

    Pekka,

    I will leave to Roman and the statistics people the acceptability of performing linear regression on non-linear variables in the manner done, and problems with having your independent variable derived from a derivation of essentially the same equation backwards earlier. But if I am not understanding the following many are not. My question is not Nic’s. Mine involves your assumptions when you wrote above:

    “These forcings can be approximately determined from the TOA imbalance by subtracting the influence of warming of the surface relative to a reference period. The surface radiates the warmer it is at the moment. That doesn’t depend on the cause of the temperature. “

    The conservation of energy equation:

    F = N – αT (1)

    Radiative forcing (F) = TOA imbalance (N) – temperature (T) * feedbacks (a, k, and unknown)

    Forster approximates TOAI by simulating a massive (4x) dose of CO2 and assuming you will see a maximum imbalance and can diagnose the relationship between T, N and feedback components. This is his expertise. Forster employs linear regression here and perhaps intuition and unknown abilities to coax out the values of N and feedbacks. Let’s remember in real life these feedbacks are non-linear, some longterm, some short-term, some permanent but temperature dependent. And from all this deduces F, or RF, radiative forcing.

    Now two years later the degree of feedback and RF becomes increasingly important as the pause continues so we need to see if the models are still performing as programmed or do they need to be once again adjusted, like from to CMIP5 to CMIP7 (or something).

    So M&F get together and use some existing calculated F values from 2013, toss a couple, and create 4 fresh ones. By the way, they add 16 new models so I don’t understand why they didn’t need to make 16 new forcings. They do 114 runs to create new data and now instead of deducing all the RF and feedbacks the way Forster did they do it by direct equation. The obvious question becomes what is the innovation over Forster 2013? Why don’t they diagnose feedbacks the way Forster did before? Why use the equation that just oversimplifies and compounds and earlier error?

    My point is that any error that is compounded will be attributed to “natural variability,” (which are unknown feedbacks.) Getting good clean numbers on natural variability seems particularly relevant since this again was a major point of the exercise.

  98. R Graf
    Posted Feb 15, 2015 at 7:07 PM | Permalink

    I’m not sure if somebody has already done this but I put the assumption that T from Forster 2103 is the same as 2105 with an added error from diagnosis. This is not well behaved (IMO). See here:

    F = Ta+N Forster (2013)

    where F is forcing, T is temp, a is feedbacks and N is TOA imbalance.

    T = F / (a+k) +e M&F (2015)

    where T is temp, F is forcing, a + k is feedback and e is variability

    Now, combining a, k and e as all being known or unknown feedbacks and substituting the 2013 equation into the 2015 and adding e now as the error in Forster’s diagnosis of T 2013 we have:

    T = [(T+e)a + N]/a

    multiplying both sides by the feedback a

    Ta = Ta + ea + N

    simplifying for a, to see what the Forster 2013 error does:

    a = -N/e

    Where e is must be negative only or the feedback flips to a negative (equating to a positive physical feedback like water vapor was thought to be).

    The higher the error the lower the feedback and the anomaly will be assigned to natural variability.

    As N decreases in later years feedback will diminish and the anomally will be assinged to natural variability.

    • R Graf
      Posted Feb 15, 2015 at 7:49 PM | Permalink

      If you keep the variability as separate from feedback the error becomes proportional to variability and simplifies to this:

      e – N/a = V where V is variability.

      Here’s the work:

      T = F / (a+k) + V

      T = [(T+e)a + N]/a + V

      Ta = Ta + ea + N + Va

      ea – N = Va

      e – N/a = V

    • HAS
      Posted Feb 15, 2015 at 9:18 PM | Permalink

      R Graf

      “… and adding e now as the error in Forster’s diagnosis of T 2013 we have:”

      I’m confused by this – in my reading of F(2013) they say:

      “As in Andrews et al. [2012b], this analysis uses the CMIP5 abrupt 4xCO2 simulations and regresses N against ΔT to diagnose the 4xCO2 AF as an intercept term and a as the slope of the regression line.”

      Thus the error term in Forster’s diagnosis is in AF, not T, isn’t it?

      • R Graf
        Posted Feb 15, 2015 at 10:17 PM | Permalink

        The abrupt 4XCO2 simulations were to aid in diagnosing the radiative forcing component out of the overall adjusted forcing which included feedbacks. So they new if they over-forced the model with 4XCO2 the response of TOA imbalance (N) would be much stronger than the feedback which is much slower to respond. They then could run regression analysis, which obviously I am less than knowledgeable about, in order to end up with radiative forcing separated from (a) alpha which is both air and ocean feedbacks (combined for simplicity). The equation resulted: F = Ta + N

        Then in 2015 M&F needed an equation that included the feedbacks and variability all broken out so they could run linear regression on them. So they went back to the energy balance and looked to see what they could do and came up with their 2015 equation:
        T = F / (a+k) +e

        Now the alpha (a) in the first equation is brought into the denominator of the other side and ocean and air are separate because CMIP5 is proud they now account for the Pacific Decedal Oscillation or to give more terms; I am not sure. I simply combined them back for simplicity and changed (e) to a (V) for variability because I am using (e) now for the error in T that occurs from diagnosing it. Even though I am sure there is nobody more qualified on the planet than Forster to do so, he could not have nailed it exactly. And since T in the equation is the observed temperature even though it is from a model it is different than Forster’s diagnosed value. Thus I have:

        T = [(T+e)a + N]/a + V

        Here I just substituted the (F) from 2013 and show (e) as the error in diagnosis. Feedback is in both numerator as part of T, like in the 2013 equation and in the denominator from the 2015 equation. The rest you can see above.

        They were getting variability in all sorts of ways but all meaningless. The equation an invalid. “It’s a dead parrot nailed to a perch” — Monty Python.

        • HAS
          Posted Feb 15, 2015 at 10:48 PM | Permalink

          I can see in F (2013) they say they go on “to diagnose the time series for F in a
          transient scenario run, using diagnostics of N and ΔT. In step 2 we substitute these a terms into equation (1), using N and ΔT diagnostics from various forced scenarios to compute each model’s AF.”

          I had assumed that the “N and ΔT diagnostics from various forced scenarios” would be exactly what M&F used when they say: “we determine the extent to which the across-ensemble variations of DF, a and k contribute to the ensemble spread of GMST trends DT, using the 75-member subensemble of CMIP5 historical simulations for which radiative forcing information can be obtained from the
          CMIP5 archive.”

          So I thought the problem is with the dF error of estimation on Pekka’s logic, not the dT.

        • R Graf
          Posted Feb 15, 2015 at 11:01 PM | Permalink

          DF from 2013 contains both T plus N. You can place error, probably more appropriately, in both terms of F to get:
          T= (Ta+N+e)/a +V
          Ta = Ta + N + e +V
          V = -e -V

          I don’t see the parrot moving any.

        • R Graf
          Posted Feb 15, 2015 at 11:15 PM | Permalink

          They planed to use 2013 (F) on 2015 runs of mostly the same models. I think they just forgot that (T) was in (F) since they did not care about (T). Or, they just thought like some here that if you run it again it’s a new variable.

        • HAS
          Posted Feb 15, 2015 at 11:27 PM | Permalink

          We were just discussing the colour of the parrot.

        • R Graf
          Posted Feb 15, 2015 at 11:30 PM | Permalink

          I messed up the end of the end of that equation.
          T= (Ta+N+e)/a +V
          Ta= Ta + N + e +Va
          V= -(N+e)/a
          still nonsense..

  99. andersongo
    Posted Feb 16, 2015 at 6:11 AM | Permalink

    Pekka:

    “…ΔFest = ΔFreal + error (5)

    We see that under this physical assumption the term error is left in formula (5), but otherwise the value of estimated ΔF depends only on the real ΔF. The terms that would have caused circularity cancel out at this level. ”

    And how is that error calculated? By regressing ΔT on a linear function of itself.

    • Posted Feb 16, 2015 at 7:09 AM | Permalink

      It’s not calculated. It’s not known.

      One hypothesis of the analysis of M&F is that it’s so small that the results remain useful. That’s part of the hypothesis. It’s justified by their earlier work, but not proven correct.

      • andersongo
        Posted Feb 16, 2015 at 7:13 AM | Permalink

        Justified but not proven correct…Is this climate science new-speak?

        • andersongo
          Posted Feb 16, 2015 at 7:43 AM | Permalink

          Also, once again, the claim is not that ΔT is fed back from equation (1) to equation (2)to obtain an estimate of ΔF. The claim is that ΔT is regressed from a linear function of itself since the ΔF estimate (since real ΔF is not available due to unquantifiable error) used in the regression is really a function of ΔT by virtue of ΔF – ΔN = αΔT. In this case, the α(ΔTpred(ΔFreal, α, κ) + e) term in your equation

          ΔN = ΔFreal – α(ΔTpred(ΔFreal, α, κ) + e) + error seems dubious and cancellation of ΔT upon further substitution is thus not possible. Also the error e, where
          e = ΔT – predicted ΔT
          is then used as an indication of internal variability, is obviously affected by this circularity. By error I mean the error calculated from the regression and not the error term in

          ΔFest = ΔFreal + error
          Note that for the circularity to hold, M&F assumption that the error term in
          ΔFest = ΔFreal + error
          must be true.

        • andersongo
          Posted Feb 16, 2015 at 7:46 AM | Permalink

          “must be approximately zero” must be true. typo

  100. Greg Goodman
    Posted Feb 16, 2015 at 7:32 AM | Permalink

    Pekka says It’s essential to notice that the logic of this subtraction requires that T is the real temperature at the moment as stored in the CMIP5 database. It cannot be changed or recalculated by the regression model without making the formula fail.

    And by the same token, it has to be the real surface flux. It cannot be changed or recalculated from TOA by a regression model without making the formula fail.

    The natural variability “error” term is assumed to be “random” and this is a necessary condition for the regression to correctly remove it. However, this is not the case, neither in the models and even less so in the real climate record.

    Figure 1b panels b,c from M&F2015 shows the distribution of model trends and ( vertical line ) the observed HadCRUT trend.

    Here we realise the “watch the pea” issue. What they are showing is that the recent over-estimation of the trend is comparable to the under-estimation of the trend in the thirties. This is presented as showing the current model failure is not significant.

    To the extent that it shows the models are a failure across the board this worth noting however, does not increase our confidence That their ECS informs us about the real climate.

    In discussing simply the magnitude of deviation they are obscuring that fact that there is a systematic bias in once sense in the 30s and in the other sense since 1998, and a very limited section 1965-1985 in between where the models have been tuned to fit reasonably well.

    This would appear to be a clear indication that model ECS is exaggerated and that the conclusions of the paper are unjustified.

    Here we analyse simulations and observations of GMST from 1900 to 2012, and show that the distribution of simulated 15-year trends shows no systematic bias against the observations.

    The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded

    The models fail to reproduce the early 20th warming when CO2 was not significant, are tuned to fit the later 20th warming and fail by running consistently hot since 1998. This is consistent with them being over sensitive to CO2.

    The “innovation” of this paper is in kicking up enough pseudo-statistical dust to disguise the fact.

  101. Posted Feb 16, 2015 at 7:39 AM | Permalink

    Sorry, the formatting seems to have been stripped off the quoted sections there. This initial para was quoting Pekka, the following is from the paper:

    Here we analyse simulations and observations of GMST from 1900 to 2012, and show that the distribution of simulated 15-year trends shows no systematic bias against the observations.

    The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded

  102. Posted Feb 16, 2015 at 7:53 AM | Permalink

    A sliding trend analysis, like they are doing is mathematically equivalent to doing a 15y running mean on the rate of change.

    I discussed the issues and distortions injected by running means here:

    Data corruption by running mean “smoothers”

    A significant amount of the model output is injected noise to make the time series look more climate-like. There is very little real modelling of the internal variably.

    Their naive methods and their gross approximations are injecting additional noise into both the model and observed data that will be unrelated.

    Thus any conclusions about the statistical significance of recent deviation based on this kind of method is thus without value.

  103. R Graf
    Posted Feb 16, 2015 at 7:53 AM | Permalink

    I’m just curious how long it will take everyone to notice there is no independent variable. Saying the variable on the right hand is an approximation of the left hand does not give you any information.

    BTW, I noticed in the abstract in the 2013 paper that the conclusion was that feedbacks dominated forcing in the 85-year span and the opposite true for the short span. This is 180 degrees of the direction of the claim in 2015.

    • davideisenstadt
      Posted Feb 16, 2015 at 7:59 AM | Permalink

      everyone here has noticed, with the notable exception of pekka.

      • Posted Feb 16, 2015 at 3:17 PM | Permalink

        That is not accurate.

        Even accepting Nic’s substitution, it is sufficient to rearrange the equation to find out that alpha disappears totally and it ends up as a regression of T vs N to determine kappa.

        whether that is meaningful and can be compared to the kappa introduced by the authors from earlier work remains to be considered. IMO this diffusion constant is yet another fiddle parameter that can be used to make the models fit a restricted part of the climate record whilst maintaining the desired high sensitivity.

        For this result to have any significance, the authors need to show that it is capable of showing a positive.

        They describe this as “novel” method yet, as is customary in climatology, there is not analysis and verification of the method itself to establish that it is valid before using it and publishing results and conclusions.

        The first thing that is necessary with a new method is to establish its validity before using it.

        Getting the answer you desired or “knew” to be right before doing the study does not count as validation of a method.

  104. Posted Feb 16, 2015 at 8:06 AM | Permalink

    We can identify three questions:

    (1) Does the approach of M&F defined by their stated hypotheses involve circularity by the mechanism proposed by Nic?

    (2) Are the hypotheses of M&F justified at all and not contradictory or obviously wrong?

    (3) Are the hypotheses good enough to produce useful insight to the models of the CMIP5 ensemble?

    My answer to the question (1) is that such circularity is not present. The claimed circularity cancels out (or in another correct formulation does not enter at all), when their hypotheses are accepted. I have explained the reasons for my conclusion in my recent comments.

    To the question (2) my answer is that M&F present enough justification to pass the weak requirements of that question.

    To the question (3) I do not have clear answers. I remain myself skeptical of the usefulness of the results.

    The combination of my answers to (2) and (3) means also that I have doubts on the justification of publishing the paper in a journal like Nature.

    • R Graf
      Posted Feb 16, 2015 at 9:14 AM | Permalink

      Pekka,

      To understand how this happened here one has to remember that the models never created information that could be used to test themselves with their own output. It really is that simple.

    • Posted Feb 16, 2015 at 9:48 AM | Permalink

      Given the M&F paper depends upon such shaky hypotheses I propose adding an additional error term to their equation to more accurately account for its influence. The internal variability error (epsilon) must be modified by the error of hypothesis (eta) to produce:

      ΔT = ΔF / (α + κ) + (ɛ + η)

      • R Graf
        Posted Feb 16, 2015 at 9:54 AM | Permalink

        Are you saying the dead parrot is just stunned or really dead?

        • Posted Feb 16, 2015 at 10:31 AM | Permalink

          Although M&F defenders claim it is just pining for the fjords, this Norwegian Blue is definitely deceased. It’s bleeding demised. Passed on. Bereft of life. It’s run down the curtain and joined the choir invisible. This is an ex-parrot.

    • R Graf
      Posted Feb 16, 2015 at 11:11 AM | Permalink

      Pekka,

      We were all fooled for some time. I thought for a while that there was just circularity contamination as did you. The fundamental circularity comes from the fact that the only connection of the model to the real world is temperature. The energy balance and temperature trend was used to derive all the other components to the model, a model who’s output is temperature and the same trends used to program it. The model’s output can allow extrapolation into the future but it does not give you more insight than you already programmed into the model. The energy balance is useless to give information about past model runs to current ones.

      • R Graf
        Posted Feb 16, 2015 at 11:14 AM | Permalink

        Temperature cannot be both the dependent and independent variable at the same time.

        • davideisenstadt
          Posted Feb 16, 2015 at 11:43 AM | Permalink

          R Graf:
          yes.

      • andersongo
        Posted Feb 16, 2015 at 12:23 PM | Permalink

        Exactly. This is quite clear from the regression equation.

        • R Graf
          Posted Feb 16, 2015 at 2:51 PM | Permalink

          So is there anyone who think this equation is “probably pining for the fjords.”

          This equation had become awfully quiet.

    • Michael Jankowski
      Posted Feb 16, 2015 at 6:49 PM | Permalink

      “The claimed circularity cancels out”…say what?

  105. R Graf
    Posted Feb 16, 2015 at 8:27 AM | Permalink

    Pekka,

    I will count your answer as a thumbs up on support to Nic.

    If we do not hear from anyone else who has support for the equation or the conclusion I propose the following:

    Nic, Steve,

    Clearly we will all today come to the conclusion you had a week ago and I hope you will forgive us for not finishing this sooner. I propose all who have been debating here that would like to submit their opinion to CLB should all compose our own independent comments from here down and have both of you review them for a day and then report them on CLM.

    This paper should be withdrawn.

    Congratulations Nic.

    • Posted Feb 16, 2015 at 3:25 PM | Permalink

      You would do better to limit yourself to stating your own opinion rather than trying to rewrite those of others.

      • R Graf
        Posted Feb 16, 2015 at 4:08 PM | Permalink

        climategrog,

        I am for debate, not to attributing my opinions to anyone, or anyone’s to mine. Do you not see the perpetual motion machine or does it need more study to see if it is? You can read my logic written to Pekka above. Please tell me your thoughts on the equation and if it is circular what, in your opinion should be dome beyond ended the debate to support Nic?

      • SteveS
        Posted Feb 16, 2015 at 4:09 PM | Permalink

        Agree….. R Graf you’re jumping the shark on this one. The discussion should continue with input from Mcintyre. I appreciate Pekka’s input, and he can think and speak for himself without you summarizing.

        • R Graf
          Posted Feb 16, 2015 at 4:44 PM | Permalink

          Fair enough. I was thinking we would have heard more people realizing the circularity.
          I agree I should not be the leader on this. Nic, Steve, where are you?

        • stevefitzpatrick
          Posted Feb 16, 2015 at 7:45 PM | Permalink

          “Nic, Steve, where are you?”

          Well, my guess is that they figure they have much better things to do with their time.

          The entire thread is borderline nutzo; yes, you have to respect the requirement Pekka insists on: “use only the CIMP5 archived T data”. But even then, the whole exercise is beyond the pale; you can’t logically use model diagnosed temperature ‘data’ to ‘verify/evaluate’ those self-same models which generated the temperature ‘data’. It is the silliest waste of time I have encountered in some time. That the ‘climate science community’ is unable, or unwilling, to see the obvious problems with this kind of paper simply means they are unable to practice rational discrimination between reality and rubbish. As I have said many times before, the field is not well.

        • davideisenstadt
          Posted Feb 16, 2015 at 11:29 PM | Permalink

          steve:
          good luck with your hopes…if the guys can’t understand the point hat has been articulated countless times in this thread by now, they will never get it.
          “math is hard”, as barbie said.
          the condenscenscion, and abusive language that has been applied to critics of this “scheme” only goes to demonstrate the innumeracy and lack of training of those who delve into this field.
          If I had only known, when I was kid, just how much money could be made from such cheesy applications of statistical analysis….
          oh well.

        • Don Monfort
          Posted Feb 16, 2015 at 11:53 PM | Permalink

          OK, dave.

          “condenscenscion”

          Steve has started a trend. How about “condenscensation”.

        • davideisenstadt
          Posted Feb 16, 2015 at 11:58 PM | Permalink

          I stand corrected
          😉

  106. R Graf
    Posted Feb 16, 2015 at 5:18 PM | Permalink

    The fundamental circularity comes from the fact that the only connection of the model to the real world is temperature. The energy balance and temperature trend was used to derive all the other components to the model, a model who’s purpose is to output a temperature from it’s temperature derived assumed input forces. The model can extrapolate into the future but it does not give insight into the past or present because that is what was used to program it. Although it is valid to compare the model’s accurate conformity with the past to check one’s program, as is the claim by the author’s, you cannot use this equation. Never mind the problems with non-linear variables being assumed to be linear and unknown accuracy of estimation of F. One absolutely CAN NOT have the same value (T) act as the independent variable and dependent variable at the same time. Trust your grade school algebra teacher. Follow my algebraic trial solutions a few comments above.

    Who challenges this? Who supports? Be brave.

  107. Greg Goodman
    Posted Feb 16, 2015 at 5:27 PM | Permalink

    If we accept the authors hypothesis and method, they demonstrate that the serious divergence since 1998 is nothing unusual for the CMIP5 model group. In fact, it is typical of their ability to reproduce even a hindcast for which they know the answer before tuning the model.

    This seems to be a clear indication that they are not fit for the purpose of reproducing climate behaviour and certainly not for extrapolating it way outside the calibration period.

    Keep up the good work on the modelling. Come back in 10 years an let us know how you are progressing.

  108. Don Monfort
    Posted Feb 16, 2015 at 5:45 PM | Permalink

    I want to give them a fair chance to show their stuff, so I would suggest they come back in 62 years.

  109. JUA
    Posted Feb 16, 2015 at 6:47 PM | Permalink

    Interesting discussion but instead of fighting over possible circularity in the the Marotzke regression analysis it seems to me more profitable to take a look at the model results presented in great detail by Forster et al. 2013 (Lewis ref. v). Here is a brief summary of the essentials. 23 CMIP5 models are included in the analysis. As discussed by Lewis, the adjusted climate forcing, AF, is derived from the imbalance, N, of the energy fluxes at the top of the atmosphere by addition of a response term, the change in outgoing radiation due to the change in temperature,
    AF=N+αΔT. (1)
    N and ΔT are outputs from the historical model runs. The constant α is determined independently from model runs with an abrupt increase by a factor of 4 of the CO2 concentration in the atmosphere. The jump in radiation imbalance (intercept in linear regression of N vs ΔT) is named 4xCO2 AF and –α is the slope of the following decrease of the imbalance with increasing temperature. Both α and κ, the efficiency of heat uptake by the oceans, can be derived from runs with a gradual change in CO2. For such a scenario of continuously increasing forcing F, the temperature change is approximately given by
    ΔT=F/ρ, with ρ=α+κ. (2)
    This scenario is close to the real situation modeled by the 23 different models for the time period late 19th century to 2005 (historical). The results, averaged over 5 years, are given for 2003.

    There are large differences among the models with respect to all the results and parameters, as illustrated in a number of tables and figures. The most relevant figure in relation to the Marotzke and Forster paper is Fig. 9, containing four different scatter plots. In all plots the x-coordinate is the temperature increase ΔT in 2003, varying among the models from about 0.3 K to 1.9 K. In Fig. 9a the y-coordinate corresponds to the right hand side of Eq. (2), y=2003 AF/ρ. As expected from Eq. (2) the 23 points fall pretty well on a straight line, with slope not too far from unity and correlation coefficient R=0.87. Figs. 9b, 9c, and 9d, show separate correlations of ΔT with AF, ρ, and α. The correlation with AF is weaker than with AF/ρ but still strong, with coefficient R=0.72. However, there is insignificant correlation with the other two variables – just as claimed by Marotzke and Forster!

    How can we understand this? The reason becomes clear from the scatter plot in Fig. 8a, showing the correlation between the parameters α and 2003 AF. There is a strong positive correlation with coefficient R=0.62. This means that the negative correlation between ΔT and ρ or α, expected from Eq. (2), is eliminated by the positive correlation of both parameters with AF!

    It appears to me that this is the most serious problem with the Marotzke regression analysis. The parameters AF and α are not statistically independent in the sample. (I do not see this as a necessary consequence of the relation in Eq. (1), claimed to give circularity.)

    Figure 7 in the Forster et al. paper is also interesting. It shows a scatter plot of the climate sensitivity (ECS) against the adjusted forcing, 2003 AF, for the 23 models. Both parameters vary by about a factor of three with little apparent correlation. However, when models are selected with linear trend from 1906 to 2005 between the IPCC (2007) 90% confidence limits, 0.56 K to 0.92 K, then there is a clear correlation, very similar to that found earlier by Kiehl and by Knutti. Models with high sensitivity have low forcing and vice versa.

    • R Graf
      Posted Feb 16, 2015 at 7:51 PM | Permalink

      Welcome JUA. We are in our eleventh day here. Please read above to catch up.

      Anyone who wants to read Forster 2013 its here: http://www.atmos.washington.edu/~mzelinka/Forster_etal_subm.pdf
      And for M&F 2015 search “Stokes”, who provided a link near the top of this post.

      The abstract 2013 says quote: “Multi-model time series of temperature change and AF from 1850 to 29 2100 have large inter-model spreads throughout the period. The inter-model spread of 30 temperature change is principally driven by forcing differences in the present day and 31 climate feedback differences in 2095, although forcing differences are still important for 32 model spread at 2095.”

      Abstract to 2015 says quote: “The differences between simulated and observed
      trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings
      used to drive models over the longer timescale.”

      My interpretation of the above are in direct contradiction. I don’t know if that is acceptable, (even in climate science,) when it is the same research and same models. But, I am admittedly new here.

      BTW, I believed the same as Pekka for all but about the last day, and I came from the opposite bias.

      • R Graf
        Posted Feb 16, 2015 at 7:53 PM | Permalink

        Ignore the line numbers 29 thru 32 in the text for 2013, sorry.

    • Posted Feb 17, 2015 at 2:46 PM | Permalink

      JUA,
      Thanks for your informative and constructive comment, with which I very largely concur.

      Forster et al (2013) is an excellent, very useful paper, and its Fig.9 is indeed relevant. That figure is based on changes over the full historical simulation period (1860-2005), which is probably better than taking changes over 62 year periods as in M&F 2015. I don’t think Fig.9.c) and d), showing regressions of α and κ on ΔT, are that useful. Only the sum of those parameters, ρ=α+κ, enters into the surface energy-balance equation, and separating the individual influences of α and κ on ΔT is not simple.

      Fig. 9.a) and b) however are highly relevant, as they show regressions of ΔT on AF alone and on AF/ρ (actually the regressions have the variables the other way around, which doesn’t correspond to physical causation, but I’ll ignore that). The R^2 for AF alone is 0.512, which implies the standard deviation of the deterministic regression predictions is 71.5% of the predictand’s standard deviation – the R of 0.72 that you refer to. For AF/ρ, the R^2 is 0.246 higher at 0.757. That 0.246 increase in R^2 corresponds, roughly speaking, to a standard deviation of deterministic predictions from ρ of sqrt(0.246) = 49.6% of the predictand’s standard deviation. So AF and ρ contribute in terms of prediction standard deviation in the ratio 71.5%:49.6% or 1.44:1. Hardly an insignificant contribution from ρ.

      This calculation is only very approximate, and properly calculated the ratio is probably higher, but the point that the contribution from ρ is significant would remain valid.

      One of the reasons for the high correlation of AF with ΔT is the circularity, or non-exogenicity, problem. Internal model variability can change ΔT without compensating changes in N over 62 year or longer periods, implying that it will cause diagnosed AF to covary with ΔT, which artificially increases the R^2 of the regression between AF and ΔT. I think failure to appreciate this fact is probably at the root of Pekka’s misunderstanding on the circularity issue.

      AF is almost bound to be correlated with both α and in principle also with κ. That follows from the fact that AF is calculated as AF = α ΔT + N and κ is meant to represent N/ΔT, implying that N = κ ΔT, if κ as diagnosed in Forster et al (2013) is truly a model property (a dubious assumption).

      • Posted Feb 17, 2015 at 3:02 PM | Permalink

        Nic,
        Can you present some justification for your claim that the way M&F perform their analysis introduces rather than removes circularity?

        It’s totally clear that N contains such contribution from surface temperature that must be removed to make a meaningful analysis in the spirit of M&F. It’s also totally clear the calculation of AF removes very much of that effect that must be removed.

        It is not known, how much error is left in the analysis, but the term that you have described as the source of the circularity has exactly the opposite role. It is there to remove the circular effect of temperature.

        • Posted Feb 17, 2015 at 4:33 PM | Permalink

          Pekka,
          Do you agree that if your argument were correct then fluctuations in T and N over 15 and 62 year periods arising from model internal variability with unchanging external forcing should be strongly negatively correlated?

      • Posted Feb 17, 2015 at 5:33 PM | Permalink

        Nic,

        Yes. That is part of the hypothesis. Real ERF is affected very little by internal variability. Thus M&F must assume that the contribution of internal variability to N is approximately -α times the change in temperature due to the variability.

        • RomanM
          Posted Feb 17, 2015 at 7:04 PM | Permalink

          Pekka, most of what you are referring to constantly as a hypothesis should be more correctly termed as an assumption.

          Hypotheses are statements made ahead of time which are checked using the data and analysis to come to a decision as to their believability. Assumptions are statements made which are assumed to be true throughout the analysis without necessarily checking at any point to see whether they were true or appropriate to the analysis.

          Using the equation ΔF = α ΔT + N as an identity is an assumption to facilitate the analysis. As far as I can see, nothing in the paper was posited as supporting its use as not substantially affecting the results of the analysis. In fact, I pointed out to you that there was indeed problems due to its presence with interpreting the magnitude of the “internal variability” of the climate models. If the assumption is not made, then there is indeed quantitative “circularity” in the equations used by the authors which would completely invalidate the conclusions of the paper.

      • Frank
        Posted Feb 18, 2015 at 5:40 AM | Permalink

        Nic and Pekka: Circularity and other problems ALSO arise from M&F’s INTERPRETATION of the results from their regression equation. The abstract says:

        “Using a multiple regression approach that is physically motivated by surface energy balance, we isolate the impact of radiative forcing, climate feedback and ocean heat uptake on GMST—with the regression residual INTERPRETED as internal variability—and assess all possible 15- and 62-year trends”

        Immediately after the regression equation, M&F say:

        “We INTERPRET the ensemble spread of the regression result … as the deterministic spread …”

        If I understand correctly, M&F believe that all of the unforced variability from each model run is found in the regression residuals and all of the deterministic temperature change is found in the sum of the other terms of the regression. Forcing is deterministic, but effective radiative forcing contains both a deterministic component (from rising GHGs etc) and unforced variability. (ERF is derived from temperature output, which contains both forced and unforced variability.) Therefore, their regression CAN’T be used to separate deterministic (forced) warming from unforced temperature variability. I suspect Nic is correct in believing that a circular regression is inherently flawed, but M&F’s INTERPRETATION of the regression results is unambiguously flawed.

        Furthermore, it is inappropriate to interpret the regression residuals as unforced variability. The residuals from a simple regression are often interpreted as measurement error or flaws in the regression equation used. Suppose I measured the acceleration of gravity by dropping an object at various altitudes above the surface of the earth and measuring the distance fallen with time. I can regress distance fallen as a linear function of time squared, but I can get a better fit to the data by adding a second term with the starting altitude. Any second term will improve the fit at least slightly, even though the first power of the starting altitude isn’t directly involved in physics of the situation. If I know how air density varies with altitude (negative exponential of altitude divided by scale height), then I could add an appropriate term to the regression equation that accounts for the drag, which varies directly with air density. I could add a third term accounting for the fact that g gets weaker with altitude above the surface of the earth. If all of the important physics effecting a falling object were correctly incorporated into the regression equation, then I might be able to interpret the residuals as measurement error or perhaps as unforced variability arising from conducting my experiment where the atmosphere might be rising or subsiding. Rarely do we understand the physics of a problem well enough to confidently assign a meaning to residuals.

        Since M&F made serious mistakes in the derivation of their regression equation, it is unreasonable to interpret the residuals from that regression as unforced variability. (The regression contains an inappropriate additional degree of freedom because the regression coefficient for alpha and kappa must be the same. They have inappropriately approximated 1/(1+x) as 1-x. Kappa is known to decrease with time/warming in model output.) So there is no way the residuals from this primitive regression should be interpreted as unforced variability – they include flaws in the regression equation.

        The place to learn about unforced variability is the unforced control runs for each model, not these regression residuals.

        • Posted Feb 18, 2015 at 6:23 AM | Permalink

          Frank,

          The word interpret is their way of telling that additional assumptions are used to allow extracting information.

          That does not tell about their beliefs more than that they consider it likely that the results they obtain making those additional assumptions have useful informational content. They must think that the values are likely be close enough to correct ones to add to understanding of the models rather than lead to wrong conclusions.

          I haven’t seen any valid claim to support your assertion M&F made serious mistakes in the derivation of their regression equation. Their analysis rests on hypotheses and assumptions that can be contested, but it’s not based on serious mistakes.

        • Frank
          Posted Feb 18, 2015 at 8:15 AM | Permalink

          Pekka wrote: “The word interpret is their way of telling that additional assumptions are used to allow extracting information.”

          Frank replies: Those assumptions appear to be wrong. The part of the regression equation they claim is deterministic has one term with unforced variability in it. The residuals they claim are unforced variability also contain systematic errors from their regression equation.

          As I discussed more fully above, the “linear expansion of equation 3” involves approximating 1/(1+x) as 1-x. The range of both alpha and kappa is about 50% of the ensemble means, so x is probably too big for this to be a reasonable approximation.

          If you look at the coefficients in front of the a’ and k’ terms in this expansion, you will find they are the same. The regression coefficients for a’ and k’ are optimized independently even though the physics of equation 3 says that only the sum of these terms (and ERF) are important.

        • Posted Feb 18, 2015 at 9:22 AM | Permalink

          Frank,
          It’s totally clear that their regression formula cannot be derived. M&F write

          This equation hods for each start year separately and suggests ..

          They do not say that the following formula follows from the previous, they say that the previous arguments suggest that the linear regression model might be useful.

          You have also questioned the separation of dependence on α and κ. That’s a choice made to separate the two influences, whether they really turn out different or not. That makes sense, because the values of α and κ obtained from Forster et al (2013) for the different models are almost totally uncorrelated. The dependencies on α and κ might be expected deviate more for the 60 year trends than for the 15 year trends as their sum is inversely related to TCR while α alone is inversely related to ECS.

  110. stevefitzpatrick
    Posted Feb 16, 2015 at 6:50 PM | Permalink

    Poor Pekka. He ends up defending the absolutely indefensible. The CIMP5 models are circular self-references… and he can’t see that. Sad.. very sad.

    • Greg Goodman
      Posted Feb 16, 2015 at 7:40 PM | Permalink

      steve, I would advise against being condescending towards Pekka. He knows his stuff and is a proper scientist, not a politically motivated alarmist. From you comments here, you do not appear to be in the same class.

      He clearly has some serious doubts about the worth of the paper but weighs his words a little more carefully than many here.

    • Posted Feb 16, 2015 at 8:36 PM | Permalink

      I’ve actually learned a great deal by reading Pekka’s thoughtful comments, here and elsewhere. The fact that he is willing (at least for this debate) to accept the “black box” aspect of M&F’s selected climate model simulations may simply be due to the fact that he has more experience in the general field than others.

      On the other hand he seems to agree, on the whole, that several non-statistical aspects of the M&F paper raise questions about the overall value the paper provides to other researchers. I’m impressed by his willingness to engage and, especially, educate.

      Kiitos Pekka!

    • SteveS
      Posted Feb 16, 2015 at 9:32 PM | Permalink

      Steve…..really uncalled for remarks vs Pekka. Because he doesn’t see things your way he’s ” poor, sad,pekka” ? He disagrees with some points here, but he’s always been respectful and professional. It’s silly comments like yours that bring meaningful discussions to a halt.
      I disagree also on some of his points, but he’s certainly qualified to have his own views, and in light of the time has has put in to explain himself, he deserves a little more respect.

      • stevefitzpatrick
        Posted Feb 16, 2015 at 9:40 PM | Permalink

        SteveS,
        He would deserve a bit more respect if he could see the the logical fallacy of M&F. The ‘archived’ delta-T’s from the CMIP5 ensemble can’t be logically used to diagnose delta-F and then that delta-F used to diagnose a different delta-T.

      • davideisenstadt
        Posted Feb 16, 2015 at 11:19 PM | Permalink

        true that

  111. stevefitzpatrick
    Posted Feb 16, 2015 at 9:29 PM | Permalink

    “He knows his stuff and is a proper scientist, not a politically motivated alarmist.”

    Ya, well, I am a ‘proper scientist’ as well. I intend no condensation. But Pekka really is defending a paper with very serious logical problems: you can’t use the models to diagnose delta-F and then use that delta-F to ‘verify’ the self-same models. It is a silly exercise. I trust you can see that.

    • stevefitzpatrick
      Posted Feb 16, 2015 at 9:33 PM | Permalink

      “condescension”, not “condensation”… spelling correction, ugg.

      • j ferguson
        Posted Feb 16, 2015 at 10:54 PM | Permalink

        Actually, Steve, “condensation” was really good. Really creative use of the language is unfortunately all too rare these days.

    • Don Monfort
      Posted Feb 16, 2015 at 9:36 PM | Permalink

      “I intend no condensation.” I think Yogi Berra said that first. Just kidding steve. Anyway, I agree with your analysis. You don’t have to delve very far to see that this one doesn’t hold condensation.

      Well our model really doesn’t get natural variability (just watch us prove it), so we have an excuse when a convenient excuse when we miss.

      • Don Monfort
        Posted Feb 16, 2015 at 9:39 PM | Permalink

        You know what I mean. Steve threw me off with the condensation.

        • Don Monfort
          Posted Feb 16, 2015 at 9:46 PM | Permalink

          I said somewhere near the top that I was interested in Pekka’s take on this. He knows what he is talking about and he’s honest. And though I lean towards circularity, I have reservations due to Pekka’s persistence. He remains a gentleman and a scholar.

        • stevefitzpatrick
          Posted Feb 16, 2015 at 9:51 PM | Permalink

          Yes Don, Pekka is a honest broker, a gentleman (certainly more than me!) and a scholar… he is just very, very wrong in this case. M&F is rubbish.

        • Posted Feb 17, 2015 at 5:42 AM | Permalink

          Talking hot air leads to condensation, it was a freudian slip. 😉

    • davideisenstadt
      Posted Feb 16, 2015 at 11:20 PM | Permalink

      but, at least he is polite when expressing his opinions, which is certainly worth respecting.

  112. Posted Feb 17, 2015 at 12:48 AM | Permalink

    This discussion has been somewhat confusing what with all the different “errors” being thrown around, as well as terms like “real T,” “actual T,” etc. getting mixed in. The closest I can come to making sense of Pekka’s position is that by using a backed-out measure deltaFhat using only older, holdout runs of the model, an instrumental variable of sorts has been constructed that can then get plugged into a different, new set of model runs to try to overcome the endogeneity problem. That only works if the instrument deltaFhat is valid, having a high correlation with the true deltaF and no correlation with the residuals in the new model runs, hence his caveat about it only working if certain parameter values hold. That’s my best shot–if this is not an IV regression (for some reason not being described as such by the authors), then the circularity critique must hold.

  113. Geoff Sherrington
    Posted Feb 17, 2015 at 1:04 AM | Permalink

    It is normal that new ways to handle numbers evolve over time. The way to test the new way is typically to include it in a published paper, where others can review it.
    Unless the paper is themed about the new math, it is best to incorporate it, as the authors have done here, in a paper that has little relevance to scientific advancement.

    • Greg Goodman
      Posted Feb 17, 2015 at 3:02 AM | Permalink

      The most important thing when introducing a new method would be to TEST it and validate that it works.

      Publishing unfounded conclusions based on non-validated “innovative” methods and leaving it to other to waste their time doing YOUR obligations of test and publishing rebuttals is not acceptable science.

      That is what is happening here.

      Had they published a validation with an error, it could be the role of others to point this out and possibly correct it.

      Normal scientific practice seems to have been put on hold for the special case of climatology, especially in the once “prestigious” journal Nature.

  114. Greg Goodman
    Posted Feb 17, 2015 at 3:28 AM | Permalink

    It appears that the circularity argument has become a bit of a red-herring.

    Pekka has a valid point in that the flux from the model is the proper, most direct quantity to use, except that this is not what the authors do since they do have the relevant flux: the SURFACE flux. So they substitute the TOA flux crudely adjusted by a highly approximative linear regression model.

    So Pekka’s objection to Nic’s substitution applies equally well to the paper itself.

    Even if Nic’s substitution does not provide a more accurate form, it is valid in demonstrating that there is collinearity in both the dependant in independent variables. Something that the authors failed to discuss or consider. Presumably they failed to realise.

    This collinearity will lead to a bias in the regression result. Unless this is recognised and corrected or shown to be negligible, this invalidates the method and hence the conclusions. This seems to be RomanM’s main point.

    Rearranging the equation after Nic’s substitution to isolate the temperature variables on one side of the regression shows that alpha is no longer present. Thus it is hardly surprising that any further analysis is found to be insensitive to alpha !

    Before adopting the “innovative” method the authors need to show that it is capable of showing a dependence on alpha if one is present.

    If they had attempted this, they would probably have found that it is the method itself that is insensitive to alpha and realised what Nic has pointed out.

    There is quite a lot information in the paper about the models that may be useful for the record such as figure 1b,1c that shows they completely underestimate the earlier warming as badly as they fail to capture the recent lack thereof:

    For that reason I don’t think a retraction is necessary but a published comment is required to point out the fundamental flaws in the method, it’s lack of validation hence the invalidity of the conclusions.

    This would preferably avoid discussion of circularity and concentrate on collinearity and the absence of the key parameter alpha.

    • Posted Feb 17, 2015 at 7:21 AM | Permalink

      Greg Goodman:

      Excellent post. RomanM additionally has pointed out some of the apparent typos/mistakes in the presentation of data that frustrate efforts to replicate the authors’ exact methods. Yet as you point out, the method itself seems to drive the result, not the data.

      I am also interested in the extent to which the definition of terms determines the outcome and diminishes the apparent influence of feedbacks (alpha). For example, in his response at Climate Lab Book, Marotzke admitted that:

      In fact there is no unambiguous way of splitting forcing and feedback, and this remains a problem that the climate research community has grappled with for some time.

      The methodological problems identified by critics would seem to explain why M&F’s results were “insensitive to the ambiguity”.

      M&F admit that feedbacks change over time — in other words, feedbacks can “strengthen” to partially negate an increase in exogenous forcing (otherwise we would quickly experience the dreaded “runaway greenhouse” effect). Many of us wonder how that process fails to display significance in multi-decade runs. Perhaps it is related to the authors’ decision to ignore the fact that many models allow alpha and kappa to change over time although I’m not clear on how they managed this if they were using unadjusted model output.

      Marotzke explained that, in their chosen formula (1),

      …it is obvious that ΔT varies proportionally with ΔF whereas ΔT varies less than proportionally with either α or κ (unless κ becomes very small, close to a new equilibrium, in which case ΔT varies inversely proportionally with α); this provides ready explanation for a lesser role of ensemble spread in α or κ over the historical period…

      Yet this seems to suggest that α can have a “significant” influence even within their chosen method.

      • Greg Goodman
        Posted Feb 17, 2015 at 7:35 AM | Permalink

        In fact there is no unambiguous way of splitting forcing and feedback, and this remains a problem that the climate research community has grappled with for some time.

        Forster should read my post at Climate Etc. (linked above).

        It is surprising if he has not read Douglass and Knox 2005 and their reply to Wigley et al’s criticism of that paper in Douglass and Knox 2006 ( refs in my article ).

        If one is going to make gross simplifications and linearise everything it is possible to use the linear relaxation response correctly contains both and allows extraction of the appropriate scaling.

    • Posted Feb 17, 2015 at 8:45 AM | Permalink

      Greg Goodman, “they completely underestimate the earlier warming as badly as they fail to capture the recent lack thereof:”

      It is more like the models completely missed the early cooling that lead to the 1941 rebound.

      • Posted Feb 17, 2015 at 2:34 PM | Permalink

        Calling it a “rebound” implies a lot of unstated and unjustified assumptions.

        I too completely missed the cooling that ended in 1941

        I think the key thing shown by the paper, and the figures I posted in particular, is negative error in 30s, close fit in 75-95 and all model fits above reality in last 15 segment.

        That is consistent with models being too sensitive. Omitting the sign of the divergence and just looking at magnitude, allows a conclusion that the recent divergence is to be expected.

        If it is not significant that the models are completely wrong to 15 years or more, it is equally insignificant when the are correct for 15-20 years.

        It also needs to be noted that even this agreement was achieved by adjusting a multitude of essentially free parameters and so is statistically meaningless.

        Taken at face value, MF2015 seems like a fairly good demonstration that the models are of no value at this stage and should be ignored until they can produce meaningful results.

        • Posted Feb 18, 2015 at 8:10 AM | Permalink

          Greg, “Calling it a “rebound” implies a lot of unstated and unjustified assumptions.”

          Worse actually, it involves looking at the modeled estimated temperatures versus instrumental and trying to be descriptive 🙂 While I haven’t looked at every possible model combination, in general, the models don’t get cooling and reversion to “normal” or mean. Crowley and Unterman have a volcanic reconstruction that should be considered in the pi and historical “experiments” but AFAIK isn’t. If the models do a great job of matching an incorrectly assumed near zero variability past, why expect them to produce anything worthwhile in the future?

  115. Posted Feb 17, 2015 at 4:59 AM | Permalink

    I hope this will be the last comment I feel necessary to write on this topic.

    I have explained my points using formulas in two comments that complement each other. The logic of the M&F approach here, and the error that Nic made here. RomanM presented similar formulas using a different notation and discussing a little more the influence of the error.

    I add one more explanation of the same arguments without formulas.

    M&F wrote a paper, where a crude linear regression model was used to learn about the properties of models in CMIP5 database.

    The hypothesis that a crude model is applicable is not proven correct, but to formulate the point so weakly that it should not be controversial: “Their hypothesis is not totally unreasonable”. Thus it makes sense to study, what follows from that hypothesis.

    One part of the hypothesis is that the three variables that may contribute to the temperature trends of 15 years and 62 years are α and κ of the models, and the effective radiative forcing ERF that results from external input of GHG, concentrations, aerosols, volcanism, and solar radiation.

    The effective radiative forcing (variable F) is estimated from TOA imbalance N using α and surface temperature T from the same model runs in Forster 2013. N is strongly correlated with T, the resulting F much less as it’s approximately the ERF determined by external input.

    Performing the regression analysis on this basis does not technically involve any circularity. There’s nothing in the later stage that affects the data obtained from the earlier stage. The forcing used as input is independent, because it contains the independent data about TOA imbalance.

    The value of ΔF has, however, been obtained from a formula that’s not exact for ERF.

    It’s now possible to use the regression formula of M&F to determine, how much of the temperature trends is contributed by the three explaining variables, and how much is residual that’s assumed to come from internal variability. Technically that’s not problematic, but one problem remains. We get the contributions of the three variables defined by the method used in the work of Forster et al (2013) and M&F. Forcings, in particular are determined from the same model runs as temperatures, but the formula may have significant errors. It’s possible that these errors correlate with the values of the other variables. If that’s the case, then the attribution of the contributions to the three explained parts and to the internal variability are distorted.

    The formula used in the analysis (in Forster 2013) has been justified and is arguably the best available. The correlation between the values of TOA imbalance and surface temperatures in the CMIP5 data is very strong. The calculation of forcings by their formula reduces the correlation very much. In ideal case the calculation would result in the ERF determined almost totally by the externally input sources of forcings (not quite 100% as it’s ERF, not RF).

    The results of M&F may attribute the temperature trends somewhat erroneously, because the ΔF that’s one of the explaining variables is not the real ERF. In what way the attribution is erroneous depends on the correlation of the error in ERF with the other variables. Correlation with temperature affects mainly the share of ERF and the residual, correlation with α or κ affects also their shares.

    The error I discuss is, however, not circularity in the formulas that form the hypothesis, the error comes from the inaccuracy of the formula that’s used. The formula that’s used is actually the natural simple choice to minimize the spurious circularity, not the reason for the circularity. That’s because the F used is close to the real ERF that’s essentially external input, while N is highly dependent on T. The error of Nic was not to realize that M&F remove the circularity error as well as they can rather than create it.

    • Brandon Shollenberger
      Posted Feb 17, 2015 at 5:53 PM | Permalink

      Pekka Pirilä, could you explain this remark of yours:

      Performing the regression analysis on this basis does not technically involve any circularity. There’s nothing in the later stage that affects the data obtained from the earlier stage.

      This seems to be a non-sequitur. Whether or not later portions of a methodology affect earlier portions does not determine whether or not there is circularity. That only determines if the methodology is recursive.

      If this is an accurate description of your position, your argument is nothing more than misinterpreting “circular” as “recursive.” If it is not an accurate description of your position, you ought to fix it so people aren’t misled.

      • Posted Feb 17, 2015 at 6:19 PM | Permalink

        Brandon,

        One issue that may be confusing is the use of the world circular, when the analysis is once trough. The same problem could be described better using a different expression. What happens in the cases labeled circular is that the variables are not independent as they should be in a regression model to avoid erroneous interpretation of the results. This is the approach I picked, when I postponed the discussion of the potential issues to the final stage of the analysis – the interpretation of the results. At that stage I discussed, how correlations of the error term might affect the results.

        One way of describing the problem is to say that when the explaining variables are not fully independent, variability due to one explaining variable leaks into the contributions attributed to the other variables making them either too large or too small.

        • Brandon Shollenberger
          Posted Feb 17, 2015 at 6:36 PM | Permalink

          Pekka Pirilä, I’m afraid I don’t see how that explains your remark. I don’t find the word “circular” remotely confusing, and I don’t see that anyone else here has either. The only confusion I’ve seen regarding the word is your apparent claim there is no circularity because there is:

          nothing in the later stage that affects the data obtained from the earlier stage.

          Which would only determine if the approach was recursive, not circular.

          It would help if you would address the question I raised head-on. As it stands, I’m not certain I properly understand your response to me. It sounds like you’re actually acknowledging the circularity people have pointed out, but I can’t tell because I can’t see how what you say is supposed to respond to what I said.

        • Posted Feb 17, 2015 at 7:17 PM | Permalink

          Brandon,
          My wording was, indeed, not very good. Another form of circularity is that a variable must be solved from a equation, where it occurs in many places, and that’s the case I did discuss in my recent comments.

          In this case we have the equation, where ΔT occurs
          – explicitly only on the left hand side in M&F
          – explicitly once on both sides after the substitution on Nic
          – explicitly once on the left hand side and twice on the right hand side after second substitution I introduced.

          The third case goes back to the first, as the two terms on the right hand side cancel. These two equivalent alternatives are the ones that are consistent with the physical hypotheses of M&F. Nic’s alternative makes sense for a different physical hypothesis.

          Based on the above, I think that the question you asked referred to bad formulation of the point from my side, not any real disagreement on the substance.

    • m.t.
      Posted Feb 17, 2015 at 7:41 PM | Permalink

      Pekka,

      “One part of the hypothesis is that the three variables that may contribute to the temperature trends of 15 years and 62 years are α and κ of the models, and the effective radiative forcing ERF that results from external input of GHG, concentrations, aerosols, volcanism, and solar radiation.”

      This may be the hypothesis, but your description of ERF is not what Forster provides. Forster 2013 uses the algorithm from Forster and Taylor 2006. An example of the diagnosed forcing values is given in that paper.

      “Imagine, for example, that the atmosphere alone (perhaps through some cloud change unrelated to any surface temperature response) quickly responds to a large radiative forcing to restore the flux imbalance at the top of the atmosphere, yielding a small effective climate forcing. In this case the ocean would never get a chance to respond to the initial radiative forcing, so the resulting climate response would be small and this would be consistent with our diagnosed effective climate forcing rather than the conventional radiative forcing.”

      I’m no climate scientist, but the seems to me that the ΔF from Forster is essentially the forcing that was required to effect the ΔT seen in the models.

    • andersongo
      Posted Feb 17, 2015 at 9:09 PM | Permalink

      Once again, the circularity is that ΔT is regressed onto a linear function of itself , thus leading to erroneous results from the linear relationship thus obtain and more specifically the error term which is used to assess internal variability. Your assertion that the error term from the regression does not affect results is puzzling as it has already been shown that there is no such cancelation of the ΔT term. Discussion of recursive use of ΔT is a red-herring.

      “What happens in the cases labeled circular is that the variables are not independent as they should be in a regression model to avoid erroneous interpretation of the results. This is the approach I picked, when I postponed the discussion of the potential issues to the final stage of the analysis – the interpretation of the results. At that stage I discussed, how correlations of the error term might affect the results.”

      This right there indicates the confusion; it seems that you are admitting that the circular regression yields garbage but then assert that it does not affect final results. You are in effect claiming that the OLS has no significance and can thus be disregarded. You can’t postpone discussions of whether circular regression leads to erroneous final results otherwise. The question is simple: does the regression equation where ΔT appears on both side leads to circularity and thus breakdown of the OLS or not? It appears you agree after all.

      • HAS
        Posted Feb 17, 2015 at 9:53 PM | Permalink

        My current working hypothesis (in accord with Roman) is that Pekka is making the charitable assumption that M&F put aside Foster (2013)’s estimation of F from T and N using regression, and instead elevated it to an identity by way of assumption.

        Thus all conclusions from the paper need to be prefaced with “If we assume ΔF = α ΔT + N in climate models ….”.

        As I said “a working hypothesis”, Pekke has not confirmed this, and also “charitable”, I’m reminded of the old adage “If we had eggs, we could have ham and eggs, if we had ham”.

        • andersongo
          Posted Feb 17, 2015 at 11:44 PM | Permalink

          Even if this was so, we would still be left with circular regression. Rearranging the equation to put all ΔT terms on the left side may circumvent this difficulty. But this is still problematic for:

          (1) That’s not what M&F did. They explicittly regressed ΔT onto ΔF and thus a linear function of itself.
          (2) The minimization so as to obtain coefficients becomes problematic and unreliable.

          RomanM’s suggestion is that due to the identity ΔF = α ΔT + N, a change Δ in F must be reflected by a change αΔ in N which will then supossedly “alleviate” the circularity. But the regression process is still mathematically flawed.

        • HAS
          Posted Feb 18, 2015 at 12:07 AM | Permalink

          The problem at hand is to clarify Pekka’s assumption that gives M&F a pass.

          I think his point is that if there is an assumed identity, regressing against ΔF is fine, it contains no more or less information than its component parts.

        • HAS
          Posted Feb 18, 2015 at 12:42 AM | Permalink

          I was multi-tasking when I pressed Post.

          Actually reflecting on this perhaps Pekka assumption is also that F and T are independent.

          Forster (2103)’s estimation of F is done by regressing N against T and setting F to the intercept. On that basis and assuming an identity I think M&F get to pass go.

          Very early on in the thread we were talking about M&F’s failure to do tests on this.

        • ianl8888
          Posted Feb 18, 2015 at 12:57 AM | Permalink

          … and assuming an identity

          Which identity is assumed, please ?

        • HAS
          Posted Feb 18, 2015 at 1:36 AM | Permalink

          F is defined as α ΔT + N as produced directly from the model runs i.e. not as α ΔT + N + residue as estimated as by Forster using regression.

          Pekka’s argument seems to be it is legit to use the direct outputs from models if models are all you are making statements about, so if F is just a linear combination of those it can be used. That assumption falls over if there is a residue floating around i.e. F is a product of regression.

          I was musing that this is probably not a sufficient condition to make it legit, you needed also to have F and ΔT independent. I also think that is what Pekka has been implying in some of his comments.

        • andersongo
          Posted Feb 18, 2015 at 3:47 AM | Permalink

          This assumes no variation in ΔF over time, that is that a change in ΔT is perfectly cancelled out by an equivalent change in N. Pekka and others have to prove this over the relevant time periods before declaring the circularity issue a red-herring. If by “fixed” Pekka really meant that ΔF is constant,then we can all see where the misunderstanding lies.For the case where F is being defined as an intercept:

          ΔF = N + α ΔT fot N = 0

        • HAS
          Posted Feb 18, 2015 at 4:06 AM | Permalink

          Isn’t it weaker than that – just the the partial derivative of N wrt ΔT is α?

      • andersongo
        Posted Feb 18, 2015 at 5:58 AM | Permalink

        Indeed but only on assuming that ΔF is indeed independent from ΔT, in which case ΔF/ΔT =0. Good luck proving that.

        • andersongo
          Posted Feb 18, 2015 at 11:59 AM | Permalink

          Correction: F independent of T.

        • HAS
          Posted Feb 18, 2015 at 2:13 PM | Permalink

          Pekka’s point would be that they are allowed to assume that, and the fact that Foster calculated F as the intercept of the regression of N against ΔT this might give some comfort in this regard.

        • HAS
          Posted Feb 18, 2015 at 2:38 PM | Permalink

          In fact I see from further down the thread where Pekka has now spelled out his views in the lingua franca of maths he is saying that he doesn’t need an identity, he can stand an error in the equation provided F is independent of ΔT i.e. the partial differential of the error wrt ΔT is 0 (or close to 0).

          This is of course testable in F as estimated by Foster 2013, a point that some suggested should have been done when the thread first got going.

        • andersongo
          Posted Feb 19, 2015 at 1:53 PM | Permalink

          This sounds like groundhog day. The issue was that ΔF was thought to be linearly dependent on ΔT, resulting in ΔT being regressed on a function of itself. If Pekka’s point was for all this time that F is independent of T by virtue of compensation from N, then why did he not say it from the start?
          People went round and round in circle for nothing due to strange arguments such as “Performing the regression analysis on this basis does not technically involve any circularity. There’s nothing in the later stage that affects the data obtained from the earlier stage.”
          His argument is testable and from Nic’s test regarding negative corelation of N and T (compensatory contribution of N to changes in T) then Pekka’s premise seems flawed.

  116. Frank
    Posted Feb 17, 2015 at 6:55 AM | Permalink

    Pekka and others: I’ve been trying to understand this situation from a slightly different point of view that may be useful to others.

    In climate models and the real world, deterministic variability is caused by various forcings: GHG’s, aerosols, volcanos or solar; while unforced variability is caused by fluctuations in ρ (α+κ) – the rate at which heat is removed from the surface compartment (surface, atmosphere and mixed layer). For example, if an unusual number of strong El Ninos occur in a 15-year period, less warm water is buried in the deeper Western Pacific and less upwelling in the Eastern Pacific. The warmer surface waters there warm the atmosphere. The unusual warmth in this period would be due to a reduction in κ (kappa). Hopefully, these details about ENSO are correct, but they aren’t essential to my argument.

    Using ΔT = ΔF / (α + κ), M&F have created individual models of each climate model (probably of each run), and then one comprehensive model for the ensemble of model runs. The average values for α and κ are obtained from model output, but unforced variability is NOT created by allowing α and κ to vary with time. These parameters are fixed. (If I were trying to model unforced variability, I might start by saying ΔT = ΔF / (α+α” + κ+κ”), where the double-primed parameters a represent the unforced natural variability that is observed in these parameters and the unprimed values are their average. M&F are doing something very different.)

    If the model of M&F doesn’t allow α or κ to vary, how can ΔT show unforced variability? Obviously ΔF must contain both forced and unforced variability! (And there should be a different ΔF for each model run.) If a climate model chaotically produced an unusual number of strong El Ninos and α and κ are not allowed to change, then the effective radiative forcing for that period (ΔF) must be higher than normal. In the real world and climate models, forcing is deterministic. In M&F’s equations, however, ΔF also contains the unforced variability that produces unforced variability in ΔT.

    ΔF = α ΔT + ΔN. Which terms provide the forced and unforced variability in ΔF? α ΔT is usually the larger term and it certainly provides unforced variability. ΔN may also provide some unforced variability. So Nic’s mathematical argument for circularity makes physical sense to me. You just need to remember that “effective radiative forcing” is not solely a deterministic forcing. It doesn’t help that we often use the same symbol (ΔF) both forcing and “effective radiative forcing”.

    To complete my argument, M&F’s regression equation contains ΔF, which contains both deterministic and unforced variability. Therefore M&F’s claim that the regression equation provides only the deterministic temperature change is wrong.

    • Posted Feb 17, 2015 at 7:38 AM | Permalink

      Isn’t “internal variability” essentially a lagged component of α and κ? Adding a climatic cosmological constant to the equation shouldn’t change the cause-and-effect analysis. Even if you “cheat” a little by including portions of feedback in your forcing values, the oscillating return to equilibrium would seem to be driven by feedbacks (including the lagged effect of the ocean heat sink).

      • Frank
        Posted Feb 17, 2015 at 11:58 AM | Permalink

        Opluso: Alpha and kappa are parameters that control the rate of heat flow from the surface compartment (surface, troposphere and ocean mixed layer) to space and the deep ocean respectfully in response to a change in surface temperature. AVERAGE values for these parameters can be abstracted from the output of climate models. Unless I’m sadly mistaken, unforced variability around the average value of these parameters is responsible for unforced variability in climate. For a simple derivation of alpha and kappa, see Isaac Held’s blog. Alpha is called beta in this post.

        http://www.gfdl.noaa.gov/blog/isaac-held/2011/03/11/3-transient-vs-equilibrium-climate-responses/

        • Posted Feb 18, 2015 at 7:35 AM | Permalink

          I think we are agreeing. My point was that the M&F claim (that “internal variability” explains climate model inaccuracies while feedback does not) is just another way of saying that climate models do a poor job of representing α and κ.

          Splitting “feedback” into multiple parts is no doubt helpful in calibrating models. However, the simplified physics equation only requires an error correction value because we don’t accurately model α.
          ΔT = ΔF / (α + κ) + ɛ

          The surprising conclusion of M&F

          The role of simulated climate feedback in explaining the difference between simulations and observations is hence minor or even negligible.

          depends on treating α and κ and ɛ as independent functions rather than related parts of total “feedback”.

    • Posted Feb 17, 2015 at 8:56 AM | Permalink

      “If the model of M&F doesn’t allow α or κ to vary, how can ΔT show unforced variability?”

      Injected noise as I’ve noted twice above.

      “If a climate model chaotically produced an unusual number of strong El Ninos …”

      Generally they won’t. There is no understanding of the cause of El Nino et al that permits even an approximate modelling of the process. Climate models usually just add some randomly distributed noise to make the output look more climatey.

      • Frank
        Posted Feb 17, 2015 at 11:48 AM | Permalink

        Climategrog: I’ve never heard of “injected noise”. Lorenz showed long ago that the solutions to coupled non-linear differential equations used in weather forecasting and climate models exhibit chaotic behavior. For a review, see: Lorenz (1991), “Chaos, Spontaneous Climatic Variations and the Detection of the Greenhouse Effect”. If forcing is kept constant, surface temperature in climate models show unforced variability on a decadal time scale about an average value. If changes in forcing are added (GHGs for example), there is both a deterministic change in temperature and unforced variability in temperature.

        Although it isn’t critical to my argument, climate models do reproduce some aspects of El Nino. Kosada recently showed that recent multi-decadal variability in the rate of global warming (faster from 1975-1998, slower since) could be observed in the output of a climate model if SST’s in a small portion of the Eastern Equatorial Pacific were constrained to match observed SSTs in that region.

        • Greg Goodman
          Posted Feb 17, 2015 at 12:38 PM | Permalink

          Thanks Frank,

          if you force (constrain) a certain critical part of the Pacific to follow observed SST, there will be certain effects like wind feedbacks and Bjerkness effect that are modelled and to some degree produce a knock on effect in a broader region.

          This says nothing about modelling the original cause of ENSO, it is about ASSERTING it into the model.

          Some models ( not many ) do display a degree of ENSO-like patterns but not at the right time. Just comparable wiggles. That again confirms what I said that they have no skill or understanding of the underlying cause and do not model it.

        • Greg Goodman
          Posted Feb 17, 2015 at 12:56 PM | Permalink

          Just for the record, I hypothesise that the root cause of ENSO is tidal effects on the thermocline. Since we still cannot model surface tides in a deterministic way, I’d guess we are a long way from being able to model gravitational effects on the thermocline.

          However, if you look at the ratio of the density differences at the surface and at the thermocline it suggests the predominant 12h response of the surface would correspond to something of the order of 2 years at the thermocline.

          As a back of envelop figure that is about right for ENSO.

          Spectral analysis of trade wind data shows the classic “3 to 5 years” pseudo-periodicity o fENSO may be modulation of a lunar driven 4.43y periodicity.

          Power Spectrum – W. Pacific Trade Winds (squared)

          At that point the modelled feedbacks would have something causal to bite on. But I digress.

        • R Graf
          Posted Feb 17, 2015 at 1:40 PM | Permalink

          Clive Best has written a lot on tides: http://clivebest.com/blog/?p=5986

          Can you give me your opinion on my comments regarding Second Law on bottom of post?

        • kim
          Posted Feb 17, 2015 at 1:45 PM | Permalink

          Rock me tender though the cradle is constrained.
          ==========

    • Layman Lurker
      Posted Feb 17, 2015 at 1:24 PM | Permalink

      A very interesting and insightful comment.

  117. Posted Feb 17, 2015 at 6:55 AM | Permalink

    For those who may wonder, why I haven’t answered.

    I have written one more comment to express my points better in some respect, but that comment is in moderation. Thus there’s an unknown delay before you see it.

    • Greg Goodman
      Posted Feb 17, 2015 at 7:39 AM | Permalink

      Oh dear, you did not use F-word did you ? Fami-l-i-a-r.

      I posted how to fix this a couple of days ago but our host oddly chose not to do so.

      If you did use that word, I suggest reposting using “au fait” instead 😉

    • R Graf
      Posted Feb 17, 2015 at 8:35 AM | Permalink

      Hi Pekka,

      Let me ask the following: If I grant that M&F can create any energy balance they want for T and diagnose any value they want for the black box values as long as the equation is in balance, is that what they truly did here? I see them plugging in one of the black box values from another diagnosed run on another day and substituting it in. Once they do that aren’t introducing the assumptions of all the rest of the black box variables left behind? This is why and I felt it appropriate to take the equation from the past diagnosis of the variable and substitute it in, because that is how it was derived. But plugging in any value of F from another run or other model makes the equation invalid, as it should. The algebra is telling us the truth. M&F’s results are not. Am I making sense.

      • R Graf
        Posted Feb 17, 2015 at 8:09 PM | Permalink

        Pekka,

        I realize in your timezone you should be off to bed but tomorrow I would ask that you let me know if you agree or disagree with my definition of circularity is valid or not, or whether or not it applies to climate models. Thx. Here is my test:
        “Any value brought forward from a past trial (or alternate of average) to be placed into an equation brings with it all of its assumptions. If any of those assumptions appear on both sides of the equation, no matter how small, you are breaking the law.”

    • Posted Feb 17, 2015 at 3:00 PM | Permalink

      Pekka

      Thanks for mentioning this problem – I don’t get involved in moderation, but I have been able to release your comment.

      You are however incorrect in still thinking that I have made a mistake.

      • Posted Feb 17, 2015 at 3:13 PM | Permalink

        Nic,

        On the question of mistake, we have 100% opposite views, and I have presented the evidence by a formulas in the latter of the two earlier comments linked in that comment.

        I agree that there are uncertainties. I have also doubts on the accuracy of their results, but discussing the real issues requires that the false claims are first put aside.

        • Posted Feb 17, 2015 at 4:36 PM | Permalink

          Thanks Pekka, I found that explanation clearer than your earlier posts.

          “N is strongly correlated with T, the resulting F much less as it’s approximately the ERF determined by external input.”

          On a number of occasions here you have stated that the object of the subtraction is to remove the correlation. This is not the case.

          The term αΔT is the climate feedback. The strongly negative Planck feedback +/- whatever *assumed* parametrised feedbacks they build into the various models.

          Under the current orthodox view. these are predominantly +ve and reduce the magnitude of the Planck f/b.

          The effect of a -ve feedback is to stabilise the system and will reduce the correlation. However, it will not remove it, nor is the subtraction of the climate f/b “intended” to remove the correlation. It is intended to leave the net forcing after all feedbacks ( here called ERF ).

          This will still be correlated with surface temps since it is what is driving them. If it was essentially uncorrelated, as you appear to be suggesting, there would almost zero sensitivity to external forcing. As we know, this is far from the case in the models.

          So, to the extent that climate is sensitive to forcings, there is still correlation in the explanatory variable. It is not accidental nor an result of the various inaccuracies and approximations. It is the essence of the question being studied.

          Now as I have already posted, and as yet no one has objected, if Nic’s substitution ( which as argued is legitimate ) is done, the equation can be rearranged to isolate ΔT in one term on the left and alpha disappears from the equation.

          That is why the results are found to be insensitive to alpha. If there is some residual but insignificant correlation, it is likely due to the rather gross approximations and linearisations that are being done.

          To put this one to bed and look at the real issues, I think it necessary to agree to drop the entrenched argument about whether this is “circularity” or not.

          There is collinearity in the dependant and independent variables as presented in the paper and if the equation is rearranged to remove it alpha disappears.

          Since they have been fairly open about highlighting possible issues, it appears that the authors had not realised this, since it was not mentioned.

          Thus there is a legitimate issue that has been raised.

        • Posted Feb 17, 2015 at 5:50 PM | Permalink

          Greg,

          I have stated that I find the final results of the paper very surprising. That remains true. Thus I’m not surprised, if it will be found that some part of their hypothesis turns out to be violated by the GCMs. The only thing I have argued for is that the reason is not in the basic circularity of that particular step.

          There may be problems in the accuracy of the formula that connects N and F, but there are also many other details that can deviate so much from the hypothesis that the explanation is in some of those assumptions. The use of a linear regression model can also influence significantly the results, when the actual relationships may deviate very much from the linearity.

          Nic did discuss also some of the other problems in the original post. Those may be worth a closer look.

          It would be nice to get the strong expectations and the results of analysis to agree better, whether that happens by finding, where the analysis fails to describe the CMIP5 model behavior correctly, or by helping in correcting wrong intuitive expectations.

        • Greg Goodman
          Posted Feb 18, 2015 at 3:15 AM | Permalink

          Thanks for the reply Pekka,

          I think the continued argument about whether this should be called cirularity is getting in the way. It seems to have degenerated into a battle of prides where someone has to say “yes, OK, it is circularity” or someone else has to say ” OK, I was mistaken, it isn’t”.

          Everyone agrees the result is counter-intuitive and as you say the importance is to understand why.

          I think I have pointed out why but no one seems to be considering whether I’m correct and either agreeing or refuting what I’ve pointed out:

          using Nic’s substitution shows where the problem is because alpha falls out of the equation and we see immediately that the rest of the study could never produce a result that was sensitive to alpha.

          Had the authors attempted to validated whether their “innovative” method was capable of showing sensitivity to alpha before using it and publishing , they would have realised this themselves.

          Many people incorrectly talk of “using a methodology” where they mean “using a method”. A methodology is a *study* of the method. It seems that in this case there was NO methodology, just an assumption that what seems like a good idea would work. When it gave a result that tied in with the author’s desire to explain the pause, confirmation bias kicked, they concluded they had made a “significant innovation” and published.

          Uncritical peer review at Nature let it through.

        • Greg Goodman
          Posted Feb 18, 2015 at 4:04 AM | Permalink

          Apologies, this has got so long I lost track. This is exactly what Nic did in eqn6.

          ΔT = ΔN / κ + ε

          He also notes that this has effectively removed everything but ocean heat uptake. So the continual approximations, substitutions and linearisations have ended up evacuating the rest of climate system into the error term. Which of course is being assumed by M&F to “random”.

          Pekka has objected to this substitution but this is precisely the source of ΔF used by the authors and is drawn from the 2013 paper.

          Nic’s equ6, which is nothing but an algebraic rearrangement of the authors’ present and previous work, shows that a regression where ε is (erroneously) assumed to be randomly distributed, is in effect using κ as the only explanatory parameter linking TOA variability to surface temperatures.

          The apparent presence of alpha in M&K is thus an algebraic illusion, and no matter what the models do, their result would be insensitive to alpha.

        • kim
          Posted Feb 18, 2015 at 7:49 AM | Permalink

          Curiosity
          Spoke to itself all about
          Strong expectations.
          ============

      • Greg Goodman
        Posted Feb 18, 2015 at 4:23 AM | Permalink

        I also objected earlier to their not accounting for the phase lag introduced by a linear relaxation process.

        They do mention this obliquely but do not state what it really means or implies.

        The thermal adjustment of the surface layer to DF is expected to occur within a few years. This means that for timescales of one to several decades, the surface energy balance is in quasi-steady state and reads …

        I would suggest that this is exactly the point at which alpha disappears from the equation.

        The phase lag and the difference in temporal evolution between driver and climate response ( which is not a simple fixed time lag ) is precisely what determines the depth of ocean involved and informs us about the parameters of the response.

        In sweeping this under the carpet, by calling it quasi-steady state, they are in effect assuming ZERO lag and thus instantaneous equilibration.

        From that point onwards, the real climate response that tells us about the sensitivity ( and alpha ) either gets falsely attributed to ocean heat uptake or dumped into the error term and dismissed as part of random variability.

        • kim
          Posted Feb 18, 2015 at 7:58 AM | Permalink

          Through the interstices of mathematics, and the choate cage of logic, shines beaming physics.
          ===========

  118. R Graf
    Posted Feb 17, 2015 at 9:16 AM | Permalink

    The basic question (IMO) boils down to: can F be assumed to be a constant in the same model? If so then the substitution problem is thrown out. But once F depends on T then you cannot use F from any other model or run or it will necessarily bring error.

    This is how come M&F’s results led them to believe that feedbacks dominated early years and forcings the long haul when right in the abstract of Forster 2013 he concluded:
    “The inter-model spread of temperature change is principally driven by forcing differences in the present day and climate feedback differences in 2095, although forcing differences are still important for model spread at 2095.” -Forster

    • Geoff Sherrington
      Posted Feb 18, 2015 at 3:18 AM | Permalink

      R Graf, “can F be assumed to be a constant in the same model?”
      Not so sure. In one view, a constant is a well known number that is subject to continuing refinement, like pi to a million places, so that its error can properly be ignored and the calculation algorithm can be classed as exact. Then you get the next stage of accuracy,
      Rydberg constant, Value 10 973 731.568 539 m-1 Standard uncertainty 0.000 055 m-1. Relative standard uncertainty 5.0 x 10-12 ,
      Concise form 10 973 731.568 539(55) m-1 where a measurement is involved and therefore a limitation on significant figures. Next to a less rigorous constant such as Stefan Boltzmann, Stefan-Boltzmann constant, Value 5.670 373 x 10-8 W m-2 K-4, Standard uncertainty 0.000 021 x 10-8 W m-2 K-4, Relative standard uncertainty 3.6 x 10-6, Concise form 5.670 373(21) x 10-8 W m-2 K-4.
      For better symbols, please see http://physics.nist.gov/cgi-bin/cuu/Value?ryd|search_for=abbr_in!

      When it is possible to use only far fewer significant figures for a “constant”, a limitation arises because of the physical limits of a measurement, from the absence of a verification method of sufficient validity or from known or unknown exogenous factors entering the calculation of the value of the “Constant”, or 2 or even all 3 of these. Somewhere in this spectrum we move from the nomenclature of a “constant” to a “variable”. To me, the F discussed here is the latter.

      Sorry to nit pick, I’m not even sure if this adds to the discussion.

  119. R Graf
    Posted Feb 17, 2015 at 10:29 AM | Permalink

    Taking an assumption for a variable from a prior event and plugging it into an equation that has one of it’s dependencies on the other side, no matter how small or in what form, violates algebra because it violates the arrow of determination. So M&F complied with the First Law but violated the Second Law.

  120. R Graf
    Posted Feb 17, 2015 at 11:07 AM | Permalink

    If your looking for evidence of Nic’s assertion that M&F used values of F from 2013 all I found in M&F 2015 is a footnote to Forster 2013. But their CLB response contains the following admission:

    “Because N is readily available but F is not, Forster et al. (2013), from where the time series of F were taken, used the pre-determined model property α to obtain F by:
    F = N + αT (3)
    using the N and T that they diagnosed from simulations of the 20th century.”

    Dilbert: I’m obsessed with inventing a perpetual motion machine. Most scientists think it’s impossible, but I have something they don’t.
    Dogbert: A lot of spare time?
    Dilbert: Exactly.

  121. R Graf
    Posted Feb 17, 2015 at 1:01 PM | Permalink

    Steve McIntyre posted nine days ago: “I’ve done a quick read of the post at Climate Lab Book. I don’t get how their article is supposed to rebut Nic’s article. They do not appear to contest Nic’s equation linking F and N – an equation that I did not notice in the original article. Their only defence seems to be that the N series needs to be “corrected” but they do not face up to the statistical consequences of having T series on both sides.”

    ANY value brought forward from a past trial (or alternate of average) to be placed into an equation brings with it all of its assumptions. If any of those assumptions appear on both sides of the equation, no matter how small, you are breaking the law, period!!!

    This is another way of stating the Second Law of Thermodynamics. It must be followed or it is not science.

    Please tell me what assumption I am breaking. The silence is deafening. Steve, Nic, are you OK?

  122. Michael Mayson
    Posted Feb 18, 2015 at 12:01 AM | Permalink

    R. Graf – please be quiet. You are spoiling a really interesting thread. Let’s leave commentary to those who know what they are talking about.

  123. Posted Feb 18, 2015 at 4:56 AM | Permalink

    I don’t know whether repeating the same argument once more with equations makes it clearer to anyone, who hasn’t understood my argument already, but one more time.

    The basic hypothesis is that the temperature trends produced by GCMs follow the equation

    ΔT[i,j] = a[i] + b[i] ΔERF[i,j] + c[i] α[j] + d[i] κ[j] + e[i,j]   (1)

    where ΔERF[i,j] is determined (almost) totally by externally given input and e[i,j] is internal variability that averages to zero.

    An additional part of the hypothesis is that the internal variability is mainly due to variability in the oceans. When ocean currents bring cold water to the surface we have a cold phase in surface temperatures. When the surface is colder, it radiates less and that’s reflected in the TOA imbalance leading to larger downwards imbalance. That relationship is assumed to be equal for the temperature trends caused by internal variability as it is in the creation of the long term balance, which defines α. The equation

    ΔN = ΔERF – α ΔT   (2)

    defines α, when the other variables refer to the difference between two long term equilibrium situations. Model runs, were CO2 concentration is quadrupled and a long period after that is simulated are used to determine α. The assumption is that the same α can be used for the influence of internal variability to the 15 year and 62 year trends.

    I follow Roman and use the symbol φ for the error in this assumption. Thus φ is defined by

    ΔN[i,j] = ΔERF[i,j] – α[j] ΔT[i,j] + φ[i,j]   (3)

    The estimate of effective forcing is now

    ΔF[i,j] = ΔN[i,j] + α[j] ΔT[i,j]   (4)

    and

    ΔF[i,j] = ΔERF[i,j] + φ[i,j]   (5)

    M&F use the equation

    ΔT[i,j] = a[i] + b[i] ΔF[i,j] + c[i] α[j] + d[i] κ[j] + e[i,j]   (6)

    to estimate the regression coefficients. This differs from the original assumption by including F rather than ERF. We can insert (4) to get

    ΔT[i,j] = a[i] + b[i] (ΔN[i,j] + α[j] ΔT[i,j]) + c[i] α[j] + d[i] κ[j] + e[i,j]   (7)

    but we can also insert (5) to get

    ΔT[i,j] = a[i] + b[i] (ΔERF[i,j] + φ[i,j]) + c[i] α[j] + d[i] κ[j] + e[i,j]   (8)

    Equation (7) has ΔT[i,j] on both sides indicating apparent “circularity”, equation (8) doesn’t. The equations are equally correct, but which tells more correctly about circularity?

    In equation (8) we may have influence of ΔT in φ, in (7) we have influence of ΔT both explicitly and through ΔN that is with certainty affected by ΔT. Thus the equation (7) tells correctly about circularity only, when both contributions are taken into account. Doing that we end up in equation (8), which does not have explicit circularity, but may have something of unknown nature through φ.

    The only correct way to discuss circularity in the M&F paper is by discussing the properties of φ. M&F acknowledge that such an error term leads to uncertainty. That error term has the nature of circularity, if the partial derivative of φ with respect to ΔT is not zero, when the other independent variables ΔERF, α and κ are kept constant. If that partial derivative is small, the circularity is of little concern. We have no reliable knowledge on φ, but the explicit assumption of M&F is that φ is not so large that it would severely modify the results. Contesting that assumption is legitimate, when the arguments used are not weaker than those of M&F for their assumption.

    • Greg Goodman
      Posted Feb 18, 2015 at 9:37 AM | Permalink

      This whole exercise seems like a very complicated version of :

      A = B +α -α + ε

      A = Beff +a + φ ; does not resurrect α it simply hides the fact it is irrelevant.

      further obfuscation of the problem by introducing alternative and additional terms and definitions does not improve understanding it just buries it another layer deeper. It seems that the authors confused themselves.

      Perhaps you could consider my reply to your last comment, where I point out that presence of the climate reaction is being inadvertently equated to zero.

      While the logic of doing this was that the equilibration time is sufficiently less that the averaging period, it is a mistake to ignore it.

      This move is *asserting* that temporal difference between driver and surface effect is zero and *assuming* that the error this introduces does not matter.

      As I have pointed out this is equivalent to saying there is instantaneous equilibration; zero depth of ocean involved in the climate response and a time const tau of zero and dumping the errors this induces into the supposedly “random” error term.

      Santer et al cited earlier works finding a time const of 30 – 40 months for the CMIP5 models.

      Nic’s eqn 6 shows what is really being done is regressing N and T via κ and ε, the banished relaxation response present in both the models and physical climate is being partly interpreted as ocean heat uptake, the rest as “random error”.

      The new eqns 7 and 8 are simply adding four or five more parameters and four or five more variables and is an exercise in over-fitting that data that adds no information.

      Until the importance of that inappropriate simplification is addressed, the rest becomes academic. IMO.

      • Greg Goodman
        Posted Feb 18, 2015 at 9:42 AM | Permalink

        Since it is always better to visualise what all this talk of relaxation response is all about, I again suggest looking at my recent article on Judith Curry’s site:

        On determination of tropical feedbacks

        Start with pictures 😉

        Note that the models have a tau far longer than the climate system , but that is a story for another day.

        The form of the temporal relationship is still typical of what happens.

        • Greg Goodman
          Posted Feb 18, 2015 at 9:47 AM | Permalink

          Now clearly if we are going start out by saying that difference between the light blue and the dark blue line “doesn’t matter” we’re going to be dumping most of the information about climate response into the “random error” box.

          If further analysis turns out to be “insensitive” to values of TCS, this should not be surprising.

          Now most here, including Pekka, find the results surprising and I’m suggesting this is why.

    • Posted Feb 18, 2015 at 10:46 AM | Permalink

      Pekka,

      Using equations to spell out your argument does indeed help. You wrote earlier (comment 752295) that is was hypothesised that:

      “Real ERF is affected very little by internal variability. Thus M&F must assume that the contribution of internal variability to N is approximately -α times the change in temperature due to the variability.”

      I agree that is being assumed. To test this assumption, I have examined the correlation between non-overlapping 62-year trends in N and T in ten CMIP5 model preindustrial control runs: nine models used by M&F and one model with an exceptionally long control run. The correlation was only significantly (p=0.05) negative in one case, and it was weakly positive in four cases. Internal variability appears able to change forcing, not just GMST, on multidecadal timescales, in models as well as the real world. The standard deviation of 62-year trends in T in control runs is material – typically a 0.1 to 0.2 K change.

      There is another problem. Variations across models in how well the equation ΔT = ΔF/( α+κ) holds in relation to ‘true’ forcing, arising from causes other that internal variability (such as time or state dependence of α and κ), will lead to error in estimating ΔF using (4) in the same direction as the error it causes in estimation of ΔT. That will artificially boost the correlation between ΔT and ΔF and hence the apparent ability of ΔF differences to explain intermodel differences in ΔT, similar to the effect of internal variability that affects ΔF.

      • Posted Feb 18, 2015 at 11:22 AM | Permalink

        Nic,
        I don’t have any immediate comments on what you wrote. I just observed that Pehr Björnbom has written a comment to CLB on changes over the full 144 year period available. I haven’t studied that comment carefully, but his approach seems to be similar to what I have had in mind, but not done. The results of that calculation seem to be closer to what I would expect intuitively as α and κ have a stronger influence on temperature than in M&F.

        There may be something interesting to learn in the unexpected results of M&F, whether they turn out to be reasonably correct or seriously erroneous. Many uncertain assumptions enter in their analysis, but how much the inaccuracies of each of them affects the outcome is difficult to tell without quantitative work. I haven’t spent the effort needed to obtain CMIP5 data, and have presently no plans to proceed on that.

        I tried to find from Forster et al 2013 some comments on the differences between the multiple entries from the same models in CMIP5 ensemble. Those differences might help in testing some ideas. They seem to tell in every case that they have used the average of such entries for quantities supposed to be independent of internal variability without any comment on the spread between the results. You might have the data and the tools to look at that.

        • Don Monfort
          Posted Feb 18, 2015 at 12:39 PM | Permalink

          Pekka:”There may be something interesting to learn in the unexpected results of M&F, whether they turn out to be reasonably correct or seriously erroneous.”

          Since M&F’s unexpected results producing analysis was built on assumption piled on top of assumption, wouldn’t it have been sciency of them to have done the kind of checking that has been done by Nic and Pehr? And I wonder what’s stopping them from doing some testing to find support for their methods and assumtpions, after having seen the criticisms from Nic, Ross M, Roman M. et al. I am also wondering why Pekka hasn’t done something along those lines given the amount of time and effort he has spent defending this baloney. But what do I know.

        • Posted Feb 18, 2015 at 1:58 PM | Permalink

          Don,

          There are many kinds of scientific papers, and that’s as it should be.

          Looking at the results, I have the suspicion that the CMIP5 ensemble cannot really answer the questions M&F try to figure out. The models and model runs do perhaps not cover sufficiently different combinations of forcing and sensitivity feedback parameters to allow for strong conclusions, or even to avoid misleading results. The limited information content of the ensemble on the issues analyzed forces them to use a very simple model. Choosing a linear regression model is methodologically safe, but perhaps a nonlinear model based on the original equations had still been better. The issues of statistical analysis had, however, been more complex in that case. Maximum likelihood method might perhaps been appropriate fer the estimation of the parameters of such a model.

          Many alternatives for this analysis can been proposed in retrospect (including the alternative of forgetting the whole idea). They picked one, and got out this much.

          I had a rapid look at the CMIP5 database and noticed that using it takes some effort – as Nic told in one of his comments. That made be stop digging deeper in that by myself. I never planned to spend this much time in explaining my views of the method. Thus no choice was based on an estimate of that effort.

          I have tried to write several comments in the way that would clarify also some realities of scientific work that apply more generally than only to this paper. In that this paper has acted as a case study rather than the goal by itself. (Whether I have succeeded in that at all, I don’t know.)

        • davideisenstadt
          Posted Feb 18, 2015 at 2:02 PM | Permalink

          the whole concept of an ensemble of models, or an ensemble mean, or averaging different runs of the same model..are all dubious statistical constructs.

        • Don Monfort
          Posted Feb 18, 2015 at 3:51 PM | Permalink

          Pekka, this is not some inconsequential little paper in a backwater science, like entomology. We wouldn’t be talking about an obscure journal publishing speculative research full of unexplained assumptions, on why green eyed gnats prefer diddling on Thursdays.

          “The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.”

          That’s blatant propaganda not supported by the research. You know that. The authors have gone into hiding. They aren’t going to provide their data and methods for examination. They don’t deserve the defense you have been putting on for them. Cut em loose. Just my very humble opinion.

        • Posted Feb 18, 2015 at 4:37 PM | Permalink

          Don,

          The paper is a small contribution to science. It draws attention by having been published in Nature, but that attention does not last for long.

        • Don Monfort
          Posted Feb 18, 2015 at 6:38 PM | Permalink

          The attention has lasted long enough for political purposes. You are recehorsing this one, Pekka. Science is not being well served.

      • Carrick
        Posted Feb 18, 2015 at 12:39 PM | Permalink

        Nic Lewis:

        Using equations to spell out your argument does indeed help.

        Yes indeed.

        Thanks for taking the time to write this out carefully.

  124. stevefitzpatrick
    Posted Feb 18, 2015 at 7:40 AM | Permalink

    There is a very old English expression, believed to date from 1579, which seems to apply here: “You can’t make a silk purse from a sow’s ear.”

    • Greg Goodman
      Posted Feb 18, 2015 at 8:37 AM | Permalink

      Especially when this particular sow’s ear has been put through a meat grinder, too. They are trying to infer the surface properties of the skin from the ensuing pulp.

    • R Graf
      Posted Feb 18, 2015 at 9:07 AM | Permalink

      Accepting that I know the least of anyone here, I can still tell this equation is not just shagged out after a long squawk.

      • William Larson
        Posted Feb 18, 2015 at 3:53 PM | Permalink

        R Graf–
        In the words of the man in “The Princess Bride”, “You have made one of the all-time classic blunders!” Your blunder is in stating that you know the least of anyone here–that dubious honor certainly applies to moi instead. But hey, I get to learn a few things, and from you as well. Yes, you gloriously said it: “This equation is not just shagged out after a long squawk.” To paraphrase someone else in some other universe, “My computer takes so long to shut down that I am thinking of naming it ‘M&F at CA’.”

  125. Frank
    Posted Feb 18, 2015 at 4:05 PM | Permalink

    Pekka wrote: “It’s totally clear that their regression formula cannot be derived. M&F write

    “This equation holds for each start year separately and suggests ..”

    They do not say that the following formula follows from the previous, they say that the previous arguments suggest that the linear regression model might be useful.

    You have also questioned the separation of dependence on α and κ. That’s a choice made to separate the two influences, whether they really turn out different or not. That makes sense, because the values of α and κ obtained from Forster et al (2013) for the different models are almost totally uncorrelated. The dependencies on α and κ might be expected deviate more for the 60 year trends than for the 15 year trends as their sum is inversely related to TCR while α alone is inversely related to ECS.”

    Frank replies: And I can add a population term to their regression equation and say that it is “suggested” by UHI. That term will improve the fit. In the abstract, M&F claim that their regression approach is “physically motivated by surface energy balance”, but their algebra and regression for energy balance is wrong. Surface energy balance considerations suggest a different model. One can play “assume a statistical model”, but Doug Keenan and the Met Office have taught me that a model must be based on physics or you can’t “prove” that it is “significantly” warmer today than it was a century ago. The regression equation must have the correct physics – especially when you INTERPRET the residuals as unforced variability.

    Compare the histograms for 1998-2012 trends in Figure 1c and Figure 2e. Add the ensemble mean trend of about 0.2 degC/decade to the Figure 2e histogram so we are looking at the total trend in both cases. Processing the output from 75 of 114 model runs through M&F’s flawed regression model has widened the histogram so that 1998-2012 is no longer an outlier. If their model is obviously flawed, what have they accomplished? They haven’t separated deterministic variability from unforced variability. One-third of the model output (39/114) wasn’t suitable of their analysis. And now the 1998-2012 outlier in the output from 114 models is magically within M&F’s 5-95% confidence interval.

  126. Frank
    Posted Feb 18, 2015 at 5:10 PM | Permalink

    If one looks closely at Figure 2a of M&F15, the error bars fail to connect or barely connect the ensemble mean to observations more often than appropriate for a normal distribution. For 1954-1956 the 5-95% error bars are actually too short to bridge the gap between observations and the ensemble mean. From 1950-1953, the error bars barely span the gap. The error bar for 1927 just spans the gap, but 1928-1931 are nearly as bad. From 1962-1965, the error bars don’t bridge the gap, while 1961 and 1966 span the gap. The error bars fail to span the gap in 1995, 1997 and 1998, while just barely spanning the gap 1990-1994 and 1996. I count 10 years out of 98 where the error bars don’t span the gap (not unreasonable) and 16 more years when they barely span the gap (probably unreasonable). That would put 25% of the data outside the 10-80% confidence interval. Of course, these errors are not randomly distributed; they mostly occur because excessive cooling by aerosols in the models creates excessive warming and cooling trends when aerosols change.

    Note that M&F cleverly put the error bars on the OBSERVATIONS in Figure 2a, not on the ensemble mean – where they belong. If they had put the error bars on the ensemble mean, we would see that their 5-95% confidence interval of 0.26 degC extends up to +0.57 degC/decade for 1992-2006 and down to -0.28 degC/decade from 1951 to 1956. This is a total change over 15 years of +0.86 degC (equally all of 20th century warming) and -0.42 degC. For the 20-year period beginning in 1951, more than -0.5 degC. (Negligible cooling was actually observed during this period.)

  127. barn E rubble
    Posted Feb 19, 2015 at 12:02 AM | Permalink

    RE: Pekka Pirilä Posted Feb 18, 2015 at 4:37 PM

    “The paper is a small contribution to science.”

    As much as I (and most here) appreciate your efforts to show there is always another side . . . I don’t think I’ve understood your position on the main point of the paper.

    IE: “The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.”

    I think you provide an excellent argument, perhaps from a different point of view/perspective than many (most) of the commenters here . . . however . . .

    Pardon my spelling and interpretation but: rehellisesti sanoen; do you believe climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations?

    • Hoi Polloi
      Posted Feb 19, 2015 at 3:58 AM | Permalink

      I have asked Pekka the same question already a week ago. Pekka answered as a true politician; a no-answer.

  128. Greg Goodman
    Posted Feb 19, 2015 at 1:52 AM | Permalink

    I have posted the following question at Climate log book. It’s my first post so held for moderation. I’ll see whether they accept it and what response they have.

    Greg Goodman
    February 18, 2015 at 5:22 pm

    Your comment is awaiting moderation.

    Many people seem somewhat surprised by the conclusion “For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends…” and it does seem counter intuitive.

    What surprised me about the paper is that they develop a “innovative” method but do not test it before applying it and drawing some fairly certain conclusions.

    Is it not necessary, before apply such a novel method to establish that it works?

    These kind of non validated, home-spun techniques are typical of much published work in climatology and are one of the main problems that the Wegman report highlighted in 2006.

  129. Greg Goodman
    Posted Feb 19, 2015 at 2:17 AM | Permalink

    Panel a of their figure 3 shows the 62 years trends. It would be easier to visualise had they used the mid-point of the period rather than the start date for the x axis. So it is helpful to add 31 year to the dates they use. The x axis would then show that the periods are centred on 1931 to 1983.

    http://www.nature.com/nature/journal/v517/n7536/fig_tab/nature14117_F3.html

    We can see that ensemble mean trends centred on 30s and 40s were well below the observational data. Firmly outside the declared measurement uncertainty.

    As modelled AGW kicks in they begin to narrow the gap. There is a limited period from 1975-1980 where the they match, then a steadily increasing divergence on the hot side, which becomes more marked in the last 7 years of the record.

    This is also highlighted in figure 2a,b I excerpted above.

    That is, even after taking a 62 year average that will remove AMO, PDO and ENSO variability there is a very clear progression across the full range of the record from serious underestimation to an increasing over estimation of the rate of change of temperature.

    Plotting the difference of the 62 year trends of the ensemble mean and HadCRUT4 and comparing to the calculated AGW should be a good indication of the degree to which they are over-estimating AGW.

    Though their method is untested and their conclusions about alpha probably erroneous, there is some useful information to be gained from some of the other information provided in this paper.

    • Hoi Polloi
      Posted Feb 19, 2015 at 4:03 AM | Permalink

      Reproducibility is one of the cornerstones of modern science

      http://www.theguardian.com/science/2015/feb/18/haruko-obokata-stap-cells-controversy-scientists-lie

      • Greg
        Posted Feb 19, 2015 at 6:51 AM | Permalink

        Yes, I saw that article. Entitiled “Why scientists lie?”

        It interesting to compare how that issue was dealt with and compare to climatolgy.

        At least the japanese still have enough honour and integrity to know when to fall on a sword.

        The “presigeous” 😉 journal Nature was at the heart of that one too.

    • Posted Feb 19, 2015 at 7:52 AM | Permalink

      greg, “Panel a of their figure 3 shows the 62 years trends.”

      But what are they comparing? To get an apples to apples comparison I have to use 70% tos (ocean) and 30% tas (air) for hadcrut4 or a combination of ERSSTv4 with Berkeley. The models mainly miss the oceans and since there isn’t an official marine tas, there isn’t an official “global” tas. Comparing model tos in degrees with global ocean SST in degrees provides a more realistic comparison. Comparing model land tas to Berkeley land tas in degrees takes more time, but is better than comparing anomaly on an arbitrary baseline.

      • Greg
        Posted Feb 19, 2015 at 11:04 AM | Permalink

        I agree in principal that these sea+land, water+air means are phyically corrupt. I usually ask: what is the average of an apple and an orange ? Answer: a fruit salad.

        However, eternal arguments about circularity to one side, there is some things that can be taken from this paper that the authors will not be able to argue against if we use their choice of data and their graphs.

        It appears that simply subtracting hadCRUT4 from the ensemble mean leaves a pretty clear rising bais that looks a lot like the progression of CO2 “forcing”.

        It seems that this would be a much clearer indication of the fact that the models are over sensitive to CO2.

      • Steven Mosher
        Posted Feb 19, 2015 at 1:50 PM | Permalink

        “The models mainly miss the oceans and since there isn’t an official marine tas”

        Its called MAT.

        Click to access HadISST_paper.pdf

        http://www.ncdc.noaa.gov/bams-state-of-the-climate/2009-time-series/mat

        and there isnt an “official” anything.

        • davideisenstadt
          Posted Feb 19, 2015 at 1:56 PM | Permalink

          well mosh…
          if the complaint is “there isn’t an official marine as”
          and your response is “Its called MAT”, followed by
          “….there isnt an “official” anything.”, one is hard pressed to understand your response. Is there an official marine TAS?
          apparently according to you there is…and then, there isn’t.
          BTW,
          there is an official soft drink of the olympics…
          so to write that there isn’t “an “official” anything” is,
          on its face, incorrect.
          For someone who has an obsession with pissant corrections and usage, youre pretty sloppy today.

        • davideisenstadt
          Posted Feb 19, 2015 at 1:56 PM | Permalink

          Thats “official marine TAS” my apologies.

        • Posted Feb 19, 2015 at 4:45 PM | Permalink

          Mosher, there is no official anything. I thought you were the official spokes model for models 🙂

          There are MOHMAT and HADMAT1 which are attempts at a night marine air temperature but with the models it appears you only have tas, tas max, tas min and tos to choose from. So if you compare the model mean tas to hadcrut4 you have apples and oranges. I believe you told me “I” can’t compare models to observations, but apparently M/F are trying to do just that.

          By comparing the 62 year model anomaly with the 62 year hadcrut4 anomaly they are completing the fruit salad. When Berkeley published their product they (that would include you) provided a global mean “temperature” with a remarkable +/- ~0.06 C of uncertainty and the models produce a “temperature” output. That should allow a more meaningful direct comparison of real temperatures not anomalies, unless of course your uncertainty interval is meaningless.

        • Steven Mosher
          Posted Feb 20, 2015 at 4:41 PM | Permalink

          simple david.

          there are many MAT products. see the link.
          none is labelled “official”
          same with SAT. there are 5 or more products. each slightly different.

          there is no international body that says This is the official.
          there are versions. and folks argue about which is best.

          so there is MAT for the obsevervations ( several)
          none is labelled “official” that I know of.

          Finding a certification would be your first step

        • davideisenstadt
          Posted Feb 21, 2015 at 8:36 AM | Permalink

          sorry mosh…
          there is always the issue of ambiguity in your brief posts…
          I inferred that your quote “its called MAT” referred to the closest thing to an “official” Marine product, as you put it…apparently you were correcting the commenter and letting him know that the time series in question wasnt TAS, it was MAT.
          BTW, even though you can be quite prickly, you were of great help to me when i first encountered R, so I think kindly of you for that.

        • Posted Feb 21, 2015 at 10:23 AM | Permalink

          David, for the CMIP5 model runs there were “official” as in recommended data sets. For SST and sea ice there is a merged product of HADISST1 and version 2 of the NOAA optimally interpolated ocean temperature data sets. The models output a tas (temperature air surface) for the globe meaning 70% of that would be a marine tas which may or may not be equivalent to one to the MAT products. Since the original CMIP5 model runs for AR5 there have been a few changes to some of the versions and volcanic forcing has a new reconstruction by Crowley and Unterman 2013 which is considerably different than the volcanic forcing estimates used for AR5.

          As I said, most of the model misses are sst related and likely due to outdated volcanic forcing estimates among other things. I don’t consider model error due to poor input data to be a very good test of model “natural” variability emulation.

  130. Craig Loehle
    Posted Feb 19, 2015 at 10:44 AM | Permalink

    Either what M&F did was just too sophisticated for any of the dumb readers here to understand, or it is not clear what they did and what it means even for a roomful of statisticians and engineers. After 700 comments and applying occams razor, I conclude the latter.

    • Greg
      Posted Feb 19, 2015 at 11:12 AM | Permalink

      I think what they did was too sophisticated for the authors to understand too. But the answer fitted their personal biases so they concluded it must be “right”.

      Peer reviewers at the “prestigeious” journal Nature were as uncritical as ever and it got published.

      The protracted discussion here has being trying to indentify how they got to this surprising result and to find where they went wrong.

      This whole waste of time would be unnecessary had the authors attempted to validate their “innovative” method before publishing a study based on it.

      Once upon a time, when publishing a new method the custom was show that it worked. First.

      • Greg
        Posted Feb 20, 2015 at 3:35 AM | Permalink

        The authors do not even seem to realise what they were doing with their sliding “trend” is applying a running mean filter to rate of change of temperature.

        Running mean is a crappy low-pass filter that introduces a lot of distortion.

        Looking at spike in 1991 on their figure 2a of 15y “trends” it is obvious that they have significant amount of sub-15y variability in their result.

        On top of the climate “noise”, measurement error, additional errors introduced by piling on linearisation approximations at all stages, ignoring the lagged nature of the response, they then insert further distortions by poor data processing.

        It is really unsurprising that this method fails to detect anything.

        Therefore, that failure to detect an influence of α and κ tells us nothing about climate or climate models but tells us a lot about the competence and rigour of these authors.

        On the evidence of this paper I have a lot of trouble agreeing with Pekka’s suggestion the Piers Forster is an “expert” and we amateurs may be mistaken.

        This paper is frankly amateurish.

        • Posted Feb 20, 2015 at 9:26 AM | Permalink

          Greg –
          The sliding trend is not the same as a running mean filter on the rate of change of temperature. It’s equivalent to a low-pass-filtered version of the temperature first differences, with a weighting function which is parabolic in shape, similar to a Welch window.

        • Carrick
          Posted Feb 20, 2015 at 10:43 AM | Permalink

          HaroldW:

          The sliding trend is not the same as a running mean filter on the rate of change of temperature. It’s equivalent to a low-pass-filtered version of the temperature first differences, with a weighting function which is parabolic in shape, similar to a Welch window.

          Whether they were aware of this is another question.

          Nick has a couple of posts up here and here on derivative (Savitzky-Golay) filters that comments on this.

          (But for some reason he only shows the real part of the transfer function. I never got around to asking him why.)

        • Posted Feb 20, 2015 at 11:25 AM | Permalink

          Thanks Carrick, I hadn’t seen Nick’s posts.

        • Posted Feb 20, 2015 at 2:35 PM | Permalink

          Carrick,
          <"(But for some reason he only shows the real part of the transfer function. I never got around to asking him why.)"
          The filters are real and either symmetric or antisymmetric, so Im or Re are zero. I show the non-zero part.

        • Carrick
          Posted Feb 20, 2015 at 4:39 PM | Permalink

          That makes perfect sense, thanks.

          C-Lion Man

    • stevefitzpatrick
      Posted Feb 19, 2015 at 11:46 AM | Permalink

      Craig,

      I suspect that you are right of course, but all the 700+ comments on this thread may not amount to much.

      What is needed is a formal publication which lays out the logical and procedural problems with the M&F paper, similar to how O’Donnell et al pointed out the serious flaws in the Steig et al Antarctic warming paper that made the cover of Nature in 2009. Like M&F, Steig et al appeared motivated by a desire to ‘explain’ an apparent discrepancy between models and reality. Like Steig et al, M&F adopted very doubtful methods, and claim as credible results which are contrary to any reasonable expectation.

      If no formal refutation of M&F is ever published, then this very dubious paper will for the next decade or more be trotted out as an ‘explanation’ for the divergence between the CMIP5 ensemble projections and measured reality, or at least until the divergence is so large that even M&F’s ‘internal variation’ can’t explain it. I accept that the M&F authors believe their analysis, as did the authors of Steig et al. I also believe they are similarly mistaken; perhaps they were beguiled by results which fit their hopes/expectations, just as Steig et al almost certainly were.

      I still find it very odd that so many recent papers reach the conclusion of ‘no significant error in the models’, despite their divergence from reality, even while those papers reach that conclusion via a dozen different assumed mechanisms. William of Ockham would perhaps have a different suggestion for the cause of model/reality divergence.

      • Don Monfort
        Posted Feb 19, 2015 at 12:08 PM | Permalink

        ” Like M&F, Steig et al appeared motivated by a desire to ‘explain’ an apparent discrepancy between models and reality. Like Steig et al, M&F adopted very doubtful methods, and claim as credible results which are contrary to any reasonable expectation.”

        It would be interesting to know how many doubtful innovative methods they tried, before they came up with the results they were after. We should compile a list of the other numerous examples of the climate science method: confirmation bias motivated by noble cause corruption.

        • stevefitzpatrick
          Posted Feb 19, 2015 at 1:23 PM | Permalink

          Don,

          I do not suggest willful effort to find a desired result. In Steig et al, the authors made choices in their analysis (eg. small number of retained PC’s) which basically smeared substantial peninsula warming over the entire continent…. while peninsula warming mostly disappeared (in conflict with reliable on-the-ground measurements for the peninsula!). The rest of Antarctica, with fewer ground measurements, was artificially warmed by the smearing. Steig et al were expecting/hoping to discover warming over the whole of the continent, so my guess is they didn’t critically examine if their analysis choices made any sense…. in light of the loss of peninsula warming, it seems pretty clear they didn’t make sense, but the authors probably were not aware of that. I think M&F fall into the same kind of trap. As Feynman noted, the easiest person for you to fool is yourself.

        • Don Monfort
          Posted Feb 19, 2015 at 2:24 PM | Permalink

          Yeah Steve, you can tell how sincere they are by how conscientiously they make their data and methods available to those who want to check their work and by how willingly they own up to their errors.

        • kim
          Posted Feb 19, 2015 at 8:12 PM | Permalink

          There’s circularity in the duct-taping around and around and around the models.
          =============

      • Greg
        Posted Feb 19, 2015 at 12:56 PM | Permalink

        “I still find it very odd that so many recent papers reach the conclusion of ‘no significant error in the models’, despite their divergence from reality”

        I’m not at all sure that is what the current paper shows, despite some that may conclude that.

        What is shows is that the current divergence is not more larger than past divergence.

        Put this the other way around: the models have been consistently as bad in hindcasting the period with known data that they were trying to match, as they have been in anticipating the lack of warming.

        The divergence is not a new problem, models have always been this bad. Too much attention has been focused on a relatively short period between 1975 and 1998 which they were tune to reproduce more accurately.

        Look at their fig 3, the divergence in the earlier 62y periods ( centred on 1930-1940 ) were far worse than the divergence at the end.

        Their fig 2a,b also shows this.

        The authors should be credited for pointing this out. The reader should not allow himself to be guided only by the abstract.

        • stevefitzpatrick
          Posted Feb 19, 2015 at 1:37 PM | Permalink

          Greg,

          I guess it depends on how other people ‘use’ the results from M&F. It it clear from headlines that M&F is already being used to say the the current divergence of models from reality is “not significant”. If the authors are trying to say the models are overall pretty poor, over the entire instrumental period, then they ought to say that in response to inaccurate claims about the conclusions of their paper.

        • Streetcred
          Posted Feb 19, 2015 at 10:36 PM | Permalink

          Makes it a lot easier, Greg, when the historical ‘data’ is constantly ‘homogenised’ to convergence with the hindcasts. In either direction, they are not fit for purpose.

      • Bob
        Posted Feb 19, 2015 at 4:07 PM | Permalink

        Steve, was Steig et l ever retracted or a corrigenda issued?

  131. R Graf
    Posted Feb 19, 2015 at 1:50 PM | Permalink

    You may delete my previous comment.

  132. R Graf
    Posted Feb 19, 2015 at 2:32 PM | Permalink

    If our aim is to improve the methodology we will need to clarify the problem to make it apparent to an audience outside of top physicist and statisticians. A lawyer would break the issue down for a jury but it helps the lawyer think too. What if we model the method if every-day terms like travel time to work? What if we start with the assumption that we can break down travel time by time stopped in traffic and time in motion. We know those two items account for total time but we want to account for traffic and possible breakdowns as well so those are variables we add.

    travel time = (time stopped + time moving) (traffic factor) + Unpredictable breakdowns

    t = (S + M)(f) + U

    We can run tests when there is no traffic to normalize perfect traffic factor as f = 1

    How about if one of our climate model experts replace Forster’s and M&F’s equations into these terms and describe it as a historical travel time study. Then one our statisticians can test validity if we even need to get to that point. Just a suggestion. Anybody in?

    BTW, I have comment in moderation. I don’t know why but it can be discarded.

  133. Curious from Cleathropes
    Posted Feb 20, 2015 at 4:40 AM | Permalink

    Still trying to get my head around this particular issue. I wonder if the following analogy is opt? When using the least squares method to analyse trends a requirement is that the individual results are statistically independent (I am lead to believe – though I have never actually done this – that this is due to the fact that the calculation required the inversion of a large matrix and if the data points are independent the matrix will be “orthogonal” making this step trivial). Of course you can apply least squares with data elements that are not independent and you will get results. However, the best that can be said of such results would be that the estimates of the errors are overstated and at worst meaningless?

    Many thanks for anyone whom can take the time to respond.

    • Greg
      Posted Feb 20, 2015 at 7:11 AM | Permalink

      yes, there are many conditions for OLS to give the “best unbiased linear estimator” that are being gleefully ignored.

      Sadly this is not limited to climatology, although the level of general incompetence in this field leaves one reeling.

      Another condition that is often ignored ( but is not the case here ) is that the x variable should have negligible error.

      That is particularly pertinent in the many attempts to estimate climate sensitivity by regressing dRad on dT.

      I wrote an article on this, much of which was incorporated into my recent article on Judith’s blog.

      On inappropriate use of least squares regression

      Some corrections can be applied in some cases the but first requirement is to know what you are doing and avoid improper regressions in the first place, if possible.


      I thought it was Nature that has recently adopted a policy of have at least on statistically competent reviewer on all climate related papers,

      Was I dreaming when I read that?

      • Greg
        Posted Feb 20, 2015 at 7:15 AM | Permalink

        PS another conditions is the “error” terms ( ie all that is not the linear relationship being sought ) should be of random ( gaussian ) distribution.

        There is quite some flexibility on this but having strong cyclic variability , for example will bias the result.

        Also a significant lag will decorrelate the relationship.

      • thisisnotgoodtogo
        Posted Feb 20, 2015 at 12:30 PM | Permalink

        If my memory serves, “Science” made that stipulation.

    • RomanM
      Posted Feb 20, 2015 at 8:21 AM | Permalink

      Least squares methods (and maximum likelihood estimation) do not require that the any of the variables be independent of each other. What is important is the proper identification of the (unobserved) random components in the statistical model and how they relate to each other and to the observed data being analyzed. The mathematics can then deal with the estimation of unknown parameters and of the values of the random variables themselves.

      In the case where a variable containing such a random component appears more than once in the set of equations defining the relationships in the system, one needs to ensure that all of the appearances of the randomness are properly taken into account when applying the math. Otherwise, the results will not be reliable and any statistical interpretation of those results will be incorrect.

      In the regression in M and F, the authors posit that ΔT can be decomposed additively into a deterministic component and a random component: ΔT = ΔT” + ε. Since the same ΔT is used in defining ΔF: ΔF = α ΔT + ΔN = α (ΔT” + ε) + ΔN. The only way ε can disappear from that equation is to have ΔN = ΔN” – αε where ΔN” (as well as ΔF) is either deterministic or itself has a random portion which is independent of ΔT. This is what I meant in an earlier comment about ΔN “masking” the effect of ΔT in defining ΔF. Assuming that ΔN is of this form does not seem to me to be warranted so carrying out a simple regression done in M and F would be flawed.

      • R Graf
        Posted Feb 20, 2015 at 9:03 AM | Permalink

        Roman, your breakdown of the problem in common sense terms is the best that I have seen.

        Am I understanding correctly the Forster equation simply did not break out variability as a factor. Yet it certainly was there and had a hand in determining F, radiative forcing.

        Then M&F write their equation with F and the variability in it is forgotten. Is this right?

        Also, whereas all of these relationships are circular and the only thing that allows the variables to manifest is their relative kinetics, it seems that time is a critical missing component to all the equations. After all, the way the equations are written the effects are reversible. (I am not suggesting time should be added and anyone spend their time trying to solve it. They should have programmed the computer to output the data that was desired.)

      • Greg
        Posted Feb 20, 2015 at 9:22 AM | Permalink

        Thanks for the expert input, Roman.

        “What is important is the proper identification of the (unobserved) random components in the statistical model and how they relate to each other ”

        Isn’t this a rather liberal use of the word “random”? Something the authors do liberally 😉

        The description that they are “random” implies that they have no effect on the regression estimation. Indeed the authors frequently use the adjective random with the implicit assumption that they can then be ignored.

        In the purest sense OLS assumes normally distributed “errors”, doesn’t it?

        It seems to me that a lot of the reason for the surprising conclusions of this paper is that they are shunting off statistically important variability into “random error” terms and duly ignoring them entirely.

        If you are still in agreement with Nic’s substitution, it would appear that the dependency on alpha has been shunted off into the error term.

        Could you comment on that interpretation?

      • Carrick
        Posted Feb 20, 2015 at 10:56 AM | Permalink

        Just a follow up on RomanM’s comment… while it is true that the variables (basis functions) of the LSF need not be orthogonal to each other, you do pay a price when they are not.

        You get “noise amplification” which is proportional to the square root of the ratio of the largest to smallest eigenvalue of the Hessian matrix

        This is why singular value decomposition or similar techniques are used in the inversion process, which reduce the amount of noise amplification by dropping the include of eigenmodes with very small eigenvalues (but at the expense of a loss of fidelity). In particular for Gaussian white noise the factor is just

        \sigma_{fit}^2 = {\lambda_{max}^2 \over \lambda_{min}^2} \sigma_N^2

        (hopefully the notation is obvious).

        Whether M&F used SVD (or similar) is something I haven’t checked, but technically for a problem like this, they should.

        • Carrick
          Posted Feb 20, 2015 at 10:58 AM | Permalink

          My link to the Hessian Matrix got dropped somehow. (In case this one also is dropped, it’s a standard term that you can find in Wikipedia.)

      • Frank
        Posted Feb 21, 2015 at 6:29 PM | Permalink

        Roman wrote: “In the regression in M and F, the authors posit that ΔT can be decomposed additively into a deterministic component and a random component.”

        Don’t regression residuals contain things besides “random components”? If I perform a linear regression one variable has a quadratic influence, the residuals will contain systematic errors, in addition to “random components”. M&F’s regression equation is a poor approximation of the physics of surface energy balance and contains an extra, inappropriate degree of freedom. I don’t understand how they get away with equating the residuals with “unforced variability”. I don’t know how any analysis of signal with a chaotic component can separate “unforced variability” from possible flaws in the regression equation and from uncertainty in the variables (T, N, alpha and kappa). I would benefit from a clear discussion of “random components” in chaotic systems.

        The allegedly deterministic portion of the regression equation contains ΔF, which is calculate from ΔT – and T contains unforced variability. M&F need to prove that the unforced variability contributed by ΔT and ΔN is negligible.

  134. Posted Feb 20, 2015 at 10:33 AM | Permalink

    Here is a comparison of the 15y running average of CMIP5_rcp4.5 ( essentially the same thing they are doing with “sliding trends”, compared to a real low-pass filter of the same data.

    Compare this to the red line in figure 2a of the paper.

    It follows that their conclusions about 15y “trends” are dominated by the distortions and inadequacies of their data processing and thus their conclusions in this regards are spurious.

    I think the problem with 62y “trends” lies elsewhere.

    • Posted Feb 20, 2015 at 10:41 AM | Permalink

      BTW the peak at 2000 here corresponds to the peak they show around 1992 since they use the beginning of the 15y period , not the mid-point.

      I use a 78mo 3-sigma gaussian filter which has similar frequency characteristics to the 180mo ( 15y ) running mean without the leakage and distortions of the latter.

      A sliding 15y “trend” is the same thing as a running average of dT/dt. Identical mathematically.

      Their red line is somewhat smoother since they are doing individual regressions. I used the ‘anomaly’ of the CMIP5 ensemble mean, 60S-60S rather than the detailed HadCRUT mask.

      • Posted Feb 20, 2015 at 3:13 PM | Permalink

        here is the excess rate of change between CMIP5-rcp4.5 tas and HadCRUT4 :

        M&F’s observation that the current departure is not exceptional is true. They really have been just as inaccurate at hindcasting even when trying to match the historical record. This certainly does not increase the confidence we should have in the models.

        However, what they are trying to sweep under the carpet in presenting it like that is that the deviation has progressed from totally missing the early 20th c. warming, to currently missing the lack of warming.

        There has been a steady progress from under-estimation to over-estimation of warming. This, despite also over-estimation of the cooling effect of volcanoes.

        All of this underlines that the models are over sensitiveness to radiative forcing.

        M&F’s primary conclusion is counter to what is shown by the very data they chose to try to demonstrate it.

      • Posted Feb 21, 2015 at 1:21 AM | Permalink

        Slight correction for the record. The RM of dT/dt is not identical to the sliding “trend” since the mean minimises the absolute deviations not the squares of the deviations.

        However, it has the same temporal structure which is the origin of the distortions and inversions caused by a running mean.

        Data corruption by running mean “smoothers”

        The peak of the inverting lobe of the running mean is at window period / 1.433. In the case of 15y window that is 10.5 years. So any variability around that period will not be removed but inverted.

        Their choice of a 15y window is most unfortunate for studying models where a very significant part of the variability comes from volcanic forcing and the latter part of the record is dominated by two major eruptions about 10.25 years apart.

        • Posted Feb 21, 2015 at 3:20 AM | Permalink

          I just found a comment by HaroldW above about the difference between sliding trend and running mean of dT/dt.

          In effect the square window of the RM is replaced by a Welch window, so it is slightly less distorting. As I said above OLS trend minimised the squared errors ( the variance) rather than absolute errors.

          However, the essence of the problem remains. He links to a very good discussion by Nick Stokes who shows the frequency response of the sliding trend “filter”.

          We see the large negative lobes that cause the inversions in the data that I referred to.

          Though not “identical” the problem is essentially the same. The distortions introduced by the sliding trend are very similar to those of a running mean and it would be far better to use a properly chosen filter.

        • Posted Feb 21, 2015 at 4:24 AM | Permalink

          Nick Stokes’ articles , from which the above graph comes:

          http://www.moyhu.blogspot.com.au/2015/01/trends-breakpoints-and-derivatives.html

          http://moyhu.blogspot.fr/2015/01/trends-breakpoints-and-derivatives-part.html

        • Posted Feb 21, 2015 at 11:45 AM | Permalink

          The red line in Nick’s graph above is the frequency response of the “sliding trend”.

          It can be seen that the negative inverting peak is almost 50% of the main peak that we are interested in.

          It’s worse than we though (TM) !

          It is not surprising that M&F could not find anything after the way they mangled the data.

          Had they used a gaussian-derivative ( with sigma=5y for ex. ) they may have more of a chance of getting a result.

  135. R Graf
    Posted Feb 20, 2015 at 9:13 PM | Permalink

    To bring back the focus on the core questions here:
    1) Was the equation used by M&F appropriate for the goal, valid, fed with untainted values?
    2) Were M&F’s results in contradiction to other studies?
    3) Was M&F’s conclusion warranted by their results?
    4) How can we devise tests to determine the above?

    How about we do an inventory and have each weigh in on each of the above?
    Would it be good if afterward someone took the lead to assign further investigation?

    • Posted Feb 21, 2015 at 1:31 AM | Permalink

      Good idea.

      No one seems too interested what I’ve shows about their defective sliding “trends” but it basically invalidates anything they are doing with 15y windows. The rest of questions then become immaterial.

      However, I think the 62y is long enough and there does not appear to be significant energy in the system around 62/1.433 = 43 years, so they probably hit lucky there.

      So I suggest further consideration of 1) and 3) should be restricted to the 62y case. All their results and conclusions for 15y are invalid.

    • R Graf
      Posted Feb 21, 2015 at 12:53 PM | Permalink

      Here’s a silly question, if someone doesn’t mind answering: if M&F’s goal was to determine variability in the temperature signal versus its rise why not simply analyze only that? Why bring feedbacks into it at all? Why not simply run the models and use the equation T = aF +e ? After all, who cares what the CO2 forcing is vs. the aerosol forcing vs. the feedback? It’s different in most of the models, which is the whole point to the design of eventually determining the right mix? The skeptic’s question is: “are the models systematically overestimating adjusted forcing?” Could M&F’s unnecessary complication of adding feedbacks brought in error that is clouding the analysis?

      • Don Monfort
        Posted Feb 21, 2015 at 3:21 PM | Permalink

        My guess is that their goal was to debunk the claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations and they didn’t get what they wanted, until they tried innovative methods.

        • Posted Feb 21, 2015 at 10:55 PM | Permalink

          That would be untested innovative methods. Ones that invert 50% of the signal.

        • Posted Feb 22, 2015 at 1:44 AM | Permalink

          Here is the freq response of the gaussian derivative filter I used above to get the ‘smoothed’ difference between CMIP5 and HadCRUT4 rate of change.

          sigma=5y attenuates 50% at 16y and is roughly what M&F seem to have had in mind by using 15y ‘trends’.

          Now what is needed is to do the same thing for a high sensitivity and low sensitivity model and see whether there is any noticeable difference from the range TCS available within CMIP5.

          Anyone know which models are high and low TCS?

        • Posted Feb 22, 2015 at 2:08 AM | Permalink

          Compare this to shape of the red line in Nick’s plot above, which represents the “sliding-trend” used by M&F

          Marotzke and Forster’s circular attribution of CMIP5 intermodel warming differences

          The negative lobes invert that data and totally corrupt it . That one reason why they found a negative result.

        • Frank
          Posted Feb 22, 2015 at 2:21 AM | Permalink

          For ECS, TCR of Models see http://eprints.whiterose.ac.uk/76111/22/JGRA_PFjgrd50174%5B1%5D_with_coversheet.pdf Tab. 1.

        • Carrick
          Posted Feb 22, 2015 at 10:10 AM | Permalink

          climate grog:

          That would be untested innovative methods. Ones that invert 50% of the signal.

          I believe that’s an overstatement of the severity of the problem. What you state would be true for white noise.

          For climate signals, typically the amplitude spectrum goes as 1/f to some power , so a number like 10% rather than 50% is probably closer to right.

        • Carrick
          Posted Feb 22, 2015 at 10:22 AM | Permalink

          Also, I think there is no problem with using sliding mean windows or sliding trend windows as long as you use the tool correctly.

          That is, if you are interested in the “gross rate of change” rather than fine details you will not deceive yourself. Except for a few well behaved filter designs, you should disregard variability that is in the roll-off region(s) of the filter.

          The problem is when you start engaging in “wiggleology”, which I would define as the study of non-robust features of the data series…overanalyzing features of the data that may be a product of the measurement process, rather than intrinsic features of the data.

      • R Graf
        Posted Feb 22, 2015 at 12:52 AM | Permalink

        Pehr Björnbom’s Feb. 20th post on CLB seems to say the M&F’s results are the opposite of what should be expected looking at the CMIP5 historical database of model behavior. That is a powerful indictment that is hard to explain or for authors to ignore. Greg, I think your CLB comments are even clearer and also very reasonable.

        Looking at the the M&F equation again in light of my realization that it is the identical equation as Forster 2013 but modified, I believe the modifications are invalid. When substituting in the 2013 equation all terms should cancel except for variability as such here:

        F = N + aT 2013 Forster

        aT = F – N

        T = (F-N)/a Correct algebraic manipulation

        T = F/(a+K) 2015 M&F

        The TOAI -N needs to added to the numerator. Although kappa is not a radiative property it could remain in the demominator as long as it is treated as a non-linear variable. But if placed in the denominator if should also have been used to diagnose the GHG forcing. Otherwise, could also be accounted for like variability, by adding an extra term that diminishes with time. Kappa and variability are outside the energy balance and these anomalies are adjustments to account for the temporary imbalances. Please provide thoughts and objections.

    • R Graf
      Posted Feb 21, 2015 at 6:46 PM | Permalink

      Still on question 1. but looking equation validity, in all the furor I think we all missed the obvious, the equation is not circular to Foster 2013, it is the same equation with a simplification on forcing and an elaboration on feedbacks. Although I can see Forster’s 2013 equation as reasonable to his question as to how much effect does radiative forcing have delta T output versus feedbacks relative to time series, I do not see M&F’s rationale.

      Forster, recognizing the climate model creator’s oversight to produce direct output on this question, used obtainable known model TOA imbalance values to deduce radiative forcing separated from feedbacks. Forster could have gone the extra mile to separate out feedbacks, like changes in lapse rate, cloud formation and aerosol concentrations, from non-radiative ocean heat uptake. Of course, that would have been valuable data but apparently Forster saw that as too hazardous, a bridge too far. Forster also stays mute on variability, a monstrous unknown. He concludes simply: “The inter-model spread of temperature change is principally driven by forcing differences in the present day, and climate feedback differences in 2095, although forcing differences are still important for model spread at 2095.”

      But in 2015 M&F, with adjusted forcings easily available to accomplish their need to run a study on variability, they instead opt to use Forster’s painstakingly diagnosed radiative forcings that included an unknown amounts of kappa and unknown variability, (although mostly regressed out, still some) and set up an equation that I don’t think accurately represents the physics. Does not alpha belong in the numerator as the inverse of climate sensitivity? And since TOAI is a known quantity and it is hypothesized by M&F to be a significant value relative to T for many years, shouldn’t N and dT be broken out, (as we did in our through circular substitution,) and the alpha be a factor of dt, like Forster 2013 wrote? Kappa should also be there also but alpha and kappa need to be on different functions relative to time and T, as they are not linear to T alone. Or, this feedback mess could all be avoided by using already adjusted forcings. Right?

  136. Gordon Hughes
    Posted Feb 22, 2015 at 3:58 AM | Permalink

    Since Nic quoted two of my more pithy remarks about the M&F paper I would like to pick up on two themes in these comments. First, about peer review. This is and can not be a form of audit. Even in the days when referees were occasionally paid (not much), academics did not have the time or resources to check the validity of experiments or statistical work. I regard my function as a referee as being to consider whether the questions addressed by a study are interesting, coherent and properly investigated. The majority of papers do not meet one or more of these requirements, including many that appear in refereed journals. It is ridiculous to claim that peer review is the “gold standard” for academic work. Papers that appear in good peer reviewed journals may be more reliable, on average, than those which appear in other places but even that is hard to prove and the distributions overlap. Claims about the merits of peer reviewed publication are just a variant on the self-protection of medieval guilds or some modern professional bodies.

    Second, about statistics and statistical methods. The original purpose of statistics was to provide a way of summarising large amounts of data. It developed into a framework for testing hypotheses. The M&F paper confuses these two elements. It relies upon least squares as a way of summarising results from runs of climate models but then makes claims about hypotheses (concerning climate sensitivity) that are not warranted by the evidence and methods used. The mis-specification of the model due to the inclusion of an endogenous variable means that the coefficient estimates are biased and thus do not provide a reliable summary of the evidence. Further, conventional procedures for testing hypotheses are not robust to such violations of basic assumptions, so that no valid statistical inferences can be drawn from the results.

    For the last 20-30 years econometricians and statisticians have focused increasingly on the way in which statistical data is generated because that must be the starting point for either of the uses of statistics. In this case, the M&F data consists of runs of a set of climate models using standardised inputs and various values of key parameters. The data does *not* concern climate (except by accident), they relate to the performance of climate models. These are not randomly generated models since they have been calibrated in different ways to reproduce historical outcomes. This is reinforced by a reliance upon summary statistics for overlapping periods so that the observations are in no way independent. As a consequence, a competent statistician would devise an entirely different statistical model to summarise the data from that used by M&F and any hypothesis tests would have based on that model.

    The paper may contain useful information about the properties of climate models, but it tells us nothing about the actual climate. But, would Nature have published a paper that was accurately formulated and executed? It seems rather unlikely. One suspects that the merit of the paper as far as the journal was concerned lay in ill-founded claims about the implications of the exercise. While the authors are responsible for the content of the paper, the whole episode tells us a lot about the weight that should – or should not – be put on material published in high impact journals with defective and skewed review processes.

    • HAS
      Posted Feb 22, 2015 at 1:51 PM | Permalink

      The confusion between the inductive and deductive processes is endemic in a lot of research, particular the social sciences.

  137. Posted Feb 22, 2015 at 5:25 AM | Permalink

    OK, let’s look at some individual models and how they deviate from the chosen HadCRUT4 surface temp.

    Here I have dropped sigma of the filter to 3y to get some data around 2000. It still does the job. So this is ‘smoothed’ rate of change in the model – ‘smoothed’ rate of change in hadCRUT4.

    In essence, what M&F were doing with a crap filter.

    I just grabbed a few models without any attempt assess what was what. This needs to be more methodical, it’s just first run to test what comes out. Some models: CSIRO and HadGEM have multiple runs.

    So what can we see?

    The all are badly wrong at the start of the record failing to get the cooling at the end of the 19th c. This is the biggest deviations but also the record is less reliable that far back.

    They pretty much all miss the early 20th c. warming up to 1940. They all miss the cooling that Hadley centre injected into the climate record and most others have since adopted. Whether that makes the models wrong of speculating frigging with the data wrong could go either way.

    They are all ( with one exception in what I picked ) grossly over estimating volcanic forcing and thus over estimating the effect of a lack
    of volcanic forcing.

    If we look at the lest half of 20th c. , the periods in between the major eruptions show marked and progressively worse excess warming.

    In short they over-estimate volcanoes and over-estimate the other prime forcing in the models : AGW.

    Like I’ve been saying for a while they are over-sensitive to ALL radiative forcing.

    Now, let’s list what they get about right:

    …. err ….. I’ll get back you on that when I find something.

    • Posted Feb 22, 2015 at 5:38 AM | Permalink

      BTW, the one exception was CMCC-CMS, which appears not to have any volcanic effect and thus show excess warming at each eruptions since it fails to follow the data. This one is showing slight relative cooling where the end of the surface data warms after Pinatubo. There is nevertheless a long term rise in excess which seems similar to the rest.

      If someone knows of a low sensitivity model it would be good to have at least one to compare to but it seems that ALL models that get submitted to CMIPS have high sensitivities and the defects of their rendition of the historic period seem to outweigh the the limited inter-model differences in sensitivity.

      In that respect M&F _may_ be correct.

      This simple reflects selection bias in what gets into CMIP.

      • Posted Feb 22, 2015 at 4:58 PM | Permalink

        inmcm4 is a low sensitivity model. GFDL-ESM2-G and GFDL-ESM2-M also have low transient sensitivities.

    • R Graf
      Posted Feb 22, 2015 at 10:56 AM | Permalink

      Greg, Are the adjusted forcings time series published for these models? If so, could you or someone be able to plot them against the temps and evaluate variability? Why wouldn’t this have already been done? How were the error bars for the future forecasts created? Why would M&F not just evaluate variability in order to validate the error bars? Why go into feedbacks?

      What, in your opinion is causing the general behavior of the models to deviate from observed in a 20-year sin wave? Why are 90% of models out of phase in the 20s and 30s and then 90% in phase by 1970? Were the models created tuned with aerosols and other changing variables through the time periods to attempt to track the observed record?

      • Posted Feb 22, 2015 at 11:55 AM | Permalink

        I don’t believe in “sine waves” any more than I believe in “trends”. Neither model should be fitted without a good excuse.

        Bear in mind the gaussian-derivative sigma=3y will take out most variability below 10y.

        what I think is seen in those plots, as I said, is over reaction to volcanoes and thus over reaction to the gaps. If a model dives too low in the time series ( leading to neg. excursion in dT/dt ) it will have to pop back up if it is to match overall, which they generally do, ie +ve excursion in dT/dt.

        Both the dip and the bump is due to over sensitivity to VF.

        If we try to take that out, by eye, what is left is steady rise in dT/dt. ie an every increasing deviation from hadCRUT. in T(t) and accelerated warming above and beyond hadCRUT.

        The most obvious cause of that would be over sensitivity to AGW.

        Taken together then, over-sensitivity to ALL radiative forcing.

        Bottom line: model are over-sensitive .

        To be sure the models are tunes. There are a multitude of semi-free “parameters” that allow this to happen.

        Why they miss the early warming period is another question. Either there is some fundamental variability that the models have no knowledge of, or the surface data is in error. Both merit equal consideration.

        Now I raise that issue again because as I pointed out several years ago in an article on Judith’s site hadSST3 has a huge 0.5C step cooling stuffed into it on what I consider speculative grounds. An issue that our host did much to bring into the open a decade ago.

        Most other SST records have now adopted something similar. ICOADS SST shows a much more monotonic rise that does not fit the CO2 diatribe. This adjustment is at best speculative.

      • Posted Feb 22, 2015 at 11:57 AM | Permalink

        ” Are the adjusted forcings time series published for these models?”

        CO2 is reasonably well known as is AOD , at least since 1980. The scale of those forcings is still very much up in the air.

    • Frank
      Posted Feb 22, 2015 at 11:41 AM | Permalink

      climategrog, I posted the link to the ECS, TCR of models: http://eprints.whiterose.ac.uk/76111/22/JGRA_PFjgrd50174%5B1%5D_with_coversheet.pdf .
      The models in your figure… the greatest TCR has HADGEM2 with 2.5, the smallest is GFDL E SM2M with 1,3. Maybe you also try inmcm4 with 1.3 and a great one: FGDALS-S2 with 2.4. What do you think about the sum of residual-squares to the observations for every model so you can compare each other with one value?

      • Posted Feb 22, 2015 at 2:04 PM | Permalink

        Thanks Frank, good suggestions.

        I’m having trouble working out what is what in the four “emsembles” of the HadGEM CMIP5 file.

        Do you know where I can find doc of what they represent?

        • Posted Feb 22, 2015 at 5:09 PM | Permalink

          HadGEM2-ES has four realisations for its historical simulation: r1i1p1; r2i1p1; r3i1p1; and r4i1p1. ‘r’ is the run number, normally from 1 up; i is instance – it should be 1; p is physics – if p > 1 it uses a non-standard model parameterization. The 4 runs differ as they spawn off the preindustrial control run at different dates (shown, quite often wrongly, by the branch_time attribute of the historical run netcdf file). The trends can differ a fair bit between runs, as model internal variability is significant.

          Two or three models have 10 historical runs, many have 3, 4, 5 or 6, and some have only one.

          Ignore FGOALS-s2: its historical simulation was faulty and it was withdrawn from the main CMIP5 results.

        • Posted Feb 23, 2015 at 12:46 AM | Permalink

          Thanks for the help on this Nic.

          Per M&F, I have not been using “historical” runs which end earlier. They used rpc4.5 which extends the last few years of the data, without explaining why they chose that “pathway”.

          rpc4.5 corresponds to emissions being reduced and then peaking in 2040. I am not aware of any reductions happening so far that would make that the best choice. I have been using rpc8.5 which seems a better choice for “business as usual” that is the current reality.

          This only affects the last few years but it is just the section that is of key interest when arguing about the “pause” and its significance.

          Do you see anything wrong with that approach?

          This seems like just another of those little ‘tweaks’ that help them arrive at their conclusion.

        • Posted Feb 23, 2015 at 2:05 AM | Permalink

          re HadGEM2; can I then take all four runs as model output to assess sensitivity?

          It seems that CSIRO model has some specific test runs like volcano only; 2xCO2; 4xCO2; etc., in the 10 runs they they submitted to CMIP5.

          I’m still trying to work out which of those 10 is the actual model run with everything turned on. It seems to be run 10 but I’m not certain about that.

        • HAS
          Posted Feb 23, 2015 at 4:06 AM | Permalink

          cllimategrog, on the RCPs I’d just been looking at them for another matter. http://link.springer.com/journal/10584/109/1/page/1 gives the background on each series, and http://link.springer.com/article/10.1007/s10584-011-0148-z an overview. I think RCP8.0 would be regarded as high – it assumes technology improves at half the rate it has been going at recently.

          There is a bit of discussion on BAU here http://climatechangenationalforum.org/what-is-business-as-usual/

        • Posted Feb 23, 2015 at 11:10 AM | Permalink

          The RCP runs start in 2006; up until then there is just the historical run, which the RCP runs spawn off; M&F stitched the RCP4.5 run onto the end of the historical run, which is standard practice. You’ll have to do the same if you use actual raw CMIP5 data.

          GHG concentrations (as opposed to emissions, in the case of CO2) have increased somewhere between RCP4.5 and RCP8.5 so far. RCP8.5 isn’t really ‘business-as-usual’. It is the 90th percentile of all scenarios seriously considered, for each forcing agent separately. Hence, e.g., its amazingly high CH4 rate of increase.

          Re HadGEM2-ES: the average of the 4 runs should give a better measure of sensitivity than any of them.

          All 10 CSIRO-Mk3-6-0 historical CMIP5 runs are ‘p1’ and have standard physics and forcings. Same for the 10 rcp4.5 runs.

  138. Carrick
    Posted Feb 23, 2015 at 2:01 AM | Permalink

    Greg (aka Greg Goodman aka climategrog) has been commenting on the appropriateness of a running trend () filter. For example here he says:

    It can be seen that the negative inverting peak is almost 50% of the main peak that we are interested in.

    It’s worse than we though (TM) !

    I had pointed out above that the spectrum for climate signals is 1/f to some power, so this attenuates the relative amount of signal you get in the first lobe as compared to the pass band of the filter.

    However, it’s worse than I thought too!

    Nick’s transfer function (as I just pointed out on his website) is not dimensionless:

    It is the ratio between output temperature trend and input temperature amplitude. That’s why there is a slope that is proportional to frequency. If you want to convert it to a dimensionless number you need to divide by 1/f.

    So in practice, the issue that Greg is concerned about translates to only a few percent. As long as you don’t engage in wiggleology (that is, overanalyzing small, non robust features of the analyzed signal), running trend should be “okay”. (In practice, I’d suggest using a Hann-tapered window, rather than a boxcar aka rectangular window).

    • Posted Feb 23, 2015 at 3:08 AM | Permalink

      Yes, sorry about the name flip. I keep trying to post as Greg Goodman but every time I access my WP account to upload a graph it flips my ID here “climategrog”. Often I don’t notice until I see the comment posted. I’m sure everyone’s got the idea by now.

      There are some oddities about Nick’s formula which I have also pointed out over there. However, the response will be like his red line. It is basically the sinc fn which is FT of the rectangular sliding window and the linear ramp which is the response of the diff operation.

      Your assumption about the spectrum of data is simplistic and inaccurate, IMO. It is not simply 1/f, which I guess you are deriving from the autocorrelated nature of random variations of temperature.

      One of the dominant features of the data and the modelled responses is the two massive volcanoes. They occurred about 11 years apart. With the successive recovery periods that makes at least 25y that is dominated by a circa 11y signal. That is close to the peak of the negative lobe in a 15y running average.

      I don’t see any reason not to use the 3y gaussian I chose above. The peak in the combined “smoothed” diff is about 20y and it attenuates 50% at about 10y ( without INVERTING the signal).

      There are obviously dozens of alternative filters but this should be analysed as being a low-pass filter, not a “windowing function” which is something else.

      This is not a finite series which we are further distorting to do an FFT. We are free to chose the extent ( duration ) of the window and chose a kernel that provides a suitable low-pass response.

      If you want to use Hann bell instead of a gaussian bell the result would not be much different. My 3-sig 3y gaussian kernel is 18y wide.

      • Carrick
        Posted Feb 23, 2015 at 4:47 AM | Permalink

        Greg:

        Yes, sorry about the name flip. I keep trying to post as Greg Goodman but every time I access my WP account to upload a graph it flips my ID here “climategrog”. Often I don’t notice until I see the comment posted. I’m sure everyone’s got the idea by now.

        To be clear, I wasn’t “calling you out”. I was making sure people realized it was the same Greg with all three labels.

        Your assumption about the spectrum of data is simplistic and inaccurate, IMO. It is not simply 1/f, which I guess you are deriving from the autocorrelated nature of random variations of temperature.

        It’s 1/f to some power (the power being typically greater than one for the PSD). That there is less energy in higher frequency bands than lower ones is a well known result. One way to look at this is through spectral periodograms of course.

        I don’t see any reason not to use the 3y gaussian I chose above. The peak in the combined “smoothed” diff is about 20y and it attenuates 50% at about 10y ( without INVERTING the signal).

        My comment was restricted to the repercussions of the way they actually did it. What they did resulted in an undesirable amount of noise to leak into the filtered signal, but I disagree the leakage is 50% of the main signal (even 5% is an issue if you can readily do better).

        I’ll grant that you definitely can do it better than M&F did it. In particular, I don’t see any problem in using a variety of filters. Using rectangular window only is probably more risky than using Gaussian.

        I haven’t repeated M&Fs analysis, but in general there isn’t a huge difference between any of the tapered-window OLS filters, as long as you keep the width of the frequency domain main lobe approximately constant. This includes truncated Gaussian as well as Blackman, Hann and Welch tapers.

        If you don’t compensate for the increase in the frequency domain main lobe size, as you increase the amount of taper, you reduce the amount of smoothing performed by the filter.

        As to why you might not want to use Gaussian with too small of of a sigma here: When you increase the amount of taper, you have to increase the width of the window to compensate for it. When you do this, you reduce the available range of points for the filtered data.

        So as you said on Nick’s blog, there are trade-offs.

        • Posted Feb 23, 2015 at 4:56 AM | Permalink

          Thanks Carrick, it seems we are pretty much in agreement.

          One part I don’t follow tho’:

          If you don’t compensate for the increase in the frequency domain main lobe size, as you increase the amount of taper, you reduce the amount of smoothing performed by the filter.
          /blockquote>

          Could be clearer about “amount” of smoothing. The attentuation varies across the band, the roll-off is different for different filters. or are you talking about scaling of attenuation at the peak?

          I’m not clear what you want to compensate for or how you wish to do it.

        • Carrick
          Posted Feb 23, 2015 at 10:20 AM | Permalink

          Greg, what I was trying to do was fix the 3-dB knee point for each transfer function associated with the windowed LSF trend filter (polynomial order 1), by estimating the lobe width (in frequency) of the filter. I was also using HadCrut 4 temperature to experiment with, which has the high-frequency roll-off I was describing above.

          My point is that the roll-off is steep enough for all of the filters + the signal— except rectangular—and all of the filters are flat in the band-pass region, you effectively recover the same curve (within perhaps a few tenths of a dB).

        • Greg Goodman
          Posted Feb 23, 2015 at 11:46 AM | Permalink

          I’m not sure you can do this sort of thing. Half of that freq. resp is the diff. operation, if you start poking it up and down to make it flat ( which seems to be what you are describing ) it won’t be the diff of temp any more.

          You have to decide what you need the filter to do then choose a filter which satisfies your criteria. Usually you have to accept some compromises from what filters are available and how much data you have etc.

          What you say about the spectrum may be reasonable over 120y but it is not true of the last 40. An that matters, you cannot afford to be inverting a major part of the signal. How the models respond to the volcanic forcing is a crucial factor, I really don’t want it screwed up by the processing.

        • Carrick
          Posted Feb 23, 2015 at 1:04 PM | Permalink

          Greg, Actually it worked very well. The proof is in the pudding. I can post a figure later today if you want to see it.

  139. Posted Feb 23, 2015 at 2:27 AM | Permalink

    I’ve just been looking at their abstract again. In essence, the key result is contained in one sentence:

    For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends or, consequently, on the difference between simulations and observations.

    Now if we are to believe that the “simulated climate feedback” that their method extracts from the models is a reasonable representation of way the model behaves ( and not a spurious artefact of their method ) this means that over the entire hindcast period plus the ten years or so since, the sensitivity of the model has no bearing on the accuracy of the numbers it produces. The errors in the trends of high sensitivity models are indistinguishable from those of the low sensitivity models. There are other apparently “random” factors which so outweigh the effects, that a factor of two in TCS becomes undetectable in the results.

    However, the question of sensitivity of the climate is the key point. It is what makes the difference between it being a small beneficial warming and an OMG AlGorithmic 6m of sea level rise.

    What the authors have shown is that the whole range of models referred to by the IPCC are not fit for the primary purpose to which they are being applied.

    They tell us nothing, according to Marotzke and Forster, about the key question of the climate (non-)debate and cannot be used to predict whether future temperature changes in response to changes in emissions will be benign or catastrophic.

    • Posted Feb 23, 2015 at 5:54 AM | Permalink

      I agree completely with your last comments. Marotzke and Forster must be wishing they never published their defective paper, because its done irreparable damage to the credibility of the models, and that’s definitely not going to endear them to the climate glitterati.

  140. Greg
    Posted Feb 23, 2015 at 6:22 AM | Permalink

    OK, I have a provisional grouping of high and low TCS models according to Forster & Gregory 2013. ( In fact I think this basically indicates sensitivity to GHG rather than more generally sensitive to all radiative forcing ).

    low group: CMCC-CMS GFDL-ESM2M GFDL-ESM2G inmcm4

    hi group: ACCESS1-0 HadGEM2-ES GISS-E2-H_p1 GISS-E2-R_p1 CanESM2 CSIRO-Mk3-6-0_mem9

    To make it legible I have taken the mean of the low-pass filtered dT/dt deviations from hadCRUT4 for each group.

    So pre-WWII it’s pretty even, probably in agreement with M&F. Post-WWII hi group shows more variability, especially since Mt Pinatubo.

    Clearly the low sensitivity models come much closer to replication of the “pause” in hadCRUT4 since Y2K.

    As I have been saying all along, the high group seems too sensitive to both volcanic cooling and GHG warming.

    • Greg
      Posted Feb 23, 2015 at 6:28 AM | Permalink

      As a side note, all the models across the board seem to have the same massive negative deviation around 1960 and positive deviation centred on 1970, irrespective of sensitivity.

      Figure 2b in the paper also shows this to be a problematic period.

    • davideisenstadt
      Posted Feb 23, 2015 at 7:01 AM | Permalink

      Thank you for your work…I know its time and effort and a part of your life and what not…
      it looks like the models systematically miss the warming period of the ’20s and 30’s…as well as the cooling period of the ’60s and 70’s…so it seems that these models misestimate variance on a decadal scale?

      • Greg
        Posted Feb 23, 2015 at 7:28 AM | Permalink

        Indeed, as I commented above, what this paper is really pointing out is that the divergence problem is not a new one for the models: they’ve always been that inaccurate and unreliable .

        It’s just that 97% of climate scientists have completely misunderstood what the models show. 😉

    • R Graf
      Posted Feb 23, 2015 at 1:09 PM | Permalink

      Greg, I second David’s remark. And for all of us who are cheering you on can you describe your test and what its result will determine?

      Has M&F supplied their data?

    • R Graf
      Posted Feb 24, 2015 at 8:53 PM | Permalink

      Greg, I re-read M&F tonight and found they do admit that the models all seem to overestimate volcanic effects.

  141. Kenneth Fritsch
    Posted Feb 23, 2015 at 10:30 AM | Permalink

    I attempted to find an alternate estimate of the deterministic trends in the CMIP5 models over the Historical model period 1861-2005 and the extended period 2006-2014. I used the RCP4.5 CMIP5 models that in turn used the historical inputs from 1861-2005 and the RCP4.5 scenario from 2006-2014. I used a spline smooth (SS) with df=9 and the default spar from the function smooth.spline in R to extract an estimated deterministic trend. I also used a Singular Spectrum Analysis (SAS) to estimate a deterministic (or at least a secular trend) from the CMIP5 RCP4.5 model series. I used the first 2 principle components to represent the trend in SSA. In the end I used SS because it gave the better fit to the model Transient Climate Response (TCR) – details to follow. I used a start date for my analysis of 1865 in order to get a more equal division of the time periods used in my following analysis. For comparison to the observed climate I used the Cowtan-Way Infilled HadCRUT4 temperature series. All trends for a given time period were determined by subtracting the mean of the first 2 yearly temperatures from the mean of the last 2 yearly temperatures in the series.

    I make no claims for this exercise except to say that it is strictly empirical from the start and would require some later connections to physical attributes in order to gain credibility. It does have advantages in not assuming linear relationships for the deterministic trends. I do not use overlapping trends and assume all the statistical baggage that could entail. I also used only one model run where the model had multiple runs and always arbitrarily selected the first run from multiple runs.

    The first 4 links below show the CMIP5 Models and CWHadCRUT4 series and SS derived trends for 1865-2014. There are some similarities in the series and trends but the late series upward swings show differences and the noise or natural variations vary over the models in both structure and magnitude. The observed CWHadCRUT4 series in the 4th link at the end can be used as a reference to the real world.

    The fifth link below shows histograms of the model and observed (red vertical line) SS derived trends for the periods 1865-1919, 1920-1974 and 1975-2014. The latter period was selected because it should be the period of the greatest forcing due to GHGs and where the natural variations are becoming a smaller part of the series. For the 1920-1974 period it can be seen that the observed trend is toward the middle on the high side while for 1975-2014 the observed series trend has become one of the smallest. The trends of these series increase in the later periods and the increased spread in trends shows dramatically in the 1975-2014 period. These tendencies seen here support the argument that the models vary greatly in handling the GHG and/or aerosol forcings and that are, on average, producing net responses different than the observed climate, the more the climate proceeds into a GHG forced time period.

    The sixth link below shows the progression of 15 year trends and reinforces what is seen in the longer trend periods with the observed series very near the bottom of trends in the 2000-2014 period. The 2 models with lower trends are MRI-CGCM3 and immcm4 which if you go back and look at the model series graphs have very non-descript series that show nearly a straight line trend over most of the 1865-2014 period.

    The seventh link below is a plot of the 1975-2014 CMIP5 model trends versus TCR where the TCR values were available and has a correlation of 0.63. Here is where I wanted to do a regression of trend versus TCR and some proxy for the aerosol forcing. The CMIP5 models have used all the same aerosol concentrations over the historical period but the manifestation of those concentrations on the aerosol optical depth in the models, where that data is available for some part of the world, is quite different (see Figure 9.29 Chaper 9 from the AR5 Evaluation of Climate Model review). I ended this analysis without pursuing a proxy for aerosol forcing but found that the simulated southern Indian Ocean sub-thermocline temperature might be a reasonable choice. Better still I think one comes from here where troyca states at his blog: “This is because the difference in the NH/SH ratio in the historicalGHG simulation and that of the historical simulation implicitly combines the actual aerosol forcing and the “enhancement” of this forcing (rather than trying to estimate these highly uncertain values separately), which is even more directly relevant to the degree of TCR bias. Indeed, if we look here, there appears to be excellent correlation:” What I would hope to accomplish with such a regression as suggested above is to improve the trend versus TCR relationship by accounting for the models differences in aerosol manifestations. This would in turn support my use of the spline smooth to estimate the deterministic model and observed trend.

    CMIP5 Models and CWHadCRUT4 Series and SS derived Trend for 1865-2014:




    Long Term Trends CMIP5 Models and CWTadCRUT4 from SS:

    15 Year Trends CMIP5 Models and CWHadCRUT4 from SS:

    Deterministic Trend versus TRC for Some CMIP5:

    • Greg Goodman
      Posted Feb 23, 2015 at 2:35 PM | Permalink

      “I make no claims for this exercise except to say that it is strictly empirical from the start”

      Except that if you used Cowtan and Way it isn’t.

      • Greg Goodman
        Posted Feb 24, 2015 at 1:26 AM | Permalink

        Sorry, I think I misunderstood what you meant by empirical. I think you meant is was just as untested and home spun as the M&F paper. At least you recognise the fact and say it needs more analysis. They published in a high visibility journal.

        I’m doubtful that taking first and last point does much for the signal/noise ratio, but at least you get a correlation that points to the dependency on TCS that most, even Pekka, seem to expect to be there.

        May be worth improving on.

        • Kenneth Fritsch
          Posted Feb 24, 2015 at 10:20 AM | Permalink

          That would be TCR.

          How would one give a measure of a non linear trend? I used the mean of 2 last points minus the mean of the 2 first points.

        • Kenneth Fritsch
          Posted Feb 24, 2015 at 11:32 AM | Permalink

          I should have added above that if we assume that the deterministic trend has been extracted from the signal and was done so it attempts to remove the noise then where is the signal to noise?

        • Posted Feb 24, 2015 at 12:37 PM | Permalink

          What I meant about S/N is that first and last points seems pretty crude and does not take advantage of the amount of data present to improve S/N.

          OLS seems appropriate, the problem is with using a slider as a frequency filter which sucks big time.

          “The latter period was selected because it should be the period of the greatest forcing due to GHGs and where the natural variations are becoming a smaller part of the series.”

          That is exactly the kind of thinking that has got us into the current mess. It basically assumes, a priori, that that rise is due to AGW. In view of the early 20th c. rise that is totally unfounded.

          I’m not sure that you’ve shown anything more than that more sensitive models rise quicker, though I don’t have time to analyse you method in detail.

        • Kenneth Fritsch
          Posted Feb 24, 2015 at 8:08 PM | Permalink

          “That is exactly the kind of thinking that has got us into the current mess. It basically assumes, a priori, that that rise is due to AGW. In view of the early 20th c. rise that is totally unfounded.”

          Greg, there is certainly a physical basis for the measured increases in GHGs in the atmosphere during the period I noted and corresponding increases global mean temperatures. That is hardly an a prior. Please explain yourself.

    • Kenneth Fritsch
      Posted Feb 28, 2015 at 5:56 PM | Permalink

      In my latest attempts to decompose the CMIP5 model temperature series such that I can extract the deterministic trend, I have looked at the regressions of the trends derived from the smooth.spline function in R versus the Transient Climate Response (TCR). TCR should relate, over some length of time period, to a deterministic trend in the climate model series. Recall that the spline smooth (SS) and Singular Spectrum Analysis (SSA) trend extractions from earlier posts gave similar results, but that SS correlated somewhat better to TCR over the 1975-2014 period. All trends were determined by subtraction of the first 2 years in the SS series from the last 2 years. All model data were from the RCP4.5 CMIP5 models (33) that had published TCR values. The RCP4.5 series are under Historical model conditions from 1861-2005 and then under RCP4.5 scenario conditions from 2006-2100.

      The results of these regressions (correlations) are given below for the time period and length in years noted. The first correlation was calculated by using the shorter time period to determine the SS trend while the correlation in parenthesis was calculated by first extracting the trend for the entire 1861-2100 period and then calculating the trend for the given time period.

      240 years from 1861-2100: Correlation of SS derived Trends versus TCR=0. 84 (0.84).

      95 years from 2006-2100: Correlation of SS derived Trends versus TCR= 0.69 (0.73).

      40 years from 1975-2014: Correlation of SS derived Trends versus TCR = 0.63 (0.67).

      40 years from 2006-2045: Correlation of SS derived Trends versus TCR = 0.63 (0.77).

      15 years from 2000-2014: Correlation of SS derived Trends versus TCR = 0.60 (0.77).

      The correlation between the SS trends and TCR increases with the length of time period used when the shorter time periods were used to determine the trend, while when the trends were determined using the entire 1861-2100 period, the correlation is considerable less dependent on the period length.

      The question then is what do these results mean with regards to the method used in extracting a deterministic trend that is determined mainly by GHGs – as would be the assumed case for TCR? A more direct question would be does the extracted deterministic (at least secular) trend contain effects from natural variability and is that what causes the correlation to be reduced (somewhat) using shorter time periods? I think the latter method of extracting a deterministic trend, in that it is not very length dependent, indicates that the trends are closing in on the value that would be predicted by the TCR value for the model and therefore a good measure of the same effects from which TCR is derived, i.e. GHGs in the atmosphere. I plan to go back and redo these calculations using both SSA and EMD (Empirical Mode Decomposition).
      Meanwhile I think if this approach is reasonable that it shows that the M&F paper has added some considerable portion of the deterministic part of the CMIP5 model series to the natural variability part. Counter to what M&F found in their paper, the strong SS trend and TCR correlation in this exercise shows that the variation in TCR values for the individual CMIP5 models is a good predictor of a deterministic temperature trend even for short periods of 15 years.

      A feature of the deterministic trend that I was not able to investigate at this time was to account for the effects of aerosols to the SS trends and TCR value by adding a aerosol proxy for each model to the regression. All models used the same aerosols for the historic period but the effects still vary considerably from model to model as shown by comparing the aerosol optical depths for some models.

      • Kenneth Fritsch
        Posted Mar 1, 2015 at 12:08 PM | Permalink

        I went back and used the SSA (Singular Spectrum Analyses) principle components 1 and 2 to extract the trends for CMIP5 models using the entire 1861-2100 period and then using the difference in the mean of the first and last 2 years in 5 periods to calculate trends for all 42 models. Those results are in the linked table below. Obviously the extracted trend is not linear otherwise all trends for all time periods would be the same. I then regressed the trends versus the TCR values for the 33 CMIP5 models where that value was available – as I did when using the spline.smooth (SS) in my previous post. The correlations are listed below and show as good or better fit for SSA over SS and less dependence on the length of the time period.

        SSA trend to TCR correlation for 1861-2100 =0.80
        SSA trend to TCR correlation for 2006-2100 =0.72
        SSA trend to TCR correlation for 2006-2045 =0.72
        SSA trend to TCR correlation for 1975-2014 =0.80
        SSA trend to TCR correlation for 2000-2014 =0.79

        By this analysis I believe I have shown that with an effort to use an analysis method that is much simpler and direct M&F would have reached very different conclusions.

        • R Graf
          Posted Mar 1, 2015 at 11:14 PM | Permalink

          Ken, Would you then say that as a result of the above you have shown that there are no valid grounds for the assertions made in the paper that ‘For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends’ and that ‘The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded’?

          You did a commendable job and a decent amount of work Ken. Is there any reason for not duplicating your method for M&F’s time period, 1900-2012? Do you have any opinion as to why M&F’s method could not correlate temperature trends to TCR? Any indication that M&F’s result was directed with a pre-intention? Am I just missing something or how could TCR not be directly related to T trend if the forcing and climate resistance were derived from delta T from a linear equation? Do you plan to post your results and conclusions on CLB? I would and intend to compose my final thoughts.

        • Kenneth Fritsch
          Posted Mar 2, 2015 at 3:05 PM | Permalink

          R Graf, what I presented here is an alternative way of attempting to extract the deterministic trend from the temperature signals which I think is more straight forward then what M&F attempted. I think many of those doing these investigations look for a method that can support an existing position on these matters. They are not dishonest but are willing to stop when the analysis results are favorable to their position and without looking further. How open the authors might be to alternative methods and results come out of how they defend their work.

          In my case as a layperson I post these analyses looking for criticisms from those more knowledgeable and skilled in these matters. I do the analyses primarily out of my own curiosity. As for the SSA analysis, I need to go back now and look at different windows to make sure I can minimize end effects. None of these methods are surefire in separating the noise (natural variation) and deterministic trends. That is why I was looking for something physical, or at least independent, like TCR, to hang my hat on. I am not at all sure I have accomplished that at this point.

        • Kenneth Fritsch
          Posted Mar 3, 2015 at 4:35 PM | Permalink

          I have looked at 3 windows (L) using Singular Spectrum Analysis (SSA) to extract trends from CMIP5 RCP4.5 and 3 observed series of global mean temperatures. Trends were calculated as noted previously are assume to be an estimate of the series deterministic trend. The trends for two time periods 1975-2014 and 2000-2014 are compared using the three window values of L= default (1/2 the series length), L=30 and L=15. I also calculated the correlations obtained for five time periods of the trends versus the model TCRs for 33 models where the TCR values are published. The five periods where 1861-2100, 2006-2100, 2006, 2045, 1975-2014 and 2000-2014.

          The results in the link below show that the default windows setting in the R function ssa gives probably overall the best correlation of model trends to TCR values and least dependence on length of the time period. When the setting of L decreases it can be seen that the observed trends proceed from the middle of the pack towards the bottom in the 2000-2014 period. Not shown here but what occurs with the highest setting of L is that the observed temperature pause is mostly ignored as part of the extracted trend while at the lowest setting of L the pause shows as a plateau and finally with the intermediate setting the pause is represented in the trend as somewhere in between. On the other hand with the longer 1975-2014 period the observed trends reside in the lowest trend area for all settings of L. A window of 30 years (L=30) might be a reasonable setting for L since that is the sometimes considered the minimum period of time to use in comparing climate results. That setting continues to provide reasonably high correlation of SSA trend to TCR values.

        • Posted Mar 3, 2015 at 5:40 PM | Permalink

          Ken

          Well done for performing this analysis and posting your results. I will study them with interest and comment further if I have any queries or points of interest to raise. I’m not an expert on SSA, but I have read some criticism of its use in Ring et al (2012): Causes of the global warming observed since the 19th century. I can’t recall what the objection made to its use was.

        • R Graf
          Posted Mar 3, 2015 at 8:58 PM | Permalink

          It sure looks like TCR is deterministic of temperature trend. Nice work.

          I have to ask again how could TCR not affect temperature trend? Aren’t the CMIP5 archive values for ERF, alpha and kappa (and TCR and ECS) the result of Forster et al’s diagnosis based on analysis of the abruptly forced model runs? Didn’t they derive all at the same time from the same equation? And, is it just me or in their paper do M&F go out of their way to describe how they obtain the values make it seem that they come from the modelers? And, why do they pardon themselves for using using ECS? What are they talking about when they say:

          “To avoid confounding the uncertainty in model response with the uncertainty from CO2 forcing, we use a (alpha) and not ECS to characterize model response.”

          Doesn’t ECS contain both alpha and kappa? Why does breaking the two apart make less uncertainty?

          If nobody answers I will keep reading as it seems like the earlier papers with Gregory and Taylor were much more willing to inform the reader.

        • Frank
          Posted Mar 4, 2015 at 3:06 AM | Permalink

          I tried another approach, similar result: http://kauls.selfhost.bz:9001/uploads/evaluation%20of%20CMIP5.pdf .

        • Kenneth Fritsch
          Posted Mar 4, 2015 at 8:42 PM | Permalink

          R Graf:

          ECS was previously obtained by replacing the 3D ocean component with a slab ocean in the models in order to eliminate the slow temperature equilibrium with the ocean. Otherwise the models are required to run for 1000s of years to equilibrium and that makes for very expensive computing times. The recently published ECS values come from the abrupt 4XCO2 experiment by regressing the change in the TOA net radiation against the change in temperature (after adjusting the temperatures and TOA net radiation using the pre industrial control model runs). The previous method would have eliminated kappa by eliminating the slow ocean equilibrium problem. The 4XCO2 experiment includes the ocean and kappa which is related to heat diffusion. The ECS values are in effect obtained by extrapolation of the linear regression line to TOA net radiation = 0. Alpha is the slope of the line and F is the intercept on the y axis.

          In the paper describing the regression to derive the ECS values for CMIP5 models deals with alpha and not kappa in the regression. That paper does however present an equation similar to that in the M&F paper and shows how temperature change correlates with alpha, kappa, climate resistance (alpha+kappa), Adjusted forcing and Adjusted forcing divided by climate resistance with the last variable having the best correlation.

          I do not believe ECS depends on kappa but rather only determines the time to reach equilibrium. TCR should be affected by kappa and I would think kappa would be a determining factor in the difference between TCR and ECS.

          The Forster paper discussing the ECS regression and kappa is linked here:

          http://onlinelibrary.wiley.com/doi/10.1002/jgrd.50174/epdf

        • R Graf
          Posted Mar 5, 2015 at 1:00 AM | Permalink

          Ken,

          Sorry I can’t answer your question but thanks for clearing up one of mine. I guess my confusion about ECS stemmed from the analysis of the ocean equilibrium to find it, but now I understand they are just eliminating ocean effects in order to isolate equilibrium atmospheric feedbacks, alpha. I have also read they can estimate ECS without the 1000-yr model settling and they label it ECS but its “effective climate sensitivity.” This does not clarify though why M&F bring up a point: “To avoid confounding the uncertainty in model response with the uncertainty from CO2 forcing, we use a [alpha] and not ECS to characterize model response.”

          To give M&F the benefit here, are they saying they adjusted alpha for each 15-year interval and did not use ECS since ECS is based on a doubling of CO2 and the historical period was in fact a slowly approaching 2X CO2 at a rate of 1% increase per year, thus necessitating unique values of alpha for each period? If they made adjustments it is not clear. If they did not then their analysis would misinterpret the models, all of which were based on 1% growth in GHG. Do you concur?

        • Kenneth Fritsch
          Posted Mar 5, 2015 at 2:40 PM | Permalink

          R Graf: Alpha and kappa remain constant for an individual models throughout the time period covered in M&F but those values do vary from model to model. Those values are listed in the link to the Forster paper I gave above.

          ECS is an emergent model parameter and is assumed to remain constant for the duration of the time period used in M&F. The historical CMIP5 model runs used in M&F all have the same GHG levels as inputs from observed measurements. Like aerosol forcing in the models, evidently the GHG forcing as manifested by individual models can vary even when the levels of aerosols or GHGs are the same. That would make alpha a more fundamental variable than ECS, I guess. Look at the Forster paper link again and at Table 1 where you will see that all the feedbacks that add to give alpha are listed. Those are Long Wavelength Clear Sky, Short Wavelength Clear Sky, and Cloud Radiative Effect (derived).

      • R Graf
        Posted Mar 6, 2015 at 1:34 AM | Permalink

        Ken,

        Yes, I thought M&F were using the table values from Forster 2013 and possibly a few others that have been contributed and accepted by CMIP5 as “official” model archive values, which was my point that there is some degree of circularity in Forster then borrowing back these values in 2015, obliquely stating they were obtaining from the model archive, albeit with footnote crediting his own 2013 paper. Perhaps this is what struck Nic too originally.

        I feel paper is written in an unnecessarily abstruse fashion which, intentionally or not, may conceal basic flaws from even those expert in the field. Ken, when you are referring to aerosol forcing differences in models I do not see your connection of logic to M&F making alpha a preferred value over ECS, and needing to make a point of it “to eliminate uncertainty”. From my reading aerosol’s radiative effects are combined with all other known radiative effects to define Effective Radiative Forcing, ERF. I understand from Forster the definition of adjusted forcings has evolved even in recent years toward considering all immediate effects as one bundle. The modelers I believe then look at increased vapor’s effect on cloud formation feedback and changes in lapse rate by on increasing convection efficiency as bundled into alpha. Finally, ocean temp equilibrium imbalance to surface air temp is considered kappa. These three categories are separated as they are different mechanisms to system temperature, being affected by time and temperature differently.

        In truth the models are much more complex, each being a unique recipe brew of guessed effect proportionality. The models have to account for things like the difference in black body radiation at noon in the tropics with heated stratospheric ozone causing a massive temp inversion at the TOA, making CO2 have negative radiative forcing due to increased emissivity.
        As the first sentences in Forster and Taylor 2006 admit: “With both the increase in computer power and a more complete representation of the many interactions in the climate system, climate models have become increasingly complex. Consequently, understanding their responses can often be just as difficult as understanding climate change in the real world.”

        I ask: if the whole object understanding through simplification, why then can so few people understand M&F’s controls, tests and conclusions? After giving it our best can anyone follow even the basic logic of this paper? Forster admits in 2013 that the modelers in CMIP3 were simply adding a reciprocal values or forcing to balance feedbacks to duplicate the 20th century presumed accurately recorded global temperature delta. I guess since that was a criticism of the past Forster wanted to make it clear that CMIP5’s models had little evidence of offsetting variables. This by simple logic explains why the models diverge. What is purported to be proved, however (and by what logic) by bundling all the models back together and running them through a smoothing low pass filter? I am eager for enlightenment.

        • Kenneth Fritsch
          Posted Mar 6, 2015 at 9:45 AM | Permalink

          Look at the part of the paragraph in M&F that precedes that quote you make. ECS=F2X/alpha and F2X is often simply referenced to a value of 3.71 W/m2, but the authors are saying that for the CMIP5 models that value ranges from 2.6 to 4.3 and thus adds uncertainty to the value of ECS.

          I find it worthy that those using climate models attempt to extract from a seeming black box some variables that can be used to differentiate the individual models. The chaotic nature of climate leads to internal variability (noise) that makes this differentiation more problematic, i.e. multiple realizations of an individual model will lead to different temperature trends over shorter periods of time. Extracting values such TCR and ECS should be attempts aimed at overcoming the chaotic variations in comparing models. In my mind if these values are reasonably different we have to say that the models are fundamentally different and therefore one can no longer combine models in looking for quantities as M&F did in attempting to allude to a variance of deterministic trends but rather attempt to determine which models, if any, reasonably well represent the observed climate.

          I know from my studies that the noise in some CMIP5 models temperature series is very different than that in others and further that the year over year variations in temperatures from model to model can vary significantly. Unfortunately I see those working in the modeling area more intent on avoiding showing fundamental differences in models than in narrowing the field to a potential few potentially valid models. This may well have something to do with the black box nature of the models, but when have you seen the output of these models compared with something like a Kolmogorov-Smirnov test.

        • R Graf
          Posted Mar 6, 2015 at 7:42 PM | Permalink

          Ken, thanks I get it now, 2XCO2 forcing is normally referenced by climatology to be 3.71 W/m2. But M&F are using each model’s directly diagnosed F2XCO2, the Y intercept from regressions of their abrupt CO2 forcing runs, rather than the common default value. It shows you are much more sage in climatology prose than I. Their bending to pat themselves on the back for using all the values derived from Forster 2013 rather than introducing oranges into their apples is I guess what got me confused.

          On your thoughts of worthiness of the model analysis endeavors, I absolutely agree the object should be to compare each model to the observed record and score accordingly. There is no sense in making each shotgun pellet trajectory equally important to study when you can focus on the character of the ones nearest the target. I think it is important to keep perspective that chaos exists only to the extent that one lacks understanding to predict events. Five-day weather forecasts would be viewed as absurd in the 1930s. Internal variability is simply the amount of unexplained phenomena. If you dare to widen the field of view to 1, 5 or 20ka the variability problem grows at every scale. It’s not nature’s chaos, its our still infantile understanding.

          It seems that M&F’s purpose was to circle the wagons around critics of the models rather than learn from the models because that might lead to somebody’s feelings getting hurt if they had to take their model out of the group. Everyone deserve a trophy today.

          A black box is what you get when you don’t clearly articulate output requirements to programmers.

        • davideisenstadt
          Posted Mar 6, 2015 at 9:48 PM | Permalink

          R Graf:
          yes…its not “noise” its unexplained variance” there is a big difference..

        • Kenneth Fritsch
          Posted Mar 8, 2015 at 8:35 PM | Permalink

          “I think it is important to keep perspective that chaos exists only to the extent that one lacks understanding to predict events. Five-day weather forecasts would be viewed as absurd in the 1930s. Internal variability is simply the amount of unexplained phenomena. If you dare to widen the field of view to 1, 5 or 20ka the variability problem grows at every scale. It’s not nature’s chaos, its our still infantile understanding.”

          I am not sure what you mean here by these comments but I think the chaotic behavior in climate is here to stay and will require treating it as a stochastic phenomena. The does not mean that we cannot attempt to separate those effects from deterministic ones. Climate models with multiple runs do and should show variability and that “noise” can be captured by an ARMA model. The problem with comparing climate model output (generally in the form of temperature trends) to observed and obtaining statistically significant differences is that an individual model run cannot time some of those noise excursions that can affect shorter term temperature trend. Further the observed results come from a single realization and we do not have nor can we obtain other realizations of it without representing the observed temperature series as an ARMA model. I suspect a number of researchers in the climate field are hesitant to apply stochastic models to that part of a temperature series that remains as residuals after removing an estimate of the deterministic trend.
          In the introduction to this thread, Nic Lewis stated that he did agree with the authors that the internal variability or that part of temperature series that can be represented as noise makes the comparison of model output to the observed difficult over 15 year periods when attempting to show statistical significance. I agree with that position and is why I have become more interested in the areas of model to observed comparisons that Nic Lewis has been studying, i.e. the deterministic values of ECS and TCR. After all the main issue of AGW is that part of climate (temperature here) for which man is responsible and that would be the forcing caused primarily by GHGs and aerosols. M&F made an attempt in their paper discussed here to make a separation of the variability in the deterministic trends and noise in these series but in my layperson’s view the attempt was weak and pretty much failed.

        • R Graf
          Posted Mar 9, 2015 at 8:50 AM | Permalink

          Ken,

          I think our only difference is in the definition of what is unknown and what is unknowable, location of an electron for example. I am sure we agree that there is a big difference between unpredictable and undecipherable. In M&F I think its a reasonable suspicion of most skeptics that their method is transferring kappa effects to be interpreted as internal variability. Looking from the skeptic’s eye generally the bias of M&F is in finding importance and influence with the suspect that brought one into the investigation, AGHG, and dismissing other factor significance or irrelevance. We are biased too in looking for only deterministic trends that are on the 15 to 100-year scale. This is why I believe that the IPCC is failing to give adequate weight into investigating paleoclimate, and other extreme, annual variability. Who believes that we have a good grasp on dynamics of either? And, why should we believe there would not be clues within both?

          In my last comment on Mar 7 below I realize now that M&F’s criticism of Nic in their last paragraph on CLB was for not appreciating their that their method handles variability of alpha and kappa better than his (which is unspecified). It’s still unclear to me whether M&F used a delta T with pre-industrial being the zero point for every interval or used the interval’s start year as zero. If the later their results are confounded by the fact that the diagnosis for all variables was relevant for pre-industrial response to CO2 at pre-industrial temperature. Climate resistance may very well increase with increasing temperature as seen in paleo-plots of the interglacial temperature ceiling, which we are near.

          For kappa it only makes sense that the temperature delta should be relative to the pre-industrial. Perhaps this is why, as M&F admitted, their method broke down with temperature down-swings, they presumed to be from vulcanic model over-forcing.

          Regardless of the above, M&F’s fundemental flaw IMO is placing any relevance to model to model comparison unless their paper is only making model to model conclusions, as in Forster 2013. If they are not claiming to be making conclusions about the observed forcing and resistance then they did not do much to correct the misconception of the world press.

        • Kenneth Fritsch
          Posted Mar 9, 2015 at 9:49 AM | Permalink

          “This is why I believe that the IPCC is failing to give adequate weight into investigating paleoclimate, and other extreme, annual variability.”

          I believe that valid temperature reconstructions would provide a great testing ground for climate models. The currently used approach of selecting temperature proxies for reconstructions after the fact of how well they emulate the modern instrumental record can be readily shown as flawed and biased. The validity of the proxy response must be shown prior to selection and then when valid proxies are shown to exist those responses are used regardless of how well the response correlates with the modern temperature record. There is little effort made in this direction, or even, for that matter, of obtaining out-of-sample tests on proxies already used in temperature reconstructions by updating the proxies.

          I think the poster called Frank does a good job of summarizing the problems with M&F here:
          http://rankexploits.com/musings/2015/new-thread/#comment-135623

  142. RomanM
    Posted Feb 23, 2015 at 1:38 PM | Permalink

    Another paper on models and “internal variability” (and the hiatus) in nature Climate Change.

    • R Graf
      Posted Feb 23, 2015 at 2:12 PM | Permalink

      Thanks Roman. Interesting, they determine models allow 10-year hiatus 10% chance, 20-year hiatus at 1%, and this puts a 16-year hiatus just crossing the 95% mark. It looks like the next El Nino should tell the story. Exciting. Steady as she goes!

    • miker613
      Posted Feb 23, 2015 at 2:44 PM | Permalink

      I thought their comments on the possibility of the “hiatus” continuing for five more years were fascinating.
      “there is a non-negligible probability (that is, between 0 and 29% for an expected warming rate of 0.2 K per decade) of the current hiatus continuing for 5 more years. Failure to adequately communicate this possibility could lead to allegations of overconfidence in GCM projections, especially if the existing hiatus continues until 2020 and beyond.”

      Am I misunderstanding them? They seem to be saying: It is possible tho unlikely (~5%) that the hiatus continued this long. Once it did, it is quite possible tho unlikely (~25%) that it will continue five more years. Therefore, the combination should not be considered statistically significant either.

      • Greg Goodman
        Posted Feb 23, 2015 at 3:03 PM | Permalink

        Although the absolute probability of a 20-year hiatus is small, the probability that an existing 15-year hiatus will continue another five years is much higher (up to 25%).

        This sounds an awful lot like : having flipped four heads in a row, the change of getting a fifth one is even higher. 😕

        • miker613
          Posted Feb 23, 2015 at 3:07 PM | Permalink

          No, they’re not saying that. But they are saying, the chance of getting a fifth one is 1/2, so be sure not to listen to people who point out that the overall chance of getting five heads in a row is only 3%.

        • Greg Goodman
          Posted Feb 23, 2015 at 3:26 PM | Permalink

          which means that they regard climate as a coin flip but are studying deterministic models to work out how we should ‘expect’ pennies to land.

          They know the games is up, they’re just stacking up their excuses to try to keep the gravy train running a few years longer.

          This sort of silliness getting published is becoming a sub weekly event and they have a multi-billion dollar, international industry set up to produce it.

          This thread is about M&F and probably should concentrate on that.

    • Greg Goodman
      Posted Feb 23, 2015 at 2:53 PM | Permalink

      Caption of fig 1a:

      a, GMST anomalies in observational data sets (red; see Methods for details), CMIP5 historical and RCP4.5 scenario ensemble members (grey) and single-model ensemble means smoothed with a 10-year low-pass filter (blue).

      Can anyone with access to the paper tell us what filter they used and how they abused it to get filtered results right up to the end of the data ?!!

      • RomanM
        Posted Feb 23, 2015 at 3:19 PM | Permalink

        I don’t have access to that paper through my university, however there is a Supplement available at the NCC site which might be helpful.

        • Greg Goodman
          Posted Feb 23, 2015 at 3:55 PM | Permalink

          Thanks Roman.

          “…we account for the influence of spurious long-term drifts in energy content by
          applying a 100-year high-pass Butterworth filter to each time series.”

          OMG, this must be Mann’s reinvented, back and forth Butterworth.

          You can’t fit a 100y Butterworth kernel even once into 120y worth of data, so this must be a recursive IIR run both ways ( a la Mickey Math Method ).

          The trouble is these guys know as little about IIR filters as they do about FIR. A 100y recursive filter probably needs something >300y to converge to within 5% of a stable result.

          As a simple guide, if you can’t fit a convolution kernel to do your filter, the recursive filter will not have converged and none of the output is usable.

          Now I just said I’d stick to M&F but this is the same damn problem. They have not the first idea about data processing but think themselves authorised to pull any method out of a hat and publish papers with spurious, unchecked results using it.

          ENOUGH.

        • Don Monfort
          Posted Feb 23, 2015 at 10:40 PM | Permalink

          Greg, please recruit some of the other math-stats dudes on this thread and write a paper exposing the silly math used by the so-called climate scientists. I don’t know maths doo-doo from stats Shinola, but this looks important.

  143. Greg Goodman
    Posted Feb 24, 2015 at 2:30 AM | Permalink

    It seems another major problem that all the models , low or high TCS, show is over doing the post WWII cooling and starting the warming trend too early.

    All the CMIP5 models, across the board have a surge of warming in the 70 which did not happen. This was the rough of the post war cooling period.

    This indicates that they are fundamentally missing the mark and are likely just patching together an approximate wiggle match over a limited period while failing to capture the essentials of the system.

    Despite the technical faults of the paper, they have highlighted a fundamental issue:

    The models are worthless.

    That may not be the point they were trying to make but it’s also what is being said by the second paper which is basically saying the models are so uncertain that it will even longer to refute them.

    As we all know, in science, no hypothesis has any value unless it is falsifiable. What both these papers are saying is that the hypothesis that models accurately represent the long term warming of the climate can not be falsified for maybe another 10 years.

    The proper scientific conclusion is that they currently have no scientific value in assessing or projecting the effects of AGW. The whole AGW bandwagon should then shut up and come back in ten year when they can tell us something about the validity of their models.

    • Posted Feb 25, 2015 at 10:00 AM | Permalink

      coming back to what is the biggest problem for all the models, irrespective of TCS for CO2, is the 1960-1970 divergence.

      1960, where they all under-estimate the rate of change was the very strong solar cycle. 1970, where they to a man grossly over-estimate the rate of warming was the following weak solar cycle.

      So the most likely reason for the biggest deviation of all models seems to be under-estimation of the solar forcing.

      That again would contribute to their current problems.

      It seems that , had they done this study properly, they would have got some useful information about the models.

  144. Greg Goodman
    Posted Feb 24, 2015 at 12:16 PM | Permalink

    I’ve just calculated the impulse response and freq. response of ( first difference + running mean ) as a filter and it’s worse than we thought (TM) !

    There is no roll-off attenuation at all. It appears there was a mistake on Nick’s plot, or it represents something else.

    In fact it has a series of zeroes in frequency every sub-multiple 15y ( 15, 7.5, 5…)
    and a 100% strength inversion close to 10y.

    So the filtering effect just manages to prevent the way that taking rate of change makes blows up high freq. noise, it does not progressively attenuate it. It is NOT acting as a low-pass filter. It selectively nips out some bands of variability, while making a mess of the rest.

    Now the sliding “trend” is not identical since the estimation of the slope is not identical to mean of dT/dt but the frequency dependency can’t be much different.

    Carrick has pointed out that since the data itself will have a kinda (i/f)-ish spectrum the amount of high freq. in the result will die off because it was never in the signal ( meagre consolation ).

    It is clear that at least part of the reason they did not detect any correlation to sensitivity was because they had so badly mangled the data with their processing.

    Of course, if they had tested their “innovative” method before using it they would have realised that.

    • Posted Feb 24, 2015 at 12:22 PM | Permalink

      Greg –
      The application of an N-point running mean to the first difference, produces the same answer as [x(t+N)-x(t)]/N. In other words, it’s drawing a line between the first point and the last.

      So yes, any signal with a period which is a sub-multiple of N, will have a zero response.

      • Posted Feb 24, 2015 at 12:55 PM | Permalink

        It’s what happens in between the zeroes that matters, especially between the first and second zero where there is still significant signal in the data and where it gets inverted.

    • R Graf
      Posted Feb 24, 2015 at 1:25 PM | Permalink

      Greg, are you working from M&F’s data? If so, post where others can access if they wish to run analysis. If not, have M&F responded in any way to Steve M’s request? If not, I think the request should be placed in CLB as unanswered for the record.

      Don is right, if what you are saying, (that their innovative method statistically produces nothing,) can be verified by someone else that can be published as a paper or be part of a book. For example, Ross M and others just collaborated with Mark Steyn to put out a soon to be out climate issues book.

      Did you understand my earlier comment about the equation used? Do you believe that N should be in the numerator and k dropped from the denominator and added as a separate factor to account for budget imbalance along with unknown imbalance e?

      Marotzke and Forster’s circular attribution of CMIP5 intermodel warming differences

      If they derived the equation incorrectly or inconsistently with their diagnosed radiative forcing this is also very bad.

      If you have a link to the new hiatus paper please share. And, did they use adjusted forcings or radiative forcings like M&F?

      • Posted Feb 24, 2015 at 2:21 PM | Permalink

        I got hadCRUT and CMIP5 from KNMI. Anyone can do that. I just used 60S-60N, I have no interest in Cowtan & Ways further screwing around with extrapolation and don’t thing the HadCRUT map was worth bothering with to do a basic test.

        I did not go into the lastest paper, one per week is a full time job already. I just scanned the SI that Roman linked to.

        I’ve already posted the point that they should have tested their innovative method before using it, at CLB and it’s gone rather quiet now.

      • R Graf
        Posted Feb 24, 2015 at 2:52 PM | Permalink

        Wait, let’s rethink Forster’s 2013 equation for radiative forcing: aT = F – N where (a) is radiative feedback. All of the alpha feedbacks like cloud reflection changes in lapse rate change the need for warming at the surface by changing the atmosphere thus are affecting radiative forcing and are only days to respond according to Forster 2013. He says also ocean uptake or evolving temperature is the cause of TOA imbalance. And variability is not mentioned (after all, why not count it as ocean uptake?). Kappa in M&F 2015 is TOAI. But, is the equation correct?

        Starting with Forster’s equation:

        aT = F – N

        aT = F – k

        T = (F – k)/a

        T = F/a – k/a

        There is no place for variability unless you simply want to split deep ocean from surface variability since they behave on different time scale, one steady for hundreds of years the others like the PDO or ENSO shorter and annual variability as perhaps another factor. But they all need to be divided by alpha (inverse of climate sensitivity) if you are using theoretical diagnosed radiative forcing.

  145. R Graf
    Posted Feb 24, 2015 at 9:00 PM | Permalink

    In re-reading Forster 2013 tonight I found that kappa is ocean uptake efficiency where N is energy imbalance due to ocean uptake. Does anyone know how these variables are defined relative to each other? I am having trouble understanding both equations being energy balances when they have seemingly identical variables placed differently.

    Forster 2013 sheds light k on line 272 when he relates k to ΔT through climate resistance here:

    ΔT = F / ρ
    “where the climate resistance ρ = α + κ, κ being the ocean heat uptake efficiency.”

    This leads to ΔT = F /(α + κ)

    • R Graf
      Posted Feb 25, 2015 at 8:43 PM | Permalink

      Answering my own question I found the following in Forster and Taylor 2008:

      “When N = kDT holds, we can write the heat balance
      of the climate system as F = rDT with r k + a, which we
      call the ‘‘climate resistance,’’ because it is the reciprocal of
      the climate response. Unlike the formulation with a thermal
      inertia, the relationship F = rDT has no timescale”

      The accuracy of the F, a, and k values used is still going to be a question, even M&F admitted. There has been an evolving definition of F from pure radiative forcing to adjusted forcing, which has many flavors depending on what is considered and immediate response versus transient response, which gets put into alpha. I am reading that Kappa is considered to be on a slow enough timescale that it’s considered a fixed resistance rather than an inertia. When they did their 4x CO2 abrupt forcing experiment to diagnose adjusted forcing and climate feedback it seems they assumed the kappa would remain linear and constant. Its not clear if the CMIP5 published values were used or diagnosed values and if all assumption were the same. But it certainly seems counter-intuitive that ocean uptake and increased cloud formation would have no impact on warming.

      • Posted Feb 26, 2015 at 12:19 AM | Permalink

        Good work, you’re getting close.

        As I pointed out above, in assuming that they can ignore the difference in phase between a radiative forcing and the result of the forcing, they are effectively assuming instant equilibration of the system. Thus all information about thermal inertia and the real climate sensitivity gets pushed into the deep ocean or dumped off into the supposedly “random” error term.

        The remaining alpha is a direct correlation of F(t) and T(t) that has lost all contact with physical reality of the ocean system.

        This was the whole point of my articles on improper use of OLS and determination of tropical feedbacks.

        On inappropriate use of least squares regression

        On Determination of Tropical Feedbacks

        This a large part of the reason of the failure of climate science so far, they are too lazy ( or incompetent ) to deal with even the simplest first order linear ODE correctly. They think they can analyse the everything with “trends” and running means.

  146. Greg Goodman
    Posted Feb 26, 2015 at 1:53 AM | Permalink

    Fig 3b from the paper

    available in full here:

    Their scale runs from 1900 to 1950 in 5y intervals but this is the _start_ date, so to centre it we need to mentally add 31y 😦

    We see that models are more consistent in the earlier 62y slots and then diverge. This is little surprising since the models are optimised to match the later period.

    The dark dot near the middle is centred on 1960-65. There is a brief spot of agreement but already a divergence and spread of results.

    By the end of the graph we see that results have separated into two distinct groups. My analysis shows that this is where the high and low sensitivity models diverge and TCS does make a difference. Low TCS models show less post 2000 divergence.

    This feature of their results was apparently not noticed by the authors and goes contrary to their conclusion.

    For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends or, consequently, on the difference between simulations and observations. The claim that climate models systematically overestimate the response to radiative forcing from increasing greenhouse gas concentrations therefore seems to be unfounded.

    Oops!

    • kim
      Posted Feb 26, 2015 at 8:45 AM | Permalink

      Very nice, very visible.
      ======

    • R Graf
      Posted Feb 26, 2015 at 3:25 PM | Permalink

      Greg,

      Still trying to fully wrap my mind around this. Are you saying the authors just counted on poor printing reproduction of the graph figures to demonstrate their conclusions? These plots of the regressions basically flatten and average the 62-year trends until the right-hand edge you are seeing 1950-2012. We know the models were last tuned in the mid 2000s. So why again are these graphs even meaningful except to show the models were successful in their obvious endeavor to find a theoretical basis for plotting a warming trend starting and ending near the observed temperatures? The fact that the low TCS models, despite the likely higher offsetting forcing, under-performed temperature rise is quite possibly a surprise to the authors and all else. It could also be an indication of error in M&F’s method. After all, the modeler’s intention was convergence at the observed temp in the 2000s. And, remembering residuals are high as variability is high, and variability is any case ocean uptake, N, as manifested mathematically by M&F through ocean uptake resistance k. Any consistent imbalance weighting a model’s trend high or low could be legitimately plugged by modifying k. Right? I still think ANY conclusions from the models is circular beyond being able to determine if they could accurately track 112+ years without too much divergence at any one point from observed T, which only proves they were programmed correctly to the point of their last tuning. The frequency and magnitude of their divergence should define statistical bars or expected certainty of prediction. As the models one by one inevitably fall outside these bars they should be discarded, right? Therefore the recent paper that Nature just published is the only one that is answering legitimate questions, predictability. Right? Please anyone play devil’s advocate to correct any of the above.

  147. David Stone
    Posted Feb 26, 2015 at 1:05 PM | Permalink

    I am disappointed as usual by the lack of statistical knowledge displayed in all these papers, and in the modelling used in the first place. Here we have a hugely complex system, of which we can only attempt to model or measure one or two terms. We have no way to remove the confounding factors which must therefore be present, and using techniques such as linear regression any other interesting but minor effects are completely lost. There is an underlying assumption that the equation which is developed by the author is exactly correct, is exactly linear, and has no internal couplings which we cannot eliminate. No estimate of error terms is produced, and the result taken as the truth! The criticism above is excellent and points at the major flaws, why were these not spotted in the alleged peer review? Because it appears that the process is much less rigorous than the journals would have us believe.

  148. R Graf
    Posted Feb 26, 2015 at 2:02 PM | Permalink

    Greg. nice work.

    Now going back to the scene, M&F’s sentences here for logic dissection:

    “The differences between simulated and observed trends are dominated by random internal variability over the shorter timescale and by variations in the radiative forcings used to drive models over the longer timescale. For either trend length, spread in simulated climate feedback leaves no traceable imprint on GMST trends or, consequently, on the difference between simulations and observations.”

    The first sentence compares all simulations to actual observed temperature trends and deduces that since the models have no predictive value in the 15-year term that actual observed is “random internal variability.” That ignores the logic that everything is random until science finds a reproducible explanation. The valid sentence should read (Plain English) as follows: the models universal can maintain no statistical accuracy or ability to track 15-year actual trends for GMST through the 20th century to present. Primary drivers of GMST are therefore unexplained and overpowering the models, obliterating any signal from 15-year changes in GHG concentration.

    The second sentence explains their results show that the models, no matter how long the simulation period, are not affected by the amount of feedback (alpha and kappa?) programmed in, only upon effective radiative forcing ERF. And, any divergence in long-term is due to ERF.

    This is what obviously caught Nic and other’s attention. It seems defiant to logic; how can feedback, which by definition fractionalizes ERF have no affect? Both cloud response and ocean uptake are assumed to be long term resistances to change in T. The only thing that the authors could have been referring to is “trend” in that the ERF / climate resistance ratio was solely determining trend. This again is obvious from whatever the models placed as input and if any variances in time-series in these values. Therefore the authors are putting in a conclusion what is set as an assumption. This indeed is circular logic.

    Even before the paper analyzes the 72 runs that there was data breakdowns derived for they made general conclusions from the 114 runs overall temp trends. They said they all performed in line with the 20th century observed in universally consistent fashion. Again, the models were programmed to do so. This is not an analysis that is called upon for a conclusion lest it supply a false impression to be meaningful. It’s circular logic again.

    I am going to continue to try to dig into the numbers of what M&F actually plugged into their equations and what they were outputting. Greg, I agree that the plotting of alpha and kappa seem to be nonsense unless they are plotting deltas on these values, which them makes sense that constant input would equal constant in output.

    • Posted Feb 26, 2015 at 3:10 PM | Permalink

      Dissecting there claim is a good idea:

      “The differences between simulated and observed trends are dominated by ….”

      Now, as far as I can see the a NEVER calculated the “difference” so how would they know. They plot one squiggle on top of another and eye-ball it ??

      How about they plot the difference of ensemble mean and HadCRUT? I looked at their plots and that was the first thing I wanted to see.

      One look at that spike in their 15y “smoothed” data and I knew they were using some kind of running mean and they had significant signal in the negative lobe. I can spot this a mile off I’ve seen it so often.

      Pekka said that Forster was an “expert” but this is decidedly amateurish data processing.

      The most favourable way I can see this is good old bias confirmation. They got the result they wanted and concluded it must be right.

      • R Graf
        Posted Feb 26, 2015 at 5:15 PM | Permalink

        Yes, again, that the models cannot follow the observed in the 15-year trend, unless there are eruptions or known events that are programmed in, just highlights the obvious lack of understanding of what drives short-term trends of GMST. Plotting the 62-yr trends and looking at them far away I guess the programmers were in the ballpark. So what, isn’t that an obvious expectation?

        Here is what’s eating me know: The kappa is a steady linear value tied to delta T. But this holds true only for delta T relative to the pre-industrial baseline temp. where they calibrated it from. After that delta T for a trend will have a direct relationship with cloud response, alpha, but kappa remains related only to the delta T from pre-industrial T. Do you know if M&F kept delta T continually relative to pre-industrial T?

        • Posted Feb 27, 2015 at 2:27 AM | Permalink

          I think that after all the crude linearisations and totally ignoring the temporal phase lags which contains the crucial information about the time-constants of the system response, the linearity or otherwise of kappa becomes irrelevant.

          The problem is kappa itself which is being used as a dumping ground for Trenberth’s “missing heat”.

        • Frank
          Posted Feb 27, 2015 at 6:41 PM | Permalink

          Try reading Isaac Held and references therein:

          http://www.gfdl.noaa.gov/blog/isaac-held/2011/03/11/3-transient-vs-equilibrium-climate-responses/

          I believe that alpha M&F’s paper appears to be called beta by Isaac and kappa is called gamma.

        • Greg Goodman
          Posted Feb 28, 2015 at 6:36 AM | Permalink

          The thing is , when there is a significant volcano the heat uptake will drop as the surface cools. Since their processing inverts the circa 10y variability, for the last few decades of 20th c. they are inverting the very thing they need to look at to determine kappa.

          Their conclusions related to 15y sliding trends are spurious.

          I’ve emailed the leading author, pointing this out but have not received a reply.

        • Don Monfort
          Posted Feb 28, 2015 at 2:34 PM | Permalink

          I am going to guess you will get the same response as Steve McI. got to his request for data, Greg. I am guessing that Steve McI. got no response. They don’t have to show you no steenking badges

  149. joe
    Posted Feb 27, 2015 at 4:13 PM | Permalink

    Another study by Mann which attributes the faux pause to the AMO and PDO while at the same time claiming the prior warming was not enhanced by the amo and/or pdo on the front side of the natural cycle.

    http://www.sciencemag.org/content/347/6225/988.abstract

    Always have to respect a scientists that can make two divergent claims with the same study.

    • Greg Goodman
      Posted Feb 28, 2015 at 6:38 AM | Permalink

      My mother used to tell us that you can’t have your cake and eat it.

      But then she wasn’t a climatologist. They have lots more cake than most people so maybe it is possible.

      • Hector Pascal
        Posted Feb 28, 2015 at 1:44 PM | Permalink

        To have cake and then eat it is a trivial task. Most people can manage that. The really hard thing to do is to eat cake and then have it. I’m surprised that so many people can’t make the distinction.

        It’s right there with people believing that Canute tried to order the tide back rather than demonstrate that the power of kings had no control over nature.

  150. R Graf
    Posted Feb 28, 2015 at 5:52 PM | Permalink

    Greg,

    I am not sure if you were already aware that according to M&F’s Data table of models used they elected not to use Forster 2013’s model ISPL-CM5B-LR despite having the values for F, a and k. Also, model NorESM1-M is missing from the list but Nic thinks it an oversight. Lastly, I do not see the diagnosed values four models: bcc-csm1-1, bcc-csm1-1-m,GISS-E2-R, MIROC5, or in the Forster 2013 data. Were these newly derived? Is so, why?

    Here is a good reference paper on model use:

    Click to access taylor_et_al_2012.pdf

    In case you have not already read it on page 494 they go over considerations for using CMIP5 data. They make the point that Nic did that the historical runs, due to arbitrary placements of PDO and other oscillations, are randomly squiggling relative to each other. They also point out that k is significantly drifting even in the calibration period and will need adjustment.

    On my final thoughts on M&F, I first believe their evaluation of the 36-model’s 114 runs for variability follows circular logic and is a head-scratcher as to how any models could add insight into the variability of the observed record more than the observed record itself. M&F seem to ignore that fact the modelers created their wiggles with the benefit of the same record that M&F were comparing them to. In light of this fancy sentences like: “Our interpretation of Fig. 1 tacitly assumes that the simulated multimodel-ensemble spread accurately characterizes internal variability, an assumption shared with other interpretations of the position of observed trends relative to simulated trends (for example the reduction in Arctic summer sea ice.” is junky junk designed to obscure the truth (that they have nothing).

    In the 18 models’ 72 runs where they were evaluating dF, k, a, and dT over intervals, their output is entirely dependent on their own inputs, all derived from dT by Forster’s diagnosis. The CMIP5 data according to Taylor et al. must be used with caution. Ocean uptake, k, has a drifting starting value and must be adjusted be time interval. This only makes sense; the further the deep ocean and thermocline vary from equilibrium temperature relative to the atmosphere the higher the efficiency for which they will uptake of give off heat to the atmosphere. This means k is not a linear function to delta T of the atmosphere to itself but of the average delta T between oceans and atmosphere wherever they interface. This later delta is constantly in flux due to high variability and constant drift. M&F assume it’s linear. Although I cannot find the actual data values M&F plugged in, they admit that their values are not valid following volcanic eruptions. This would indicate they k is not adjusted.

    The final two problems are not unique to M&F, they are the assumption that climate sensitivity, from which alpha is assumed to be the inverse, is linear to delta temp. But the paleo-temperature record suggests there is an upper limit to T, suggesting an increasing resistance to rise above our current range. This can be explained by a cloud relationship to ERF, and a cloud relationship to Clausius Clapeyron. The final problem is the assumption that natural climate variability only exists on the less than 30-year time scale, (that the paleo-temp record is a hockey stick handle). Wow.

    Due to my lack of statistics expertise I will leave criticism there to others but I ask: is it kosher to do OLS on values that have already gone through a process of assumptions to linearize them? Kappa was derived in this manner from N. And F was determined through OLS previously from it’s assumed relationship to dT, k and a over a test period of abrupt change in F. It seem like a lot of smoothing going on.

    • Greg Goodman
      Posted Mar 1, 2015 at 12:02 AM | Permalink

      As shortcut, the IPSL model simulates volcanic forcing by manipulating the TOA solar input . This probably means M&F methods cannot be applied using IPSL TOA forcing.

      ===

      OLS is not a low-pass filter and should not be used as one. It is a valid way to minimise the effects of normally distributed errors, provided there is no significant systemic variability other than the presumed linear relationship.

      M&F are trying to use the sliding-trend as a “smoother”. It is a misuse. They should have used properly chosen filter.

      • Posted Mar 4, 2015 at 4:35 AM | Permalink

        To get an idea of the importance of the distortions of the 15y sliding trend technique here is the CMIP5 ‘tas’ from the model ensemble compared to its 15y running mean.

        CMIP5_180moRM

        The freq response of running mean is sinc function and thus the same as sliding OLS trend.

        The El Chichon peak is neatly inverted and the Mt Agung one disappears !

        Around El Chichon – Mt Pinatubo period the variation in the mean is almost in anti-phase with the unfiltered model output.

        M&F’s failure to detect a “traceable imprint” of model sensitivity using this technique is thus unsurprising and their conclusions concerning 15y trends are unwarranted.

  151. Kenneth Fritsch
    Posted Mar 4, 2015 at 1:22 PM | Permalink

    Does it bother anyone else that when you take overlapping function data from a time series there is a great chance that the residuals of a regression of that data with time will have a very high degree of auto correlation? When I calculated overlapping 15 year linear trends from 42 CMIP5 RCP4.5 models from 1861-2100 and then regressed the variances of the models for each year against those years, I obtained residuals that had an auto correlation of 0.89.

    M&F regresses the variances from the overlapped trends not against time but the right hand side of their regression equation. Since M&F does not provide that right hand side data and while it can be calculated with a lot of downloading effort, we cannot conveniently look at their regression residuals. I am wondering if you have variables on two sides of a regression equation that both have regression time series residuals with high auto correlation will that regression have residuals with high auto correlation. I did not find in my reading of the M&F paper any reference to testing the regression residuals for auto correlation and making the appropriate adjustments.

    • dfhunter
      Posted Mar 6, 2015 at 5:47 PM | Permalink

      This comment is not at Ken, but not sure where to add !!

      @ Nicholas Lewis – after the discussions/comments above & at other blogs, do you stand by your post comment –

      “The paper is methodologically unsound and provides spurious results. No useful, valid inferences can be drawn from it. I believe that the authors should withdraw the paper.”

      not trying to be funny with this, but it would be useful to lurkers/interested readers at CA to know 🙂

    • davideisenstadt
      Posted Mar 6, 2015 at 6:11 PM | Permalink

      KF your comments regarding autocorrelation are apt. I think it should be expected when one is looking at running fifteen year averages…after all each successive observation shares 14 out of fifteen members with the ones immediately before and after it….a good reason to avoid the use of running averages in regressions… Briggs would spin in his blog.

    • Ron C.
      Posted Mar 7, 2015 at 2:41 PM | Permalink

      Early in this thread, I showed how one CMIP5 model produced historical temperature trends closely comparable to HADCRUT4. That same model, INMCM4, was also closest to Berkeley Earth and RSS series.

      Curious about what makes this model different from the others, I consulted several comparative surveys of CMIP5 models. There appear to be 3 features of INMCM4 that differentiate it from the others.

      1.INMCM4 has the lowest CO2 forcing response at 4.1K for 4XCO2. That is 37% lower than multi-model mean.

      2.INMCM4 has by far the highest climate system inertia: Deep ocean heat capacity in INMCM4 is 317 W yr m22 K-1, 200% of the mean (which excluded INMCM4 because it was such an outlier)

      3.INMCM4 exactly matches observed atmospheric H2O content in lower troposphere (215 hPa), and is biased low above that. Most others are biased high.

      So the model that most closely reproduces the temperature history has high inertia from ocean heat capacities, low forcing from CO2 and less water for feedback.

      I’m not fond of climate models, but I’m warming up to this one.
      (Oh wait, it’s by the Russians! Can you spell “Big Oil?” sarc/off)

      • kim
        Posted Mar 8, 2015 at 8:10 AM | Permalink

        Putin on the Shitz.
        ===========

        • Ron C.
          Posted Mar 8, 2015 at 10:09 AM | Permalink

          Actually, Putin doesn’t use the Internet very much.
          When he recently said, LOL, he meant Look Out Lativia!

        • Ron C.
          Posted Mar 8, 2015 at 10:12 AM | Permalink

          Heh, Look Out Latvia!, not Lativia

  152. r graf
    Posted Mar 7, 2015 at 3:13 PM | Permalink

    In M&F’s paper’s last paragraph they state that because they cannot detect impacts of alpha and kappa in models (with their self-diagnosed data and untested innovative statistical method) they therefore have no impact in the models’ trends. And, if they have no impact in models then they have no impact in real life. (Since the purpose of the paper is to help determine the model’s fitness this is purely circular logic.) BTW, Ken Fritsch above and Pehr Björnbom (on CLB) show with M&F’s own figures that, as one would expect, GMST does correlate with model’s TCR (or other sensitivity metrics).

    Then M&F, pointing to the 3-fold variance in alpha values from model to model (from Forster’s 2013 diagnosis,) along with the fact that their (untested) method shows no variance in alpha in simulation, concludes that alpha and kappa are undetectable by affect on temperature trend. Hence the conclusion: If you can’t find me you can’t say there is too much or too little of me. At the same time they seem to conclude somehow that forcing (although is the exclusive value detectible in 62-year trends) cannot be criticized as being over-estimated. Then of course the press release translates this to: “Skeptics who still doubt anthropogenic climate change have now been stripped of one of their last-ditch arguments…”

    After a month on this I confess I still don’t get it.

    In defense of their conclusions M&F in their CLB response attack Nic Lewis for using (in his past work) forcings and ECS values from what M&F consider to be less accurate historical model simulations and diagnostic methods than M&F chose. When they got to defending how alpha and kappa could disappear from significance it seems they are claiming that their own method is very inaccurate. And, it is these poorly fitting assumptions that led to their skewed analysis. Read for yourself here M&F’s last paragraph on CLB: “At the 2014 AGU Fall Meeting it was shown independently by Kyle Armour (MIT), Drew Shindell (Duke University), and Piers Forster that over the historical period these quantities change over time. Hence, their diagnosis from historical simulations is highly uncertain. This also supports the physical explanation as to why α and κ have a small role in determining model spread that Lewis did not understand. The small spread supports the reasoning that unique values of α and κ do not well characterize 20th century trends.”

    I understand that alpha and kappa are “climate resistance” values and as such should be more constant than TOA imbalance, for example, but I don’t get what M&F are claiming here. Does anyone want to take a stab?

  153. Posted Mar 17, 2015 at 12:19 PM | Permalink

    I have redone the M&F analysis using an ensemble of 42 CMIP5 models here: Marotzke & Forster Revisited

    There is an inbuilt assumption in models that internal climate variability is quasi-random and it probably isn’t (see even Mann’s post on Realclimate).

    Marotzke & Forster(2015) found that 60 year trends in global surface temperatures are dominated by underlying climate physics. However, the data show that climate models overestimate such 60y decadel trends after 1940.

    The main reson for their controversial regression analysis is to seperate forced temperature rise from ‘random’ natural variation in the models!

    • Kenneth Fritsch
      Posted Mar 27, 2015 at 9:18 AM | Permalink

      I was concerned about the auto correlation of the regression residuals from the M&F regressions and requested of and received from Jochem Marotzke the ERF (Effective Radiative Forcing) data that I required to replicate the M&F regressions. When I set to doing this I realized that what I had first thought was the authors approach was incorrect. In fact the time invariant variables, alpha and kappa, used in the regressions limit the regression approach to what M&F published. As it turns out auto correlation was not a problem, but in repeating the regression I found some rather problematic issues in the details of the regression results and in looking in detail at the ERF model time series.

      My analysis below does not take into consideration the problems connected to the circular nature of the regression. Rather I attempt to show that with a small adjustment to the regression the results change dramatically and would require a very different conclusion than the one acquired from M&F. My alternative approach here will have all the circularity problems of the M&F approach. My preferred approach was discussed at https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-warming-differences/#comment-753507 and shows for the period 1975-2014 that the GMST deterministic trends for the observed data sets are smaller than all but 2 or 3 of the CMIP5 models.

      I used the ERF, alpha and kappa data provided by Marotzke in combination with the RCP4.5 CMIP5 model GMST (Global Mean Surface Temperature) time series from KNMI. I have linked the data and results from an Excel file in a Dropbox link below. The data I used were from 61 model runs whereas M&F used 75 model runs. The extra runs in M&F were the result of using the historical runs and adding to those runs from the years 2006-2012 from the RCP4.5 scenario series. RCP4.5 includes all the historical part up to 2005 and then from there the RCP4.5 scenario. In order to avoid having to join different series I simply used the 61 RCP4.5 model runs that were avaialable. I did the 15 and 62 year trends as was performed in M&F and obtained very similar results. The results are shown in the tables in the two links below. I show the regression coefficients and p.values and the overall regression p.value, the auto correlation of the regression residuals, the standard deviation of the residuals and the standard deviation of what M&F termed the deterministic part of the regression or regression error.

      It becomes obvious looking at the p.values for the regression coefficients and overall regression that for the 15 year trends the regression model cannot used for its implied intended purposes of using the model differences in the independent variables ERF trends, alpha and kappa to predict differences in temperature. It can also be seen looking at the standard deviations of the residuals and the regression error that the variation in regression residuals is large compared to that due to regression – as was the conclusion in M&F. In contrast for 62 year trends, the ERF coefficient and overall regression p.values show statistical significance while the coefficients for alpha and kappa, in general, are not different than zero. For the longer trend period the standard deviations for the residuals and regression are nearly equal for most of the regressions.

      The temperature series are different for each model run but the 17 individual models used have only one corresponding realization of ERF (and alpha and kappa which are time invariant) per model. This situation might pass muster if one considers that ERF (and alpha and kappa) are deterministic for a given model and will not thus change much from run to run while the temperature runs are influenced by chaotic effects and can change by goodly amounts. I nevertheless did the same regressions where I averaged the multiple temperature runs for a given model to match the 17 ERF series. I do not show the results, but the only major difference was that the fewer degrees of freedom reduced the frequency of obtaining coefficients different than zero and p.values for over all regression less than 0.05.

      What was quite revealing to me (as a layperson relatively new to this area of climate science) is what is shown in the two links of plots of the 17 model temperature and ERF series. (The ERF series are always on top of the temperature series in these plots.) An overwhelming amount of variation year to year is in the ERF series compared to the GMST series. What is quite evident is that the temperature tracks ERF and that the problem with the regression is the noise in the deterministic ERF series. Recall that the trends that M&F reference in their paper are determined over the period by delta T and delta ERF and evidently used, as I did in my regression trend calculations above, the last year in the trend period for T or ERF minus the first year in the trend period. In these linked plots I show a trend line in red for ERF and GMST that was derived from Singular Spectrum Analysis (SSA) using a window of L=67 and combining the first 2 principle components. I used these trends lines and reran the regressions for 15 and 62 years. Those results are linked in two links and tables below. Noticed the dramatic improvement in the regression results, and, particularly so, where the series are expected to see the effects GHG forcing. The regression residuals are much reduced. Notice also that the kappa coefficient does not have p.values indicating values different than zero for any of regressions for either time period or using the M&F method or SSA trends, while alpha begins to consistently show some significance with the SSA trends for 62 year periods.

      In conclusion looking at the details of M&F type regressions, as I have done here paints a very different picture than that provided in the M&F paper. The deterministic differences in models is indeed evident in the GMST series.

      Two links to two tables showing regressions per M&F methods:


      Two links showing plots of model ERF and GMST series with SSA Trends:


      Two links to two tables showing regressions using SSA trends:


      Link to Excel file with data and results in Dropbox:

      https://www.dropbox.com/s/hzfqws74791p56e/M%26F_Regression_Using_MF_Data.xlsx?dl=0

  154. Kenneth Fritsch
    Posted Mar 27, 2015 at 9:20 AM | Permalink

    I have a rather lengthy post of my replication of the M&F regression and other details that I would like to get removed from moderation.

    Thanks.

    • kim
      Posted Mar 27, 2015 at 10:38 AM | Permalink

      I’m still moderating it. Thanks for the exhaustive search and the lucid language.
      =========

  155. Posted Jun 7, 2015 at 10:53 PM | Permalink

    Actually, I think the paper is even worse than this post suggests.

    • Paul_K
      Posted Jun 8, 2015 at 5:19 AM | Permalink

      I agree, Lucia. Nic is far too polite.
      When I first read the paper, I just thought it was a bad paper. However, after actually testing the magnitude of the authors’ model error, which is misinterpreted by the authors to be all “natural variability” within the GCMs, I came to the conclusion that it is much worse than just bad. Terrible tripe which leaves the reader diminished. So bad it’s not even wrong.

22 Trackbacks

  1. […] statistikkmetoder i ortodoks klimaforskning så er Kanadiske Climate Audit først ute igjen. Her i regi av Nicholas Lewsi støttet av statistikkprofessorene Gordon Hughes (Edinburgh Univ) og Roman […]

  2. […] uncomfortable facts are revealed in Nic Lewis’s analysis of the paper, which has recently appeared at Climate Audit. Nic, assisted by Gordon Hughes and Roman Mureika, […]

  3. […] Nic Lewis, an expert in this area of climate science, today pubished an article demonstrating that there are serious errors in the paper, and that its conclusions cannot be […]

  4. […] Nic Lewis, an expert in this area of climate science, today pubished an article demonstrating that there are serious errors in the paper, and that its conclusions cannot be […]

  5. […] – Gordon Hughes […]

  6. […] Nic Lewis, an expert in this area of climate science, today pubished an article demonstrating that there are serious errors in the paper, and that its conclusions cannot be […]

  7. […] Unfortunately for Marotzke, his case has now, in turn, been demolished in this article by Nic Lewis. […]

  8. By Week in review | Climate Etc. on Feb 7, 2015 at 10:17 AM

    […] Climate Audit: A must read analysis by Nic Lewis about the Marotzke/Forster Nature paper of last week; paper’s results are flawed: [link] […]

  9. […] as Nicholas Lewis argues at Climate Audit, the statistical methods on which the Marotzke paper relies are inept. Before getting to the […]

  10. […] a recent post on Climate Audit, Nic Lewis criticised Marotzke & Forster (2015, Nature) for applying circular […]

  11. […] https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-wa&#8230; […]

  12. […] Nic Lewis (of low climate sensitivity fame) has taken a very detailed (and complicated) look at the statistical methodology used by Marotzke and Forster to arrive at their results. He does not […]

  13. […] opaque to me and probably a lot of other people besides. Nic Lewis thinks it is plain wrong, and says so at Climate Audit, laying out his reasons. He gave Marotzke and Forster the opportunity to reply to […]

  14. By Classical Values » Fiddling with numbers on Feb 8, 2015 at 6:47 PM

    […] The process is called “adjusting the data.” And (surprise!) the math has issues: […]

  15. […] https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-wa&#8230; […]

  16. […] https://climateaudit.org/2015/02/05/marotzke-and-forsters-circular-attribution-of-cmip5-intermodel-wa&#8230; […]

  17. […] Unfortunately for Marotzke, his case has now, in turn, been demolished in this article by Nic Lewis. […]

  18. […] Unfortunately for Marotzke, his case has now, in turn, been demolished in this article by Nic Lewis. […]

  19. […] 5 februari kom Nic Lewis kritiska blogginlägg Marotzke and Forster’s circular attribution of CMIP5 intermodel warming differences på Steve McIntyres Climate Audit. Nicholas Lewis är en frilansande klimatforskare som har blivit […]

  20. By Some questions for Bill Nye | Dini Blog on Feb 23, 2015 at 1:08 PM

    […] – Science is not a democracy nor is it a PR battle.  Show me ACTUAL valid research depicting  ‘climate change’.  The slop coming from Michael Mann and Marotzke and Forster doesn’t count.  Peer reviewed dreck from Nature (or Science) doesn’t count either. (“Just because it was Publish in Nature doesn’t automatically mean it is wrong“) […]

  21. […] recent paper by Marotzke/ Forster (M/F) is in strong discussion here at climateaudit.org with more than 800 comments. Nicolas Lewis pointed out the question: Is the method of M/F for […]

  22. […] procedure was criticised by Nick Lewis and generated an endless discussion on Climate Audit and Climate-Lab  about whether this procedure made statistical sense. However for the most part […]