There are some odd changes between the two studies

I use http://www.nature.com/nature/journal/v430/n6995/extref/FigureData/nhmean.txt and http://holocene.meteo.psu.edu/shared/research/ONLINE-PREPRINTS/Millennium/DATA/RECONS/nhem-recon.dat

no differences in NH mean, and without **P** I can replicate that better. Yet, there is another version of MBH98 somewhere, see http://www.ncdc.noaa.gov/paleo/ei/ei_attriold.html

]]>The results for Figure 7 of MBH98, through an oversight, were based on the penultimate set of temperature reconstructions, and not repeated with the NH series resulting from the final set of pattern reconstructions used elsewhere in the manuscript (and shown in Figure 6).

I have a somewhat shorter question for Steve, namely after MBH have estimated the temperature PC’s U as in your second equation, how do they get from there to NH temperature T? This never seems to be discussed, and MBH 9x don’t offer much help. Is there a second CCE exercise where the U’s are calibrated to T, or are the U’s just averaged together somehow?

Reconstructed Us are multiplied by original S and V^T, and then re-scaled by tcpa-standard.txt , and this is the grid-point reconstruction. RE’s and other stats are computed from cosine-weighted averages of this grid-point recon (NH, sparse, global etc.). This is, of course, wrong way to compute these stats.

I’m still puzzled by P-weights, without these weights I can replicate MBH99 results more accurately..

]]>Re #9

Robin,

I’ll try to be brief in my response to your request but I may be sniped for being off-topic. If so, contact me and we can continue the dialogue via e-mail.

As we both know, your so-called CUMSUM method is a simple way to find data trends and evaluate trend patterns. Permit me define a few terms that I will use; you might have something else in mind for the same term. First, I’m referring to an nxm data array where the data observations and index to n-rows in m-columns. The data in each column is normalized by transforming the raw data point in each cell to become the difference between the data point and the column average (or arithmetic mean) divided by the STDEV of the column data. The data normalized data in each cell is measured in STDEV units. After normalizing each data column, a new nxm array can be formed for the normalized data set. Of course, the normalized data in each column can be integrated to form a CUMSUM column to look temporal changes in the form of trends. Second, I am referring to persistent trends wherein the deviations in several adjacent cells either are more or less than the column average. For example, if the row index is in years, decadal trends would be persistent. Third, graphical display of single CUMSUM data column may show easily-recognized V- and U-shaped trends and/or invert V- and U-shaped trends. The V-shaped trends may be the most interesting and perplexing because it signifies an abrupt or stepwise change in the underlying data has occurred that changed the sign of the slope of the CUMSUM curve. Of course, a U- shaped trend is a steep negative slope followed by a near-zero slope followed by a steep positive slope. A quadratic-shaped or conic section-shaped trend is also common and easily recognized. It is caused by a linear trend in the underlying data where curvature of the CUMSUM trend depends on the slope of the linear trend in the underlying data. If the CUMSUM trends for each column are significantly different in shape and phasing from other columns, different factors are influencing the underlying data. Finally, I am referring to stationary as being the stability of the statistical distribution of the data in a column. I am not referring to the stability of a trend. The data in a column doesn’t have to normally-distributed to be stationary. In long data columns, it is quite possible the data are normally-distributed over a long string of cells but become skewed or non-normal in other strings of cells. These changes in distribution mean the data are non-stationary.

As I mentioned in #6, I was trying to understand the temporal and spatial variations in some USHCN surface temperature data. I chose several sites where monthly data was available from 1897 to 2005. Hence, my data array was 109×12 for each station. After normalizing the monthly temperature data in each column, I plotted the CUMCUM data. As you know, all CUMSUM data columns start at zero and return to zero. The difference in the trajectory of the CUMSUM curve over the 109 cells in a monthly column sets the trend pattern. I was struck by the differences in the trend patterns for each month. Although some monthly patterns were similar in some features, there were many significant differences. For example, at one site, there essentially no persistent trends in November over the 109 yr interval whereas large decadal trends were observed in the trend patterns for October and December. I observed the same perplexing similarities and differences in trend patterns for the other sites that I examined.

I’ll not take the time to describe the similarities and differences. Suffice to say, they involved V-, U-, conic-shaped trends and phase delays, as well as, non-normal and non-stationary inter val. All of this indicated there were many other inexplicable non-linear factors influencing the temperature data. Some of the factors were undoubtedly due to weather matters and other related to longer-term climate changes.

Yes, it is possible to add another column in the data array and include the average of the 12 monthly data column. The CUMSUM for the average annual data exhibits its own trend pattern and it is different from any of the monthly patterns. That’s no surprise since averaging step is merely the equal weighting or blending of the monthly data. This obviously suppresses or obscures all but the large amplitude trends. In #5, I understand that you averaged your 112 column data set (not the normalized data set) to from an equally-weighted “total” data column. If my understanding is correct, then I believe that most of the lesser trends are suppressed. Therefore, this is different than using a PC methodology to select a smaller number of unequally-weighted data columns.

So … this probably didn’t provide you with any new insights. My cautionary note would be: “ … be sure that you have an idea about what kind of trends that you are looking for before using too highly aggregated data sets.”

Having uploaded your image somewhere, copy ONLY the direct URL of the image to your clipboard (i.e. without any enclosing BB or HTML code), click the [ Img ] button above the comment box and paste your URL in.

The image may not show in the Preview (that’s a software bug), but should be displayed when you submit the comment. A direct URL pasted in your comment may be a useful back up until you get used to how everything works.

]]>The reason for standardising each of the data columns was to avoid spurious weighting that would occur if the “raw” values, as published by Mann et al were used. The shape of the plots of individual columns against the time variable is of course totally unaffected by this transform. Its departure from the mean of column time series is simply a measure of how each column value differed from its mean, and thus I think it is reasonable to average across column to provide an estimate of the consensus value of the departure from the mean for the data as a whole (or as a rational sub-sample) of the data columns for each data year. After all, it must be supposed that in any given year each data column would be reflecting the climate of the time, otherwise why bother with time series at all.

Using a restricted data set, the period from 1820 to 1980, all columns are complete with what one hopes are valid observational numbers. The standardisation is thus unlikely, I feel, to produce a biased mean.

Of course, the data in the columns are /NOT/ stationary! They are climate data, and thus change due to a variety of factors, and without doubt the general change over this period has been upwards. That is what we are interested in, I think, and what I hope to do is to demonstrate the manner in which this upward trend takes (took) place. What I find is that much of the change took place over restricted periods of time, with generally stable periods between the change events or segments. Individual data columns also show this type of behaviour, and it is really interesting to look at the form of the cusums of the single columns. Some appear to be genuine un-tampered with data, but others show very strong signs of having been smoothed before being reported. Cusum plots demonstrate this very easily. For single site data, or assemblies of data of very similar origin such as the 13 temperature columns, transitions between stable regimes are generally very clear indeed, thus encouraging me to believe that the cusum method holds some promise in evaluation of assemblies of time series data.

If someone can tell me how to insert a GIF into this contribution (presuming it’s allowed) I’ll be very happy to show just what happens using these techniques.

It is very interesting to read that you have also used a technique similar to mine. I wonder exactly how you interpret the cusum patterns. I contemplate the very grand scale shape, and propose an hypothesis – “the region between 1830 and 1854 is stationary” for example – and then test the original data to see if this hypothesis holds up. Another hypothesis might be “In late 1922 a very abrupt change took place”. This would be signalled by something approaching an elbow in the cusum plot. It could be verified by a higher resolution analysis, using monthly rather than annual data, suitably de-seasonailised, and by examination of the apparently more stable segments on either side of the hypothesised elbow. Such “verification” of the import of cusum plots seems to work very well.

I certainly am not a fan of hypothesising a simple linear model to climate data and working with the residuals. This seems to be a standard technique for inducing stationarity over a chosen period, but my knowledge of standard time series methods is slight. As for assessing the data for having a normal distribution I really cannot be sure of how this might affect ones deductions if it were found not to be the case. In my experience attempting to disprove the hypothesis that a given collection of observed data are normally distributed is seldom successful, unless is is grossly and obviously non-normal, such as daily temperatures over a year in a non-tropical site. How would non-normality influence deductions regarding climate data. I’ve nothing to base any ideas on, so would welcome some instruction.

Robin

]]>1) My code downloads temperature U-vectors directly, I’ve tried to reproduce them via SVD, monthly gridded temperature data needs to be multiplied by cosine lat gridponts.tx and divided by tcpa-standard.txt and then SVD and then downsampled to annual to get quite close to archived U-vectors.

2) I don’t use P or L, but if I undestood correctly, L cancels out. So the remaining difference between my and Steve’s implementation is P. And as Hu notes, P (S^-1 in my post) by Mann is not obtained in conventional way, I tried conventional P in my post, http://signals.auditblogs.com/files/2007/11/ad1600_rec_cce.png

]]>Is there really any point in bothering with the temperature PC’s if one is ultimately only interested in T? Just calibrating T directly to the Y’s would be a lot simpler.

]]>Robin,

You said:

“Thus I took the rather obvious step of standardising each column to mean zero, variance one. Neglecting the now hot topic of column weights one can average across the data rows to form some sort of estimate of what the “climate” was like during any given year, on a standardised scale having no practical units attached to it.’

I assume that you averaged the rows of deviations of the normalized column data. If so, it seems to me that is ok if the data in the columns are normaly distributed and stationary. If the column data are not normally distributed or stationary, the row average is the row average but may be meanningless. For example, if you normalize such a row average, the cumsum plot may show well-defined trends that cannot be explained.

I have used your method when trying to understand local and regional temporal and spatial variations in USHCN surface temperature data. The U’s and V”s are eye-appealing but the underlying non-linearities and non-stationarity are confounding.

]]>