I’ve been going through the process of reconciling gridded data and station data in one Russian gridcell. Most of my effort to date has been spent on creating tools for accessing and collating data archives into organized time series formats so that others don’t have to go through the same trials and tribulations of sorting out oddball data formats and can improve their analyses on their own gridcells if they want. I’ll post a variety of read scripts up in a day or two. I’ve been working with the gridcell 57.5N and 77.5E primarily because it happened to be adjacent to the Tarko-Sale gridcell, which I had posted on previously. It also appears to include only one station so reconciliation is easier and to have several archives that can be cross-checked.
I report here on 4 archives with station data on Barabinsk (GHCN v2, GISS, GSN, meteo.ru), two of which have daily information (GSN, meteo.ru). There are some other archives – NDP040, NDP048, an early Jones version – which I’ll try to create tools for as well. I also report on 3 gridded versions – HadCRU3, HadCRU2, GISS. Other gridded versions include CRUTEM3 and CRUTEM2 and some early Jones versions.
There are puzzles surrounding not simply the gridded data, but even the GHCN v2 station data. The GHCNv2 archived version ends in 1989, but the GSN version – which reconciles to the GHCN version during their overlap, continues to the present, but is not incorporated in GHCN. The HadCRU3 gridded version ends in 1989 matching GHCN v2 except for 3 oddball monthly values in 2005; however HadCRU2 went to 2002 (and started 10 years earlier than HadCRU3.) It appears to use the updated Barabinsk information available at GSN, but ignored in HadCRU3. The GISS gridded version diverges from both.
At this point, I have no particular conclusions about this aspect of temperature history; I am merely trying to look at available information in an organized way. I will observe that these gridded temperature calculations are as much an accounting exercise as anything else. As an accounting exercise, audit trails “matters”. Thus regardless of whether or not any of this “matters” for policy, it is hard to get a favorable impression of how the audit trails are organized.
As a start, here is a spaghetti plot of annualized information from the 7 sources that I’ve collated so far:
3 gridded versions of 57N, 77E : HadCRU3, HadCRU2 and GISS;
4 station versions: GHCN v2 and GISS monthly; GSN and meteo.ru (calculated monthly from daily data.) “Adjusted GHCN not considered yet)
Some of the plots are not clearly represented in the spaghetti graph, because they are replicated by later colors – this HadCRU3 is scarcely visible on the graph below, not because it isn’t there, but because it’s overwritten. The graph below is all in 1961-1990 anomaly format (which I’ve calculated from monthly data.)
As a quick point, one observes that all 7 versions match quite closely in the reference period (1961-1990) and that there is some divergence away from the reference period. As an exercise to ensure that I was on the right track (and this was important for debugging my read routines), I did a close-up of the raw monthly data for all 4 station archives before anomaly calculation (using daily data in two cases averaging the max and min) for the period 1978-1980. This yielded virtually identical results for all 4 versions – so I feel confident that the GHCN v2, GISS, GSN and meteo.ru data for Barabinsk are all from the same original source and that later data from meteo.ru or GSN can be legitimately added to the GHCN v2 record (ending in 1989).
I calculated 1961-1990 normals for each of the 4 station versions separately. These are shown below and are consistent, adding further comfort to the identity of the underlying information in the 4 archives.
To isolate differences in station versions, I show two formats below: one in spaghetti graph format which is useful for identifying similarities or differences in scale, but has the disadvantage that later colors overwrite earlier colors when series are identical; and a panel (plot.ts) format, which focuses on differences in start and end dates. The spaghetti graph for the 4 stations replicates the first spaghetti plot, but with 3 fewer series to focus analysis a little. One sees again that the versions closely align in the reference period and have virtually identical scales. The panel version (see below) shows start and end better.
Here is the same information in panel (plot.ts) format. You see clearly that the start and end dates of the GISS and GHCN versions match, while the meteo.ru and especially GSN versions continue to the present. The GSN and meteo.ru versions commence in 1925, while the GISS and GHCN versions include some earlier data; I notice here that there is a hiatus of about 2 years in the two archives based on daily information around 1960 that is not present in GHCN. Re-examining GHCN, GHCN has three versions on file – 296120002 has the hiatus around 1960, while the 296120000 and 296120001 versions do not. I have not seen any information on the varying provenance of these three versions and how the GHCN 0 and 1 versions have information not present in the meteo,ru or GSN archives.
The excellent documentation for NDP040 records several station moves for Barabinsk as follows:
TYPE YEAR MO DAY DISTANCE DIRECTION
29612 MOVE 1939 5 21 2 N
29612 MOVE 1949 10 20 4 NW
29612 PRCP 1955 9 19
29612 MOVE 1958 12 4 0 NE
Did any of this relate to the 1959 hiatus or contribute to a possible discontinuity around 1959?
Next here is a comparison of HadCru2 and HadCRU3 versions. These are essentially identical over the period 1901-1989. However, HadCRU2 also has coverage from 1891-1901 and from 1990-2002, that is excluded from HadCRU3.
Figure 6. Comparison of HadCRU2 (red) and HadCRU3 (black)
This is particularly curious because the HadCRU2 extension appears to have been derived from the extended daily information from Barabinsk – why wouldn’t this be used in HadCRU3? The figure below shows the HadCRU versions against Barabinsk GSN – with the segments before and after the 1960 hiatus colored separately. The third tranche from 1901-1925 is from the early GHCN tranche not present in the oher archives, colored a different blue shade.
Figure 7. CRU versions overlaid against selected Barabinsk station data versions.
Finally, here is a comparison of the GISS gridded version for the 2×2 cell containing Barabinsk against the same station versions. There are puzzling differences to the corresponding CRU gridcell and with the underlying station data. Although the GISS gridded data matches Barabinsk closely through the 1961-1990 reference period, the two series have recently been “diverging” with the “divergence factor” averageing over 1 deg C in the 2000s – with the Hansen gridded version being the warmer. However the Hansen gridded version is also warmer than GSN version. I don’t know why at present – perhaps it is related to the GHCN adjustment, which I’ll look at on another occasion.