Gavin Schmidt recently told Anthony Watts that worrying about station data quality was soooo last year. His position was a bit hard to follow but it seemed to be more or less as follows: that GISS didn’t use station data, but in the alternative, as defence lawyers like to say, if GISS did use station data (which they deny), de-contamination of station data would improve the fit of the GISS model. It reminds me of the textbook case where an alternative defence is not recommended: where the defendant argues that he did not kill the victim, but, if he did, it was self-defence. In such cases, picking one of the alternatives and sticking with it is considered the more prudent strategy.
In this particular case, I thought it would be interesting to plot up the relevant gridcell series from CRU and GISS and, needless to say, surprises were abundant.
First, here is a plot of gridded data – HadCRU3 (black), HadCRU2(grey and GISS (red) – for the 5×5 gridcell centered at 37.5N 122.5W, which is the CRU gridcell containing Marysville. GISS gridcells are 2×2 gridcells centered at odd longitudes and latitudes – so this will pick up GISS 37N, 123W (and similarly below). One of the most obvious puzzles in this plot is that the two gridded series are in fairly close approximation throughout the remote early parts of the 20th century and then diverge greatly in the last few years – with about a 0.5 deg C discrepancy between GISS and CRU developing after 2000 in what is presumably the best measured few years in world history. (I’ve adjusted GISS from a 1951-1980 base to a 1961-1990 base to match CRU.) Elsewhere, I’ve observed that GISS unaccountably (and without reporting) appears to have switched their input data from USHCN adjusted to USHCN raw in 2000 in a number of stations – their own Y2K problem, as it were. Perhaps this contributes to the discrepancy. However the discrepancy appears to be real.
In this gridcell, the CRU series shows no trend whatsoever – even with the contaminated Marysville data. So whatever is done between the station data and the gridded data has in the case not resulted in any trend (though not in other gridcells). However, Schmidt’s claim that any de-contamination would improve fit seems implausible. If contamination is removed from the Marysville station, this will lower the CRU gridded data in recent years – perhaps not a lot, but it won’t increase the values. I presume that any further lowering of recent values will increase residuals and worsen fit.
I also plotted the same figure for the adjacent gridcell to the east. Once again the GISS and CRU versions match fairly closely through remote regions of the early 20th century and then diverge in the 21st century – this time the CRU series jumps ahead of GISS, opposite to before. Once again, even though the two data sets match almost precisely in the 1950s, differences of over 0.5 deg C have developed since Y2K.
Third, here is a similar plot for a western U.S. gridcell with a very pronounced 20th century trend in CRU. Once again, we see the same discrepancy developing in Y2K, with the discrepancy reaching over 1.5 deg C in 2005. Additionally, the GISS gridcell is almost 1 deg C warmer around 1900 than the CRU version. Curiously the HadCRU3 gridded version does not include some 19th century data used in HadCRU2 (grey). I don’t recall any discussion of such deletions in Brohan et al.
The second problematic box, centered near southern California (32.5N, 117.5W), has a GHCN-CRU trend difference of 0.796C dec1. Although both analyses contain more than 20 stations in that box, the CRU network abruptly falls to 7 stations starting in 1997, a decline that corresponds to a sudden cooling (and negative CRU trend). In short, these grid-box examples indicate that many large discrepancies likely result from differences in the number of stations as well as data completeness. Consequently, it is recommended that caution be exercised when using only one analysis to assess trends at the gridbox.
Here’s a plot of CRU and GISS versions – in my opinion, the most curious feature is not the discrepancy of trends (although that is interesting) , but the large discrepancy in post-Y2K results, which wasn’t pointed out in Vose et al. Vose et al recommend caution in the use of gridcell data. However, if they don’t understand the discrepancies on a gridcell scale, then how can they estimate errors.
Something interesting from Vose et al – Phil Jones provided them with the information that he’s refused to provide to Warwick Hughes and Willis Eschenbach, as Vose commented on differences in station data availability in the gridcells examined. Vose works for NOAA. So there’s Jones data floating around NOAA somewhere.
At this point, I haven’t figured out the adjustments made to go from raw station data to adjusted station data, much less to go from adjusted station data to gridded data. I haven’t worked with these data sets at length and maybe I’m missing something – but the match of the versions over so much of their history suggests that I’ve collated everything correctly. The gridcell definitions are different but the results track closely up to recent years. It seems odd that they can claim to know global temperature in (say) 1040 to within a a couple of tenths of a deg C, when GISS and CRU gridded data in these gridcells disagree by over 0.5 deg in 2005.
I don’t recall Hansen reporting that the divergence between GISS and CRU gridded data in California is unprecedented in a hunnnnnn-dred years.