At Steve Mosher’s request, a thread for the revived effort to compile Gistemp.
GISTEMP on OS X Intel.
Instructions and code (http://dev.edgcm.columbia.edu/browser/StationData/) and website (http://dev.edgcm.columbia.edu/wiki/StationData)
Folks, let’s keep in mind that GISTEMP is doing something that is really very simple – taking station data and making area and zonal averages. It’s the sort of thing that’s very difficult to do “wrong” and the “Y2K” error was rather a surprise. There are some peculiar biases in the present program, but I don’t expect them to have too big an effect.
In my opinion, the main purpose of getting it to work is to see what it really does. Once that is done, it should be possible to make very simple emulations of GISTEMP in R or Matlab, as has proved the case with MBH once one waded through the nonsense. I’ve already worked through a couple of stages of GISTEMP in R and it would be nice to finish this.
After we’ve got complete control of what they actually do, then we can assess the impact of varying methodologies and versions.
My objective in this procedure is to have some benchmark data sets representing the interim results at each step so that emulations can be tested against this. Because GISS re-writes history, I think that we should establish a benchmark date and then freeze one version of the GISS data in a location under our control for benchmarking. I can re-do the scrapes as part of this process.
Another thing I would like to do with a working version of GISTEMP is to see what happens when stations that went “missing” after the change to MCDW are added back into the analysis. Take for example the following two images which were generated from the GISS website. Both represent the GHCN anomalies using a smoothing radius of 250 km. The first image was generated using data from April 2008, and the second using data from April 1978.
I would like to understand what really happened to April 2008 temperatures in the “holes” that did not exist in the April 1978 data. Specifically, Russia, China, Australia, and Canada. I would then like to compare that with the result GISS presents for April 2008 when smoothing over 1200 km:
In other words, with so much data missing in April 2008 relative to April 1978, is the 1200km smoothing a fair representation of global anomalies? I think this is a particularly important question given we calculate the anomaly by subtracting an average calculated using many annual data points (1951 – 1980) from an average calculated using significantly fewer annual data points (2008).
The 1200 km smoothing is not good. This is easily visible in the March 2008 data. March was slightly warmer than normal in southern Scandinavia and slightly colder than normal in the northern part. This is visible in the 250 km-smoothed version of GISS. Now for some extraordinary reason GISS does not have a single station in central Scanduinavia (there are plenty, both in Sweden and Norway), and so interpolation is used. The weird thing is that in the 1200-klm smoothed version all of Scandinavia is warmer than normal. Now in my opinion an interpolation routine should conform to observations where there are actual data, but apparently the GISS smoothig doesn’t.
Sounds like a plan. A high level pseudo code would also be helpful.
what i would like to see someday is a recreation of the basic analysis but with eliminating the type 4 and 5 stations whose bias makes them totally unusable at least in my mind. I would like to eliminate the 3’s too but then we have almost no temperature data. We been dealing with GIGO (garbage in, garbage out) for to long and it is time to start only working with the good data.
I think GISS should be forgotten, thrown away, whatever, just use satellite data and maybe the ocean bobs. Hadcrut land seesm to be more reliable as well
GISTEMP has value, it’s historical data collected before any satellite measurement were made. It’s also valuable ground truth data. The question is how to calibrate and weight it.
John @ #3
I can’t see your graphics.
#9: I guess the links on Picasa are not static. Here are my images once again:
April 1978 anomaly with 250km smoothing:
April 2008 anomaly with 250km smoothing:
April 2008 anomaly with 1200km smoothing:
Wow, look at that coverage in Antarctica.
Who’s volunteering for surfacestations.org down there?🙂
Can you show your Gisstemp results and how they square with the real gisstemp?
@11 I just spent 3 months there. I’d go back in a second given the chance.
@12 Working on it… When visualization of OS X port is ready it’ll be announced on my page (http://dev.edgcm.columbia.edu/wiki/StationData). I’m starting at STEP0 now and putting everything (EVERY single data file) into KML. Then comes maps. ETA for final product comparisons is on the order of months.
Asking for your forbearance since I am not a climate scientist and I am not familiar with the literature on this topic.
Is “global temperature” defined as something like: Ae(sum(i=1,n; Ti/(Ae/n)i = average T global? Where At is the estimated surface area of the earth, Ti is the temperature associated with an the sub-area Ae/n, n is the number of sub-areas, and Ae = sum(i=1,n);(Ae/n)i. I’ve left out the fact that T is sampled across time for simplicity. If this is not a close approximation of the methodology, I apologize, and don’t read anymore because it will waste your time. However, if it’s close, then read on for a few more brash statements.
If this sampling procedure were applied to a non-linear surface temperature, where the spatial correlations and anti-correlations in temperature at various spatial and temporal scales on the surface are not expected to be contiguous – it’s cold in the Pacific and warm a few thousand miles away, etc. the resulting estimate of T global would have an error greater than the sum of its parts if the variance in T is an inverse power function of the size of the measurement Area Ae/n.
With all due respect, I’d like to posit the idea that expending resources coming up with some estimate of T global using any variation of this method will not be useful for across time comparisons. I’m not even sure that even if more perfectly measured, ( Ae/n tending to zero) the results obtained at various time intervals could be directly compared if the power function in variance varies with time – ie if the data exhibits multifractal properties at both time and spatial scales. Question – do we know if variance increases with a decrease in Ae/n ?
On the other hand, looking at individual station temperature data and modeling “locally” might result in at least comparable data for that particular station over a certain time, so that something useful results – a possibly predictive model for a local area. Comparing and contrasting parameters from such local models across the globe might give a better estimate of the direction and magnitude of trends at particular time scales. It might also be useful in spotting areas where temperature is more or less correlated across time.
After running the gistemp package, I turned off the periurban adjustments and ran it a second time. It is interesting to see the effect on annual global averages. Here is the global mean temperature anomalies with and without adjustments, and the difference.
nice arther, now remove the class 5 sites
#15. In a post last year, I observed that the GISS “urban” adjustments were as likely to be negative as positive – they were essentially random. To that extent it’s hardly surprising that the GISS urban adjustment procedure has a negligible impact on the final total.
The performance of the adjustment needs to be disaggregated though.
In the US, the GISS adjustment actually has some impact – thus 1934 and 1998 come out about the same in the US. But it looks to me like the adjustment in the ROW – where UHI impacts may well be more serious – has no impact and looks like little more than random adjustments. The fact that the GISS adjustment does something in the US has been used as a bit of a bait-and-switch in debate last fall to imply that it does something in the ROW, when there’s little evidence that the GISS urban adjustment in the ROW does anything much at all other than blend stations together.
#14. Yes, there are local issues. However, I don’t see “fractality” as being a material obstacle to the attempt to estimate averages. But take this issue to the Bulletin Board or Unthreaded.
Are there other things that would be useful to look at in the gistemp code? Or has it already been sufficiently investigated?
Arthur tons of stuff.
The effect of using “nightlights” as opposed to population to determin rural.
The effect of removing urban, periurban, and class 5 sites.
more when I get a chance
Re Mosher (#19):
Is the GISS raw already subject to FILNET/SHAP corrections?
Steve: For the most part tho these are only for the US. ROW is a free for all. Also USHCN has rational infilling of missing days. GISS strips out the rational infilling and replaces it with its own less appropriate infilling.