Detroit Lakes in BEST

In the 2007 analysis of the GISS dataset, Detroit Lakes was used as a test case. (See prior posts on this station here). I’ve revisited it in the BEST data set, comparing it to the older USHCN data that I have on hand from a few years ago.

First, here is a simple plot of USHCN raw and BEST versions. The BEST version is neither an anomaly series (like CRU) nor a temperature series (like USHCN). It is described as “seasonally adjusted”. The mechanism for seasonal adjustment is not described in the covering article. I presume that it’s somewhere in the archived code. The overall mean temperature for USHCN raw and Berkeley are very close. The data availability matches in this case – same starting point and same gaps (at a quick look). So no infilling thus far.

Figure 1. Simple plot of USHCN Raw and BEST versions of Detroit Lakes

The Berkeley series is not, however, the overall average plus an anomaly as one might have guessed. Here is a barplot comparing monthly means of the two versions. While the Berkeley version obviously has much less variation than the observations, it isn’t constant either (as it would be if it were overall average plus monthly anomaly). I can’t figure out so far where the Berkeley monthly normals come from.

Figure 2. Monthly Averages of two versions.

If one does a simple scatter plot of USHCN raw vs Berkeley, one gets a set of 12 straight lines with near identical slope, one line for each month:

Figure 3. Scatter plot of USHCN raw vs Berkeley

I then tried the following. I subtracted the Berkeley monthly average from each Berkeley data point and added back the USHCN monthly average. This yielded the following:

Figure 4. USHCN raw versus Berkeley (renormalized for each month)

The Berkeley data seems to be virtually identical to USHCN raw data less monthly normals that are different from normals of USCHN raw data plus annual average. The implied monthly averages in the BEST normalized data are shown below. The range of difference is from -2.27 to 1.41 deg C.

My original examination of Detroit Lakes and other stations was directed at whether NASA GISS had software to detect changes – a point that had been then been raised in internet debates by Josh Halpern as a rebuttal to the nascent surface stations project. I used Detroit Lakes as one of a number of type cases to examine this, accidentally observing the Y2K discontinuity. One corollary was that GISS software did not, after all, have the capability of detecting the injected Y2K discontinuity.

It would be interesting to test the BEST algorithm against the dataset with the Y2K discontinuity to see if they can pick it up with their present methodology. At first blush, it looks as though USHCN data is used pretty much as is, other than the curious monthly normals.

[Update: it looks like this data is prior to homogenization.]


  1. Steve McIntyre
    Posted Oct 29, 2011 at 3:10 PM | Permalink
    	detailsb[grep("DETROIT L",detailsb$name),1:5]
    # 144289 144289 DETROIT LAKES(AWOS) 46.8290 -95.8830 425.500
    #144298 144298 DETROIT LAKES 1 NNE 46.8335 -95.8535 417.315
    	detailsu[grep("DETROIT L",detailsu$name),1:5]
    #       id   ghcnid               name state division
    #444 212142 72753004 DETROIT LAKES 1NNE    MN        1
    	trim=function(x) window(x, start=min (time(x)[!]), end= max (time(x)[!]) )
    	tsp(X) # 1895.000 2009.917 
    	month= factor(rep(1:12,nrow(X)/12) )
    	Avg=cbind( us=unlist(tapply(X[,1],month,mean,na.rm=T)),
    	  best=unlist(tapply(X[,2],month,mean,na.rm=T)) )
     	#          us     best
    	#1  -14.888840 4.369898
    	#2  -12.400299 4.459945
    	#3   -4.509250 3.857273
    	plot.ts(X[,1:2],,main="Detroit Lakes")
    	barplot (t(Avg),beside=TRUE,col=c("mistyrose","violet") )
    	title("Monthly Avg: Detroit Lakes")
    	legend("topleft",fill= c("mistyrose","violet") ,legend=c("USHCN","BEST") )
    #Show comparison
    	levels(X$avgb)= Avg[,"best"]
    	for(i in 3:4) X[,i]=as.numeric(as.character(X[,i]))
    	plot(X$us,X$best,xlab="USHCN raw(deg C)", ylab="BEST (deg C)")
    	title("USHCN Raw v Best: Detroit Lakes")
    	plot(X$us,X$fix,xlab="USHCN raw (deg C)", ylab="BEST: Renormalized",col="grey80")
    	title("USHCN Raw v Best: Detroit Lakes")
    Avg=cbind(Avg, Avg[,1]- (Avg[,2]-4.13))
    	barplot (t(Avg[,c(1,3)]),beside=TRUE,col=c("mistyrose","violet") )
    	title("Monthly Avg: Detroit Lakes")
    	legend("topleft",fill= c("mistyrose","violet") ,legend=c("USHCN","BEST Implied") )
    • GaryW
      Posted Oct 29, 2011 at 3:32 PM | Permalink

      As near as I can tell, the BEST seasonal normalization contains a 4 month period oscillation. Third harmonic ringing in their low pass filter?

      Steve: no idea. I’m starting with their results and observing properties.

    • Posted Oct 29, 2011 at 4:59 PM | Permalink

      For anyone who has trouble on Windows…

      Lines that look like this — change the quotes…

      To this:
      Straight Upsy-downsy Quotes…

      There are quite a few to change — in the plot lines (which are thickening — sorry couldn’t resist…)

      I also moved the “details,tab” to the directory you referenced

      Presumably it was the same “” we downloaded previously…

      At least I duplicated your graphs… fwiw So I am guessing I was correct…


      • Posted Oct 29, 2011 at 5:00 PM | Permalink

        And now I see it is WordPress –FIXING the quotes… argghhh!

        Steve: There’s a command in wordpress to block off the text. Pete holzman knows the command. I’ll try to locate it.

        • Posted Oct 29, 2011 at 6:03 PM | Permalink

          You could try <pre> </pre> … not sure how that will come through … but here’s a test (works in a post, so, at least in theory, it should work in a comment:

          title("USHCN Raw v Best: Detroit Lakes")

          Steve: Thanks, Hilary. That’s what I wanted.

        • Posted Oct 30, 2011 at 12:37 AM | Permalink

          Steve: Thanks, Hilary. That’s what I wanted

          You’re most welcome, Steve. Your usage of this Helpful Hint from Hilary™ gives me confidence to grant myself a brownie point for my (very minor) contribution to the advancement of understanding BEST 😉

  2. BillC
    Posted Oct 29, 2011 at 3:38 PM | Permalink

    I don’t have the USHCN data (yet, here) as I just started with BEST. The BEST data contains a record for #samples in each monthly record at each station. Generally these seem to be daily – e.g. the one for the Detroit Lakes station above averages just over 29 samples per record when the field isn’t null (-99). Is this typical of the underlying dataset (daily records)?

    Steve – I have an old but relevant USCHN collation online at

    • BillC
      Posted Oct 29, 2011 at 5:16 PM | Permalink

      thx got it

  3. Willis Eschenbach
    Posted Oct 29, 2011 at 7:18 PM | Permalink

    Steve, have they released the raw data (as opposed to the “seasonally adjusted” data) yet? I find it frustrating to use data that’s been pre-munged …


    Steve: there ‘s a very large file of original data. It was too large for my computer to read. Much of it will probably be the same as GHCN.

  4. Bruce
    Posted Oct 30, 2011 at 11:57 AM | Permalink

    What is the data (Steve supplied) from BEST actually describing? I looked at my home town (the years were right).

    The first 4 months:

    BEST: 8.892 8.109 8.241 7.421

    Environment Canada Mean: 2.6 2.3 4.5 6.4

    Environment Canada Max: 6.5 6.8 10.1 11.6

  5. GaryW
    Posted Oct 30, 2011 at 10:10 PM | Permalink

    If you don’t mind a little amateurish work, I done a quick writeup of my examination of BEST data for my town. I was careful to use only the simplest manipulation of data so as to preserve as much of its original content as practical. I am not impressed with the quality of the work from BEST.

    Click to access BEST_data_for_Lebanon_Missouri.pdf

    Gary Wescom

  6. Posted Oct 31, 2011 at 10:31 AM | Permalink

    For those who want to follow the R Discussions but need a quick course or refresher… I am going to suggest the Andrew Robinson documents as they seem to bring you along fairly rapidly…

    Just click on this link:

    The go to the R-Users group directory to download his notes and data. There are a few quirks to entwrig the listings — but that’s all..

    Start a new edit window, then just cut and paste from the book and remove the chevrons “>” and the Pluses “+” from the beginning of each line when you paste — that should do it.. The “Select and Execute”…

    You can continue to add to a script and just execute the new part using the “Edit” menu — “Run Line or Selection”..

    Then save the script at appropriate points and carry on with the lessons.

2 Trackbacks

  1. […] release was not even the raw data. It was processed by removing the monthly averages … but we don’t know what those averages were, or how they were […]

  2. By Detroit Lakes in BEST « Bee Auditor on Nov 7, 2011 at 6:37 PM

    […] Source: […]

%d bloggers like this: