…-omatic Correlations

Update Mar 28: Here is Luboš version replacing my much less pretty monochrome version showing the spatial decorrelation of the “Comiso” version of the data recently archived a couple of days ago by Steig.

Figure 1. Spatial Correlation for Sample of “Comiso 2009” Antarctic Gridcells

Jeff Id has compared this to corresponding surface stations at his blog and is complimentary to the Comiso versions; I haven’t had an opportunity to check this.) Luboš also shows the following version of my monochromatic figure below showing the spatial decorrelation of the RegEM’ed version of Comiso data (as reduced to 3 degrees of freedom.)

Figure 2. Spatial Correlation for Sample of RegEM’ed Antarctic Gridcells

This shows the effect of the RegEM on spatial decorrelation rather nicely. See Luboš here.

Having said all that, the quantity of ultimate interest is an Antarctic average, which has only 1 degree of freedom.

Following is a post by Jeff C at Jeff Id’s blog that I’ve reproduced here. The first two graphics (as they observe) are from CA, scatter plots comparing station correlation to distance, showing the remarkable difference between the decay rate of correlation for actual surface stations and the Steig AVHRR recon. Jeff C has extended this to the UWisconsin AVHRR versions and to the just-released Steig “raw” data. Jeff C’s graphic shows negligible spatial decorrelation in the Steig “raw” data. Update (12.00 pm Eastern, 11 blog Mar 27: I’ve re-done this with my own script and wasn’t able to replicate Jeff’s results. Here is my version and script.
[monochrone version replaced by Luboš version shown above]
Update: 1.15 pm Eastern 12.15 blog Mar 27. The two Jeffs have withdrawn their calculations – Jeff C failed to calculate anomalies prior to correlations and Jeff Id made a variable error. Errors can be made; that’s the purpose of due diligence.

Here is the script:

#STEIG AVHRR DATA #300 rows monthly from Jan 1982 to end 2006; 5509 columns
download.file(“http://faculty.washington.edu/steig/nature09data/data/cloudmaskedAVHRR.txt”,”temp.dat”,mode=”wb”)
grid=scan(“temp.dat”,n= -1) # 37800
length(grid)/5509 #300
avhrr=ts(t(array(grid,dim=c(5509,300))),start=c(1982,1),freq=12)
tsp(avhrr) #[1] 1982.000 2006.917 12.000
#save(avhrr,file=”d:/climate/data/steig/avhrr.tab”)
#load(“d:/climate/data/steig/avhrr.tab”)

calc.anom =function(tsdat) {
anom = tsdat
for (i in 1:12) { sequ = seq(i,nrow(tsdat),12)
anom[sequ,] = scale(tsdat[sequ,],scale=F) }
anom}

stav.anom = ts(calc.anom(avhrr),start=1982,freq=12)
#AVHRR network
#download AVHRR data

### circledist
#calculates great circle distance with earth radius 6378.137 km
#from http://en.wikipedia.org/wiki/Great-circle_distance 2nd formula
#make data frame with from, to, correlation

circledist =function(x,R=6372.795) { #fromlat,fromlong, lat,long
pi180=pi/180;
y= abs(x[4] -x[2])
y[y>180]=360-y[y>180]
y[y< = -180]= 360+y[y180
grid.info$long[temp]= -( 360- grid.info$long[temp])

##TAKE SAMPLE
id=index=sort(sample(1:5509,300))
K=length(index);K #300
use0=”pairwise.complete.obs”
info=grid.info[,c(“id”,”lat”,”long”)]

station= data.frame( from=gl(K,K,labels=id),to= rep(id,K))#,cor= c(corry) )
dim(station) #90000 2
station$from=as.numeric(as.character(station$from))
station$fromlat=info$lat[station$from]
station$fromlong=info$long[station$from]
station$tolat=info$lat[station$to]
station$tolong=info$long[station$to]
station$dist=apply(station[,3:6],1,circledist)
dim(station) #90000 8

corry=cor(stav.anom[,index]);dim(corry)
station$cor=c(corry) ;range(station$cor)# -0.2191475 1.0000000

#2, ScATTER PLOT
par(mar=c(4,4,2,1))
plot(station$dist,station$cor,xlab=”Dist (km)”,ylab=”Correlation”,col=”grey70″,ylim=c(-.5,1))
abline(h=0,lty=2)
title(“Steig pre-ReGEM AVHRR Scatterplot”)

Jeff C writes at Jeff Id’s blog:

Previously we have looked at scatter plots to get an understanding of how well correlated the data appears over distance. Below is a scatter the raw Antarctic surface data. This contains no infilling, just actual measured data from occupied surface stations.

This plot above was originally calculated by Steve McIntyre. Note how the correlation is virtually 1 at 0 km, with a gradual decay as distance increases. This is what we would expect to see as stations closer together should have better correlated climate than stations far apart.

This plot, also provided by Steve, is a distance correlation for the satellite era (1982-2006) of the Steig 3 PC reconstruction used in the Nature paper. Note how correlation remains at 1 for some cell pairs at distances out to 3000 km. This seemed suspicious and led many of us to believe that the reduction to 3 PCs had caused a spatial smearing of the data.

This plot above is of the NSIDC AVHRR data from the University of Wisconsin website that Jeff and I have been processing for the past few weeks. Note that the “cone” is quite a bit wider than the surface data, but the distance correlation looks reasonable. Some of the cell pairs are still rather well-correlated at long distances, but we don’t see the values of 1 we saw on the Steig reconstruction.

This is the shocker. Here is the distance correlation plot for the Steig cloud-masked data released today. This data set has been presented as the satellite data used as the input to the reconstruction. If it were truly “raw” (or minimally processed) satellite data, we would expect to see a plot similar to the NSIDC plot immediately above. Instead, we see that every single data pair has a correlation of greater than 0.5!! Data from the peninsula is highly correlated with data from the East Antarctica coast and the interior despite the surface data showing nothing of the sort.

Why would this data set have such a high cell to cell correlation? I’m speculating here, but Steig talks about “enhanced cloud masking” where daily data points that exceed the climatological mean by +/- 10 deg C. are considered cloud contaminated and discarded. From my experience with the NSIDC AVHRR data, a huge number of data points would be affected by this threshold, perhaps as much as 50% of all points. If a simplified infilling algorithm was used to replace those points, high correlation might result. Regardless, this plot appears to show that the cloud-masked data set is highly-processed and suspect.

When I first ran this plot I thought it must be in error. I checked my code line by line and have repeated the results multiple times. I still find it hard to believe.

——–

Jeff Id

I’ve spent several hours verifying this post and have independently verified the results using my own code. Jeff C’s code used a subset of every 5th value in the grid (due to matrix size R can’t handle the full matrix). My independently written version used a random subset method which was derived from SteveM’s original sat correlation.

What it means:

The concept of this paper was to use spatial information to insure proper weighting and location of individual surface stations across the antarctic. The surface stations are the lowest noise measurement of atmospheric temperature and show a particular correlation pattern which we can consider “natural” (the first graph) . This is the pattern you would expect to see in any data representing antarctic temperature. The 3rd graph is the NSIDC dataset and represents spatial correlation of the publicly available cloud masked data from the same instruments as processed by the NSIDC. There is a wider spread of the cone angle as compared to surface station data which is expected due to the increased noise level in the dataset, but the key is that there still is spatial information available. The last graph however has correlations pegged at almost 1 for the full width of the dataset independant of the distance, mountain ranges, peninsula, sea contaminated pixels and the rest.

From my other post which derrived the 3 pc’s for the reconstruction dataset, this data doesn’t seem to be an exact copy of the original data but it is close. What’s more is we can now make sense of the second to last graph which is derived from the full reconstruction using 3 pc’s as presented by Steig. The data from graph 3 has almost a parallelogram shape because surface station data’s correlation vs distance is copied equally across the entire satellite dataset regardless of actual location.

If you take the surface station points (graph 1) and spread copies of the surface station data across the entire width of the Steig satellite data (graph 4), you get (graph 3).

I’m not in any way saying or in any way implying this was done intentionally but this is just about the perfect dataset to use if you want to weight every station equally and basically average the pre 1982 trends across the entire continent. I thought we were going to have to go through RegEM and do a lot of calculation to find if this was the case — not this time. This is the perfect scenario to blend the high concentration of known warming peninsula stations across an entire continent.

—–

A copy of Jeff C R code if you would like to verify the calculation:

#circledist function
#calculates great circle distance with earth radius 6378.137 km
circledist =function(x,R=6372.795) #fromlat,fromlong,lat,long
{
pi180=pi/180;
y=abs(x[4]-x[2])
y[y>180]=360-y[y>180]
y[y< =-180]= 360+y[y<=-180]
delta= y *pi180
fromlat=x[1]*pi180;tolat=x[3]*pi180;tolong=x[4]*pi180
theta=2*asin(sqrt(sin((tolat- fromlat)/2)^2+cos(tolat)*cos(fromlat)*(sin(delta/2))^2))
circledist=R*theta
circledist
}

#LOAD DATA
parse=5 #set every nth point to include, setting to 1 is very slow
#grid=scan(”anom_5509.csv”,n= -1,sep=”,”,skip=1) # use for UWisc AVHRR
grid=scan(”cloudmaskedAVHRR.txt”,n=-1) # Use for Steig recon or cloud masked
anom14_5509=t(array(grid,dim=c(5509,300)))
dimnames(anom14_5509)[[2]] <- 1:5509
anom14_5509=anom14_5509[,seq(1,5509,by=parse)] #parses to every nth column
rm(grid)
K=ncol(anom14_5509)

#Load Coordinates
#coord_5509=read.csv(file=”coord_5509.csv”,header=TRUE)
coord_5509=read.csv(file=”sat_coord.csv”,header=TRUE)
coord_5509=coord_5509[seq(1,5509,by=parse),] #parses to every nth row
long=coord_5509[,1]
lat=coord_5509[,2]

#make correlation matrix
use0=”pairwise.complete.obs”
corry=cor(anom14_5509,use=use0) #correlation coef calculation
dim(corry)<-c(K*K,1)
sum(!is.na(corry)) #1585
sum(!is.na(corry) &corry<0) # 658

#make lat-long matrices
tolat=array(lat,dim=c(K,K));fromlat=t(tolat)
tolong=array(long,dim=c(K,K));fromlong=t(tolong)
dim(fromlat)<-c(K*K,1);dim(tolat)<-c(K*K,1)
dim(fromlong)<-c(K*K,1);dim(tolong)<-c(K*K,1)

station_coord=cbind(fromlat,fromlong,tolat,tolong)

#calculate distances
stationdist=apply(station_coord,1,circledist)

#make ID matrices
id=1:K;toid=array(id,dim=c(K,K));fromid=t(toid)
dim(fromid)<-c(K*K,1);dim(toid)<-c(K*K,1)

station=cbind(fromid,toid,corry,stationdist)

#SCATTER PLOT
par(mar=c(4,4,2,1))
plot(station[,4],station[,3],xlab=”Dist (km)”,ylab=”Correlation”,col=”grey70″,xlim=c(0,6000),ylim=c(-.5,1))
# 0.4958,
a=seq(0,6000,100)
#lines(a,fm1$coef[1]*exp(-a/1000),col=2)
abline(h=0,lty=2)
title(”Steig AVHRR cell distance correlation”)
#text(4000,1,paste(”cor=”,round(fm1$coef[1],3),”*exp(-dist/1000)”),col=2,font=2)
abline(v=2500,col=2,lty=3)

This entry was written by Stephen McIntyre, posted on Mar 27, 2009 at 9:34 AM, filed under General, Steig at al 2009 and tagged lubos, motl, Steig. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

107 Comments

Dave Dardinger

Posted Mar 27, 2009 at 9:48 AM | Permalink

So does this basically mean the Steig paper is worthless? It makes little sense to say all of Antarctica is warming when what their process does is spread warming from the peninsula across the entire continent.
Genghis

Posted Mar 27, 2009 at 10:30 AM | Permalink

Yes Dave, that is what everyone is so politely not saying.
kim

Posted Mar 27, 2009 at 10:31 AM | Permalink

Loosed the hounds of stats,
Now the quarry’s gone to ground.
‘Hark For’ard’, and ‘Yoicks’!
===========================
Dave L

Posted Mar 27, 2009 at 10:32 AM | Permalink

If I remember correctly from “the Movie” that was previously posted, most of the coastal based stations demonstrated warming anomalies, which was assumed to represent ocean temperature contamination. So basically Steig’s paper represents warming of the Antarctic Ocean — No?
Shallow Climate

Posted Mar 27, 2009 at 10:34 AM | Permalink

Well, I liked the three dots in “…-omatic”. Man, that was a good one! Here we go again.
Eric J D

Posted Mar 27, 2009 at 10:35 AM | Permalink

So let me see if I can say this in layman terms that even I can understand.

They fashion a calculation which shows interior stations correlating well (extremely well) with peninsula stations then use that reasoning to attribute to the interior stations the same temperatures as the peninsula stations?
Edward

Posted Mar 27, 2009 at 10:39 AM | Permalink

How long before this Steig smearing technique gets used for smearing warming using other global temperature data sets?

Would it be possible to produce an alternative Antartic study to smear the “cooler” parts of Antartica across the penninsula?
Thanks
Ed
Patrick M.

Posted Mar 27, 2009 at 10:44 AM | Permalink

Jeffs:
A. Are you sure you have the numbers right??

B. How does this affect the conclusion of the paper? I’m sure we’ve all heard the, slight-miscalculation-but-it-doesn’t-affect-the-conclusion, response.

C. Nice work!
AnonyMoose

Posted Mar 27, 2009 at 10:45 AM | Permalink

It seems Steig’s data is lacking some explanation of what the data is. The data is making more sense as measurement of sizes of eggs in cartons labeled “Large” than of being some sort of raw data.
Jason

Posted Mar 27, 2009 at 10:49 AM | Permalink

Nature needs to publish a correction or otherwise withdraw this paper.

It is (for me) very unexpected that the masked AVHRR data would so quickly lead to such a dispositive result.

Steig’s failure to provide the masked AVHRR data suddenly morphs from being uncollegial behavior to being downright suspicious. If Steig was not aware of this problem (and I still think that is the most likely possibility), his decision to withhold this particular piece of data was truly bad luck on his part.

Steig is now the lead author of a Nature cover story that was based on faulty data that he initially attempted to hide from the people he knew were attempting to disprove his result.

Even if all of his actions were truly innocent, it looks REALLY bad.
Gary Hladik

Posted Mar 27, 2009 at 10:55 AM | Permalink

OK, I’ll bite. What does the “…” stand for? “ar”?
- Dave Dardinger
  
  Posted Mar 27, 2009 at 11:10 AM | Permalink
  
  Re: Gary Hladik (#11),
  
  OK, I’ll bite. What does the “…” stand for? “ar”?
  
  Steve called the faulty PC technique used in MBH 98 the “mannomatic”. Since Mann isn’t the lead author in this paper though he’s one of the co-authors, if I recall, the apparently faulty technique could be the Steigomatic or something else. The point is to indicate that the system used apparently will spread the heat regardless of the actual data, just as the mannomatic will produce hockeysticks even from random data.
  - Andrew
    
    Posted Mar 27, 2009 at 11:22 AM | Permalink
    
    Re: Dave Dardinger (#16), Maybe this should be “Steig-o-matic”?
  - Gary Hladik
    
    Posted Mar 27, 2009 at 4:57 PM | Permalink
    
    Re: Dave Dardinger (#16),
    Thanks, Dave. I now remember seeing the term before, but I needed the reminder.
Adam Gallon

Posted Mar 27, 2009 at 10:57 AM | Permalink

Am I right in thinking that the results of the paper are more a reflection on the maths used, than any actual warming of the interior of Antarctica?
anonymous

Posted Mar 27, 2009 at 11:03 AM | Permalink

No problem: prior to Steig the Antarctic cooling was consistent with the models, afterwards the warming was consistent with the global warming, now we’ll just be back to being consistent with the models again.
Mac

Posted Mar 27, 2009 at 11:06 AM | Permalink

So the stated warming in the Antarctic is just an artefact of the methodology employed. Shades of the “hockey stick” here.

Once is a mistake, twice is foolhardy, thrice is a conspiracy.
Steve McIntyre

Posted Mar 27, 2009 at 11:08 AM | Permalink

Everyone is piling on far too quickly and jumping to conclusions. I tried to re-do Jeff’s calculations and got a different result. So let’s first see what the difference is. I also wish that people would stop piling on with premature demands for Nature retraction; we’re just looking at things. There are some odd features to be sure, but we’re still trying to figure out what Steig did. Some self-discipline PLEASE.

Also please keep in mind that Steig’s paper could be goofy and Antarctica could still be warming. For example, let’s suppose that a simple average of the 15 or so stations that we have from the Antarctic show a warming. You don’t need 2 teraflops of operations to take an average and you wouldn’t get a Nature cover, but it wouldn’t mean that Antarctica was cooling. It seems reasonable to me that Antarctica is warming along with the rest of the world; that’s a different issue entirely from whether Steig’s RegEM plus preprocessing is a sensible way of handling data.
Jeff Id

Posted Mar 27, 2009 at 11:15 AM | Permalink

I looked back at my own code, which was written independently from Jeff and I typed the wrong variable in the anomaly section resulting in a correlation of temp rather than anomaly. You can actually see the problem in JeffC’s code above.

I’m sorry to everyone for my screw up. I explained above I didn’t see any intent either way. Blogs are real time science done in the open so these things will sometimes happen.
Jeff Id

Posted Mar 27, 2009 at 11:27 AM | Permalink

Sorry for the trouble Steve and others.

This is my code and the place where I made the error

#steveM download
#download.file(“http://faculty.washington.edu/steig/nature09data/data/cloudmaskedAVHRR.txt”,”temp.dat”,mode=”wb”)
grid=scan(“temp.dat”,n= -1) # 37800
length(grid)/5509 #300
avhrr=ts(t(array(grid,dim=c(5509,300))),start=c(1982,1),freq=12)
tsp(avhrr) #[1] 1982.000 2006.917 12.000]

t=rowMeans(avhrr)

#romanM anomaly

calc.anom =function(tsdat) {
anom = tsdat
for (i in 1:12)
{
sequ = seq(i,nrow(tsdat),12)
anom[sequ,] = scale(tsdat[sequ,],scale=F) }
anom
}

stav.anom = ts(calc.anom(avhrr),start=1982,freq=12)

source(“http://data.climateaudit.org/scripts/steig/collation.functions.txt”)
### circledist
#calculates great circle distance with earth radius 6378.137 km
#from http://en.wikipedia.org/wiki/Great-circle_distance 2nd formula
circledist =function(x,R=6372.795) { #fromlat,fromlong, lat,long
x = as.numeric(x)
pi180=pi/180;
y= abs(x[4] -x[2])
y[y>180]=360-y[y>180]
y[y<= -180]= 360+y[y<= -180]
delta= y *pi180
fromlat=x[1]*pi180; tolat=x[3]*pi180; tolong=x[4]*pi180
theta= 2* asin( sqrt( sin( (tolat- fromlat)/2 )^2 + cos(tolat)*cos(fromlat)* (sin(delta/2))^2 ))
circledist=R*theta
circledist
}

mask=(rnorm(5509)< -1.3)
ll=sum(mask)
sur=avhrr[,mask] ##### I put the wrong value here it should have been “stav.anom[,mask] so we independently made the same error
Mitchel44

Posted Mar 27, 2009 at 11:30 AM | Permalink

One of the reasons I keep coming back here is exactly what just happened.

A possible error in the data/method used in a peer reviewed document was found, it was checked and rechecked, initial look was pretty damming and people were beginning to draw conclusions. Then, after a call for cooler heads, a review of the checking method showed an error in the math, which was not only quickly recognized, fixed and posted, but even followed up with an apology for missing it in the first place.

You will not find that on most other blogs about this topic, in fact you would more likely be ridiculed or demeaned for questioning the paper in the first place, in at least one blog I could name.

Thanks for the hard work and keep it up.
Steve McIntyre

Posted Mar 27, 2009 at 11:40 AM | Permalink

The slight wrong turn here doesn’t mean that this calculation is out of the woods yet. IT sure looks like there’s been quite a bit of massaging on the “raw” AVHRR data and this is still a black box. The scatter plot of reconstruction correlations looks different from the AVHRR correlations – shy is that? what does it imply? We just got the AVHRR data a day or so ago. The possibility that the calculation is sensible should not be excluded out of hand either.
- Jeff Id
  
  Posted Mar 27, 2009 at 11:45 AM | Permalink
  
  Re: Steve McIntyre (#21),
  
  For sure. We know there’s some problem from the sat reconstruction correlation. (graph 2). Looking at Steve’s correlation plot the data shows what we would expect and require from the sat data to have any chance of achieving proper RegEM station weighting.
  
  In my mind it’s just a matter of which step did it occur at and how much effect it has on the result.
Jeff C.

Posted Mar 27, 2009 at 11:48 AM | Permalink

Folks, I owe a huge apology to you all, and particularly to Dr. Steig. I re-used code to process the scatter plot that I had previously used for the satellite reconstruction. I neglected to account for the fact that the recon were anomalies, the cloud-masked data set were temperatures. When I recalculated the scatter plot using anomalies, the familiar pattern re-emerged.

Thanks to Hu over in the previous thread for reviewing the code and pointing out this flaw. This mistake was entirely mine and I again apologize for jumping to conclusions.

Steve – thanks for keeping me honest and helping me become a bit more humble.
- Patrick M.
  
  Posted Mar 27, 2009 at 11:56 AM | Permalink
  
  Re: Jeff C. (#23),
  
  Sometimes you just have to eat the snake head:
  
  Circumstances arose one day which delayed preperation of the dinner of a Soto Zen master, Fukai, and his followers. In haste the cook went to the garden with his curved knife and cut off the tops of green vegetables, chopped them together and made soup, unaware that in his haste he had included a part of a snake in the vegetables.
  
  The followers of Fugai thought they never tasted such good soup. But when the master himself found the snake’s head in his bowl, he summoned the cook. “What is this?” he demanded, holding yo the head of the snake.
  
  “Oh, thank you, master,” replied the cook, taking the morsel and eating it quickly.
- MC
  
  Posted Mar 27, 2009 at 12:44 PM | Permalink
  
  Re: Jeff C. (#23), Jeff, does the NSIDC data come out in a similar way to the Steig AVHRR data using the code you have now updated?
Anthony Watts

Posted Mar 27, 2009 at 11:58 AM | Permalink

Well one thing is for sure, mistakes made are out in the open for all to see.

There’s no “CENSORED” folder here.
- MarkB
  
  Posted Mar 27, 2009 at 1:23 PM | Permalink
  
  Re: Anthony Watts (#25),
  
  Well one thing is for sure, mistakes made are out in the open for all to see.
  
  There’s no “CENSORED” folder here.
  
  That doesn’t excuse the demands for retractions and the rest of the huffing and puffing. And others need to look in the mirror before they claim that they can do a better job than The Team. People can be mistaken without being the enemy.
  - Jeff Alberts
    
    Posted Mar 27, 2009 at 9:25 PM | Permalink
    
    Re: MarkB (#31),
    
    You’re missing the point. Here they’re saying, this is the result, and this is how we got there. The Team says, this is the result, you figure out how we got there.
jorgekafkazar

Posted Mar 27, 2009 at 12:04 PM | Permalink

Well, transparency has its advantages and disadvantages. Peer review in public can make for red faces. Still, embarrassment is good for the complexion, and it’s far better than the other approach:

“Now, these are our secret files, where we keep all our secret stuff.”–Barney Fife
Lucy Skywalker

Posted Mar 27, 2009 at 12:25 PM | Permalink

And what I discovered because I happened to be searching for the article at the down time of your correcting it, was how fast you did it. Thanks.
Carrick

Posted Mar 27, 2009 at 12:30 PM | Permalink

This image is pretty interesting:

It is remarkably similar to correlational data in the boundary layer:

In both cases you end up with region of negative correlation, followed by a rebound.
Carrick

Posted Mar 27, 2009 at 12:33 PM | Permalink

Also should mention that if you were to split the data in to correlation “downwind” and “crosswind” you would likely end up with a similar pattern. What was found in the boundary layer studies was the scale-length for the correlation was different along the wind direction, compared to cross-wind direction.
Jeff Id

Posted Mar 27, 2009 at 1:43 PM | Permalink

I added this to my post

Thanks to Steve McIntyre at Climate Audit and comments from Hu McCulloch at Climate Audit for quickly spotting this error and bringing it to out attention. I think this points out pretty well that accusations of cherry picking or playing favorites on Climate Audit aren’t reasonable. Problems get chopped up and spit out regardless of the source or meaning.
Reid

Posted Mar 27, 2009 at 1:54 PM | Permalink

This is an example of real-time open peer review. Perhaps Climate Audits most lasting and memorable contribution to science will be the pioneering of open peer review blogging.

When will Nature and other prestigious journals adopt open peer review? Resistance is futile.
- Luis Dias
  
  Posted Mar 27, 2009 at 2:07 PM | Permalink
  
  Re: Reid (#32),
  
  When will Nature and other prestigious journals adopt open peer review? Resistance is futile.
  
  Sometimes, I’ve the nastiest of feelings that more and more we’re becoming like the Borg.
Allen63

Posted Mar 27, 2009 at 1:59 PM | Permalink

Mistakes are extremely common throughout science and engineering (some estimate 50 percent or more). The only thing to do is check one’s work, find them, admit them, fix them as soon as feasible. Even the fixes can be mistakes. Keep it up and eventually the final outcome will be a good one.

If one does that, then one loses no credibility in my eyes — rather respect.

I have a tough time respecting most politicians and some climate scientists.
Sinan Unur

Posted Mar 27, 2009 at 2:05 PM | Permalink

I hate the word anomaly.

Looking at the previous posts and Steig’s web page, I must be missing something because I cannot figure out how the anomalies are calculated.

And, I am not sure why calculating correlations between levels versus between anomalies versus between one series in levels one series anomalies would matter.

After all, the correlation coefficient between two series is not affected by the subtraction of a constant from one or both or multiplication of one or both by a constant — any constant.

What am I missing?

— Sinan
- Steve McIntyre
  
  Posted Mar 27, 2009 at 2:12 PM | Permalink
  
  Re: Sinan Unur (#35),
  
  Sinan, you forgot about the nose on one’s face. The “anomalies” subtract a different monthly constant from each month. Otherwise the annual cycle dominates the correlations and you obviously get huge correlations.
  - Sinan Unur
    
    Posted Mar 27, 2009 at 2:31 PM | Permalink
    
    Re: Steve McIntyre (#38), yes, indeed I did. Thank you. Now, I’ll take a five minute break to go bang my head on the wall. 🙂
    
    — Sinan
Tolz

Posted Mar 27, 2009 at 2:10 PM | Permalink

Mistakes are MUCH more likely to occur in favor of a result you “want” or believe to be the case in the first place. That’s human nature, and NOT an indication of improper motivation. The lesson of course is to be MORE critical of your own hypothesis. Kudos to the Jeffs, like it has always been with others the CA crowd, in being very upfront in acknowledging the error and seeking to correct it–get it right, first and foremost. We see that that behavior is certainly not universal.
- hswiseman
  
  Posted Mar 28, 2009 at 7:06 AM | Permalink
  
  Re: Tolz (#37),
  
  Exactly. While hiding mistakes may not be universal, confirmation bias probably is. It was certainly evident in the bloviation, chest pounding and piling on here after the initial publication of erroneous results. No need to beat Jeff and Jeff around the head and face (they are taking care of that job themselves). Still, I think we can lighten up on the self congratulations here. Science at the speed of blog is not an easy thing to do, and inevitably error will creep in. Keep in mind, however, that the world is watching what goes on here, and with that attention comes a responsibility to get it right. Steve, Ross, Spencer & Christy can attest that even a small error of a switched sign will become the bloody shirt of AGW agitprop. In the immortal words of the Sergeant on Hill Street Blues (god, I am old), “BE CAREFUL OUT THERE!”
vg

Posted Mar 27, 2009 at 2:23 PM | Permalink

So does analysis of this data then justify Steig’s conclusion? Is Antarctica warming?
- Jeff Id
  
  Posted Mar 27, 2009 at 3:07 PM | Permalink
  
  Re: vg (#39),
  
  It just looks like the spatial correlation of the processed data is reasonably normal for what would be expected nothing else.
Hu McCulloch

Posted Mar 27, 2009 at 2:26 PM | Permalink

Thanks, Jeffs — I had a hunch that you had not removed the seasonal means before running the correlations, so that you were mostly picking up the high correlations of the annual season cycle.

The corrected data shows the classic exponential decay of the correlation with distance assumed by Kriging. The high correlations at short distances mean that the values are a smooth function of location. So this raises the question again, of why Steig bothered to reduce the data to 3 PCs? Was it because the complete data set would choke up RegEM?

Perhaps we could have a new thread devoted to efforts to plug this file into RegEM together with the surface data, to see if anything like Steig’s recon comes out, and why it looks like it does. I’m not working on this but I assume Steve and others are.

This should be easy, in comparison to your efforts to reconstruct the new Steig file from the NSIDC raw data.
Lucy Skywalker

Posted Mar 27, 2009 at 5:32 PM | Permalink

Now I don’t know if cloud masking is literally what it suggests, but if it is [infills for non-data under cloudy conditions], it DOES raise a point in which could lie a big source of errors. This is the posited overall warming effect of cloud cover over permanent icefields. If Svensmark’s hypothesis is correct, from about 1957 to 1977 there would have been more cloud cover over the planet which would have a cooling effect on the rest of the planet but a warming effect over Antarctica. Now if the cloudy records are omitted, the earlier Antarctica records might register too cold: thus a false amount of warming could appear to have happened from that time.

Just my thoughts and I could be wrong. See my notes on Steig
Jeff Id

Posted Mar 27, 2009 at 7:18 PM | Permalink

Ok,here’s a surprise in the other direction. At least for me.

Comiso’s Data

Comiso, provided the data for this paper and whatever math he did the result looks really good to me! I would love to know how he did it.
David Smith

Posted Mar 27, 2009 at 10:01 PM | Permalink

The satellite data is a measure of land skin temperature. Skin temperature may be sensitive to slight changes in windspeed. The higher the windspeed, the greater the mixing of air, even in the lowest few cm. The greater the mixing, the warmer the skin temperature even though the total air-column temperature may be unchanged. Something to ponder a bit.
Luboš Motl

Posted Mar 28, 2009 at 3:39 AM | Permalink

I have qualitatively reproduced Steve’s chart, with additional colors and much more detailed information encoded in mine, and with a completely independent code in Mathematica. See

http://motls.blogspot.com/2009/03/spatial-correlations-in-antarctica.html
Nickardo

Posted Mar 28, 2009 at 3:49 AM | Permalink

Satellites work with radio signals David there is always some lagg in it
Hu McCulloch

Posted Mar 28, 2009 at 7:33 AM | Permalink

Re Luboš Motl #48:

Very informative (not to mention cool) use of color graphics, Luboš! Your graphs show that there is indeed a very disturbing difference between the correlation structure in the new cloudmaskedAVHRR.txt, versus the derived ant_recon.txt. This difference doesn’t show up well in monochrome plots like Steve’s #1 and 3 in the post above, except for a tendency of the ant_recon.txt correlations to stick to unity.

Your graphs show that a very useful way to summarize this mass of numbers would be with 3 lines, indicating the median and quartiles of the distribution of correlations at each distance. The mean is not as useful as the median here, since the distribution of correlations bounded by -1 and +1 can’t be symmetrical. The median is less sensitive to this asymmetry. The quartiles will then give a good sense of how well the median is doing as a summary of the relationship.

In order to compute the median and quartiles, you have to bin the distances somehow. For your purpose, 50km is fine, since that’s the resolution of the data, and you have a bazillion pairs in each bin. For comparison to the surface correlations as in Steve’s plot #2 above, however, bigger bins like 100 km might be necessary in order to have a good number (like the square root of the total) in each bin.

You have randomly sampled the pairs in order reduce the number of correlations from 5509X5509 ≈ 30M down to a more manageable 0.5M. It surely doesn’t matter how this was done, but just to make the calculation more easily replicable (aka auditable), it might be better just to use only every k-th column of the matrix, starting with say column 1. Then anyone can double check that the starting column doesn’t matter. k = 5 or 6 would reduce the number of correlations by a factor of 25 or 36. These easily constructed sub-samples would be very representative of the full set of correlations.
Hu McCulloch

Posted Mar 28, 2009 at 8:05 AM | Permalink

RE #51, While I was typing, Steve moved Luboš’s excellent graphs to the top of this thread.

It would be of interest also to construct Lubograms (and/or median plots) using only the first 1, 3, 5, 10, and 100 of the principal components of the anomalized cloudmaskedAVHRR.txt matrix. This would give some insight into whether the very different behavior of ant_recon.txt is due to its calibration to the surface data in RegEM, or if it’s just caused by its reduced matrix rank.
- Steve McIntyre
  
  Posted Mar 28, 2009 at 8:27 AM | Permalink
  
  Re: Hu McCulloch (#52), Steig et al stated:
  
  Although it has been suggested that such interpolation is unreliable owing to the distances involved [Turner et al 2005], large spatial scales are not inherently problematic if there is high spatial coherence, as is the case in continental Antarctica [Schneider et al 2004].
  
  I see negligible evidence from Antarctic spatial decorrelation for higher “spatial coherence” in Antarctica, as opposed to say Australia. The rank-3 version clearly builds in a lot of spurious correlation.
  
  At the end of the day, all that exists for the early portion is the 13-15 stations and it sure seems like they’re spending a lot of energy on weird multivariate methods without spending enough time on data QC.
Luboš Motl

Posted Mar 28, 2009 at 9:04 AM | Permalink

Dear Hu, that’s a good idea. I must do some shoppings now but when I return, it should be enough to add a few lines of code saying

“Truncate the 300 x 5509 numbers to the first N PCs”

and recalculate the graphs with them. Well, one may expect what will happen. They will start with an amplified level of Steig’s inaccuracy – 1 PC will have 100% correlation of everything 🙂 – and then they will slowly descend through Steig’s picture to the correct picture of the full reality. I shouldn’t have made this prediction because it has substantially reduced my eagerness to actually do it. 🙂
Steve McIntyre

Posted Mar 28, 2009 at 9:12 AM | Permalink

At the R graphic gallery, one of the 3 most popular graphic viewings was a 2-D histogram (hist2D) not as well as done as Luboš’
http://addictedtor.free.fr/graphiques/

It would be worth modifying hist2D to make it as pretty as Luboš.
Jeff Id

Posted Mar 28, 2009 at 9:41 AM | Permalink

seems like they’re spending a lot of energy on weird multivariate methods without spending enough time on data QC

I know it’s a common theme but in this case I think we’ll eventually have a different reconstruction (maybe tonight or tomorrow) with a different trend that has a high degree of high frequency correlation to local stations and an equally randomized low frequency trend.

I hope someone else has a good idea how to deal with that.
- Steve McIntyre
  
  Posted Mar 28, 2009 at 9:46 AM | Permalink
  
  Re: Jeff Id (#56), one of the aspects of Steig et al that’s not been covered so far are the changes to AVHRR trend in the new iteration of Comiso data. Perhaps the prior iterations match the surface stations as well as the new iteration – this needs to be examined. To do this, one will need all the AVHRR versions plotted in the Steig SI and we’ve only seen the most recent.
Craig Loehle

Posted Mar 28, 2009 at 10:23 AM | Permalink

Commercial, salesman with a blender: “Will it blend? Let’s try it!! Look look, it even blends Antarctica! Now only $54.99 plus S&H”
RomanM

Posted Mar 28, 2009 at 10:56 AM | Permalink

There has been a good deal of discussion on this thread regarding the correlation between temperatures at various locations throughout Antarctica. Several people have looked at the relationship between correlation and distance by creating graphs linking the two. IMO, one of the difficulties in interpreting these is that they are affected by a variety of factors, including the shape and topography of the continent and by the fact that the place is completely surrounded by a large pool of water.

I think that it is informative to pick several locations and to see how the AVHHR series at that location is related to all other locations. I selected two points: the tip of the peninsula (Steig series 1) and the obvious interior point: the South Pole (grid point 1970 is the closest).

For a selected site, after calculating the 5509 correlations, we graph them using a color scale to represent the correlation (as usual red is positive and blue is negative and white areas represent zero correlation). The location of the grid site is represented by a green +. Keep in mind that these are correlations measuring relationships between temperatures at the grid point, not positive or negative trends.

First, we take the latest revelation from Steig, the cloud masked AVHHR data. The grid point is on the tip of the peninsula.

Several things stand out in the graph. Obviously, the region immediately adjacent to the grid point is strongly correlated, but what is somewhat surprising is that the correlation drops off fairly becoming negative while still in the Western Antarctic area. The relatively low correlation continues to the rest of the continent.

Next, we take the original reconstruction: ant_recon.txt. This was supposedly reconstructed from the previous data using RegEM and the manned surface stations:

The correlation has strengthened dramatically in the Western Antarctic so that now the pattern exhibited by the reconstruction at the tip of the peninsula seems to be reflected by the entire west. As well, the Eastern portion has now become more strongly inversely correlated with the peninsula.

I have also looked at the two other reconstructions (detrended and PCA) created by Steig as well as the looking at the South Pole and how its temperatures correlate with the rest of the grid points. These can be found at my statpad site . The R script can be found in a Word document here.
- Jeff Id
  
  Posted Mar 28, 2009 at 11:14 AM | Permalink
  
  Re: RomanM (#59),
  
  That is a fantastic plot. I read your blog, this is a very clear way to show what’s happening. You can see the limitations of 3 pc’s- Really nice.
  
  I wonder how it correlates on a low freq trend basis?
  - Steve McIntyre
    
    Posted Mar 28, 2009 at 11:15 AM | Permalink
    
    Re: Jeff Id (#61), I agree. I’ve posted it as a separate thread to draw attention to it.
- Kenneth Fritsch
  
  Posted Mar 28, 2009 at 11:31 AM | Permalink
  
  Re: RomanM (#59),
  
  If I understand what Roman has done here (need to read it twice –my problem, not his), I think it really cuts to the matter of the TIR reconstruction producing a West Antarctica that takes “heat” from the Peninsula.
  
  I may have read too much into the authors of Steig et al. intentions, but a major premise seemed to me to be that the well-established Peninsula warming had spread to Western Antarctica (and by inference will continue to spread to mainland Antarctica with all the problematic issues that could raise for a stable Antarctica ice shelf). Continuing this analysis of how sensitive that premise is to the methodology used is certainly in order.
  
  The AWS reconstruction does not show the warming of the West Antarctica that the TIR reconstruction does.
MrPete

Posted Mar 28, 2009 at 11:01 AM | Permalink

Suggestion for use of color: People tend to consider Green=go, Yellow=caution, Red=stop. Might use Green=+1, Red=-1, Yellow=0.
Layman Lurker

Posted Mar 28, 2009 at 11:51 AM | Permalink

Roman, excellent plots. Would it be possible for you to try another example with a coastal station like Casey, which is opposite the peninsula?
Steve McIntyre

Posted Mar 28, 2009 at 12:07 PM | Permalink

I felt obliged to produce a better scatter plot after being out-competed by Luboš 🙂 Here’s a quick rendering of an improved scatter plot using my methods.

library(gplots)
h2d < – hist2d(x=station$dist,y=station$cor,show=FALSE, same.scale=FALSE, nbins=c(50,50)) col1=c(rep(1,3),"#0000BB","#0014FF","#006BFF","#00C2FF","lightblue2","lightblue1","grey85",
"lightgoldenrodyellow", "lightgoldenrod","#FF8700", "#FF5B00","#D70000",rep("#800000",4))
par(mar=c(4,4,2,1))
filled.contour( h2d$x, h2d$y, h2d$counts, levels=seq(0,380,20),
col=col1,xlab="Distance (km)",ylab="Correlation" )
mtext(side=3,"Comiso Distance-Correlation",cex=1,font=2,line=0.5)
Luboš Motl

Posted Mar 28, 2009 at 12:28 PM | Permalink

You won, Steve! It’s pretty, the code is short (the proper labels on axes would be difficult for me) – I just don’t understand why you chose the places with densities below 50 to be almost invisible (black). Your colors make it look like there are no distances above 4,000 km and no correlations below 0.1. 😉
- Steve McIntyre
  
  Posted Mar 28, 2009 at 12:41 PM | Permalink
  
  Re: Luboš Motl (#66),
  
  Choosing colors can be time consuming. I had a nice set of 14 colors on hand which was a little short and so I agree that I made too much black, but I wanted to get something out. I’ll tidy my list of colors some time.
Luboš Motl

Posted Mar 28, 2009 at 12:36 PM | Permalink

Dear MrPete, in the density plots, the colors (e.g. green, yellow) are densities. On the other hand, Roman’s plots use colors as correlation, and it is pretty natural to have the “TemperatureMap” (blue-red: blue is cool, red is warm) color function.

Your traffic light color function seems a bit hard to remember. You say that green is go, yellow is cautious, red is stop. Well, red is surely stop, but I would be cautious with the greens, too. 😉 Before you invent a new color scheme, check whether it has already been invented e.g. here:

http://reference.wolfram.com/mathematica/guide/ColorSchemes.html
- MrPete
  
  Posted Mar 28, 2009 at 12:52 PM | Permalink
  
  Re: Luboš Motl (#67),
  I apologize; I shouldn’t have suggested it quite so casually 🙂
  
  As an architect of the first PC GIS system, I have a professional background in creating color themes to communicate quantitative information. I’m not simply inventing colors from thin air.
  
  The Mathematica color maps reference is interesting to me, mostly because so many of them are designed to look pretty, rather than communicate effectively.
  
  Rainbow mapping, such as done by Hu (#68), is valuable for distinguishing quantile bands. But other than that, they’re meaningless.
  
  As RomanM had already noted, we need to be careful in the climate context to avoid insinuations of temperature when such is not implied. A red-blue gradient commonly implies heat.
  
  My mapmaking and user interaction experience tells me that most people view the green-yellow-red gradient as good-ok-bad. Although everything is subject to exceptions of course 🙂
  - Steve McIntyre
    
    Posted Mar 28, 2009 at 12:55 PM | Permalink
    
    Re: MrPete (#71), Pete. in mineral exploration, “anomalies” as used in exploration are marked in loud reds while subdued pastel greens and yellows tend to be used in background.
    - MrPete
      
      Posted Mar 28, 2009 at 12:57 PM | Permalink
      
      Re: Steve McIntyre (#74),
      Great example of how context “colors” the message 🙂
Hu McCulloch

Posted Mar 28, 2009 at 12:39 PM | Permalink

RE Mr. Pete #60:

Suggestion for use of color: People tend to consider Green=go, Yellow=caution, Red=stop. Might use Green=+1, Red=-1, Yellow=0.

I did something like what you suggest in the following graph of Steig ant_recon.txt means in a comment on the Steig Eigenvectors and Chladni Patterns #2 thread:

On a scale of -6 to +6, 0 is yellow, +4 is red, and -4 is blue. Green is at -2 and orange at +2. -6 is mock-ultraviolet, +6 is mock-infrared, and the other integers are fudged in non-linearly to keep yellow prominent. These 13 key colors are then interpolated to create a 121-color MATLAB colormap using colormap colors. For 13 bands, use colormap colors(1) instead.

function[colormap] = colors(discrete)
% colors(discrete)
% colors (with no argument) returns 121 interpolated colors
% colors(1) (or any other value) returns 13 discrete colors
% Based on 13 rainbow saturated crayons, with yellow at center.
% blue, yellow, red primaries
% green, orange secondaries
map = [.7 0 1; .5 .15 1 ; .3 .3 1; 0 .7 .6; 0 1 0 ; …
.67 1 0; 1 1 0 ; 1 .85 0; 1 .65 0;1 .35 0; 1 0 0 ; .85 0 .15; .56 0 .24];
if nargin > 0
colormap = map;
return
end
map2 = zeros(121,3);
map2(1, 🙂 = map(1, :);
th = (.9:-.1:0)’;
for i = 2:13
map2(((i-2)*10+2):((i-1)*10+1), 🙂 = th*map(i-1,:) + (1-th)*map(i,:);
end
colormap = map2;
- Steve McIntyre
  
  Posted Mar 28, 2009 at 12:52 PM | Permalink
  
  Re: Hu McCulloch (#68), we really need to figure out a way of placing things like color schemes in a retrievable location. They take a lot of effort to work out.
  - MrPete
    
    Posted Mar 28, 2009 at 12:56 PM | Permalink
    
    Re: Steve McIntyre (#72),
    Good idea. I’ll be happy to contribute to the archive when I get a chance.
    
    Right now, it’s time to get back to cleaning up my office mess (and taxes) 🙂
Hu McCulloch

Posted Mar 28, 2009 at 12:43 PM | Permalink

RE Matlabomatic-Emoticons in #68, read 🙂 as colon-right paren.
🙂
MrPete

Posted Mar 28, 2009 at 12:54 PM | Permalink

By the way, if you want to mathematically generate color ranges, convert to HSB or HSL color space before interpolating. You’ll get a much better result than staying in RGB space.
- Steve McIntyre
  
  Posted Mar 28, 2009 at 12:56 PM | Permalink
  
  Re: MrPete (#73), Pete, can you do a post on these color ranges. I’m quite interested in improving in this area.
  - MrPete
    
    Posted Mar 28, 2009 at 12:59 PM | Permalink
    
    Re: Steve McIntyre (#76),
    Yes, but probably not too soon 😦 … it will take some digging to pull up my info… and right now time is short.
    
    A fun little project though. I’d be happy to collect informative color themes and links from others, such as what Luboš supplied. Just email webbed dot pete at gmail…
Navy Bob

Posted Mar 28, 2009 at 2:53 PM | Permalink

“Color often generates graphical puzzles. Despite our experiences with the spectrum in science textbooks and rainbows, the mind’s eye does not readily give a visual ordering to colors….

“Because they do have a natural visual hierarchy, varying shades of gray show varying quantities better than color.”

From the master, Edward R. Tufte, “The Visual Display of Quantitative Information” p. 154.
- Dave Andrews
  
  Posted Mar 28, 2009 at 3:25 PM | Permalink
  
  Re: Navy Bob (#79),
  
  I agree. Remember how during the Cold War all NATO maps showed the Warsaw Pact countries as ‘menacing’ red whilst NATO countries were a nice ‘cool’ blue?
  
  Likewise GISS shows, for example, an eye catching red anomaly over Siberia, whilst in fact the temperatures are still way below zero.
- Sinan Unur
  
  Posted Mar 28, 2009 at 4:15 PM | Permalink
  
  Re: Navy Bob (#79), I did hate dot matrix printers but they were good for one thing: If a scatter-plot contained multiple observations at a given coordinate, the markers got darker and darker each time a point was printed whereas lone observations remained faint. The effect was especially supple when using worn out ribbons.
  
  I *hate* color in graphs unless the color is associated with a well defined physical meaning. On the other hand, when correlation coefficients are considered one simply has to use color. Just white + two cool colors would be my preference (I am speaking with no expertise in the topic at all other than thinking about what would help me actually understand the graphs posted here. They look very pretty but they are also confusing to me.)
  
  I think setting r = 0 to white is the right idea. Then setting r = -1 to dark blue (ocean) and r = 1 to dark green (land) or r = -1 to light green (land) and r = 1 to light blue (sky) makes sense to me.
  
  The moment there is a red in there, my brain screams TEMPERATURE!!!
  
  My 2 cents.
  
  — Sinan
  - MrPete
    
    Posted Mar 28, 2009 at 4:45 PM | Permalink
    
    Re: Sinan Unur (#82),
    This is a healthy conversation. What should +1 and -1 actually mean? I know it is pos/neg “correlation” but what does that represent in physical terms?
    - Sinan Unur
      
      Posted Mar 28, 2009 at 6:02 PM | Permalink
      
      Re: MrPete (#83), in this particular case, we are talking about the correlation in temperature anomalies between two points by distance to a reference station (as far as I can understand). So, if the association between anomalies at two locations is linear, -1 means the temperature anomaly at one location is smaller than average every time the temperature anomaly at the other location is greater than average. Vice versa for r = 1 between two locations.
      
      Could you please explain what you mean by your question?
      
      — Sinan
MrPete

Posted Mar 28, 2009 at 4:03 PM | Permalink

It all depends on what you are trying to communicate.

Minimizing the number of colors is often a very good thing.

First note: you don’t need to use gray — I like to vary the density/shade of a single color in some situations.

But in the case we’re dealing with here, if I understand correctly, positive, zero and negative correlation have different meanings. And thus, using two or three colors can make a lot of sense.
Hu McCulloch

Posted Mar 28, 2009 at 6:40 PM | Permalink

Re MrPete #83,

A perfect correlation of +1 is what you should get if you place two thermometers in the same well-stirred bucket of water and then read them both on various days. The readings don’t have to be identical — one can be in C and the other in F, but they must be a linear transform of one another.

A very high correlation is what you should get if you have two thermometers near one another that are measuring more or less the same thing, with minor differences due to siting or flaws in the thermometers.

A perfect negative correlation is hard to obtain except as a statistical artifact. If you have two thermometers whose readings are uncorrelated, but then measure them each day as differences from their average, the one will always be exactly as high above the average as the other is below the average, and the correlation will be -1, whether they are both measured in C or one in C and the other in F. This is why in the other thread I thought maybe Roman had subtracted out row averages as well as seasonal column averages before he computed his correlations. (He didn’t).

A less than perfect negative correlation would arise at a monthly frequency between two sites on opposite sides of the tropics, if they didn’t have their seasonal means subtracted. If they did, a weak negative correlation between site anomalies would still be possible, but it would take an unusual combination of ocean currents or some such.

In Luboš’s first graph above, the correlations are all positive at first, but then die out toward zero on average by 5000km. But then just by chance half are positive and half negative. These weak correlations are meaningless.

The cluster of negative correlations at around 5000km in Jeff Id’s graph of station correlations (the third plot in the post) may be meaningful, and probably has something to do with coastal currents affecting coastal stations on opposite sides of the continent differently. But generally you’d expect distant correlations between site anomalies to die toward zero.

I hope this helps.
- Jeff Id
  
  Posted Mar 28, 2009 at 7:48 PM | Permalink
  
  Re: Hu McCulloch (#85),
  
  If you don’t have thermometers you can use bristlecone pine trees as a substitute.
  
  Couldn’t resist.
- MrPete
  
  Posted Mar 28, 2009 at 9:45 PM | Permalink
  
  Re: Hu McCulloch (#85) and Jeff Id (#86) :-)…
  
  Thanks! OK, now I’m not flying blind. (Although I am flying seat of the pants. All of my reference tools and past work are packed away somewhere… but hey, this is live interaction with ability to be wrong at any point 🙂 )
  
  What I heard, in layman’s and then possibly visual terms:
  
  +1 == perfect teleconnection or perfect match. Both readings equally valid to represent measurement at the base site.
  0 == random (noise). No connection at all.
  -1 == perfect anti-match. For temp, this should be opposite seasonality or ???
  
  Or…
  
  +1 == Meaningful connection
  0 == Meaningless connection
  -1 == Meaningful (confusing in this case!) inverse connection
  And one more item that almost always needs to be represented: missing data.
  
  Is that about right?
  
  So, for this one, here’s what I would try, based on now being a bit better informed
  
  a) Set missing data to dark gray or black. Nice contrast and allows the real data to “pop”
  b) Set meaningless = white
  c) Set meaningful = something like #204896 , a deep color that blends well all the way to white, and that usually represents strength, etc. (This is a “steel blue”/grayed blue. It is not usually a “temperature”)
  d) Set anti-meaningful to a complementary color but easily distinguished, perhaps #482096
  
  You could pick any of a number of color pairs. It can be tricky to choose pairs that work well and that don’t immediately imply a different message.
  
  This is quite similar to what RomanM did, with specific modifications. (And this is completely untested; I’m not set up for graphing right now. Who knows, those colors might look Really Ugly when actually used! 🙂 )
  
  And unfortunately, schemes such as provided by Mathematica give very few compatible options. All of their two-color schemes with white in the middle are either explicitly temperature mappings, or go to black at the extremes. I don’t want +1 and -1 to look the same at all.
  
  As Steve M said, it really does take some consideration to develop a useful color scheme. And I’m probably not there yet on this one.
  - Greg F
    
    Posted Mar 28, 2009 at 9:59 PM | Permalink
    
    Re: MrPete (#87),
    
    -1 == perfect anti-match. For temp, this should be opposite seasonality or ???
    
    Naw … it’s an upside down thermometer.
  - Sinan Unur
    
    Posted Mar 29, 2009 at 7:38 AM | Permalink
    
    Re: MrPete (#87),
    
    Or…
    
    +1 == Meaningful connection
    0 == Meaningless connection
    -1 == Meaningful (confusing in this case!) inverse connection
    
    I beg of you and everyone else who might be lurking/watching not to ascribe meaning to a correlation coefficient based on its value any more than “two variables move in the same direction versus two variables move in opposite directions”. The closer |r| is to 1, the closer the points are to a line but that does not necessarily mean there is a linear association between the variables.
    
    The closer |r| is to 1, the stronger is the association between them, but the association need not be meaningful.
    
    The value of the correlation coefficient does not tell you if the relationship is linear or meaningful. Something other than the value of r tells you if an observed correlation is meaningful.
    
    — Sinan
    - Kenneth Fritsch
      
      Posted Mar 29, 2009 at 11:01 AM | Permalink
      
      Re: Sinan Unur (#93),
      
      Sinan, I believe in the Steig et al. case the spatial correlations are being used in the reconstruction process and the question that arises, in this layperson’s mind anyway, is are the effects of one region’s temperature anomalies affecting another region real or an artifact of the process. In this case, the correlations are not being so much addressed as an indication of any relationship, but as an indicator of how the Steig reconstruction process works.
      
      Lubos Motl above comments on the “natural” thermodynamic tendency of one regions temperature (changes) affecting a nearby region. I would guess that that argument would be in good agreement with what Steig et al. have perhaps anticipated and then concluded about the effect of a “hot” Antarctica peninsula on the adjacent West Antarctica region with their TIR reconstruction.
      
      Since, in my view, over recent decades I see that effect in the TIR reconstruction and not in the AWS reconstruction or with raw temperature data, I have to wonder how much of what we see in the TIR reconstruction is an artifact of the methods used in the reconstruction.
      
      The other question that arises out of this discussion is whether a warming in the Peninsula and cooling or no trend in East and West Antarctica in recent decades would be unique to that region of the globe – and without visualizing a major yet to be discovered geothermal event under the Peninsula. I see a cooling in the SE US with warming elsewhere in the US. On a localized level in Illinois in the US, I see long term cooling and warming with large differences in magnitude from various stations around the state. Perhaps the station data are not valid or the physics on a local level are different than those on a more regional level.
      
      Another interesting point is that the area of a hot Antarctica peninsula is only a few percent of the total area of Antarctica.
  - Stu Miller
    
    Posted Mar 29, 2009 at 9:20 AM | Permalink
    
    Re: MrPete (#87),
    
    Mr. Pete,
    
    Just to add another complication to your task,somewhere between 5 and 10 percent of us males are color blind in some degree. For those of us who are color vision challenged, your choice of complementary colors may not work. As far as I can tell, the two color blocks you show are the same. Likewise I am usually unable to interpret spaghetti graphs. In the days of hand plotted curves, the use of various geometric shapes to represent the data points allowed interpretation of multiple black curves.
Luboš Motl

Posted Mar 29, 2009 at 1:09 AM | Permalink

Dear MrPete, the correlation coefficient between +1 and -1 is computed from datasets {x_i, y_i} (for example, x_i are temperatures on one place at time i, while y_i on the other place), as follows:

Subtract the average of x from x_i, and the average of y from y_i. Now, the average (or sum) of x_i is zero, much like for y_i.

Now, calculate the sum of x_i.y_i. If x_i and y_i change independently, this should be close to the sum of x_i, times the sum of y_i, which would be zero. However, if they don’t change independently, the sum of x_i.y_i will be nonzero.

But x_i.y_i have “units” – characteristic units of “x” times units of “y”. You need a unitless result between -1 and +1. So you must divide it by something with units of “x” and something with units of “y”. These two factors are the Euclidean lengths of the vector “x” and the vector “y”, according to the Pythagorean theorem (the square root of the sum of squares of x_i, and similarly for y_i).

If you imagine x_i and y_i are coordinates in an N-dimensional Euclidean space, the correlation coefficient is the scalar/inner product of the vectors x,y, divided by the length of x and length of y. This ratio is nothing else than cos(angle) where the angle is one between the vectors x,y.

If they’re pointing in the same direction, the angle is zero, and the correlation coefficient is +1. For opposite directions, it is -1. If they’re orthogonal, it’s zero.

The discussion about color schemes is fun. I was just developing a color scheme that covers a lot of the RGB space (by “spirals”) but still allows you to distinguish everything. But for practical purposes, the discrete color schemes with jumps may be helpful, too.

When analyzing maps via Mathematica, I was irritated by a hidden detail about the ENSO maps:

The colors that appear on the map actually don’t precisely (RGB bytes) agree with any of the colors on the scale on the bottom (although they’re somewhat close to interpolations of the colors at the bottom). So irritating. So one can’t reverse-engineer the temperature anomalies of the ocean, not even the right intervals.
jeez

Posted Mar 29, 2009 at 3:42 AM | Permalink

Luboš, it is likely the mismatch is due to the fact that the original information should be in 24 bit color, and then was interpolated to 8 bit (palletized) when output to the GIF format which only supports 8 bit color.
vg

Posted Mar 29, 2009 at 5:27 AM | Permalink

So what does it all mean? Is Steig vindicated or is now definite that there is no significant warming using the recent verified Steig data
- Dave Dardinger
  
  Posted Mar 29, 2009 at 7:42 AM | Permalink
  
  Re: vg (#91),
  
  Ahhh! The +1 correlation question to my #1 -1 correlation question. I expect the answer is “white”! That is, proper answer is still hanging out there and will need more analysis to be determined. But I’m betting on the negative side since we know the stations themselves aren’t wildly warming. But it isn’t yet known exactly how Steig got his results.
Hu McCulloch

Posted Mar 29, 2009 at 7:36 AM | Permalink

RE #51,52, 65,
A very good use of shading and/or color here would to illustrate the relative frequency distribution of correlations at each distance rather than their absolute number.. That is, rather than using say a yellow to green scale to represent the absolute number of correlations as in Lubos’s original plots or (with a different scheme) in Steve’s version (#65), draw a median line, and then add several interquartile ranges (or even a continuum of interquantile ranges) using your color/shading scale. For example, the 50% range(ie interquartile range) could be in say yellow, the 90% range (5% – 95% interval) in yellow-green, the 95% range in green, and the 99% range in pale green. This would be a lot more informative than just 3 lines, and a lot less clutter than a separate line for each quantile.

Anyway, I’m looking forward to learning how adding PC’s affects the correlation-distance relationship for the newly released, anomalized, cloudmaskedAVHRR.txt data.

RE MrPete #87, what scale of color gives 204896 for steel blue? Is this a hexadecimal code that just happens to have no digits above 9? MATLAB uses 0-1 scales for each color rather than 0-255, with the result that even if you try to hit a color on the head, rounding error could easily put you off by 1 click. This could be the source of the problem Luboš encountered in #89 above and that jeez mentions in #90, since I suspect most of the official maps are done in MATLAB.

BTW, Pete, is it just my imagination, or have you been laying low this past year since the famous Starbucks expedition? If so, welcome back!
MrPete

Posted Mar 29, 2009 at 8:08 AM | Permalink

Hu,
I’ve been laying low both before, during and after. Real Life is pretty distracting these days 🙂
The #rrggbb color spec is how you do color in HTML. Each character pair is a hex code; they just happened to all be in the 0-9 range in this example.
Would it mean anything to combine RomanM’s correlation plots with Luboš’ frequency (density) data? Seems that frequency is simply the number of identical points across the space we’re examining.

Sinan, that’s a good reminder. Although… while we know correlation doesn’t imply causation, doesn’t correlation generally imply *some* kind of meaning or significance? Even if we don’t know what it is?

Time to run.
Stu Miller

Posted Mar 29, 2009 at 9:22 AM | Permalink

Re: #87 Mr. Pete,

Just to add another complication to your task,somewhere between 5 and 10 percent of us males are color blind in some degree. For those of us who are color vision challenged, your choice of complementary colors may not work. As far as I can tell, the two color blocks you show are the same. Likewise I am usually unable to interpret spaghetti graphs. In the days of hand plotted curves, the use of various geometric shapes to represent the data points allowed interpretation of multiple black curves.
- curious
  
  Posted Mar 29, 2009 at 9:44 AM | Permalink
  
  Re: Stu Miller (#96), Seconded!
- MrPete
  
  Posted Mar 29, 2009 at 5:17 PM | Permalink
  
  Re: #97, #98, #99,
  
  Wow. Since the colors are shades of blue and purple, either we’re just dealing with too-small color patches, or badly adjusted monitors, or readers who are among the 1% of men who cannot see red colors. Various forms of this colorblindness exist…
  
  Bigger patches (using the same colors as above):
  This is the bluish patch, and
  This is the purplish patch.
  
  It’s pretty tough to compensate for colorblindness when selecting colors. Apparently, for people with this colorblindness, the rainbow:
  
  looks something like this:
  
  As you (with full color vision) can see, there are really only two colors left, in different shades. Ouch!
  - curious
    
    Posted Mar 29, 2009 at 5:42 PM | Permalink
    
    Re: MrPete (#102), Thanks Pete – the rainbow looks ok to me so visit to the optician can wait 🙂
    
    With the choice of blueish and purplish though they still look very close to me, but maybe it would look different if they were opposite ends of a scale going through white in the middle (I think this is what you are suggesting?). Maybe my monitor is not great either because I second Stu’s comments re: spaghetti graphs were many lines are often represented with only minor colour graduations between them.
  - Stu Miller
    
    Posted Mar 29, 2009 at 6:40 PM | Permalink
    
    Re: MrPete (#102),
    MrPete,
    Like curious, I see little difference between the larger bluish and purple swatches. On the other hand, my rainbow looks more like your first illustration than your second one. I am not sure there is a solution that will work for all, but I keep looking for a method by which I can extract data from color data plots.
    - Geoff Sherrington
      
      Posted Mar 29, 2009 at 7:12 PM | Permalink
      
      Re: Stu Miller (#104),
      
      If you use a graphics progam like PhotoShop, Corel PhotoPaint, Photo Impact etc, you can select a colour properties window. By dragging the mouse over the map you can get numbers for RGB and CMYK schemes. You can also add transparency to colours with this type of package, for making maps. For colour blindness tests (more than 10% of men affected) try a Google on “Ishihara” such as
      http://www.toledo-bend.com/colorblind/Ishihara.asp
frost

Posted Mar 29, 2009 at 9:56 AM | Permalink

Re 96 & 97:

The color blocks in #87 look the same to me and I am not color blind.
Paul Penrose

Posted Mar 29, 2009 at 4:09 PM | Permalink

VG,
Please stop asking for conclusions when the analysis is not yet complete. Until the results can be replicated, however, I would be skeptical of any claims made.
frost

Posted Mar 29, 2009 at 8:11 PM | Permalink

A variable that is not being mentioned is the monitor that the color is being displayed on. I’m using an old, carbon belching CRT and I notice that many pictures I find on the internet seem dark. I also notice that such pictures do not look dark when displayed on an LCD monitor. Perhaps that is what is going on here.
- MrPete
  
  Posted Mar 30, 2009 at 12:37 PM | Permalink
  
  Re: frost (#106),
  That’s quite possibly part of the answer. Note too, the color I provided will be the dark extreme (+1, -1) and most data will be much lighter.
  
  I picked adjacent colors on purpose, because the +1/-1 meanings are more similar than opposite in this case (vs temperature, where cooling and warming are opposite). (BTW, this entire discussion probably seems like picking nits to a casual observer… to me it’s a valuable interaction on how we represent information visually…)