Steig’s Secret Data

I checked in at Steig’s webpage to see if the long-awaited AVHRR had finally materialized.

Update (Mar 26, 2009 aa am Eastern) This is now released.

It had the following new paragraph (without any change notice to show that this dataset had not been there from time immemorial):
cloudmaskedAVHRR.txt contains the monthly-averaged cloud-masked satellite data used in the reconstruction presented in the main text (and shown in Figures S1c and S1d). The 300 rows are months, starting with 1982 (January) at the top, and ending in 2006 (December) at the bottom.

I tried unsuccessfully to download thecloudmaskedAVHRR.txt”> data – access denied.

I guess you have to be the “right” sort of researcher to get access to the data. Maybe Steig makes “Special Decisions” for special researchers.

Or maybe Steig’s trying to make sure that he’s archiving the “right” data. You’d think that they wouldn’t need to be sorting this out for a Nature cover story – now over two months after the publication.


  1. bender
    Posted Mar 25, 2009 at 3:24 PM | Permalink | Reply

    Interesting. The other files in that directory are accessible to me.

  2. Raven
    Posted Mar 25, 2009 at 3:26 PM | Permalink | Reply

    The files in the folder:

    Are all accessible except for the cloudmaskedAVHRR.txt

    The readme was updated with the file and it suggests that the data should be available.

    It might be a simple goof up with file permissions.

    • Steve McIntyre
      Posted Mar 25, 2009 at 3:41 PM | Permalink | Reply

      Re: Raven (#2), the other files are the ones that we’ve been discussing e.g. the rank 3 data in the recon. The AVHRR file is the one that we’ve been asking for and it’s the only one that’s “protected”. As you say, it might be an accident. Perhaps that’s what they’ll say.

  3. Jason
    Posted Mar 25, 2009 at 3:27 PM | Permalink | Reply

    The directory says that it was put there at: 22-Mar-2009 16:39

    Even if this file was available, would it be sufficient?

  4. Posted Mar 25, 2009 at 3:41 PM | Permalink | Reply

    I can’t read it from work but I have access to many computers and, IP addresses ;) Still it might be a screw up. I’m glad you’re back Steve, I probably wouldn’t have checked again for a long time — no experience.

    • Steve McIntyre
      Posted Mar 25, 2009 at 3:43 PM | Permalink | Reply

      Re: Jeff Id (#5), have you already forgotten all the changes at Mann’s SI after the fact? After being criticized, they put in some change notices (but then deleted some of the change notices as time passed. :) )

      • Posted Mar 25, 2009 at 3:53 PM | Permalink | Reply

        Re: Steve McIntyre (#6),

        Slow learner.

        I left a question on the RC thread but maybe I should have asked it here, it would probably get a faster response. I kept it simple so Steig wouldn’t tell me to take his Matlab class or something.

        “Gavin, Is there an intentional block to the AVHRR data on Dr. Steig’s site?”

  5. Keith W.
    Posted Mar 25, 2009 at 3:51 PM | Permalink | Reply

    I’ve got a neutral, probably unknown IP, and it comes up permission denied. But it doesn’t ask for any pass coding. Maybe you need to have a Washington.Edu listing.

  6. Posted Mar 25, 2009 at 4:02 PM | Permalink | Reply

    My rude and unprofessional question was deleted.

  7. Layman Lurker
    Posted Mar 25, 2009 at 4:15 PM | Permalink | Reply

    So the data is posted but access is restricted. Is that consistent with Nature’s policy?

  8. Dave Andrews
    Posted Mar 25, 2009 at 4:22 PM | Permalink | Reply

    Steig et al was submitted to Nature in Jan 2008.

    Do scientists normally wait until after their papers are accepted to archive their data, or do they not try to anticipate publication and have things in place beforehand? As a non scientist I would have thought that one would accumulate all the evidence up front.

  9. Posted Mar 25, 2009 at 4:33 PM | Permalink | Reply

    Dave Andrews
    I think originally, Steig said he couldn’t make the data available before publication because Nature wouldn’t permit it. Of course, that wouldn’t have prevented him placing the data on his server, blocking access to the directory and unblocking the directory on the date of publication.

    I haven’t posted since the time RC instituted the captcha. Do they no longer pre-emptively moderate and only do so afterwards?

    • Posted Mar 25, 2009 at 4:56 PM | Permalink | Reply

      Re: lucia (#12),

      It says your comment is awaiting moderation until they answer or clip it. When clipped the awaiting moderation disappears. It’s absolutely annoying that I’m unqualified to ask a simple question. They don’t understand that it just makes me more determined and it makes them look terrible.

  10. old construction worker
    Posted Mar 25, 2009 at 4:40 PM | Permalink | Reply

    Maybe it’s due to the new transparency policy?

  11. Posted Mar 25, 2009 at 5:13 PM | Permalink | Reply

    This looks more like a case where where the file was copied to the directory, but the user id under which the web browser is running does not have read access to the file.

    It is entirely possible that the person who copied the file was supposed to do a chmod +r filename on it, and forgot.

    It looks like they use dav to publish files to the web server. I have occasionally seen permissions being wrong when files were copied using the shared folders feature of Windows to a *nix server.

    OTOH, this is a peculiar coincidence.

    – Sinan

  12. Paul Penrose
    Posted Mar 25, 2009 at 5:37 PM | Permalink | Reply

    So, it’s either incompetence or intentional. Either way it does not look good.

  13. Posted Mar 25, 2009 at 6:19 PM | Permalink | Reply

    I had to add my own post to this one.

    No choice.

  14. Jason
    Posted Mar 25, 2009 at 6:39 PM | Permalink | Reply

    It certainly looks like a trivial unintentional error.

    What was intentional was their decision not to publicize the release of the data. Had they done so, this problem would have been caught rapidly, and they wouldn’t look stupid.

    • Steve McIntyre
      Posted Mar 25, 2009 at 8:33 PM | Permalink | Reply

      Re: Jason (#18), it’s a bit early to say whether it was an accident or not and we’ll never know.

      Also Steig did not email the various people who had outstanding requests for the data, which is something that most non-Team people would have done.

      • Jason
        Posted Mar 25, 2009 at 9:19 PM | Permalink | Reply

        Re: Steve McIntyre (#22),

        That’s my point. The simple act of emailing you would have made it look like they were trying to do the right thing (for once).

        Now, once they fix the permissions, even this small release of data will appear to be the result of your tooth pulling expedition.

        The team feels insulted that their results are treated with uncollegial suspicion. But actions like this (not notifying you about the data) almost demand such suspicion.

  15. Billy Ruff'n
    Posted Mar 25, 2009 at 7:53 PM | Permalink | Reply

    Whew! Steve, it doesn’t take you long to get back in the saddle. No jet lag. I’m impressed.

  16. Ryan O
    Posted Mar 25, 2009 at 8:19 PM | Permalink | Reply

    *itching to get those stupid permissions fixed . . . *

  17. rephelan
    Posted Mar 25, 2009 at 8:33 PM | Permalink | Reply

    Permissions fixed indeed. I just left a voicemail at UW IT demanding access to Professor Steig’s data. This is the url to their IT site.

    it’s 7:30 P.M. now. keep calling until someone answers. If anyone can find the 24 hour help desk, please post it.

  18. Jason
    Posted Mar 25, 2009 at 9:23 PM | Permalink | Reply

    A small thought to partially rebut my presumption that this was an innocent mistake:

    When they put the file there they set the last modified time on the file, effectively creating a record of when they updated the site.

    When they change the owner or group of the file, the last modified time will _not_ be updated.

    Somebody wishing to establish an early release date without actually releasing the data could do this deliberately.

    (But I still think it is much more likely that this was a stupid mistake… one that I have made many times)

    • Steve McIntyre
      Posted Mar 25, 2009 at 9:39 PM | Permalink | Reply

      Re: Jason (#24),

      that happened in 2003 with the Mann data. I had looked through his data directories and was unable to locate any proxy data. I emailed Mann who said that he had “forgotten” where the data was but that Rutherford would help me. Rutherford eventually directed me to a URL on Mann’s FTP site. We noticed defects in the data and asked Mann to confirm that this was the data actually used in MBH98. Mann said that he was much too busy to answer this question and not to communicate with him any further. So we wrote MM2003. This provoked a reaction with Mann saying that we had foolishly used the “wrong” data – climate scientists giggled that the maestro had taught us a lesson – and suddenly new data directories materialized timestamped much earlier which Mann said had always been available but we were too stupid to locate them. He deleted the “wrong” data without any change notice. He said that the errors occurred because we had asked to have the data spoonfed to us in Excel and the errors occurred in the spoonfeeding only – needless to say, I had not asked for data in Excel – even then I wasn’t using Excel – and obviously I wanted the data as used not in an over the fact new version. The “community” was uninterested in the fact that we could document the emails showing that we had not asked for Excel. Also the idea blaming the recipient for being sent the “wrong” data was a surprise to me. In business, the sender would have fallen over themselves apologizing, but not the Team. It was a real eye-opener for me and subsequently I’ve assumed nothing.

      The new data set proved very interesting as Mann inadvertently left a few scraps of Fortran code including the Fortran code showing the unreported short segment centering of the PCs, which has attracted so much attention. This was identified only through parsing the Fortran code and was nothing that one would have expected.

    • schnoerkelman
      Posted Mar 26, 2009 at 12:11 AM | Permalink | Reply

      Re: Jason (#24), Actually, one can sometimes see the time that the ownership was changed. Use ls -lc when getting the directory listing to see the change time on the inode. My ftp server on Solaris allows this option, others don’t.
      As a long time systems guy I would really give these guys the benefit of the doubt: it’s a very simple thing to forget and this kind of thing happens frequently.

      • ianl
        Posted Mar 26, 2009 at 2:32 AM | Permalink | Reply

        Re: schnoerkelman (#36),

        Then they’re really, really slow learners and I definitely do not want them re-organizing the world, thank you.

  19. rephelan
    Posted Mar 25, 2009 at 9:41 PM | Permalink | Reply

    OK, UofW is in Spring Break. There is no central operator after business hours and the University PD is unaware of any connection to IT services (nice guys, by the way, and they are REALLY looking forward to a quiet week!) There does not seem to be a 24 hour computer services hot-line.

    • Jobius
      Posted Mar 25, 2009 at 10:08 PM | Permalink | Reply

      Re: rephelan (#26),
      So the missing data was posted on a Sunday afternoon at the start of spring break, with the permissions set so downloads aren’t allowed, and no one available to fix the permissions. Requests for assistance from RC are sent down the bitbucket. I’m racking my brain trying to think how they could have made this look more suspicious. I mean, we wouldn’t know if they were wearing ski masks while uploading the data.

  20. Kazinski
    Posted Mar 25, 2009 at 10:37 PM | Permalink | Reply

    My son is a current student at UW. I had him logon to his account and attempt to get access. No dice. There was no request for a logon id. I was hoping it would check for a cookie showing he was currently logged in. He says they have a system that checks for enrollment in a specific set of classes that controls access to class websites, and data.

  21. Kazinski
    Posted Mar 25, 2009 at 10:57 PM | Permalink | Reply

    I should also note that Washington has some of the most liberal laws in the nation when it comes to FOIA requests. It has very strict timelines, and potential fines for stalling, for requests for data. The FOIA requests do not cover computer code, or methods. But data itself has to be delivered, forthwith, on request. And since the data is already compiled, they can’t claim it isn’t available.

    Here is a recent news story detailing some of the hot water public servants can get into not complying with FOIA requests:

    OLYMPIA — The man President Barack Obama selected to be his top deputy at the Department of Housing and Urban Development is leaving his last public job with a huge legal bill for violating Washington’s open government laws, in a case tied to Seattle’s taxpayer-financed NFL stadium and its biggest booster, billionaire Paul Allen.

  22. steven mosher
    Posted Mar 25, 2009 at 11:02 PM | Permalink | Reply

    As jeez points out Steig just got back from Antartica on 3/19. So if he posted the data on Sunday within three days in the cold of south pole, perhaps he had brain freeze

  23. jeez
    Posted Mar 25, 2009 at 11:09 PM | Permalink | Reply

    I tried logging in as Legitimate/Researcher, but that didn’t work.

    • steven mosher
      Posted Mar 25, 2009 at 11:17 PM | Permalink | Reply

      Re: jeez (#33),

      perhaps if you spelled legitimate incorrectly it would work.

  24. rephelan
    Posted Mar 25, 2009 at 11:10 PM | Permalink | Reply

    Oh, he’s back? well, then… his home page, with phone numbers is

    give him a call. 206-685-3715 or the lab at 206-543-6327

    • RomanM
      Posted Mar 26, 2009 at 4:56 AM | Permalink | Reply

      Re: rephelan (#34),

      This pretty clearly appears to me to be a mistake in setting the permissions on the file. I don’t believe that it serves any purpose in encouraging harassment of anybody by posting phone numbers.

      • rephelan
        Posted Mar 26, 2009 at 8:01 PM | Permalink | Reply

        Re: RomanM (#40),

        You may be right and I may owe an apology… but the file was there for three days. Given a little interest, the permissions get adjusted. I’ve dealt with more than a few UNIX permission problems (and caused a number myself!)… but I quickly learned to be a lot more careful. I still make screw ups (as in perhaps even in this thread) and expect to get dinged. The phone numbers weren’t secret.

        By the way, I love your work. keep it up. You are teaching me something new with every post.

  25. Nemo
    Posted Mar 26, 2009 at 3:10 AM | Permalink | Reply

    I constantly must remind myself not to attribute to maliciousness that which can be explained by incompetence. But, as Paul (16) said, either way, it does not look good. Mr McIntyre, Thank you Sir, for all of your hard work. Nemo

  26. Posted Mar 26, 2009 at 4:42 AM | Permalink | Reply

    Sorry, all you need is an FTP client, which is what was presumably used to post the file, to change the permissions. All this talk of spring break and It support is rubbish.

    I’m also a student at UW, taking an online class. I’ve never needed to call anyone to post and change a file on my section of the servers. Steig, being faculty, would always have access, as i imagine any students assisting with the research. if this was an innocent mistake originally, it’s now become blatant incompetence or petty smugness in the ants all running around trying to access the data.

  27. curious
    Posted Mar 26, 2009 at 5:07 AM | Permalink | Reply

    FWIW – I’d suggest one of the steps in posting data should be to check it is accessible. And echo the comments above re: courtesy notifications to those who have requested it.

  28. Steve McIntyre
    Posted Mar 26, 2009 at 6:59 AM | Permalink | Reply

    Regardless of the underlying reason for the data being password protected, let’s compare Gavin’s response in this instance to the Harry problem. At dinnertime on Superbowl Sunday, errors in the Harry data set were reported here. By the next morning, Gavin had taken steps to expunge the faulty Harry data from the READER data set, blaming me for failing to notify British Antarctic Survey on Sunday evening (though I was then preoccupied with other matters,not just football).

    Or Mann’s alterations of the MAnn 2008 SI. Gavin coordinated matters with Mann with military precision. We raised problems with Mann’s SI here. I happened to check Mann’s SI at about 11.30 am one day. Mann changed the SI about 7-8 minutes later (by sheer coincidence) and Gavin appears to have been aware of the change by 12.15, less than 45 minutes later, though the change was not publicly announced and no change notice was issued.

    The lock on the Steig data has been more prominently reported here than the problems with Harry. Gavin is specifically aware of the lock as he’s deleted a Jeff inquiry on the matter from RC. For some reason, Gavin has not moved with the same speed to unlock the Steig data as he moved to expunge the “wrong” Harry data from the READER data set (an expungement, which, whatever the “good” reasons for doing so would have had the practical effect of making reconciliations much more difficult if, for some reason, I had failed to back up the “wrong” data.)

  29. Hu McCulloch
    Posted Mar 26, 2009 at 8:32 AM | Permalink | Reply

    The restricted access cloudmaskedAVHRR.txt file sounds like it would comply with item #5 of my 3/17 request to Steig and Comiso for their data and protocols, were it publicly available. I’m willing to assume that it was only accidentally blocked, and that Steig will correct this oversight in the very near future.

    However, in order for other researchers to be able to reconstruct the cloudmaskedAVHRR file itself, it is still necessary for Steig and Comiso to release their pre-cloud-masking AVHRR file. Jeff Id (I think) has pointed out that cloud masking has to be done on at least a daily basis, so this file would be considerably larger than I assumed in my e-mail, about 365×300 X 5509, or about 6 GB, even assuming it was done after spatial aggregation. I should therefore have asked for this larger file in my item #3.

    But is it possible that the cloud masking was done in even greater detail, on an even smaller grid than 50KM, perhaps using the raw data itself? And are there several observations per day that could be individually masked? Highly detailed cloudmasking could rescue days that are only partly cloudy, since then the daily average could be taken over the unmasked locations. We might be talking petabytes here!

  30. Raven
    Posted Mar 26, 2009 at 8:37 AM | Permalink | Reply

    I can get at the file now.

  31. Frenchie77
    Posted Mar 26, 2009 at 8:38 AM | Permalink | Reply

    Just checked site, data seems to be available to me (it’s a 25 meg file). Please leave another comment if still not accessible.

  32. Posted Mar 26, 2009 at 8:40 AM | Permalink | Reply

    Dr. Steig responded to my email and fixed the permissions.

  33. Luis Dias
    Posted Mar 26, 2009 at 8:49 AM | Permalink | Reply

    Don’t know who’s more conspirationist. If some commenters (and authors) in CA who apparently create suspense novels all of a suden, if some scientists that try desperately to cover the impression that all is perfect in scienceland, by making childish tactics to prevent inquirers finding out errors in their perfect thesis.

    I really don’t. Perhaps it’s a human thing, and I’m ironically also being paranoid as well.

  34. Posted Mar 26, 2009 at 9:01 AM | Permalink | Reply

    RE #45,
    Thanks, Frenchie! The link works for me now as well.

  35. Steve McIntyre
    Posted Mar 26, 2009 at 9:13 AM | Permalink | Reply

    I’ve noted up in the head post that the data is now online. Here is a download script. There is no missing data in the file. It is expressed in deg K. monthly from 1982 to end 2006.

    grid=scan(“temp.dat”,n= -1) # 37800
    length(grid)/5509 #300
    tsp(avhrr) #[1] 1982.000 2006.917 12.000

    • Kenneth Fritsch
      Posted Mar 26, 2009 at 10:23 AM | Permalink | Reply

      Re: Steve McIntyre (#49),

      A little OT, but Steve M’s use of scan to read the data from a large file is something I learned by attempting a brute force read.table command for reading the Steig et al. 50 Mega byte TIR file. Actually read.table works, but takes orders of magnitude time longer than doing a scan and then putting the scanned data into the correct form.

      Not sure that anyone other than Kenneth will benefit from this experience, but confession is always good for the soul.

  36. RomanM
    Posted Mar 26, 2009 at 9:15 AM | Permalink | Reply

    Perhaps a new thread on this data set may be appropriate?

    This data is not in anomaly format (easy to fix):

    calc.anom =function(tsdat) {
    anom = tsdat
    for (i in 1:12) { sequ = seq(i,nrow(tsdat),12)
    anom[sequ,] = scale(tsdat[sequ,],scale=F) }

    stav.anom = ts(calc.anom(avhrr),start=1982,freq=12)

    where avhrr would be the original 300 x 5509 masked data set from the site.

    Am looking at it now….

  37. Posted Mar 26, 2009 at 9:59 AM | Permalink | Reply

    RE RomanM, #50, this works in MATLAB:

    avhrr = dlmread(‘C:\\website\agw\steig09\cloudmaskedAVHRR.txt’);
    smean = zeros(12, 5509);
    for m=1:12
      smean(m,:) = mean(avhrr(m:12:300,:));

  38. Jeff C.
    Posted Mar 26, 2009 at 10:17 AM | Permalink | Reply

    Yes, it looks like the real thing. Here is the trend plot from 1982 to 2006 followed by the same from figure S1d) the SI. I get a linear trend of 0.2609 deg C/decade. Looks like productivity will be shot today at my real job.

  39. romanm
    Posted Mar 26, 2009 at 11:15 AM | Permalink | Reply

    There are quite a few questions still to be answered here.

    How does their data relate to the 0200 and 1400 data sets? Relating their data to that from the Jeffs and Ryan is complicated by the fact that the gridding pattern isn’t the same (although most of them can be linked to a unique nearest neighbor).

    How did they infill the months where ALL of the satellite was missing? One approach would have been to consider these months as part of the “reconstruction” process rather than infilling.

    How did they apply the process for removal of the +/- 10 extreme values? There are still about 106 values (87 + and 19 – ) exceeding that limit.

    My initial look at a three PC recon from the anomaly data did not duplicate their reconstruction post 1982. The three PCs account for 14% of the variability in the data with no obvious cutoff for why 3. There is small gap to the next eigenvalue but nothing to write home about.

    Lots of fun!

    • Posted Mar 26, 2009 at 11:33 AM | Permalink | Reply

      Re: romanm (#56),

      I’m not surprised, I’ve done it for the NSIDC version and the Eigenvalues are strong past even 10.

    • Ryan O
      Posted Mar 26, 2009 at 11:41 AM | Permalink | Reply

      Re: romanm (#56), Yep!

    • Jeff C.
      Posted Mar 26, 2009 at 11:42 AM | Permalink | Reply

      Re: romanm (#56),

      How does their data relate to the 0200 and 1400 data sets?

      Here is a comparison of the continental mean (all 5509 points) for the Steig AVHRR data vs. the 0200/1400 AVHRR data.

      They look similar, but the Steig version looks heavily filtered. We have been focusing on the smearing effect of using only 3 PCs. I think the +/- 10 deg C “enhanced cloud mask” is much more of a factor in this reconstruction than we had realized.

      • Posted Mar 26, 2009 at 12:02 PM | Permalink | Reply

        Re: Jeff C. (#60),

        The Steig version looks like data, our NSIDC data looks like a kid with a pen.

        I just did a post on a quick PCA analysis of the NSIDC data which shows strong contamination from ocean pixels. I had it mostly done this morning but didn’t get time to put it up. Here’s the link.

      • RomanM
        Posted Mar 26, 2009 at 12:19 PM | Permalink | Reply

        Re: Jeff C. (#60),

        I was able to do what you did, but that doesn’t give any information about the process by which they calculated the individual grid sequences. I intend to match some grid points and do a similar comparison at each point.

        I don’t know if you are aware of exactly how the two gridding patterns differ. I noticed it when I converted the coordinates to the “polar view” using a script I posted a while ago:

        trans.spole = function(lat,lon,R=1){
        crad = pi/180
        x = R*sin(crad*(90-lat))*sin(crad*lon)
        y = R*sin(crad*(90-lat))*cos(crad*lon)
        list(x = x, y = y)}

        The pattern used by Steig is in the order: start at the upper left corner and move (basically straight) down to the bottom, then step one step to the right and repeat, continuing to the last point in the lower right corner.

        Your pattern is to start at the top and go from left to right, then take one step down, go left to right continuing this wasy until the bottom right is reached as well.

        In the original coordinates, this isn’t obvious. You can see this by plotting a sequence of points with increasing size cex in both your and Steig’s orders. Because of minor differences the pairwise matchup isn’t quite exact.

        • Posted Mar 26, 2009 at 12:28 PM | Permalink

          Re: RomanM (#63),

          I did it wrong first, mixing the coords for NSIDC with the recon data. Not pretty, the trends get mashed all over the continent like a TV with a bad raster signal.

        • Jeff C.
          Posted Mar 26, 2009 at 1:37 PM | Permalink

          Re: RomanM (#63), I followed the same order as that in the AVHRR files, that is why across (x) is in the inner loop and down (y) is in the outer loop. I don’t know why Steig’s would be the different.

          As you pointed out, the points don’t lay right on top of each other and the cell shape doesn’t exactly match either. I had to average four 25 x 25 km cells to get one 50 x 50 km cell and recalculate the lat-long. Here is a xy polar plot of the center of the grid for my coordinates and Steigs.

          Here is the code I used, perhaps I made an error in the coordinate transforms. If you see a problem, let me know and I can easily re-run it.

          #### Calculate lat lon for 5509 cells ######

          #read in lat-lon and parse to continent/shelves only
          # cell size enlarged to 50 x 50 km


          #convert to polar xy coordinates for averaging

          #x average four points to one
          #doesn’t use last row or first column to match steig grid
          x_avg=(x1+x2+x3+x4)/4 #x in 160×160 grid
          rm(x,x1,x2,x3,x4) #remove variables from workspace

          #y average four points to one
          #doesn’t use last row or first column to match steig grid
          y_avg=(y1+y2+y3+y4)/4 #y in 160×160 grid
          rm(y,y1,y2,y3,y4) #remove variables from workspace

          #convert back to lat-lon, combine to one variable

        • RomanM
          Posted Mar 26, 2009 at 2:00 PM | Permalink

          Re: Jeff C. (#65),

          I don’t think that it is as much a problem that the cells don’t overlap exactly as it is that the order in which the grid values appear in the final product is different making it harder to identify which Steig grid record is related to which of your grid records.

          My point was that your order (looking at the picture in your comment) for the results of the 64 squares would be 1 – 8 is first row, 9 – 16 is second row, etc. Steig’s 1-8 is the first column, 9-16 is the second column, etc. If the entire array was rectangular, then it would be trivial to rearrange one or the other of the listings so that the same grid point result occupied the same position in both making comparisons easier.

          As it is, the array is ragged and matching is more difficult. Even finding the closest Steig neighbor for each of your gridpoints does not provide a one-to-one matching. I can still look at smaller groups to see what was used and how it compares. No need to spend more time on it than it is worth.

        • Jeff C.
          Posted Mar 26, 2009 at 2:29 PM | Permalink

          Re: RomanM (#67), Off the top of my head I don’t think I can easily re-arrange them to follow Steig’s pattern. Even if I could, since the grids don’t exactly match, the number of cells in a given row or column might be different from Steig’s and throw the whole sequence off. It might be easier to write some code to find the closest match, there will be some error, but the worst-case would by off by a maximum of only 35 km.

          It does bother me that my grid looks different from Steigs since they came from the same original source. I could understand a fixed offset in x or y, but I don’t understand why the cell shape would be distorted as seen in the plot in #65.

        • RomanM
          Posted Mar 26, 2009 at 3:06 PM | Permalink

          Re: Jeff C. (#69),

          Not a problem. I already did the nearest match program.

          Who knows how Steig did it? I don’t think that the slight difference between the two is a problem.

  40. Patrick M.
    Posted Mar 26, 2009 at 11:33 AM | Permalink | Reply

    Thank you, Dr. Steig, for allowing the data to be accessed.

    (a little positive reinforcement never hurts)

  41. Posted Mar 26, 2009 at 12:02 PM | Permalink | Reply

    RE Jeff C, #52, note that the trends are naturally a smooth function of location. The same is true when you plot the Kelvin means and the standard deviations. It is therefore not clear what is gained by rank reduction, unless RegEM bogs down if you give it too big a matrix.

  42. Jeff C.
    Posted Mar 26, 2009 at 1:41 PM | Permalink | Reply

    Ooops, I got may labeling wrong in the last plot. My coordinates are in blue, Steig’s are in red.

  43. Ryan O
    Posted Mar 26, 2009 at 2:02 PM | Permalink | Reply

    Here are the average trends for the cloudmasked data, with the satellites used overlaid.

    There are some oddities:
    1. The AVHRR data starts in July, 1981. Steig’s data doesn’t start until 1982.
    2. The AVHRR data from NSIDC ends on June 30th, 2005 – not Dec 2006: . The 25-km data, which is simply reprocessed 5-km data, ends in 2000. My guess is the U of Wisc data set started as 5-km data, was processed through CASPR, and saved as 25-km data to minimize the size of the set. So this leaves the question: Where did the data past the purple line in the graph above come from?
    3. The transition between NOAA-11 and NOAA-14 just looks weird. In fact, all of NOAA-11 looks weird.

    • Posted Mar 26, 2009 at 3:12 PM | Permalink | Reply

      Re: Ryan O (#68),

      Thanks, I wondered where the satellites overlapped. There were also several months missing in the NSIDC data that look to be infilled.

      • Ryan O
        Posted Mar 26, 2009 at 3:59 PM | Permalink | Reply

        Re: Jeff Id (#73), Not only that, as Roman commented, there’s a heck of a lot of infilling. This data is post-mask. Lots of data was thrown away by masking – yet each series is complete.
        And I’ll be damned if I can figure out how they can cloud mask 300 rows of data when the raw AVHRR data starts with only 282 (using the 1982 start date).

        • Jeff C.
          Posted Mar 26, 2009 at 4:40 PM | Permalink

          Re: Ryan O (#76),

          Lots of data was thrown away by masking – yet each series is complete.

          Keep in mind that the masking was daily, not monthly. They could have thrown away huge amounts of daily data and still calculated a monthly value. My guess is that this is exactly what they did and the infilling was limited to where no data exists (e.g. late 1994).

          I have been playing around with masking the NSIDC (UWisc) data. I only have the monthly data so I can’t apply a daily mask as Steig did. We know that the daily mask is +/- 10 deg C. I think this means that the monthly values shouldn’t exceed +/- 10 deg C. If every day was right at 10, the average would be 10. If a few exceeded 10, they would be thrown out and the average would still be 10. In reality, some are less than 10, some are more than 10. Those above 10 are thrown out, those below 10 are kept and the average is less than 10. Using that convoluted thinking, I tried various threshold levels to see what I got. Using +/- 6 Deg C, gets something that looks relatively close to Steigs’s AVHRR data.

          I think Ryan is right, huge amounts of daily data was thrown out (>25% of the data set) due to the +/- 10 deg C daily threshold. However, they could still calculate monthly means using the days remaining.

          Throwing out daily data that exceeds +/- 10 deg C seems problematic to me. I live in a mild climate in coastal California and it is not that uncommon to have daily temps exceed the average by +/-10 deg C. Doesn’t this happen in Antarctica also?

        • Ryan O
          Posted Mar 26, 2009 at 5:04 PM | Permalink

          Re: Jeff C. (#77), Cloud cover regularly exceeds 75% of the time over parts of West Antarctica and the coasts – so I guess I was expecting to see gaps.

          I wouldn’t think someone would throw away that much of the data and still calculate a monthly mean. To be honest, I was expecting the “raw” data to be in a daily format because of that.

        • Ryan O
          Posted Mar 26, 2009 at 7:08 PM | Permalink

          Re: Jeff C. (#77), Hey Jeff . . . does UWisc have Arctic data? We could use that to see if there are instrumentation issues. If the Arctic data shows the same irregularities, that would help confirm instrumental drift/calibration errors.

        • Jeff C.
          Posted Mar 26, 2009 at 11:49 PM | Permalink

          Re: Ryan O (#81),

          does UWisc have Arctic data?

          Yes, they do have the Arctic data in addition to the Antarctic data. We can use many of the same scripts we used for the Antarctic set to process it. I’ll look into pulling it together.

    • Geoff Sherrington
      Posted Mar 26, 2009 at 6:19 PM | Permalink | Reply

      Re: Ryan O (#68),

      More potential problems:

      The radiometers are designed to operate within specification for a period of three years in orbit.

      NOAA 11 looks consistent with a decline in performance, though we don’t know what adjustment was done. If it did decline, then splicing to NOAA 14 in 1996 would explain the sudden jump.

      The three channels operating entirely within the infrared band are used to detect the heat radiation from and hence, the temperature of land, water, sea surfaces, and the clouds above them.

      What treatment is given to areas below cloud? For that matter, is it known in the Antarctic if the surface temperature is higher or lower under cloud? That is, if one ignores cloud covered areas in calculating an average temperature, is that average sufficiently accurate or is it biased?

      AVHRR3 used channel 1 for daytime cloud and ice mapping. It had a signal:noise ratio of 9:1. This is not good and it places a separate limit on interpretation and final error calculation. There is an additional subjective error correction when a pixel is for example, part cloud and part ice.

      The errors of discrimination and temperature estimation are different for flat as opposed to mountainous terrain, including effects such as albedo in shadows.

      A cloud-free mosaic map was made. The persistence of cloud required many scenes to make a composite of the Antarctic. It has to be assumed that the temperature did not stay constant from one pass to another.

      “The satellite images used in the mosaic were acquired by the Advanced Very High Resolution Radiometer (AVHRR) sensors on the National Oceanic and Atmospheric Administration (NOAA) satellites; the images used in the mosaic were collected during the period 1980 to 1994. Although the AVHRR scans a 2400 km-wide swath and can image nearly half of the continent of Antarctica on a single orbit, 63 sections of 38 scenes were needed to compile the nearly cloud-free digital mosaic.”

      These URLs give information on map projections as well. It is possible that Dr Steig used one the same as NOAA that is already described.

      Even to get a composite look at the Continent, post-signal processing was done

      When the entire mosaic was completed band 1 (visible, 0.58-0.68 micrometers) and band 2 (near-infrared, 0.725-1.100 micrometers) were averaged, and the entire digital mosaic was enhanced using a 91 by 91 pixel spatial filter; these results were then contrast stretched

      While this might be OK to make a pretty picture, it urges caution in catching “adjustments” (how I hate that word) to the temperature proxy data.

      One could nit-pick all day.

    • Steve McIntyre
      Posted Mar 27, 2009 at 11:50 AM | Permalink | Reply

      Re: Ryan O (#68),

      Ryan O, there are some interesting discussions of NOAA-11 in connection with MSU data. I wonder how and whether the AVHRR dealt with that sort of issue.

      • Ryan O
        Posted Mar 27, 2009 at 1:40 PM | Permalink | Reply

        Re: Steve McIntyre (#98), and Re: curious (#97),
        Steve: I don’t know if the root cause of the MSU issue with NOAA-11 could cause problems with the AVHRR instrument – but I also don’t know that it doesn’t. As you know, NOAA-11 had several issues (so did NOAA-9). I’m trying to slog through the documentation on the processing/calibration to see if it could have caused something to go awry with the AVHRR. So far, nothing conclusive.
        Curious: Ch 1 and 2 are not used at all for the reconstruction. Steig used Ch 3, 4, and 5 which have an entirely different calibration method. So while Ch 1 and 2 are interesting – I had no idea how rapidly the instrumentation could degrade – they don’t really have anything to do with Steig.
        On another note, I checked the satellite temps (raw, not anomalies) vs. the AWS and manned station temps. Here’s what that looks like for all 100 stations with data post-1982:

        I’d bet dollars to donuts that the ~ 4-5 different slopes contained in the overall picture have geographical significance. In fact, I know at least one of them does. The group of points extending down to 200K with the steeper slope correspond to the Antarctic interior (Admunsen-Scott, et al.).
        Now I need to 1) separate the geographical areas and look for systematic differences in the temperature relationship; 2) do the same thing by time/satellite; 3) do the same thing by season. All of this will help point to the reason why the satellite data and ground data differ so much.

  44. OldUnixHead
    Posted Mar 26, 2009 at 2:34 PM | Permalink | Reply

    Did anyone happen to capture a web-page view of the ‘data’ directory listing before the ‘cloudmaskedAVHRR.txt’ file was unlocked? I may be just getting foggy in my dotage, but I thought that the file size was 50MB when I was looking ~0850 EDT. I note that it is currently 25MB. Just wanted to confirm one way or the other. Thx

    Steve: Jeff had a screenshot here and it was 25 MB. So you can relax on this.

  45. cba
    Posted Mar 26, 2009 at 3:13 PM | Permalink | Reply

    Steve, et al, I was able to download the cloudmaskedavhrr file this afternoon. It was 25.8 MB in size – about 300 rows by over the excel column limit. This was on the work computer here which has an edu address. I didn’t see if the comments here indicated the file is now accessible but my avail. time here is often quite limited.

    If the file is still not accessible, I can probably zip it and email it to someone from my house but I’ve got limited ability for email size and can’t guarantee that this will work or if I can do it more than once or twice.

    • Steve McIntyre
      Posted Mar 26, 2009 at 3:55 PM | Permalink | Reply

      Re: cba (#74), if you read the above comments, we’ve all got it now.

  46. Posted Mar 26, 2009 at 6:33 PM | Permalink | Reply

    RE 65, 66, 72,
    Steig’s grid in fact has a spacing of about 50.5km. I don’t know where he got this, or if it derives from the AVHRR grid, but it’s definitely not 50km. It’s not exactly 50.5, but using this value gives nice integers after rounding, while 50 doesn’t.

  47. Posted Mar 26, 2009 at 8:07 PM | Permalink | Reply

    I’ve just done a pca analysis, eigen vectors and comparison to the reconstructed sat data.

    I don’t think this is the actual data used for the paper. It’s close but it doesn’t seem to be the original.

  48. Hu McCulloch
    Posted Mar 26, 2009 at 9:52 PM | Permalink | Reply

    RE Jeff ID #83 –

    See my comment on your site.

  49. Posted Mar 26, 2009 at 10:24 PM | Permalink | Reply

    Thanks much Hu, I made the changes. It was very confusing to me to see all the last 12 weights at zero. I’ve been thinking alot about the third pc in RegEM where it drops to near zero at the reconstruction point. It seems like 1 deg of freedom is lost somewhere as well.

    What bugs me is that many of the PCA run’s I have done seem to have 0 for the last value by itself and it seems like they might be related problems. The reconstruction claims 3 pc’s but really only has two.

  50. VG
    Posted Mar 26, 2009 at 11:44 PM | Permalink | Reply

    Yes thank you Dr Steig for this, although rapidly becoming a complete denier of AGW (the writer) this will we hope may re-establish the credibility of your paper, credibility and standing whatever the outcome after throughout analysis of the missing data.

  51. VG
    Posted Mar 26, 2009 at 11:54 PM | Permalink | Reply

    I give up, after eyeballing the above charts, it seems there is no meaningful data after 2000-2005? What’s the point if Antarctica has been cooling since then?

  52. Hu McCulloch
    Posted Mar 27, 2009 at 7:59 AM | Permalink | Reply

    RE RyanO #68, Geoff S #79,

    Didn’t they maintain a period of overlap every time they changed satellites, in order to recalibrate the deteriorating old satellite? This sounds as bad as the NWS replacing Stephen screens with MMTS back in the 80s without establishing a good set of overlapping observations.

    • Ryan O
      Posted Mar 27, 2009 at 9:45 AM | Permalink | Reply

      Re: Hu McCulloch (#89), No, that’s not how the AVHRR instruments are calibrated. The only overlap calibration appears to be done on Ch 1 and 2, which are not useful for temperature analysis (but are used for albedo measurements). The description is below:

      The post-launch degradation of the visible (channel 1 : 0.58-0.68 μm) and near-infrared (channel 2 : 0.72-1.1 μm) channels of the Advanced Very High Resolution Radiometer (AVHRR) on the NOAA-7, -9, and -11 Polar-orbiting Operational Environmental Satellites (POES) was estimated using the south-eastern part of the Libyan desert as a radiometrically stable calibration target. The relative annual degradation rates, in per cent, for the two channels are, respectively : 3.6 and 4.3 (NOAA-7) ; 5.9 and 3.5 (NOAA-9) ; and 1.2 and 2.0 (NOAA-] I). Using the relative degradation rates thus determined, in conjunction with absolute calibrations based on congruent path aircraft/satellite radiance measurements over White Sands, New Mexico (U.S.A.), the variation in time of the absolute gain or ‘slope’ of the AVHRR on NOAA-9 was evaluated. Inter-satellite calibration linkages were established, using the AVHRR on NOAA-9 as a normalization standard. Formulae for the calculation of calibrated radiances and albedos (AVHRR usage), based on these interlinkages, are given for the three AVHRRs.

      For Ch 3, 4, and 5 – which are the channels of interest – the calibration appears to be solely pre-launch:

      The pre-launch calibration relates the AVHRR*s output, in digital counts, to the radiance of the scene. (In pre-launch tests, the scene is represented by the laboratory blackbody.) The calibration relationship is a function of channel and baseplate temperature. For channel 3, which uses an InSb detector, the calibration is highly linear. However, a*s channels 4 and 5 use HgCdTe detectors, their calibrations are slightly nonlinear.
      To characterize the calibration when the AVHRR is in orbit, the only data available are those acquired when the AVHRR views space and the internal blackbody. This gives two points on the calibration curve, sufficient to determine only a straight-line approximation to the calibration. The linear approximation is what is applied to determine scene radiances. Scene brightness temperatures are then derived via the temperature-to-nonlinearity look-up table described in Appendix A. The methods for handling the nonlinearity will be discussed later in this section.

      A description of the non-linearity correction for Ch 4 & 5 is here:

      With the launch of NOAA-13, NESDIS changed its derivation of the non-linearity correction in the calibration of AVHRR Channels 4 and 5. The linear calibration now uses a negative, non-zero value for the radiance of space, instead of the former value of zero. This method makes the dependence of the correction terms on the internal calibration target negligible.
      NESDIS continues to supply tables of brightness temperature correction terms for the non-linearity. These correction terms are valid only when applied to “linear” brightness temperatures based on the negative radiance of space. Since the correction terms no longer vary with the internal calibration target temperature, the user does not need to interpolate on the internal calibration target temperature. Otherwise, the user applies the non-linearity corrections as before.
      NESDIS also supplies an alternate method of handling the-non-linearity which can be applied to radiances instead of brightness temperatures. For each instrument and for each channel, three coefficients (A, B, and D) of a quadratic equation are supplied in Section 1.4 all spacecraft from NOAA-13 on. The following quadratic equation can be used to compute the corrected radiance, RAD from the “linear” radiance, Rlin:
      RAD=A x R sub {lin} + B x {R sub {lin}}^2 + D
      This new treatment of the non-linearity plot corrections should be an improvement over the previous method because 1) it is less sensitive to noise in the thermal/vacuum test data, 2) it gives the user a choice of correcting either the radiance or the brightness temperatures, and 3) it is being applied retrospectively in the NOAA/NASA Pathfinder program (see URL: for more information) to generate a consistent time series of AVHRR radiances from 1981 to the present for use in studies of climate change. Making the same method operational at NESDIS will eliminate a source of inconsistency between the Pathfinder dataset and future observations.

      This had been applied retroactively to AVHRR data prior to NOAA-13 on about the 2003 timeframe.
      The basic point, though, is that the microwave channels have only a single post-launch calibration method: taking 2 points on the calibration curve by viewing an internal blackbody and then viewing space. The rest of the calibration curve is inferred from these 2 points. Because the satellites cannot look at each other’s internal blackbodies, there is no overlap.

      • curious
        Posted Mar 27, 2009 at 11:02 AM | Permalink | Reply

        Re: Ryan O (#93), Hi Ryan – this might be a basic question or have been covered elsewhere but is there any systematic checking of the satellite temp. reading against a surface measure at some known location on earth? The info. in your post mentions the Libyan desert and White Sands for channel one and two – presumably it would not be too demanding to do something similar relative to a suitably located surface station? I’m struck by the relatively high % degradation rates quoted for channel 1 and 2 and think the impact of these on temp. readings would be significant in terms of the scale of temp. trends being discussed. Thanks for any info and sorry if this has been covered elsewhere and I’ve missed it.

  53. Posted Mar 27, 2009 at 8:49 AM | Permalink | Reply

    Jeff C ran a correlation vs distance plot on our new data. It’s basically over 0.7 for the entire continent.

    It guarantees that surface station information will be blended across the entire continent.

    • Ryan O
      Posted Mar 27, 2009 at 9:54 AM | Permalink | Reply

      Re: Jeff Id (#90), I saw that and choked. It was truly a WTF moment. If I may be permitted a snark, I shall say auto-mannic-correlation.

  54. Hu McCulloch
    Posted Mar 27, 2009 at 9:21 AM | Permalink | Reply

    RE #90–
    A true “shocker,” as you put it, Jeffs! It does look like this data has been heavily smoothed somewhere along the line, even if it is of full rank.
    I couldn’t follow all the R calculations. Are you sure you removed the seasonal means first?

    • Posted Mar 27, 2009 at 9:43 AM | Permalink | Reply

      Re: Hu McCulloch (#91),

      I independently used Roman’s anomaly code, SteveM’s down loader, my own correlation work and SteveM’s circdist algorithm using random samples of the values and got the same result.

      So many people involved now it’s hard to figure out who did what. I didn’t run JeffC’s algorithm but the one I ran had seasonal removed. It was the same data you noticed was missing the last 12 singular values from removal of the monthly data.

  55. bugs
    Posted Mar 27, 2009 at 9:57 AM | Permalink | Reply

    “Whew! Steve, it doesn’t take you long to get back in the saddle. No jet lag. I’m impressed.”

    Yes, more snide insinuations, accusations, smears, attacks. Anything at all, just as long as it’s mud and it sticks. Didn’t take long at all.

  56. Layman Lurker
    Posted Mar 27, 2009 at 2:09 PM | Permalink | Reply

    #97 & 99

    This is a paper from JOC 1996: Cloud radiative properties over the South Pole from AVHRR Infrared Data.

    Ryan, very similar graphs to yours on page 3407. Interesting and extensive discussion on discrepencies between surface station and raw AVHRR temp data. Not sure how much these errors are currently corrected or adjusted.

    some of the highlights I noted:
    1. Seasonal differences between winter and summer where winter months have greater proportion of temperature inversion (neg lapse rate) and summer is more isothermal. The implication is that winter AVHRR temps are warmer relative to surface station while summer temps the differences are smaller.
    2. Seasonal and geographical differences in cloud microphysics (like water vapor vs. ice), which in turn affect cloud emisivity.
    3. Volcanic aerosols
    4. The effect of polar stratospheric clouds (psc’s) in winter where it is suggested that psc’s may be picked up as clouds when there is only starlight to work with.

  57. curious
    Posted Mar 27, 2009 at 3:45 PM | Permalink | Reply

    #99 and 101 Ryan and Layman:

    Thanks – also found this ref which looks promising:

    “Trends and uncertainties in thermal calibration of AVHRR radiometers onboard NOAA-9 to NOAA-16″
    Alexander P. Trishchenko et al.

    Abstract suggests it is concentrating on ch3b,4 and 5

    and states

    “Systematic degradation of the radiometric sensitivity of the IR detectors was observed during the lifetime of a radiometer, although the annual rate of degradation is rather small (typically below 1% per year)”.

    Worth reading the abstract for the comments on range of errors associated with each satellite – they have reduced over time and suggest 0.5K order of magnitude for the most recent.

  58. Hu McCulloch
    Posted Mar 28, 2009 at 6:43 AM | Permalink | Reply

    RE Ryan O, #68, 93,
    Thanks, Ryan, for the explanation. While I appreciate that they calibrate the sensors in the lab before launch, and then don’t go up and recalibrate them in orbit, wouldn’t it make sense as a double check to leave the old satellite in operation for a while — perhaps even a year — while the new one is operating to see if they give comparable readings? They don’t pass over at the same moment, so this would just be an average over time, but still wouldn’t it be a very useful double check?

    BTW, your very interesting graph in #68 is no longer functional. (Likewise #99). Perhaps you could upload these to CA’s new server for a more stable URL?

    You quote NESDIS:

    Since the correction terms no longer vary with the internal calibration target temperature, the user does not need to interpolate on the internal calibration target temperature. Otherwise, the user applies the non-linearity corrections as before.
    NESDIS also supplies an alternate method of handling the-non-linearity which can be applied to radiances instead of brightness temperatures. For each instrument and for each channel, three coefficients (A, B, and D) of a quadratic equation are supplied in Section 1.4 all spacecraft from NOAA-13 on. The following quadratic equation can be used to compute the corrected radiance, RAD from the “linear” radiance, Rlin:
    RAD=A x R sub {lin} + B x {R sub {lin}}^2 + D

    Who’s the “user” in question? Do people like you who are trying to use this data to replicate Steig have to do this calibration, or has this been done for you already?

    • Ryan O
      Posted Mar 28, 2009 at 8:47 AM | Permalink | Reply

      Re: Hu McCulloch (#104), The NSIDC data I am getting (up through 2002 so far) already has the nonlinear correction applied. Alternate forms are provided if you want to do them (as you can back out the previous calculations – NSIDC provides tables of the quantities). There are a couple of papers (NOAA releases) that use different methods for nonlinear corrections, so if you wanted to, you could try different ones.
      The primary reason for the wording above was that up until 2003, NSIDC had not back-corrected NOAA-11 and earlier with this method. So if you wanted to use the entire data set, you had to do this correction yourself. But after 2003, all the previous satellite data had been back-corrected so that the calibration method is the same throughout the entire series of AVHRR data. It’s confusing because NSIDC doesn’t update their documentation very often, so some of the information is out-of-date.
      AFA satellite operation goes – yes, there’s overlap in the satellites – but not in the AVHRR data. NSIDC does not provide or archive overlap periods of AVHRR data. Someone else might, but I haven’t looked for any overlaps yet.
      Also, overlaps might not help as much as we might like because of the equatorial crossing time drift. The satellites will not be looking at the same parcel of Antarctica at the same time. You don’t have a whole globe’s worth of observations to average over, like in the MSU case. All you have is a small swath of land. This is why they picked the Libyan desert as a calibration target for Ch 1 and 2. The temperature change with time-of-observation is well-behaved. So it provides a good way to do such an overlap. But from what I gather, Antarctica is not as well-behaved. For example, changes in cloud cover in the 4-hour time lapse from one satellite to the next could render such an overlap useless. My guess is that NOAA felt that the internal blackbody provides a much more consistent way of ensuring the satellites are providing comparable measurements.
      For the pictures, yes, I will rehost later. I’ve been kind of busy this week (I have to go to Europe today for a business trip) so I have been somewhat disorganized. I will be able to repost tomorrow – and I will also post the data set I used so there’s no confusion. I apologize to everyone that I have been unable to do so up to this point.

  59. Ryan O
    Posted Apr 5, 2009 at 3:56 PM | Permalink | Reply

    As I had mentioned before, there appear to be unaccounted-for offsets between the different satellites that make up the Comiso AVHRR cloudmasked data. I have spent a while trying to determine first if the offsets actually exist; and, second what the result of correcting for them would be. The R script and two supplemental data files you will need to be able to replicate this are:
    R Script:
    Station Information:
    Updated READER Temps:

    The first thing to note when plotting the AVHRR data against the ground temperatures (all manned and AWS stations) is that it appears to contain multiple populations that are not all equally correlated to ground temperatures. This could cause several problems when trying to determine satellite offsets, such as:
    1. The multiple populations increase the data scatter, which decreases the ability to identify offsets.
    2. A poorly correlated population with data concentrated only in certain times could cause mistaken identification of an offset.
    3. Some of the populations are not related linearly with ground temperatures, which would exaggerate or suppress the magnitude of a calculated offset.
    So the first thing we would need to do is identify the populations. This proved to be a somewhat challenging proposition, as they are all intermixed in the higher temperature range. After much trial and error organizing groups, plotting, reorganizing groups, replotting, etc., five distinct groups emerged:

    In the R script, I retained and documented the plotting functions I used to help do the grouping. Function plt.stn() allows you to plot a particular station vs. the groups. If you want, you can go through it just to verify that there are, indeed, five separate groups and that I have the correct stations assigned.
    After identifying the groups, we should check to see if there may be some physical reason that the AVHRR temperatures at different station locations would behave differently. The first thing would be to look for geographical significance (NOTE: The colors DO NOT match the above plot).

    The main group, which was the long, skinny group in red on the scatter plot, corresponds to the Antarctic interior. The other groups are coastal. If I had to venture a guess, I would say that the difference in the shape of the curves is due to reflectivity differences between water, snow (which also changes with grain size), and ice. If so, this effect was also described (for 37GHz measurements) in Shuman (2001) linked by Roman earlier:
    Now that we’ve identified our groups, we need to calibrate them to the ground temperatures. Doing this on a station-by-station basis would be suspect since many of the stations have a small number of points. Within a group, however, we have a much larger number of points, so we can be more certain of our transforms.
    The process of doing the calibration (after trying lots of things) ended up being fairly simple:
    1. Bias correction
    2. Nonlinearity correction
    3. Fine bias correction
    4. Fine nonlinearity correction
    The result:

    The next step is to convert to anomalies. Care has to be taken here. Remember that the purpose is to try to determine offsets between satellites. Because the ground data is discontinuous with large chunks missing, simply making the base periods the same when converting to anomalies is not enough. Instead, we will convert to anomalies using the entire time frame (1982-2006) and using ONLY months for which there is corresponding ground data. This makes sure that the comparison between the calibrated anomalies and the ground anomalies is an apples-to-apples comparison.
    After converting to anomalies, we need to find some test to determine if there are statistically significant offsets between the satellites. For this we will use a paired Wilcoxon test (since the residuals are non-normal – I checked) with a 24-month range. The estimate of the difference in means will be normalized to the 95% confidence interval to allow continuous plotting of the points as we move through all 300 rows of the data sets. If there is a statistically significant offset between satellites, we will see a peak, approximately in the center of the satellite coverage period, that exceeds 1.0:

    The biggest feature is the huge spike with NOAA-14. Without a doubt, there is a statistically significant offset with NOAA-14. NOAA-7 and -9 are also low; NOAA-11 looks generally okay except for the massive dip at the end (which I have not come up with a satisfactory way of handling yet); and NOAA-16 and 17 also look okay.
    Now that we’ve convinced ourselves that the offsets are real, it is time to calculate them. We obtain:

    -0.136315035 -0.217185496 -0.097247497 0.215448620 -0.006319678 -0.195520508

    It’s pretty obvious that these factors will reduce the trends. We get a continent-wide trend of 0.074 +/- 0.158 (compared to 0.187 +/- 0.151 from the Comiso data).
    However, had we simply calculated offsets without going through the above calibration, we would have gotten:

    -0.14944679 -0.16962065 -0.04562425 0.33161241 0.09104473 -0.05796546

    Note that this would have even further decreased the trends – to the tune of a continent-wide average of 0.032 +/- 0.149.
    Original Comiso trends (deg C/decade and 95% CI):

    Peninsula 0.406552585 0.1925186
    West Antarctica 0.411971312 0.2217650
    Ross Ice Shelf -0.104963672 0.2206460
    East Antarctica 0.225650354 0.1942384
    All 0.187422507 0.1510635

    Calibrated trends (deg C/decade and 95% CI):

    Peninsula 0.29165083 0.2282490
    West Antarctica 0.26177232 0.2502632
    Ross Ice Shelf -0.12965218 0.2714344
    East Antarctica 0.06001991 0.2033417
    All 0.07399090 0.1572431

    Here’s a plot showing my geographical groupings:

    Trends about halved – very similar to what the Jeff’s got by regridding. Common theme, maybe? Suspiciouser and suspiciouser . . . but that’s enough for now. There’s a lot more in the script I posted – you can compare the main, PCA, and AWS recons as well. There’s also some single value decomposition at the end which isn’t finished yet and will be the subject of another post. Until next time, however, I will leave you with this curious plot:

    The blue line is simply a slope of 1, provided for scale.
    Unlabeled, one might have mistaken this for the Small Magellanic Cloud:

    • Layman Lurker
      Posted Apr 5, 2009 at 9:30 PM | Permalink | Reply

      Re: Ryan O (#106),

      Ryan, very impressive post. Thanks for all the work. Regarding NOAA 11, is it possible that Pinatubo interfered with the signal recognition at the tail end of this period?

  60. Jeff C.
    Posted Apr 5, 2009 at 11:16 PM | Permalink | Reply

    Great post Ryan. I got all your code and am going to try some similar tests to the U Wisconsin dataset. In that data there are also oddities as you transition from one spacecraft to another. Funny thing, unlike what you show here, in that dataset NOAA-14 looks okay and NOAA-16 is the oddball.

    When you performed your calibration, did you apply the same correction to each station within each of the five groups? I got that impression from reading the text but wanted to make sure. I think you need to do it that way as opposed to each station getting its own tweak.

    To me, it looks like the geographical breakdown is something like this:
    East Coast/Peninsula (red)
    East and West Interior (black)
    Ice shelves (blue)
    Ross Sea Coast (green)
    West coast (light blue) – this one is iffy, but there aren’t many points

    These breakdowns make sense to me. It is not so much the region as the commonality of the physical environment.

    One last point and this is totally off the wall. I realize they are completely different things, but did you notice how much your Wilcoxon test plot looks like the MSU temp anomaly plot? Coincidence? Probably, but they are quite similar. In fact, when I was just scanning the post I thought that was what it was.

  61. Geoff Sherrington
    Posted Apr 18, 2009 at 6:12 AM | Permalink | Reply

    Found some answers to some cloud discrimination questions arising from Steig’s work. Don’t know if I’m going over old ground, but here are some quotes from

    What CASPR Does Not Do
    Temperature and humidity profiles are not retrieved, they are input.
    Calibration of AVHRR raw data.
    Navigation and registration of raw data are not performed.
    Retrievals of some parameters for large solar zenith angles.
    Retrievals outside of the polar regions, although the primary limitation is the surface temperature retrieval, which could easily be expanded to include lower latitude oceans and land.
    Stratospheric clouds are not retrieved; all clouds are restricted to the troposphere.

    CASPR is research code. It does a lot of things well but doesn’t do anything perfectly. Many of the algorithms have been validated, some have not. They are all detailed in the Reference Guide. There are three broad problems worth mentioning at the outset. First, everything depends on cloud detection, which sometimes borders on being as much an art as a science when working in the polar regions with the AVHRR. We are, indeed, trying to squeeze water from a stone. We do not claim to have solved the cloud detection problem, but rather provide methods that work reasonably well most of the time.

    Third, the retrieval of cloudy sky parameters requires temperature and reflectance values underneath the clouds. CASPR interpolates clear sky values to cloudy areas. This generally works but can result in large uncertainties in very cloudy areas. Cautionary notes are given throughout the Reference Guide. Please do not ignore them!

    This answers some of my initial queries about how satellite derives surface temperatures are measured under cloud. They are not. Thus, the reconstruction of a temperature map of the Antarctic at a given time includes factors that are stated to be unrelaible, yet we end up with claims of 0.1 degrees C discrimination.

    I am unsure of the measurements that go into the statement I bolded “This generally works, but …..”

    Maybe there has been an advance that I am unaware of, but to me it seems that a good guess (or a bad one) has made the cover of “Nature”.

  62. Ryan O
    Posted Apr 21, 2009 at 10:54 AM | Permalink | Reply

    Just as an FYI, NSIDC has completed collating the AVHRR data for me and I am now in possession of the entire 5-km gridded archive. There is one problem, however: The NSIDC archive does not include 2006. When I inquired about post-2005 data, they stated that they were unaware of any archives past that date (except for L1D data, which doesn’t help). They were very helpful and cordial throughout the process, but the simple fact is that the post-2005 data used in the Steig paper does not appear to be publicly available.
    With that in mind, I sent the following email to Steig and copied Comiso as well:

    Dr. Steig,

    I have a request concerning your 2009 paper in Nature in which you present a reconstruction of Antarctic temperatures from 1957 to December, 2006. On your webpage (, you provide the cloudmasked AVHRR data set. You also provide the link to NSIDC for obtaining the raw AVHRR data. However, NSIDC does not have AVHRR data archived past 2005.

    NSIDC has been kind enough to supply me with the 5-km gridded data from 1981-2005, but they were not able to supply the remainder of the data as used in your paper. Because I am unaware of any public source for the remainder of the data, I respectfully request that you supply the missing data, or, if a public source exists, that you direct me to that source. Your assistance is greatly appreciated.

    Best regards,
    Ryan O’Donnell

    I’ll let you guys know how it turns out.

  63. Ryan O
    Posted Apr 21, 2009 at 1:58 PM | Permalink | Reply

    Steig’s reply:


    The original data is a huge huge data set, and I don’t actually have the
    original data myself. Co-author Joey Comiso is working to put all those
    data on line at NASA, but is still working on securing the server space
    for it.

    In any case, 1981-2005 ought to be sufficient for reproducing our
    results, if that’s what you’re interested in looking at.


    In this case, I fully understand why Steig would not, himself, have the original data (it’s about 1.2 terabytes big). So no big deal there. I’m hopeful that I will receive a response from Comiso, since I had copied him on the request as well. Anyway, this is what I sent back to Steig:

    Dr. Steig,

    Thank you for the quick response. And yes, the original data is quite massive. As you personally do not have the data, if it is acceptable to you, I will defer further questions about the remainder of the data to Dr. Comiso.

    Again, thank you for the reply.

    Best regards,

    Hopefully Comiso really is in the process of finding server space for the data. I do know that NSIDC doesn’t presently have enough space, which is why they had to collate it for me in bunches (2-3 years at a time), host it for a few days to allow me to download it, then delete it and replace it with another batch.

    • Mike B
      Posted Apr 21, 2009 at 2:49 PM | Permalink | Reply

      Re: Ryan O (#112),

      Good grief. Maybe I’m just cranky lately because April has been so darn cold her in the Midwest.

      But a 2 terabyte hard drive can be had for about $300 these days. Maybe we should contribute to the tip jar to buy Joey one.

      • Ryan O
        Posted Apr 21, 2009 at 4:25 PM | Permalink | Reply

        Re: Mike B (#113), While true, I doubt NASA’s IS department will allow Comiso to walk in with his 2 TB Maxtor and plug it in to their server. I spent some time in government service (military, actually) and the depth of bureaucracy can be truly unfathomable for even the simplest of tasks.
        I think it was inappropriate for the data not to have been collated prior to publication, but being upset about that is beating a dead horse. In this particular case, it’s much more important to me to obtain the data than it is to make a point. I’m content with the response for now (Steig and I have traded a couple additional, very cordial emails besides the ones posted here) and I will follow up with Comiso in about a month to see if any progress has been made.
        I am not unwilling to file an FOI request if nothing happens, but to be quite honest, I have several other projects on my plate right now and I wouldn’t be able to get to the data even if they gave it to me today.

        • Mike B
          Posted Apr 22, 2009 at 8:55 AM | Permalink

          Re: Ryan O (#114),

          Fair enough all the way around Ryan. You’ve certainly done more than your fair share of work here. It’s just become a pet peeve of mine whenever people use the cost of hard disk space (not that you were doing it here, just suggesting that others might) as an excuse for not keeping data.

          I realize it’s probably not even Joey’s fault, but rather the NASA data center types, who display the famous MASH supply depot sergeant mentality: “I’ve got three incubators I don’t need. But if I gave you one, then I’d only have two, and two is not as good as three.”

          Enough of my ranting. Carry on. Sorry.

Post a Comment

Required fields are marked *



Get every new post delivered to your Inbox.

Join 2,881 other followers

%d bloggers like this: