NASA GISS Withdraws Access Blocking

Earlier today, I reported that I had been unable to access their server using R even to get a tiny data set. (I haven’t been working on NASA GISS data and haven’t downloaded anything much from them for months.) The following simple script failed for me (and for Roman in New Brunswick, Canada), but, strangely enough, not for other R users. Very odd.

url=”http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt” #monthly glb land-ocean
working=readLines(url)

Error in file(con, “r”) : cannot open the connection
In addition: Warning message:
In file(con, “r”) : cannot open: HTTP status was ‘403 Forbidden’

Annoyed by this, I sent the following email to NASA GISS employee Gavin Schmidt, who occasionally acts as a spokesman on GISTEMP matters, sending a copy to an eminent climate scientist who doesn’t necessarily agree with me on many things, but who disdains the undignified behavior that is all too prevalent in the field.

Dear Dr Schmidt,

My IP address has been blocked by NASA GISS from downloading GISS data using a computer script. This is undignified and petty behavior and I request that you take immediate steps to remove the block.

Yours truly, Stephen McIntyre

I received the following answer from Robert Schmunk of NASA GISS (who had been involved in a prior blocking of my access discussed here). I copy this email on the basis that it is official correspondence from a federal employee and not “personal” communication:

Stephen,

Please do not write to Dr. Schmidt on issues related to GISS website management as he has essentially nothing to do with that topic.

Although a few IP numbers have been barred from accessing the GISS website(s) in the last couple weeks, none of them should specifically be a machine that you might be using. (I say that based on the assumption that the blocked IP addresses are not, as best I can tell, Canadian.) However, I can only confirm this if you will inform me what IP number you might be using, or if you might be assigned dynamic IP numbers, then the domain name of your ISP.

I did configure one of our webservers yesterday to bar access by the user agent “R”, as we have recently had problems with two locations attempting massive data scrapes and who identified themselves with that user agent. As someone at one of those locations has since contacted me and discussed the matter, I have now lifted that block.

If you were using software which identified itself to our servers as user agent “R”, then you can try accessing them again now and see if you are able to get through. If not, then as I indicated above, please let me know what IP and/or ISP numbers you may be coming from.

rbs

The script works again.

While I’ve “scraped” their site in the past (because they refused to provide organized data) – and this led to the identification of their “Y2K” problem, I had done no such scrape in months.

Schmunk’s explanation doesn’t make sense as it stands. He says that he blocked access from R, but then why were some people able to access the site using R? He must have done something else as well.

102 Comments

  1. Wansbeck
    Posted Jan 16, 2009 at 3:57 PM | Permalink

    Would the server know that the script was generated by ‘user agent “R” ‘?
    Perhaps your script identifies itself or you are on as list of known users.

  2. Steve McIntyre
    Posted Jan 16, 2009 at 3:59 PM | Permalink

    #1. Dunno. Maybe one of the computer experts can comment on this.

    Doncha like the term “user agent R” though. Sounds like M in James Bond.

  3. Posted Jan 16, 2009 at 4:00 PM | Permalink

    Having performed analyses on database logs in the past to ferret out misbehavior, I can see how a past scraping might lead someone to conclude that recent evidence of dubious activity may be evidence of similar behavior in the past and ban all ‘offenders.’

    They may not have much of a data archival policy (it’s amazing how prevalent this is in practice) and they may be keeping data in a database that is archived only when it presents a problem (running out of space or the need to make room for something new).

    The pointless reproach at the beginning was pretty rude. All he had to say was “Please contact me for these kinds of requests” instead and the whole tone of the e-mail would have changed.

  4. Steve McIntyre
    Posted Jan 16, 2009 at 4:03 PM | Permalink

    http://tolstoy.newcastle.edu.au/R/devel/06/07/6311.html has some info on R and user agent.

  5. Steve McIntyre
    Posted Jan 16, 2009 at 4:06 PM | Permalink

    The lack of politeness is a top-down policy. See discussion here

    See the above post for an interesting conference presentation on name usage by Knowles et al 2007.

    • Posted Jan 16, 2009 at 4:46 PM | Permalink

      Re: Steve McIntyre (#5),

      I think that Hansen’s statements are just evidence of insecurity, a common human failing. Science is littered with examples of this, some of them very famous. Physics, for example, has the famous $100 bet between Stephen Hawking and Peter Higgs over the existence of the Higgs boson. It’s, figuratively, a thrown gauntlet. The answer is very important to science, but sometimes personalities just get in the way (but make attractive gossip for those of us on the sidelines). Both are equally entrenched in their own position and seem to be insecure about the criticisms of the other.

      The difference there is that both are Goliaths in their field and Hansen is unlikely to ever extend that kind of professional courtesy to your work. The fact that it’s had such an effect on him is rather telling. The science will speak for itself. Perhaps that’s what he’s most afraid of.

  6. Wansbeck
    Posted Jan 16, 2009 at 4:22 PM | Permalink

    It looks as though you have indulged in “reasonable behavior” by adding a user agent header and been blocked as a result.
    Oh the irony!

  7. Pedro S
    Posted Jan 16, 2009 at 4:30 PM | Permalink

    It does make some sense.

    They may identify IPs where huge data scrapes that caused them some problem with a user string of ‘R’ from going through their logs. You may have fallen onto that list because of large data scrapes in the past.

    But now they know who you are and put you on the allowed list.

    They are probably not malicious, just suffering from noble cause corruption.

    Steve: They knew who I was and my IP address and R access methods already. I was blocked in 2007 and had access restored after publicizing it. This is a new incident out of the blue.

  8. Steve McIntyre
    Posted Jan 16, 2009 at 4:35 PM | Permalink

    I sent the following email to NASA asking for an explanation as to why some users were blocked and not others. I wrote:

    Access has now been restored. I’m puzzled by one part of your explanation: while I was unable to access your site using R, some of my readers in other parts of the world (UK, US) were able to access your site using R, while another Canadian reader could access your site using Matlab, but not R. As you’ve explained it, all R users should have been unable to access your site, but this isn’t what happened. Are you able to clarify why some R users, but not all R users, were blocked?

    The answer:

    The block was not up that long, less than 24 hours.

    Also, the block phrasing included a component which was OS dependent, so e.g., someone using R on a Mac did access the site without trouble a few hours ago.

    Yes, the block wasn’t up that long, but that is unresponsive. It also doesn’t explain the block in the first place.

    As to the second part of the answer, he blocked access by R for people using my operating system (garden variety Windows.) And the “reason” is that an R user with the same operating system was supposedly doing a large data scrape using R. Hmmm ….

  9. Rusty Scott
    Posted Jan 16, 2009 at 4:41 PM | Permalink

    It looks like R adds some extra information about machine and operating system to it’s user agent information. It’s possible that Schmunk et.al. used more than just ‘agent=R’ as their filter criteria. Which could explain why some machines worked and some didn’t. e.g. They may have used ‘agent=R (2.4.0′ and the others who were successful had a different R version. Either way, NASA GISS needs to understand that the user agent ‘R’ is not a unique identifier and shouldn’t be used to block access.

  10. Wansbeck
    Posted Jan 16, 2009 at 4:45 PM | Permalink

    I think that the link in #4 gives the answer.
    If you don’t identify yourself as an R user the server wouldn’t know to block you.

  11. Steve McIntyre
    Posted Jan 16, 2009 at 4:54 PM | Permalink

    #9. The problem isn’t that their attempt to block “Agent R” catches people other me. The problem is the validity of their motives for blocking “Agent R” in the first place. I’m unconvinced by their explanation.

    • Rusty Scott
      Posted Jan 16, 2009 at 5:26 PM | Permalink

      Re: Steve McIntyre (#12), I find it a reasonable explanation that they have recently had two performance hits due to data-scraping and wanted to block the offenders. Someone latched onto the “Correlation” in the user agent field and selected it as an appropriate filter. (Sound familiar?) Given that you only identified an IP address block, their divulging of the “agent R” filter was entirely voluntary and actually led to an identification of the problem you encountered. They also stated in that paragraph that if you were using R then the problem should already be resolved. Of note is that the block was lifted because they were contacted by one of the offending sites. I’m not sure how they could have been more cooperative, but I agree their correspondence could use a friendlier tone. (IT guys aren’t always the best at communication.)

      • Not sure
        Posted Jan 16, 2009 at 5:32 PM | Permalink

        Re: Rusty Scott (#18), Sorry, but blocking all users of R on Windows is a ridiculously crude way of dealing with this problem. It does not reflect well upon this person’s competence.

        Furthermore, it’s obnoxious as a matter of policy. If there are some limits they want to enforce on how much of this supposedly public data is scraped at a time, they should publicly and clearly state what is and is not acceptable use of their website. As it is, there’s an invisible tripwire that triggers ridiculously broad consequences for a large number of potential users of this public resource.

  12. Luis Dias
    Posted Jan 16, 2009 at 4:56 PM | Permalink

    Mr Steve, I understand the past history you have with GISS and Gavin Schmidt, but I think you’re taking this to some pettiness. They did respond to you quite fast, albeit not in the perfect “tone”, and did explain to you the best they could about the subject.

    I’d mind that the “tone” was started by mr Steve himself, by accusing Gavin’s own employees of “undignified and petty behaviour”, without waiting for more explanations. If the reply was valid, he wasn’t even responsible for anything about it, and I can pretty easily imagine him being mad about it, and transferring his irritation to the employee that replied to mr Steve.

    I think their story may hold to scrutiny, because of that last chapter in skeptic’s blogs about the GISS “data manipulation”, perhaps many indeed used R to fetch their data, and perhaps there was someone who abused (willingly or in technical ignorance) the system, and the blockade ensued.

    Either way, I think many people lately are being ultrasensitive to anything that appears in their radar. Fortunately the people that monitored atom missiles weren’t that nervous.

    Chill?

    Steve: I took no particular offence to the tone of the email (that was a reader who was perhaps a little sensitive on my behalf). What annoys me is the act itself: that they had blocked me once again. And yes, there’s history: I’ve been blocked at U of Virginia (Mann), Roger Williams U (Rutherford) and U of Arizona (Hughes). So these guys have actually gone to the trouble of sending my IP address to one another for the express purpose of creating a roadblock. The term “undignified and petty behavior” is quite appropriate under the circumstances. IMO one of the reasons for the prompt response is that I copied an important and very senior climate scientist who has no patience for this sort of shenanigans and who has asked to be informed of such incidents.

  13. insurgent
    Posted Jan 16, 2009 at 5:03 PM | Permalink

    The “user agent” is a header string that all http protocol programs should provide to the server they are connecting to. It allows the server to serve different content depending on the type of client connecting. It’s can also be used for statistical reasons. It can also be used to track down misbehaving versions of software (browsers, bots, etc).
    You can hit this page to see the user agent your browser has: .

    IMHO, them block the R’s user agent instead of IP’s is kinda like doing brain surgery with a 12 gauge shotgun, even if it was a single versions user agent string.

  14. Steve McIntyre
    Posted Jan 16, 2009 at 5:07 PM | Permalink

    #14. Yes, it blocked Roman as well as me. Is that the problem? :)

  15. Raven
    Posted Jan 16, 2009 at 5:15 PM | Permalink

    Steve,

    I have a set up that allows me to change my IP addresses any time I want. It depends, in part, on the type of ISP and the network hardware you have. I can give you more detailed instructions if you send me an e-mail.

    • Luis Dias
      Posted Jan 16, 2009 at 5:25 PM | Permalink

      Re: Raven (#16),

      Fascinating. But it wouldn’t have solved this particular problem, because they blocked “agent R”, independently of IP.

      Perhaps a Linux Distro with R, or even a mac laptop in hand, might solve these problems.

      • Raven
        Posted Jan 16, 2009 at 5:33 PM | Permalink

        Re: Luis Dias (#17)
        Yes – not in this case. But it would allow Steve to quickly exclude IP blocking as the source of the problem.

  16. Raven
    Posted Jan 16, 2009 at 5:27 PM | Permalink

    BTW – The is nothing neferious in it. it is simply a feature I discovered by accident in the network hardware I happened to buy.

  17. Not sure
    Posted Jan 16, 2009 at 5:27 PM | Permalink

    Or maybe Steve should have free and clear access to these supposedly public archives, without having to resort to any of this cloak-and-dagger stuff.

  18. Neil West
    Posted Jan 16, 2009 at 5:53 PM | Permalink

    You should specify how the script failed. It could be as simple as a routing table problem or a bad router between you and the data source.

    Neil

    • Rusty Scott
      Posted Jan 16, 2009 at 5:57 PM | Permalink

      Re: Neil West (#26), Did you miss the HTTP server error message “403 Forbidden” which indicates a refusal by the server to serve the data?

  19. Neil West
    Posted Jan 16, 2009 at 5:59 PM | Permalink

    re #27:

    Yes I did. I guess the credentials were blocked.

    Neil

  20. crosspatch
    Posted Jan 16, 2009 at 6:02 PM | Permalink

    Since R is available as open source, one should be able to hack the User-Agent: header value to say whatever one want’s it to say. One could build a custom version that uses their own name in the User-Agent: header if they wished.

    I noticed the email that said one of the web servers have been changed. It is possible that content is served by more than one server. It is quite common to see a pair of (or more) servers “load balanced” behind a single IP address for redundancy / load reduction. A change on only one of them might result in random blocking. A quick glance at their DNS response shows at the moment data.giss.nasa.gov is aliased to web2.giss.nasa.gov and the lifetime of that query response is 900 seconds. This means that any time I go to data.giss in the next 15 minutes, I will always get the IP address for web2 since it is cached in my computer and my dns server for at least that long. If you query again after 15 minutes, you have a 50/50 shot of getting either web1 or web2 (supposing they are balancing and equally weighted in the balancing scheme). So it might be possible for one person to get web2 every time and be blocked, one person to alternate between, and a third person to get web1 every time so you could have three different people with three completely different experiences.

    I waited the 900 seconds and did another query and got web2 again, but that doesn’t mean they aren’t balancing as I had a 50/50 shot at getting web2 again anyway.

  21. tarpon
    Posted Jan 16, 2009 at 6:12 PM | Permalink

    I find the script works.

    Perhaps you could post the entire program you are trying to run, might speed debugging. Might even be possible for others to add improvements. Open science anyone?

  22. TerryS
    Posted Jan 16, 2009 at 7:45 PM | Permalink

    Re: Steve McIntyre:

    As to the second part of the answer, he blocked access by R for people using my operating system (garden variety Windows.) And the “reason” is that an R user with the same operating system was supposedly doing a large data scrape using R. Hmmm ….

    Apart from the issue as to whether they should or should not have blocked, this sounds reasonable. They have obviously decided that some but not all accesses by agent “R” are a problem so they came up with the narrowest set of rules that encompassed the problem accesses. In this way they would bar a few innocent users as they could.
    For example they could have:
    Bar all agents: This would in effect bar everyone
    Bar all “Garden variety Windows”: Again the scope is to wide.
    Bar all agent “R”: Not all agent R were causing a problem
    Bar all agent “R” AND “Garden Variety Windows”: Hits the problem users and minimum innocents.

    • Not sure
      Posted Jan 16, 2009 at 7:59 PM | Permalink

      Re: TerryS (#31),

      Bar all agent “R”: Not all agent R were causing a problem
      Bar all agent “R” AND “Garden Variety Windows”: Hits the problem users and minimum innocents.

      Actually these two are equivalent, as the vast majority of people use Windows. And it’s absolutely not reasonable to block people without justification. How’s one supposed to avoid bad behavior if bad behavior is not clearly defined?

  23. kuhnkat
    Posted Jan 16, 2009 at 9:25 PM | Permalink

    Terry S,

    they also have the option of blocking IP by region, ISP, actual address, address block, timing out after a certain amount of data is extracted…

    They are claiming they used a 1000 pound bomb when a grenade would have been quite appropriate.

  24. Steve McIntyre
    Posted Jan 16, 2009 at 9:57 PM | Permalink

    I note with slightly raised eyebrow that the “incident” described by NASA is somewhat similar to my previous encounter with them in May 2007 described here – in which I was scraping from their dataset using R. I contacted them notifying them that I was downloading data for research purposes and, if they wished, I would prefer to download an organized data set (the ultimate data set was about 10MB but it took about 8 hours to scrape one page at a time.) They did not restore access upon learning this information. I wrote a post about it here; a nice aspect of having a popular blog is that you can turn the tables on this sort of stunt. Once in the sunlight, they recanted and they told me to keep on scraping (rather than giving me an organized data set.) I was falsely criticized around the blogosphere for supposedly launching a denial-of-service attack on NASA. People were a little quieter after the “Y2K” issue popped up.

    I dealt with Schmunk at the time. So they knew exactly how I got data from them.

    People have observed that blocking R plus one OS would be one way of blocking supposed scraping. But in the previous incident, NASA GISS easily identified the inquiry as coming from the cable.rogers.com network and simply blocked the specific user. So why wouldn’t they have done the same thing in the present case: block the supposedly offending IP address or network rather than access under R in a Windows (P?) environment. Maybe a computer specialist can explain why they didn’t do the same thing this time, which they could have done effortlessly.

    Alternatively consider this possibility. We know that Hansen’s on record as refusing even to “say my name”. Hansen’s in the news again with some damning aspersions against him (not in this case by me). I’ll bet that they didn’t bother finding out that I actually defended Hansen against what I felt were unwarranted attacks. Within a day or two after new publicity , they set up a blocking scheme against the very way that I operate – R under Windows (XP). Let’s suppose that they got mad and decided to block me, but wanted to do so without using my IP address (so that they’d have some plausibility deniability.) Blocking Agent R would be a way of doing this.

    On the other hand, coincidences are always possible: who would have guessed last September that Mann would change his Supplementary Information about 7 minutes after I re-inspected his website and that Gavin Schmidt of NASA would have picked up the change in time for a blog inline comment 15 minutes later. So maybe there really was someone scraping GISS data using methods and language identical to mine on the day after there is bad publicity in blog-world about Hansen.

    Maybe there was, maybe there wasn’t. On the present information, I have no way of knowing. Y’see, I don’t believe everything that people say. (I can’t help it. I’ve been in the mining business and know better. And that was before I even heard of the Team.) Just because they say that someone was scraping (coincidentally using methods and operating systems identical to mine) doesn’t mean that there actually was someone else. I’m not making any accusations because I don’t know know that there wasn’t someone else. It doesn’t matter enough to FOI.

    • SidViscous
      Posted Jan 19, 2009 at 10:06 AM | Permalink

      Re: Steve McIntyre (#34),

      Hansen’s in the news again with some damning aspersions against him (not in this case by me).

      In searching for this aspersions that you refer to I did a quick Google/news search. I did not find the aspersions, but there was a link that caught my eye.

      Passion for hockey still burns in Hansen

      The Hansen in this case is Hockey player Rich Hansen. But the link obsiously stood out to me.

  25. jae
    Posted Jan 16, 2009 at 10:10 PM | Permalink

    Steve: welcome to 21st century America, where we-the-people don’t even have to “say your name,” as though you did not exist. Maybe that is already a snip, but the follow-up would definitely be a snip…

  26. Steve McIntyre
    Posted Jan 16, 2009 at 10:17 PM | Permalink

    Speaking of saying my name, the conference presentation by Knowles et al 2007, cited previously on another thread, deals nicely with Hansen’s problem, and, in addition, is a welcome antidote to Ooga Chaka:

    • Ron Cram
      Posted Jan 16, 2009 at 11:44 PM | Permalink

      Re: Steve McIntyre (#36),

      Hilarious. I was never a big fan of this song. I prefer lyrics with more than six words. But your use of the video is terrific.

  27. Alan S. Blue
    Posted Jan 16, 2009 at 11:03 PM | Permalink

    Can you mention the size of the files of interest to you in this particular instance?

  28. Steve McIntyre
    Posted Jan 16, 2009 at 11:20 PM | Permalink

    #37. Fig.D.txt was a massive 4K in size. NASA’s computers must have shuddered when access was attempted.

    • Jeff Alberts
      Posted Jan 20, 2009 at 2:43 PM | Permalink

      Re: Steve McIntyre (#38),

      In other words, it takes much more computing power for someone to simply pull up the main NASA web page than to grab that file.

  29. Alan S. Blue
    Posted Jan 16, 2009 at 11:57 PM | Permalink

    Steve McIntyre (#38), That’s the kind of numbers I thought I remembered. Is the data hosted on the same machines that have – and advertise – the graphical anomaly graphs, seminar power points, the animated day-to-day comparisons, and other large files?

  30. Peter Ashwood-Smith
    Posted Jan 17, 2009 at 12:15 AM | Permalink

    I had similar problems with a scraping program I wrote ages ago to pick up aviation weather. I wrote it in C and directly used open/close/read/write socket level I/O. I noticed that after using it for a while it would stop responding (sometimes refuse, sometimes hang) but I still had access to the website using the browser. I did some experiments with changing some of the non critical parts of the POST commands and it started working again. I eventually wound up copying exactly the same commands as the browser sent so that my scrape was indistinguishable from Internet Explorer opening the page. I noticed that this was quite repeatable so I came to the conclusion that non browser scrapes had some kind of rate limit which automatically kicked in but required operator intervention to turn back off.
    Given that the WWW is a pretty hostile place thats not a silly policy. Its often quite interesting to watch the attacks that occur on an IP address left wide open (it can be many per second). Server administrators really do have to be strict sometimes. Anyway not sure what the exact circumstances were here but it did not sound nefarious to me, at least without the context of history.

  31. J.Hansford.
    Posted Jan 17, 2009 at 12:57 AM | Permalink

    Without the open exchange of data and methods, scientific understanding would become an imposing task. You are right Mr McIntyre to take exception to this deliberate snubbing of your professional standing.

    I am appalled that Universities and their professionals would indulge in behaviour reminiscent of Soviet practices. The banning of select people from information…. The behaviour may start out petty, but in the end it is all the same.

    Hopefully the matter is settled and that a enjoyable pursuit of science might now ensue. :-)

  32. ScotchTapeSmell
    Posted Jan 17, 2009 at 1:47 AM | Permalink

    All in all just another day on the AGW merry-go-round

  33. VG
    Posted Jan 17, 2009 at 2:09 AM | Permalink

    Is it not a coincidence that recent post by Lubos re possible problems with NASA temp data = current problem with Steve.

  34. Alan Wilkinson
    Posted Jan 17, 2009 at 2:42 AM | Permalink

    With all due respect, Steve, I think you email to Schmidt rather set the exasperated tone for the reply.

    I agree, it is puzzling that the block was so wide unless it was just fired off in a hurray to fix a panic and not refined later. But something important may have intervened – like lunch. I incline to the cock-up theory rather than conspiracy in this instance.

    You could ask for guidelines on acceptable rates of access per time of day and undertake to stay within those in return for explicit exclusion of your IP address from any blocks.

  35. Dr Virtanen 2nd
    Posted Jan 17, 2009 at 3:18 AM | Permalink

    It is just a technical error that NASA has now corrected. Google “403 access denied” and you find explanations. The following could be one

    http://www.cooper.edu/~lent/random/Explaination_of_403_Access_Denied_(one_reason).html

  36. Bill Jamison
    Posted Jan 17, 2009 at 3:22 AM | Permalink

    It seems like a simple test would have been to try to access the data from your browser after the R script failed. If you could access http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt from IE or Firefox then you would know that your IP address wasn’t blocked.

    IMO their explanation sounds reasonable although you can easily argue that it was a heavy-handed approach to the data scraping.

    Steve: I did access through my browser and noted in comments above that the block was R-specific. The explanation is not reasonable because, as I’ve now learned, the scraping was almost identical to the scraping that I’d done in 2007 and which they agreed to.

  37. Joseph Koss
    Posted Jan 17, 2009 at 5:41 AM | Permalink

    It could very well be possible that they did try to target the offending scrapers IP address but were met with attempts to circumvent this blockage (proxy servers, wingates, etc..)

    If their servers were indeed adversely effected by the scraping then they HAD to do SOMETHING that would block such a scraper, and do so immediately. R version X and OS version Y fits the bill for a quick fix while a more refined solution could be sorted out.

    Still further this blockage could have been performed by a junior in the IT department because things began when the man-in-charge was home sleeping during his off-hours.

    Reviewing web server logs also isnt exactly an efficient thing to do. Imagine a site serving 100,000 requests a day, which would easily boil down to a multi-megabyte log file to sift through, even filtering specifically for the day in question.

    With the ‘quick fix’ in place, the man-in-charge would take his time to make sure that the replacement for that fix didn’t need to be re-visited. It would only be when complaints started pouring in that he would escalate the timetable for the replacement, which seems to be exactly what happened here.

    There could still be innocents blocked while he formulates a ‘final’ solution.

    I don’t see anything wrong here. If they wanted to block you, you would still be blocked. You aren’t blocked, hence the explanation for what has transpired has very little to do with you.

    • Rich
      Posted Jan 17, 2009 at 7:10 AM | Permalink

      Re: Joseph Koss (#48),

      I don’t see anything wrong here. If they wanted to block you, you would still be blocked. You aren’t blocked, hence the explanation for what has transpired has very little to do with you.

      Myself, I incline to this view too. I’ve been in the situation where a network is suffering a problem and the first requirement is to get it fixed and working again. I’ve had managers say to me, “I know techies want to spend hours tracking down the exact problem but this is a live network. Do the quick fix.”
      And you can’t know it wouldn’t have been fixed just as quickly without the Cc because you didn’t try it. I fully understand that you’d get tetchy. I just wish you hadn’t.

  38. Geoff Sherrington
    Posted Jan 17, 2009 at 6:10 AM | Permalink

    “So let us begin anew – remembering on both sides that civility is not a sign of weakness, and sincerity is always subject to proof. Let us never negotiate out of fear; but let us never fear to negoiate.

    “Let both sides explore what problems unite us instead of belabouring those problems which divide us.

    “Let both sides seek to involve the wonders of science instead of its terrors. Together let us explore the stars, conquer the deserts, eradicate disease, tap the ocean depths, and encourage the arts and commerce.”

    J F Kennedy, inaugural speech, Jan 20, 1961.

    I have read JH and he is no Kennedy.

  39. David
    Posted Jan 17, 2009 at 6:26 AM | Permalink

    Steve,

    Can I suggest the following?

    1. Re your IP – do you use cable/adsl? Do you have a fixed IP or a dynamic one? If the former (fixed) maybe you could ask your ISP for a dynamic one? That way your modem will get a new IP every time it reconnects.
    2. There are proxies you can run on your local PC that will strip off the user agent when requests are made – perhaps you could use one of those with R on your PC?
    3. Maybe you could sign up to one of the public anonymiser type services and point the proxy settings at R through that?
    4. (more important that 1, 2, or 3) I would get some test subjects (people) together and next time this happens actually get other people to test the blocking so that you would know that it is your PC that is blocked only, while others are not. It would have been interesting to know, for example, if R on XP from an IP other than yours work work at the same time as you were blocked, or if you were able to browse the pages for which R was being rejected.

    The fact you have to screen scrape at all to get a piddling 10MB data set over 8 hours is insane. The amount of work their back end systems would be doing to generate the pages your scraping would be huge compared to them just giving you a CSV/XML/whatever.

  40. Spence_UK
    Posted Jan 17, 2009 at 7:07 AM | Permalink

    Just a quick note!

    I accessed the site from the UK, using R, exact same files and script, within minutes of Steve trying – and I’m pretty sure I tried, successfully, around ten minutes prior to that as I was coincidentally playing with Hansen’s digits at the time (oo-er missus etc).

    I might be running an old version of R (I don’t update very often). Versions are R version 2.4.0 and OS is MS Windows XP Home.

  41. Steve McIntyre
    Posted Jan 17, 2009 at 8:04 AM | Permalink

    If they wanted to block you, you would still be blocked.

    Yes and no. On an earlier occasion, they definitely blocked me intentionally and then changed their minds after I publicized the blocking on Climate Audit. Perhaps they would have done so anyway; it’s impossible to say. My personal opinion is that NASA GISS is more sensitive to this sort of adverse publicity than CRU and that you can change their minds. But reasonable people can disagree. Without prior experience with Team blocking, I would not have publicized it so promptly, but I do have the prior experience and now deal with each incident promptly as it occurs.

    • Rich
      Posted Jan 17, 2009 at 8:45 AM | Permalink

      Re: Steve McIntyre (#53),
      So would it be fair to summarize your view that they created the climate in which you suspect them of blocking you deliberately and if, on another occasion, they do it accidentally or innocently and you respond sharply they have only themselves to blame?

      If so, I think it’s entirely reasonable. Sad, but reasonable.

      Back in the days when I read Real Climate I thought Gavin Schmidt was someone who really knew what he was talking about and who was good at explaining things. Nothing changed my mind about that but all the sneering and sarcasm got tiring and I gave up on it. I’d still like to give him and his friends the benefit of the doubt but there it is. We have to live in the world as it is.

      Steve: Well, right now, we have a situation where Santer’s refused data, Briffa’s refused data, Thompson’s refused data…. All intentionally. If they block me unintentionally while meaning to block someone else (who is entitled to data), I’m not sure that I understand why that shouldn’t concern me. After all, I’m in a position where I speak up for the third party, who might himself have no recourse.

  42. Steve McIntyre
    Posted Jan 17, 2009 at 8:58 AM | Permalink

    Update: I’ve been contacted by a third party who’s been scraping data from GISS (off hours) using R.

    The circumstances seem pretty similar to my scraping data in May 2007. At that time, NASA GISS relented on their blocking and permitted me to continue scraping in exactly the same way that I’d been doing (I’d inserted a sys.sleep pause in my script). So they had already established a precedent in which they’d permitted scraping of station by an R user.

    Now someone else comes along and does something almost identical. They block him. On what basis? It doesn’t sound like he did anything that they’d not already agreed to in May 2007. What justified their blocking this new guy? ( Even though Gavin Schmidt says that they only have 0.25 person-years tied up in GISTEMP, they seem to be aware of every ripple on the pond with their server usage. )

    The explanation will make Steve Mosher roll on the floor. They say that they want the new guy to use GISTEMP to download the data. They say that that was an important reason in their releasing their source code. Memo to NASA: NO one should have to be able to compile GISTEMP to download your data. Just gzip the station data and let people download the data – as I suggested two years. They blocked R in order to enforce GISTEMP usage. ROTFLOL.

    Maybe they intended this to apply to me as well, maybe not. Maybe they blocked me unintentionally while blocking someone else intentionally. The problem remains that the new guy who was blocked intentionally had every bit as much right to the data as me or anyone else. And as far as I’m concerned, NASA has no business blocking the new guy.

    While I may have been intentionally caught in their net, ironically NASA’s also caught a bit in their own new. Because they caught me in the sweep, now there’s a bunch of bad publicity and, in the wake of the Climate Audit publicity, it looks like they’ve backed down on the new guy.

    The Team really are quite a comedy.

  43. Jeremy, Alabama
    Posted Jan 17, 2009 at 9:25 AM | Permalink

    1. This is science. It should be considered a good thing people want to view/check your data and results.

    2. If they provided organized datasets scraping wouldn’t be necessary.

    3. For any single occurrence, anybody can make a genuine mistake. Repeated incidents are malicious.

    Since this appears to be deliberate, perhaps they plan to redact the other occurrences and build a story for their fan-boys that you are being petty. In this formulation, it is never beneficial to cop a tude even though it feels good at the time.

  44. mick
    Posted Jan 17, 2009 at 9:45 AM | Permalink

    a sufficiently broad net to catch many things & providing an ip as an antidote also happens to leave one open to being personally identified.

  45. ScotchTapeSmell
    Posted Jan 17, 2009 at 10:30 AM | Permalink

    So these are the hoops you must jump through : first do A, then B, then C, …. then Z, then AA, then BB,… then ZZ, then AAA,… etc, etc, then you finally get NASA/GISS to capitulate.

    How about this instead : NASA just openly offers all that information up immediately to any and all before the any and all even ask. Because as it is now all this smoke and mirrors continues to dishonor the surpassing reputation that NASA had when I was a kid.

  46. Kenneth Fritsch
    Posted Jan 17, 2009 at 11:15 AM | Permalink

    Is not the basic problem here one of GISS not having an integrated data base of station temperatures (like USHCN has for convenient downloading) that can all be downloaded together instead of one at a time? Does not the question then become one of why is not an integrated data base available for researcher convenience?

    As I recall from Steve M’s first experience with scraping GISS data, the GISS reply indicated that they did not want people/researchers to use their station data as the GISS main purpose is to provide some smoothed grid data with temperature anomalies that are derived from the station data. I think they even expressed reservations about posting the individual station data at all and in any form.

    It would appear what they are saying, in effect, is we do not want you looking at our intermediate data (and any potential associated problems I would presume) but to use our grid data – no questions asked.

    As I also recall it is not a straight- forward process to convert the GISS provided data into grids. All this makes me think that GISS pays most of its attention to the zonal temperature anomalies that it produces and much less to the details of how it is generated from station data. The GISS web site comments, in effect, that they would prefer the researcher use the USHCN station data when one wants to look at local phenomena.

    In my view all these GISS reactions should tend to make the curious researcher even more curious about the GISS station data and one would think that should be the case for even the climate scientists who use the grid and zonal data and stake their reputations on conclusions drawn using that data.

    I would hope if these scrapping efforts of GISS station data are successful that the results are made readily available to the public.

  47. Steve McIntyre
    Posted Jan 17, 2009 at 11:19 AM | Permalink

    #58. Since their Y2K embarrassment, NASA’s made a pretty decent effort to put their code and intermediates into public view. The main problem is that the code is execrably written and documented so that it’s needlessly hard to figure out what it’s doing (as has been documented here in the past.) But they’ve put it out there and let’s not criticize them for things that they’ve tried (however grudgingly to deal with).

    These little blocking escapades are a bit puzzling in that context. They do not amount to much in the scheme of things and result in pointless embarrassment.

    Let’s say that any of us had been in Hansen’s shoes and Schmunk comes to you saying that someone is scraping station data using R. The first question I’d ask: is it affecting the system in any noticeable way? If it wasn’t, I’d tell him to forget about it. Even if it was affecting the system by 25% or 50% – WHICH IT WASN’T IN THIS CASE, I’d probably tell him not to do anything for a while, see how we do.

    Because they know or ought to know that if they block someone, in an internet world, it’s going to come right back in their face. So I’d have let it ride. Why create a fuss over something pointless?

    • ScotchTapeSmell
      Posted Jan 18, 2009 at 12:39 AM | Permalink

      Re: Steve McIntyre (#60),

      No one can ever say you don’t bend over backwards in relation to GISS. Do you ever feel the back of your head ‘scrape’ on the ground in the process? That’s meant as a joke….. but do you?

  48. Steve McIntyre
    Posted Jan 17, 2009 at 11:24 AM | Permalink

    #59. Kenneth, they’ve made a lot of intermediates available. I’d managed to emulate things through most of STEP 3 before I took a break from this. But it should be possible to emulate the thing end to end in a fairly sane way. THEN you can start talking about what they are doing. Their smoothing and splicing methods are pretty weird. But they probably don’t do a log of harm. The operations are so trivial that even if they’re done in a goofy way. My take on GISTEMP was that it was like Peter, Paul and Mary’s pointless toy: Zip, zap and whirrr.

    • Kenneth Fritsch
      Posted Jan 17, 2009 at 1:27 PM | Permalink

      Re: Steve McIntyre (#61),

      Let me know when the GISS station data is available in integrated form so that I can do a complete differencing analysis of GISS and USHCN temperature versions station by station. I have some questions about why the differnces noted for individual stations are as large as they are between series and even for those pristine rural stations.

  49. W F Lenihan
    Posted Jan 17, 2009 at 11:37 AM | Permalink

    Hansen, Mann, Schmidt et al claim to be paragons of virtue, that is transparent and open to all constructive input and comments from others in their field. So, why haven’t they created script that recognizes Steve M and gives him a pass into the GISS computers? Stated another way, what is GISS hiding?

    Steve: Once again folks, please don’t go a bridge too far. GISS is grudgingly doing a decent job of providing intermediate data as I’ve said MANY times.

    The puzzle is really why they got involved in this last incident, but it’s a totally different issue.

  50. Ben
    Posted Jan 17, 2009 at 12:16 PM | Permalink

    snip – you’ve used language that is against blog policies.

    • Kenneth Fritsch
      Posted Jan 17, 2009 at 1:32 PM | Permalink

      Re: Ben (#63),

      No conspiracy here, Ben, and if you were attempting to analyze the data and its adjustments I suspect you might use the term frustration in place of conspiracy.

      Steve:
      please don’t even debate such talk.

  51. Steve McIntyre
    Posted Jan 17, 2009 at 1:34 PM | Permalink

    #64. GISS dset1 and GISS dset2 are available at CA in integrated form as scraped in Feb 2008. (I need to update but you can experiment with these.) They are R-lists of length 7364, the name of which is the station id. Much easier to work with than goofy GISS binaries.

    • Kenneth Fritsch
      Posted Jan 19, 2009 at 5:11 PM | Permalink

      Re: Steve McIntyre (#66),

      #64. GISS dset1 and GISS dset2 are available at CA in integrated form as scraped in Feb 2008. (I need to update but you can experiment with these.) They are R-lists of length 7364, the name of which is the station id. Much easier to work with than goofy GISS binaries.

      Steve M, thanks for the links to the GISS scraped station data. I have now updated myself with what I judge is a reasonably complete background on the process that was used. I thought the giss.dset2.tab file would download with Notepad but the inscription I downloaded is unreadable. What am I doing wrong?

      • RomanM
        Posted Jan 19, 2009 at 6:06 PM | Permalink

        Re: Kenneth Fritsch (#90),
        Steve’s “tab” files are not simple text files. They are created using R and can be read directly into R by using the “load” command.

        Steve: Or use download.file (url,”temp.dat”,mode=”wb”); load(“temp.dat”).

  52. Marvin
    Posted Jan 17, 2009 at 1:35 PM | Permalink

    There is the possibility that this was looked at by the webmasters who saw a load on the system from one location and started worrying that if multiple users all started scraping that the system load or bandwidth or something would be a major IT problem. Looking at it that way, I could see an IT guy writing a quick script that looked at what was immediately causing the load and blocking it. (I’m not saying that this is what happened, or that it would have been the smart thing to do, but simply saying that at the time, the person doing it might have thought they were doing a good thing at the time).

    IT or MIS are all notorious about being protective of their domains, and given that having your webservers crash under load is much worse than simply denying _some_ users access, I can see them opting towards whatever seemed like it would give them the least problems that would affect their next evaluation.

    • Neil Fisher
      Posted Jan 17, 2009 at 6:37 PM | Permalink

      Re: Marvin (#67),

      given that having your webservers crash under load is much worse than simply denying _some_ users access

      True, but then again, rate limiting connections on a particular port is fairly trivial to implement and doesn’t deny *anybody* access.

  53. Posted Jan 17, 2009 at 1:55 PM | Permalink

    After all your requests,Hansen finally gave you a digit, to bad its the middle one.

  54. Tony
    Posted Jan 17, 2009 at 3:59 PM | Permalink

    Just in case you have not noticed: There is a massive ‘war’ going on the internet. Your ‘little’ blog might me a bit out of harms way, but even here I can read:

    This blog is protected by Spam Karma 2: 110060 Spams eaten and counting

    Now, run something bigger, a big company’s website or let’s say NASA’s, and your web-admin (actually we are talking about a couple of people, with considerable fluctuation and many servers) can get a bit trigger happy when faced with spammers, scrapers, hackers and other vicious folk plus bot-nets the size of little countries – and simply goes nuclear.

    Heck, I do block any user-agent I don’t know or don’t want crawling my blog. And then there is the school of thought that blocks *any* unknown user-agents and only allows user-agents on a (finely tuned) white-list.

    And even while it may only be a 4K-file, do you know how often R requests this file, when you run your script?

    Yet, I must admit, this hole blocking thing is not handled very well by NASA (as in amateurish) – I guess they have to cut corners somewhere.

  55. Alan Wilkinson
    Posted Jan 17, 2009 at 6:15 PM | Permalink

    Perhaps the real question is: “What is on their website they don’t want anyone unauthorised to be able to find?”

    Is security the main reason they push GISTEMP and are sensitive about programs like R fossicking around?

  56. Steve Reynolds
    Posted Jan 17, 2009 at 7:11 PM | Permalink

    Steve: “My take on GISTEMP was that it was like Peter, Paul and Mary’s pointless toy: Zip, zap and whirrr.”

    Let’s hope it does not evolve the way the song has to The Murderous Toy:

  57. John Norris
    Posted Jan 17, 2009 at 9:29 PM | Permalink

    Every medium to large organization has an IT and/or security group whose job it is to keep the systems up and running, and protect them from mal-intended access, of which there is plenty. The IT security group can be well distanced from the primary users of the system. It is certainly plausible that when someone in that group sees some significant external access they get suspicious and shut down access to that user. I am not saying that there isn’t a paranoid fear within NASA about CA, and I agree it certainly is plausible that that fear is behind the blocking, but it is certainly plausible that these have just been security related, and completely separate from the team.

  58. Steve McIntyre
    Posted Jan 17, 2009 at 10:54 PM | Permalink

    #73. Sure, but there are other things going on here. There is no evidence to my knowledge that they believed that the R user was malware. There is also no evidence that the scraping caused any performance problems. They blocked R usage after they knew that the new guy was not malware, but a valid scientific data inquiry. They clearly don’t like people scraping data from their site. The problem is easily remedied as I said before. All they have to do is make gzip files like GHCN does. End of story.

    Surely NASA GISS can figure out how to make a gzip file. So why haven’t they?

    • Kenneth Fritsch
      Posted Jan 17, 2009 at 11:18 PM | Permalink

      Re: Steve McIntyre (#74),

      Surely NASA GISS can figure out how to make a gzip file. So why haven’t they?

      That to me is the question and over rides all the other conjectures and rationalizations. It is my opinion that GISS does not want these data used/analyzed and are more interested in researchers simply looking at the end product.

    • Patrick M.
      Posted Jan 18, 2009 at 7:02 AM | Permalink

      Re: Steve McIntyre (#74),

      Perhaps someone could offer to mirror the data so one server isn’t getting hit hard?

  59. Posted Jan 17, 2009 at 11:00 PM | Permalink

    I haven’t run across your site in a while, Mr. McIntyre, but I was a big fan of the “Y2k bug” work you did. Though virtually not acknowledged by the alarmist community, that stands as one of the biggest blows to mmgw, IMO.

    Steve: snip – we have blog rules against imputing improper motives to others. Please observe them. Also I make no large claims on behalf of my work. I’m working on small problems and puzzles because I find them interesting, not because I’m trying to change anyone;s mind about world policy.

    • Posted Jan 18, 2009 at 12:23 AM | Permalink

      Re: digitalcameras (#75),

      My apologies Mr. McIntyre. I’m honestly not even sure exactly what I said that violated the rules (though I trust your judgment that I did). I sort of wrote an emotional (off the top of my head), rather than thought out comment, on my feelings towards Mr. Hansen and the mmgw community as a whole, because of the threat I personally feel it is to the world. Again, I apologize if I’ve impugned the integrity of your excellent site and the work you do. I just sort of view your site as a David among a world of dishonest Goliaths, and perhaps I was a tad overzealous in my comments because of recent developments and awards granted to Mr. Hansen.

      At any rate, keep up the excellent work my friend. Whether you want to acknowledge it or not, the “small problems and puzzles” you work on have much larger implications, in my opinion, and the opinion of many others, and I’m thankful that you and a few others are out there doing the work.

  60. VG
    Posted Jan 18, 2009 at 1:07 AM | Permalink

    No doubt in my mind this event (NASA) related to Lubos posting re Giss temp data possible lack of randomness. Also can anyone explain why NCEP/NCAR has deleted all reference to past snow data of late (now 3 weeks, since massive increase in snow re NH)?

    http://moe.met.fsu.edu/snow/

  61. Richard Hill
    Posted Jan 18, 2009 at 3:18 AM | Permalink

    One point is worth noting, that at least the USA Govt scientific bodies do make historical climate data available to the public.
    Contrast this with S Mc’s problems with trying to get data from P Jones.
    Are you interested in Cape Grim (Australia) CO2 history? Gathered at some public expense by the CSIRO. As far a I can find out, it will only be released to “genuine scientific investigators”…
    The last thing we want is for USA bodies is to stop publishing the data.

  62. Dave Dardinger
    Posted Jan 18, 2009 at 7:46 AM | Permalink

    One thing that should be considered by Giss is that anyone scraping data is not doing it to try to “cheat” Giss of credit. The whole point is to have traceability for where the data came from. Thus however the data is used, the user want’s to be able to say, “Here’s the temp data from Jan 15, 2009 from Giss.

    At the same time Giss or their data suppliers want to be able to correct errors. This means that it’s important to have back-ups of old data and change logs so that people wanting to compare data from the past with current data can do so. This is what’s lacking in many cases, as I understand the situation and need to change.

  63. Bob B
    Posted Jan 18, 2009 at 8:06 AM | Permalink

    Steve, Hansen has too many things to do and cannot be bothered with helping you. After all he has to save the world in 4 years:

    http://www.guardian.co.uk/environment/2009/jan/18/jim-hansen-obama

  64. Michael Terry
    Posted Jan 18, 2009 at 11:49 AM | Permalink

    It’s easy to get around these blocks, anyway. Presumably R has some way of doing user-agent spoofing or you can download the data to disk with another user-agent that does. And, you can use TOR to anonymously download so they can’t block your IP.

  65. Steve McIntyre
    Posted Jan 18, 2009 at 2:22 PM | Permalink

    #84. Getting around these stupid blocks isn’t the answer. By publicizing each such incident, maybe the bad behavior by NASA GISS and worse behaviour by others can be changed.

  66. BarryW
    Posted Jan 18, 2009 at 3:11 PM | Permalink

    FYI I did some browsing in the R documentation and I think you can change your user agent value. If you type the following you can see what the user agent is set to.

    options(“HTTPUserAgent”)
    for me this returns

    $HTTPUserAgent
    [1] “R (2.7.1 i386-apple-darwin8.10.1 i386 darwin8.10.1)”

    to change it

    options(HTTPUserAgent = “New value string”)

    So you might try that if there is blocking.

  67. Rusty Scott
    Posted Jan 19, 2009 at 12:25 AM | Permalink

    I’m not sure why I find myself reading all the comments on this thread. Perhaps it’s because I’m bored and there isn’t a new fascinating post here yet. However, I read this and just had to throw out another two pennies.

    Steve: snip – we have blog rules against imputing improper motives to others. Please observe them.

    Correct me if I have it wrong Steve, but 1) the problem (according to Schmunk) was resolved before your e-mail and before this post through the actions of the “new guy” who was blocked, and 2) even if the blockage was entirely aimed at making your life a little more difficult, the problem was solved after your e-mail without argument or further discussion on NASA GISS’ part. I think you are attributing too much to some IT person who had probably never heard of the R language before. I program in many different languages, including MATLAB, a lot. What I do doesn’t involve a lot of statistical modeling, so I hadn’t even heard of R until I started following your blog here. Hopefully, something new and more interesting will elicit a blog post soon and this thread can die a slow, tortured death.

  68. Steve McIntyre
    Posted Jan 19, 2009 at 12:40 AM | Permalink

    #87. This thread is not an earthshaking issue, but I think that your understanding is incorrect. The block does seem to have been intended for the “new guy”, and I was caught in the same net. However, when I sent the email to Gavin Schmidt cc eminent climate scientist who disdains nonsense, the situation was definitely not resolved for the new guy; his most recent email from NASA GISS told him to use GISTEMP.

    If they’d already decided to restore access prior to my email and post, they hadn’t told anyone; certainly not the new guy. They relented on R access only after my email and post.

    My present diagnosis is that my intervention was helpful in getting the block lifted for the new guy. While they may not have intended to block me, I upped the ante on the situation in a way that the new guy couldn’t – both via the eminent and sympathetic scientist and via blog publicity.

    NASA can say that they planned to restore R access anyway, but there’s no public evidence of that intent.

    I’m working on some other things. Because I do a lot of analysis on some posts, they take a lot of time. I don’t spend a lot of time on posts like this, but it is important to me to document blocking incidents: this one being different obviously than Santer.

    • Rusty Scott
      Posted Jan 20, 2009 at 4:10 PM | Permalink

      Re: Steve McIntyre (#88), If my understanding is incorrect, then it is because you are operating on information which you haven’t updated to the post. In short the post can be summarized as:

      SM: My IP address is being blocked.

      NASA: We don’t think we have your IP address blocked, but we did block someone else using R and they contacted us about it. That should be fixed now. Otherwise, tell us what your IP is so we can dig deeper.

      SM: The script now works.

      Everything else, that I can see, goes to discussion about whether you believe their explanation or not, or imputing motive to their actions. You seem to have further information that hasn’t been updated to this post. I am glad to know that these threads are not where you spend a majority of your time. Keep up the good work.

  69. Steve McIntyre
    Posted Jan 19, 2009 at 9:43 PM | Permalink

    #90. Kenneth, there’s far too much data in this file to use Notepad. Arggggh. I’ve spent time organizing it in a sensible way.

    • Kenneth Fritsch
      Posted Jan 20, 2009 at 12:59 PM | Permalink

      Re: Steve McIntyre (#92),

      Steve M, I attempted a download into R per your and the prescribed R instructions. I keep getting an error message that the url has an unexpected “/” in it. I pasted the url as:

      http://data.climateaudit.org/data/giss/giss.dset2.tab

      I think the Notepad problem is one of not recognizing the format and not of too much data – but that is water over the dam.

      Steve: Kenneth, I promise you that R can do whatever it is that you want to do 100 times easier than however you’re trying to do it. Plus you end up with replicable scripts that someone can use and see what you’ve done. If readers posted such scripts with their analyses, it makes it easier for others to see what they’ve done,

  70. Anthony Watts
    Posted Jan 19, 2009 at 10:29 PM | Permalink

    Steve,

    I suggest setting the user agent header to user agent=”jester” that way they’ll have no trouble keeping track of you and thus anytime in the future you become blocked, you’ll be able to help them immediately identify you.

    “Jester” is also OS independent. It is an equal opportunity label.

    I had to do something similar with NOAA once with radar data and an application my company used that looked for all practical purposes like a web browser. Haven’t had any trouble since.

    • jeez
      Posted Jan 20, 2009 at 11:51 AM | Permalink

      Re: Anthony Watts (#93),

      Very good suggestion, but if I may modify slightly.

      user agent=”Usufruct” goes to the heart of the matter ever so delicately.

  71. Jim Pacheco
    Posted Jan 20, 2009 at 10:12 AM | Permalink

    Steve, you have probably already did this so I am interested in their reply. With a FOI request they should have given you the 10 mb file. Why didn’t they do that?

  72. johnl
    Posted Jan 21, 2009 at 1:14 AM | Permalink

    Hi. R is a programming language, and, while it’s easier learn than C, and even because it’s easier to learn than C, it’s easy to imagine a new user hammering some server with a junk script. And the typical web server administrator is going to respond with pure dumbth. Pretty much everyone blocks Python users. That’s why it’s important for everyone, R, SAS, Python users on Linux or Mac or whatever to study the help for their html interface and figure out how to tell everyone that they are using IE6 for Windows. Challenge them to block that. Especially if you have some loop that runs forever.

  73. ChrisC
    Posted Jan 21, 2009 at 1:54 PM | Permalink

    Steve, I’d like to offer what I hope is the feedback of a disinterested observer. I visit your blog every few months because I like to read scientifically credible criticisms of the mainstream science. I have always thought of you as “the loyal opposition”, carrying out careful checks on the science, keeping other scientists on their toes. However, I was greatly disappointed by your behavior in this topic. Your opening email to NASA was unnecessarily confrontational and you continue to assume malicious intent without supporting information. I realize that you have had a long history of conflicts with these people, but that doesn’t justify assuming malicious intent in their behavior, and their quick response belies insinuations to that effect.

    I urge you to get ahold of yourself. You are the best opposition to the general attitude of the scientific community. Science needs people like you nipping at its heels, exposing mistakes. However, you now appear to be emotionally prejudiced against these people, and when that prejudice becomes apparent to independent bystanders, you lose credibility with them. Step back from this issue, think long and hard about your long-term goals rather than your immediate irritation, and act accordingly.

    I offer this advice with the best of intentions, but I will understand if you choose to reject it. In any event, I wish you well in your enterprise.

  74. Steve McIntyre
    Posted Jan 21, 2009 at 10:31 PM | Permalink

    #99,101. Yes, I do have other non-public information on this in connection with how NASA handled the “new guy”. I remain of the view that their disposition of the matter at various stages could be fairly described as “undignified and petty”, though the matter was eventually resolved (even if the behavior was not specifically directed at me). It seems possible (or even probable) that the final disposition was influenced by publicity here as opposed to GISS lifting the block through their own good graces, but reasonable people can differ. I’ve said on many occasions that NASA GISS is making a much better effort on data availability than other agencies.

    AS you observe, this thread is not something that takes much time, so let’s leave the matter where it is.

One Trackback

  1. [...] too high. Similar issues plagued Steve McIntyre when he went to fetch a large amount of data once, the Gavinator of GISS blocked him. So unless NOAA/NCDC decides to pull the CRN data from the sat feed that services NWS WSFO’s [...]

Follow

Get every new post delivered to your Inbox.

Join 3,304 other followers

%d bloggers like this: