Earlier today, I reported that I had been unable to access their server using R even to get a tiny data set. (I haven’t been working on NASA GISS data and haven’t downloaded anything much from them for months.) The following simple script failed for me (and for Roman in New Brunswick, Canada), but, strangely enough, not for other R users. Very odd.
url=”http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt” #monthly glb land-ocean
Error in file(con, “r”) : cannot open the connection
In addition: Warning message:
In file(con, “r”) : cannot open: HTTP status was ’403 Forbidden’
Annoyed by this, I sent the following email to NASA GISS employee Gavin Schmidt, who occasionally acts as a spokesman on GISTEMP matters, sending a copy to an eminent climate scientist who doesn’t necessarily agree with me on many things, but who disdains the undignified behavior that is all too prevalent in the field.
Dear Dr Schmidt,
My IP address has been blocked by NASA GISS from downloading GISS data using a computer script. This is undignified and petty behavior and I request that you take immediate steps to remove the block.
Yours truly, Stephen McIntyre
I received the following answer from Robert Schmunk of NASA GISS (who had been involved in a prior blocking of my access discussed here). I copy this email on the basis that it is official correspondence from a federal employee and not “personal” communication:
Please do not write to Dr. Schmidt on issues related to GISS website management as he has essentially nothing to do with that topic.
Although a few IP numbers have been barred from accessing the GISS website(s) in the last couple weeks, none of them should specifically be a machine that you might be using. (I say that based on the assumption that the blocked IP addresses are not, as best I can tell, Canadian.) However, I can only confirm this if you will inform me what IP number you might be using, or if you might be assigned dynamic IP numbers, then the domain name of your ISP.
I did configure one of our webservers yesterday to bar access by the user agent “R”, as we have recently had problems with two locations attempting massive data scrapes and who identified themselves with that user agent. As someone at one of those locations has since contacted me and discussed the matter, I have now lifted that block.
If you were using software which identified itself to our servers as user agent “R”, then you can try accessing them again now and see if you are able to get through. If not, then as I indicated above, please let me know what IP and/or ISP numbers you may be coming from.
The script works again.
While I’ve “scraped” their site in the past (because they refused to provide organized data) – and this led to the identification of their “Y2K” problem, I had done no such scrape in months.
Schmunk’s explanation doesn’t make sense as it stands. He says that he blocked access from R, but then why were some people able to access the site using R? He must have done something else as well.