Google and Climateaudit

I’ve noted from time to time that climateaudit.org ranked extremely high on many google searches. One of the ways to find articles here has been to simply use google. I often do it. Today when I googled “climateaudit curry”, I found no link to climateaudit.

I did other experiments with terms that I’ve documented here – “jacoby climate” “briffa climate” – no links to blog comments, although links to pdfs at the blog and a link to RSS feed have survived down the page. A direct search of “mcintyre climateaudit” returns “le blog de s mcintyre”. Quixotic googles like “preisendorfer autocorrelation” returns a couple of pdf’s but not the site.

I was only able to find one google search that returned a climateaudit link – ironically “full true plain disclosure”, where we still rank first (no doubt temporarily) even at google.

Google’s policies on censorship state

Does Google censor search results?

It is Google’s policy not to censor search results. However, in response to local laws, regulations, or policies, we may do so. When we remove search results for these reasons, we display a notice on our search results pages.

Update: As noted below, we blocked robots about a month ago when we were trying various measures to keep the site from crashing and this may be the problem, although you’d think that there would still be search information from before then. We are unblocking the robots and we’ll see whether we get restored to google listings.

56 Comments

  1. John A
    Posted Mar 15, 2007 at 8:00 AM | Permalink

    Time to write a nice letter to Mr Schmidt, methinks.

  2. Jaye Bass
    Posted Mar 15, 2007 at 8:12 AM | Permalink

    When I enter similar searches in google climateaudit.org is the top line but with the little bit of french added to the end. This is what I get on google when searching for “climateaudit” => “climate audit le blog de s mcintyre”. On yahoo and msn the top link is simply “climateaudit”. I don’t suspect foul play in this case.

  3. Posted Mar 15, 2007 at 8:16 AM | Permalink

    Aha! I thought something was odd when I was looking for info from your site. If you search for say

    volcano site:climateaudit.org

    you only get rss feeds as results. It’s the same for

    heat island site:climateaudit.org

  4. L Nettles
    Posted Mar 15, 2007 at 8:17 AM | Permalink

    I ran McIntyre barabinsk through google and got sites that linked here but no returns to this site. weird.

    running Mcintyre “hockey stick” and

    http://www.uoguelph.ca/~rmckitri/research/MM-W05-background.pdf

    comes up before climateaudit.org

    http://www.climateaudit.org/pdf/mcintyre.grl.2005.pdf shows up on page 4 of the search

  5. Hans Erren
    Posted Mar 15, 2007 at 8:21 AM | Permalink

    Are you banning robots?

  6. Paul
    Posted Mar 15, 2007 at 8:27 AM | Permalink

    Very strange indeed. Here are the first results from my search:

    climate audit le blog de s mcintyre
    http://www.climateaudit.org/ – Similar pages

    CCNet-08-02-2006
    Steve McIntyre, Climate Audit, 7 February 2006. http://www.climateaudit.org/?p=521. The National Research Council of the National Academies has empanelled a …
    http://www.staff.livjm.ac.uk/spsbpeis/CCNet-08-02-06.htm – 111k – Cached – Similar pages

    Climate Audit – Wikipedia, the free encyclopedia
    A number of the scientists whose work is discussed at Climate Audit contribute to another blog, RealClimate. McIntyre frequently refers to these bloggers …
    en.wikipedia.org/wiki/Climate_Audit – 17k – Cached – Similar pages

    Stephen McIntyre – Wikipedia, the free encyclopedia
    McIntyre has stated [1] that he started Climate Audit so that he could defend … McIntyre’s websites and publications. ClimateAudit ‘€” McIntyre’s blog …
    en.wikipedia.org/wiki/Stephen_McIntyre – 21k – Cached – Similar pages

    Deltoid » Climate Audit follies
    After making this howler about entropy, John A fled from any discussion about his error to the safety of Climate Audit where McIntyre covers for his mistake …
    timlambert.org/2005/08/climate-audiot2/ – 40k – Cached – Similar pages

    M&M Project Page
    Steve McIntyre (assisted by John A.) now presents an ongoing blog at CLIMATE AUDIT. Current topics of discussion include Mann’s newly-released fortran code, …
    http://www.uoguelph.ca/~rmckitri/research/trc.html – 13k – Cached – Similar pages

    KEY URLS
    Supplementary Information to McIntyre and McKitrick (2003) … Oct 2006: This webpage has been substantially superceded by my blog at http://www.climateaudit.org. …
    http://www.climate2003.com/ – 18k – Cached – Similar pages

    A very odd mix of results, considering the activity on the blog. I wonder if something’s changed with how Google crawls your site? While Google is probably not censoring your site, you never know…Google’s gone green… It might not be “official” policy, but the work of some sub-level Believer with an ax to grind. But I doubt it.

  7. Steve McIntyre
    Posted Mar 15, 2007 at 8:30 AM | Permalink

    At Yahoo search, “the top listing for “briffa climate” was climate2003.com http://www.climate2003.com/blog/briffa.mxd.htm . Doesn’t make a lot of sense, since climate2003 is inactive and CA gets vastly more traffic. No listing for climateaudit at Yahoo for “climate jacoby tree”.

  8. Steve McIntyre
    Posted Mar 15, 2007 at 8:34 AM | Permalink

    OK, then someone explain to me why “climateaudit curry” can’t find this site (it finds Margo’s Truth or Truthiness.)

  9. John A
    Posted Mar 15, 2007 at 8:40 AM | Permalink

    Hans,

    Yes, we’re banning robots.

  10. Hans Erren
    Posted Mar 15, 2007 at 8:41 AM | Permalink

    google uses robots for their indexes

    QED

  11. David Smith
    Posted Mar 15, 2007 at 8:41 AM | Permalink

    If you separate climateaudit into climate and audit, do the results improve?

  12. MarkW
    Posted Mar 15, 2007 at 8:42 AM | Permalink

    ask.com seems to be working fine.

  13. Posted Mar 15, 2007 at 8:49 AM | Permalink

    Hi John and Steve:

    You really should not ban robots if you want the site to be indexed by Google. Also, look into getting a Google Webmaster account and submitting sitemaps.

    Banning robots would not be effective if someone really wanted to bring the site to a crawl but Google won’t index a site its robots cannot crawl.

    My 2 cents.

    Sinan

  14. Steve McIntyre
    Posted Mar 15, 2007 at 8:49 AM | Permalink

    Hans, you could be right. However, the robot blocking is very recent and was only done about a month ago when we were struggling to keep the site up. There would have been a lot of previous robot searches and I can’t see how they would be wiped out. It might be a combination of things, but I’d be surprised if it was just the robots. John A, let’s experiment with unblocking and see what happens.

  15. Steve McIntyre
    Posted Mar 15, 2007 at 8:50 AM | Permalink

    OK, we’re unblocking the robots and see what happens.

  16. richardT
    Posted Mar 15, 2007 at 8:52 AM | Permalink

    http://www.google.no/search?q=climateaudit+curry still brings up lots of results from climateaudit

  17. Steve McIntyre
    Posted Mar 15, 2007 at 8:55 AM | Permalink

    #16. Richard, when I run it, I get lots of results, but none that link here. I only get links to other sites.
    Are you getting direct links to CA or only indirect references e.g. Eli Rabett, Truth or Truthiness, ..

  18. Larry Huldén
    Posted Mar 15, 2007 at 9:01 AM | Permalink

    Real Climate wrote:

    24 Oct 2006
    New Google search function
    Filed under:

    * Climate Science
    * RC Forum

    ‘€” group @ 5:05 am

    It can be easy to find climate science information on the web, but that information ranges from the excellent to the atrocious – and it can often be hard to tell them apart without some prior expertise. Wouldn’t it be great if someone could vet the information beforehand so that you had some confidence that it wasn’t completely bogus? Well, you need wait no longer!

    Some of you may have already noticed that we have updated our search facility to use a new service from Google Co-op which is being launched today. The idea is that the search is restricted to domains and pages that have passed some kind of quality control. RealClimate is one of the demo sites of the new technology and we have started off with a selection of sites (IPCC, goverment labs, research institutes etc. – as well as RealClimate itself of course!) that we know provide quality information about climate science. As we get used to this service, we will be adding sites and pages that we feel are up to the mark. Suggestions for sites that we might not yet have found or have overlooked, will of course be welcome.

    Eventually, we hope to have a service that could be an essential resource for the interested public, journalists, and possibly even scientists, that would give a higher quality level of information than is possible now. Let us know if this ends up being useful to you and if you have any suggestions for improving the service.

  19. rhneumannn
    Posted Mar 15, 2007 at 9:07 AM | Permalink

    Hi Steve

    In regard to Google and site indexing.

    You might want to check out Google Co-op.

  20. Dave Dardinger
    Posted Mar 15, 2007 at 9:10 AM | Permalink

    Ok, here’s something weird. I clicked my browser which I’d left sitting on Climate Audit a half hour or so ago as I often do and before clicking Home to refresh things I sat pondering for a minute or two and my eyes focused on the Google box. I started thinking that perhaps I should start doing my google searches from the CA box rather than than from my usual bookmark. Well then I clicked on Home and what to my wondering eyes should appear but “Google and Climaudit”! Talk about a doubletake. I wondered where you’d gotten the mind-reading add-in from?

  21. Posted Mar 15, 2007 at 9:11 AM | Permalink

    Dear Steve,
    you should try to identify the status of all your pages and indexes and robots and pagerank and availability by Google Services for Webmasters.

    https://www.google.com/webmasters/tools/siteoverview

    You probably need a Gmail account, a very fast registration for the services, and inclusion of your website to the list. It tells you a lot.

    Your PageRank etc. seems nonzero, see other services at

    http://www.iwebtool.com/

    Best wishes
    Lubos

  22. Steve McIntyre
    Posted Mar 15, 2007 at 9:12 AM | Permalink

    #18. Larry, good point. I googled “google co-op realclimate” and realclimate featured in the Google press release announcing specialized search functions.

    http://www.google.com/intl/en/press/annc/custom_search.html contains the following statement:

    “RealClimate.org is a site that tries to give credible expert opinion on the science of climate change. Unfortunately, since this topical subject has become rather politicized, the quality of information available on the web is very variable, ranging from the excellent to the atrocious. With the Custom Google Search facility, we are able to create a searchable subset of the web that in our expert judgment provides solid and reliable information. Hopefully, it will allow users to get to the good stuff faster, without some of the confusion that currently occurs.” ‘€” Gavin Schmidt, RealClimate

    http://www.google.com/coop/docs/cse/cse_file.html shows script for the realclimate search function. So there ais a specific connection between realclimate and google and it’s not impossible that they might have implemented a customized search. We’ll see what happens with robot unblocking

  23. Posted Mar 15, 2007 at 9:15 AM | Permalink

    If you ban bots, do it selectively. Be very careful about banning the google-bot. You could set thing to permit it to crawl individual pages but not archives. That sort of banning is actually a good thing because people wnat to find the specific blog– not a months worth of archives.

  24. Posted Mar 15, 2007 at 9:18 AM | Permalink

    Steve;

    I use google alerts for “global warming,” and “cosmic rays.” I never understood why I get links to real climate but none for climate audit from my alerts. This was the case before and after the problems. I cannot recall ever getting a link to climate audit on a google alert. I was going to query Google, but did not get to it.

  25. Steve Sadlov
    Posted Mar 15, 2007 at 9:33 AM | Permalink

    A relative of mine has worked in the following cottage industry, namely, doing things to clients’ web sites to make them show up in top 5 or 10 in Goodle search results. There is a whole black art to acheiving that, which I don’t pretend to understand the first thing about. Some of the things the aforementioned cottage industry do are very, very subtle and most of us would never think of them. Has to do with the way the HTML, XML and .php work, from what I understand (which is nearly nil …) 😉

  26. richardT
    Posted Mar 15, 2007 at 9:39 AM | Permalink

    #17
    I’m getting at two direct links to climateaudit.org for most of of the search terms you give above. None of the results I get link to recent pages.
    It may be that I’m searching on a google server in Europe that hasn’t been updated to the latest catalogue yet.

  27. Stan Palmer
    Posted Mar 15, 2007 at 10:08 AM | Permalink

    Try teh Google search

    site:www.climateaudit.org

    The “site: restrcits the search to a specfic URL. So with no othr search terms as restrictions all indexed pages are brought up. There seems to be quite a few.

    Restrciting teh search woith Curry as in “site:www.climateaudit.org curry” brings up postings

    So ity looks to me as ig the issue is not with the indexing but with the relevance that Google is assigning to the results. It may be that results from sites that ban bots are marked lower but there may be other less sanguine reasons.

  28. Stan Palmer
    Posted Mar 15, 2007 at 10:10 AM | Permalink

    http://www.google.com/search?hl=en&safe=off&q=site%3Awww.climateaudit.org++curry&btnG=Search

    The above is the Google search URL for teh “site:www.climateaudit.org curry” search

  29. Posted Mar 15, 2007 at 10:12 AM | Permalink

    Steve, Google updates their tables very dynamically because things disappear from the web so often; I’m actually surprised that it took a month for your google hits to drop.

  30. Mike Coffin
    Posted Mar 15, 2007 at 10:18 AM | Permalink

    Regarding #14, you might want to read this document from Google. Google does not interpret a robot.txt file as merely “don’t crawl”. It also means “remove blocked content from the index ASAP.” Any content that you block will be actively removed from the index the next time Google attempts to crawl.

  31. Steve McIntyre
    Posted Mar 15, 2007 at 10:18 AM | Permalink

    #28 Stan, if you look at that list – everything is marked as a Supplemental Result. Also some of the links do not even mention “curry”.

  32. Steve McIntyre
    Posted Mar 15, 2007 at 10:20 AM | Permalink

    I’ve signed onto the google webmaster and it reports being blocked by our robots.txt command changed about a month ago and now changed back. So I guess we’ll have to see what happens with the robots.txt restoration before we jump to conclusions that google has adopted Gavin’s search function.

  33. Steve Sadlov
    Posted Mar 15, 2007 at 10:57 AM | Permalink

    RE: #31 – Steve M – I think you’ve discovered one of the subtleties I was alluding to. There are all sorts of ways to tweak your site to be in Google’s top search outputs. Again, it’s way over my head, but I do know the techniques certainly exist and are exploited extensively by media outfits and web retailers.

  34. Posted Mar 15, 2007 at 11:15 AM | Permalink

    Dear Steve and others,

    I would tend to discourage you from conspiracy theories. Things sometimes jump at Google – it’s a tax for other huge advantages of this search engine.

    During the years, many servers and pages disappeared and reappeared, including some of the alarmist climate blogs. 😉 I guess that Co-Op is completely free of any skeptics, supporting 100% alarmist sources and fulfilling Gavin’s dreams completely, but on the other hand, I guess that Co-op is a joke anyway.

    It’s hypothetically directed to the people who admit to themselves that they’re not capable to choose the trustworthy sources themselves and they want to be led by someone else and controlled by censorship. My guess is that no one I know – regardless of scientific or political opinions – would deliberately include herself or himself into this category, which is why I am very skeptical about the viability of the Co-op concept.

    Best wishes
    Lubos

  35. Steve McIntyre
    Posted Mar 15, 2007 at 11:31 AM | Permalink

    #33. Lubos, as I noted in #31, because we changed robots.txt, I’m not jumping to any conclusions.

  36. Dave Dardinger
    Posted Mar 15, 2007 at 11:44 AM | Permalink

    A message I just posted on another site reminded me of the question I had about why there were so few trolls around here lately. It may have been a pleasant side-effect of banning the bots. Perhaps trolls rely on google searches and the like to find new discussions they can stick their noses into. It would also explain why so often they show up quickly but still don’t seem to have learned anything from what discussion has gone on. Perhaps they’ve at best just skimmed the earlier messages before replying to the one which drew them.

  37. Bochko
    Posted Mar 15, 2007 at 3:05 PM | Permalink

    Google does allow political influence in its search mechanism, for whatever reason. Better to learn how to work the system rather than to dwell on the biases that are built in. No need to call it a conspiracy. Think of it instead as the way things are.

  38. Hans Erren
    Posted Mar 15, 2007 at 3:15 PM | Permalink

    keep track with googlefight
    http://googlefight.com/index.php?lang=en_GB&word1=climateaudit&word2=realclimate

  39. Henry
    Posted Mar 15, 2007 at 6:13 PM | Permalink

    Google shows climateaudit+curry to get 94 hits on the site

  40. JerryB
    Posted Mar 15, 2007 at 7:34 PM | Permalink

    Thanks Henry.

    Steve, and especially John A, forgive me for saying it, but
    I ‘cant believe’ that you blocked robots, and then were surprised
    that search engines responded in accord with your presumed
    preferences. Live and learn, as has been said once or twice.

  41. Posted Mar 15, 2007 at 8:10 PM | Permalink

    Re: 40. You know, mistakes happen, things get forgotten when someone is trying to keep a high traffic site going on shoestring budget. Presumably, this will be fixed soon. An email to Google explaining the situation might accelerate the process. — Sinan

  42. johnmccall
    Posted Mar 15, 2007 at 8:18 PM | Permalink

    Wait until the weekend — a lot of search caching occurs Friday and Saturday.

  43. TAC
    Posted Mar 15, 2007 at 8:24 PM | Permalink

    Sinan (#41) makes an important point about

    trying to keep a high traffic site going on shoestring budget

    I would remind everyone that the “CA Tip Jar” on top of the left-hand column is there for a reason.

  44. Steve McIntyre
    Posted Mar 15, 2007 at 8:42 PM | Permalink

    The problem that we faced was that the site kept crashing. I spent a lot of time re-booting as did John A. IT was hard to say exactly what the problem was – it sometimes seemed like we were being attacked, but I guess the problem was just a big site. One of our readers suggested blocking robots.txt and so I asked John A to do this, neither of us thinking at the time about Google. At the time, I was trying to avoid the dedicated server route, though that’s what’s been done. Live and learn. We’ll see how long it takes us to recover our google rankings.

    #39. Henry, none of the links go to pages here. There are to RSS feeds and are supplemental results. A while ago, one would have got page references to the site.

  45. John A
    Posted Mar 16, 2007 at 2:31 AM | Permalink

    I’ve added a WordPress Site Map Generator as well. The plug-in is Google XML Sitemaps.

    The sitemap in XML can be seen at http://www.climateaudit.org/sitemap.xml

  46. PeterW
    Posted Mar 16, 2007 at 4:54 AM | Permalink

    I’m sure it’s just a co-incidence. And in another amazing co-incidence, “The Great Global Warming Swindle” appears to have been removed from Google video.

  47. PeterW
    Posted Mar 16, 2007 at 4:59 AM | Permalink

    Ooops – sorry! They didn’t – they just moved it.

  48. Steve McIntyre
    Posted Mar 16, 2007 at 6:07 PM | Permalink

    Does anyone have any theories on why the google for “climateaudit” returns “le blog de s mcintyre” – why would this return in French?

  49. Posted Mar 16, 2007 at 7:28 PM | Permalink

    Re: 48

    Presumably Google’s cache has a record of some link to climateaudit with that title. It takes a while for the database to catch up with the crawled links. Funny things have been happening to my girlfriend’s web site too. *Sigh* I still think this was a combination of both the robots being disabled and some wholesale update in Google’s database.

    Sinan

  50. crosspatch
    Posted Mar 17, 2007 at 1:35 AM | Permalink

    “Does anyone have any theories on why the google for “climateaudit” returns “le blog de s mcintyre” – why would this return in French?”

    Because you are in Canada? Google returns different results depending on the geographical location of the requester. Because you are in Canada, Google might have determined that they can either return a response in either French or English and it is probably more politically correct to inconvenience an english speaker than to “offend” a french speaking Canadian. If in doubt, send them french.

  51. Steve McIntyre
    Posted Mar 17, 2007 at 11:00 AM | Permalink

    On March 15, we fixed the robots.txt command. I signed on to Google (thanks Lubos for the directions) and it says that our robots.txt is fine. Google has a log of attempts to crawl pages (about 200 different pages were tried every day) and the log reports many read attempts for different pages between March 1 and March 14. On March 15 and after, there has not been single attempt to crawl a single climateaudit page, even though our robots.txt is now fine. Does anyone have any bright ideas?

  52. JerryB
    Posted Mar 17, 2007 at 8:19 PM | Permalink

    Steve,

    It appears that Google has rediscoverd climateaudit.org.

    YMMD

  53. JerryB
    Posted Mar 20, 2007 at 5:57 AM | Permalink

    It seems that Google is rebuilding some of its data for CA from
    scratch. A search for mcintyre jones mann on CA gets 498
    hits, but if I do not limit the search to CA, the CA hits
    are mostly way down the list.

  54. Steve McIntyre
    Posted Mar 20, 2007 at 8:28 AM | Permalink

    I’ve noticed that too. If you use climate rather than climateaudit as a limiter, CA still is up the lists.

  55. JerryB
    Posted Mar 20, 2007 at 9:22 AM | Permalink

    The way I specify the limiter is with the site: option, as in
    mcintyre jones mann site:climateaudit.org
    Using that method, site:climate would preclude CA. The site
    name needs to be spelled in full (excluding www.).

  56. Posted Mar 21, 2007 at 11:36 PM | Permalink

    I googled some well-known statistical methods:

    “variance adjustment” – 2nd hit CA
    “evolving multivariate regression” – 4th hit CA
    “robustly estimated median” – 4th hit CA