I’ve noted from time to time that climateaudit.org ranked extremely high on many google searches. One of the ways to find articles here has been to simply use google. I often do it. Today when I googled “climateaudit curry”, I found no link to climateaudit.
I did other experiments with terms that I’ve documented here – “jacoby climate” “briffa climate” – no links to blog comments, although links to pdfs at the blog and a link to RSS feed have survived down the page. A direct search of “mcintyre climateaudit” returns “le blog de s mcintyre”. Quixotic googles like “preisendorfer autocorrelation” returns a couple of pdf’s but not the site.
I was only able to find one google search that returned a climateaudit link – ironically “full true plain disclosure”, where we still rank first (no doubt temporarily) even at google.
Google’s policies on censorship state
Does Google censor search results?
It is Google’s policy not to censor search results. However, in response to local laws, regulations, or policies, we may do so. When we remove search results for these reasons, we display a notice on our search results pages.
Update: As noted below, we blocked robots about a month ago when we were trying various measures to keep the site from crashing and this may be the problem, although you’d think that there would still be search information from before then. We are unblocking the robots and we’ll see whether we get restored to google listings.
56 Comments
Time to write a nice letter to Mr Schmidt, methinks.
When I enter similar searches in google climateaudit.org is the top line but with the little bit of french added to the end. This is what I get on google when searching for “climateaudit” => “climate audit le blog de s mcintyre”. On yahoo and msn the top link is simply “climateaudit”. I don’t suspect foul play in this case.
Aha! I thought something was odd when I was looking for info from your site. If you search for say
volcano site:climateaudit.org
you only get rss feeds as results. It’s the same for
heat island site:climateaudit.org
I ran McIntyre barabinsk through google and got sites that linked here but no returns to this site. weird.
running Mcintyre “hockey stick” and
http://www.uoguelph.ca/~rmckitri/research/MM-W05-background.pdf
comes up before climateaudit.org
http://www.climateaudit.org/pdf/mcintyre.grl.2005.pdf shows up on page 4 of the search
Are you banning robots?
Very strange indeed. Here are the first results from my search:
A very odd mix of results, considering the activity on the blog. I wonder if something’s changed with how Google crawls your site? While Google is probably not censoring your site, you never know…Google’s gone green… It might not be “official” policy, but the work of some sub-level Believer with an ax to grind. But I doubt it.
At Yahoo search, “the top listing for “briffa climate” was climate2003.com http://www.climate2003.com/blog/briffa.mxd.htm . Doesn’t make a lot of sense, since climate2003 is inactive and CA gets vastly more traffic. No listing for climateaudit at Yahoo for “climate jacoby tree”.
OK, then someone explain to me why “climateaudit curry” can’t find this site (it finds Margo’s Truth or Truthiness.)
Hans,
Yes, we’re banning robots.
google uses robots for their indexes
QED
If you separate climateaudit into climate and audit, do the results improve?
ask.com seems to be working fine.
Hi John and Steve:
You really should not ban robots if you want the site to be indexed by Google. Also, look into getting a Google Webmaster account and submitting sitemaps.
Banning robots would not be effective if someone really wanted to bring the site to a crawl but Google won’t index a site its robots cannot crawl.
My 2 cents.
Sinan
Hans, you could be right. However, the robot blocking is very recent and was only done about a month ago when we were struggling to keep the site up. There would have been a lot of previous robot searches and I can’t see how they would be wiped out. It might be a combination of things, but I’d be surprised if it was just the robots. John A, let’s experiment with unblocking and see what happens.
OK, we’re unblocking the robots and see what happens.
http://www.google.no/search?q=climateaudit+curry still brings up lots of results from climateaudit
#16. Richard, when I run it, I get lots of results, but none that link here. I only get links to other sites.
Are you getting direct links to CA or only indirect references e.g. Eli Rabett, Truth or Truthiness, ..
Real Climate wrote:
24 Oct 2006
New Google search function
Filed under:
* Climate Science
* RC Forum
‘€” group @ 5:05 am
It can be easy to find climate science information on the web, but that information ranges from the excellent to the atrocious – and it can often be hard to tell them apart without some prior expertise. Wouldn’t it be great if someone could vet the information beforehand so that you had some confidence that it wasn’t completely bogus? Well, you need wait no longer!
Some of you may have already noticed that we have updated our search facility to use a new service from Google Co-op which is being launched today. The idea is that the search is restricted to domains and pages that have passed some kind of quality control. RealClimate is one of the demo sites of the new technology and we have started off with a selection of sites (IPCC, goverment labs, research institutes etc. – as well as RealClimate itself of course!) that we know provide quality information about climate science. As we get used to this service, we will be adding sites and pages that we feel are up to the mark. Suggestions for sites that we might not yet have found or have overlooked, will of course be welcome.
Eventually, we hope to have a service that could be an essential resource for the interested public, journalists, and possibly even scientists, that would give a higher quality level of information than is possible now. Let us know if this ends up being useful to you and if you have any suggestions for improving the service.
Hi Steve
In regard to Google and site indexing.
You might want to check out Google Co-op.
Ok, here’s something weird. I clicked my browser which I’d left sitting on Climate Audit a half hour or so ago as I often do and before clicking Home to refresh things I sat pondering for a minute or two and my eyes focused on the Google box. I started thinking that perhaps I should start doing my google searches from the CA box rather than than from my usual bookmark. Well then I clicked on Home and what to my wondering eyes should appear but “Google and Climaudit”! Talk about a doubletake. I wondered where you’d gotten the mind-reading add-in from?
Dear Steve,
you should try to identify the status of all your pages and indexes and robots and pagerank and availability by Google Services for Webmasters.
https://www.google.com/webmasters/tools/siteoverview
You probably need a Gmail account, a very fast registration for the services, and inclusion of your website to the list. It tells you a lot.
Your PageRank etc. seems nonzero, see other services at
http://www.iwebtool.com/
Best wishes
Lubos
#18. Larry, good point. I googled “google co-op realclimate” and realclimate featured in the Google press release announcing specialized search functions.
http://www.google.com/intl/en/press/annc/custom_search.html contains the following statement:
http://www.google.com/coop/docs/cse/cse_file.html shows script for the realclimate search function. So there ais a specific connection between realclimate and google and it’s not impossible that they might have implemented a customized search. We’ll see what happens with robot unblocking
If you ban bots, do it selectively. Be very careful about banning the google-bot. You could set thing to permit it to crawl individual pages but not archives. That sort of banning is actually a good thing because people wnat to find the specific blog– not a months worth of archives.
Steve;
I use google alerts for “global warming,” and “cosmic rays.” I never understood why I get links to real climate but none for climate audit from my alerts. This was the case before and after the problems. I cannot recall ever getting a link to climate audit on a google alert. I was going to query Google, but did not get to it.
A relative of mine has worked in the following cottage industry, namely, doing things to clients’ web sites to make them show up in top 5 or 10 in Goodle search results. There is a whole black art to acheiving that, which I don’t pretend to understand the first thing about. Some of the things the aforementioned cottage industry do are very, very subtle and most of us would never think of them. Has to do with the way the HTML, XML and .php work, from what I understand (which is nearly nil …) 😉
#17
I’m getting at two direct links to climateaudit.org for most of of the search terms you give above. None of the results I get link to recent pages.
It may be that I’m searching on a google server in Europe that hasn’t been updated to the latest catalogue yet.
Try teh Google search
site:www.climateaudit.org
The “site: restrcits the search to a specfic URL. So with no othr search terms as restrictions all indexed pages are brought up. There seems to be quite a few.
Restrciting teh search woith Curry as in “site:www.climateaudit.org curry” brings up postings
So ity looks to me as ig the issue is not with the indexing but with the relevance that Google is assigning to the results. It may be that results from sites that ban bots are marked lower but there may be other less sanguine reasons.
http://www.google.com/search?hl=en&safe=off&q=site%3Awww.climateaudit.org++curry&btnG=Search
The above is the Google search URL for teh “site:www.climateaudit.org curry” search
Steve, Google updates their tables very dynamically because things disappear from the web so often; I’m actually surprised that it took a month for your google hits to drop.
Regarding #14, you might want to read this document from Google. Google does not interpret a robot.txt file as merely “don’t crawl”. It also means “remove blocked content from the index ASAP.” Any content that you block will be actively removed from the index the next time Google attempts to crawl.
#28 Stan, if you look at that list – everything is marked as a Supplemental Result. Also some of the links do not even mention “curry”.
I’ve signed onto the google webmaster and it reports being blocked by our robots.txt command changed about a month ago and now changed back. So I guess we’ll have to see what happens with the robots.txt restoration before we jump to conclusions that google has adopted Gavin’s search function.
RE: #31 – Steve M – I think you’ve discovered one of the subtleties I was alluding to. There are all sorts of ways to tweak your site to be in Google’s top search outputs. Again, it’s way over my head, but I do know the techniques certainly exist and are exploited extensively by media outfits and web retailers.
Dear Steve and others,
I would tend to discourage you from conspiracy theories. Things sometimes jump at Google – it’s a tax for other huge advantages of this search engine.
During the years, many servers and pages disappeared and reappeared, including some of the alarmist climate blogs. 😉 I guess that Co-Op is completely free of any skeptics, supporting 100% alarmist sources and fulfilling Gavin’s dreams completely, but on the other hand, I guess that Co-op is a joke anyway.
It’s hypothetically directed to the people who admit to themselves that they’re not capable to choose the trustworthy sources themselves and they want to be led by someone else and controlled by censorship. My guess is that no one I know – regardless of scientific or political opinions – would deliberately include herself or himself into this category, which is why I am very skeptical about the viability of the Co-op concept.
Best wishes
Lubos
#33. Lubos, as I noted in #31, because we changed robots.txt, I’m not jumping to any conclusions.
A message I just posted on another site reminded me of the question I had about why there were so few trolls around here lately. It may have been a pleasant side-effect of banning the bots. Perhaps trolls rely on google searches and the like to find new discussions they can stick their noses into. It would also explain why so often they show up quickly but still don’t seem to have learned anything from what discussion has gone on. Perhaps they’ve at best just skimmed the earlier messages before replying to the one which drew them.
Google does allow political influence in its search mechanism, for whatever reason. Better to learn how to work the system rather than to dwell on the biases that are built in. No need to call it a conspiracy. Think of it instead as the way things are.
keep track with googlefight
http://googlefight.com/index.php?lang=en_GB&word1=climateaudit&word2=realclimate
Google shows climateaudit+curry to get 94 hits on the site
Thanks Henry.
Steve, and especially John A, forgive me for saying it, but
I ‘cant believe’ that you blocked robots, and then were surprised
that search engines responded in accord with your presumed
preferences. Live and learn, as has been said once or twice.
Re: 40. You know, mistakes happen, things get forgotten when someone is trying to keep a high traffic site going on shoestring budget. Presumably, this will be fixed soon. An email to Google explaining the situation might accelerate the process. — Sinan
Wait until the weekend — a lot of search caching occurs Friday and Saturday.
Sinan (#41) makes an important point about
I would remind everyone that the “CA Tip Jar” on top of the left-hand column is there for a reason.
The problem that we faced was that the site kept crashing. I spent a lot of time re-booting as did John A. IT was hard to say exactly what the problem was – it sometimes seemed like we were being attacked, but I guess the problem was just a big site. One of our readers suggested blocking robots.txt and so I asked John A to do this, neither of us thinking at the time about Google. At the time, I was trying to avoid the dedicated server route, though that’s what’s been done. Live and learn. We’ll see how long it takes us to recover our google rankings.
#39. Henry, none of the links go to pages here. There are to RSS feeds and are supplemental results. A while ago, one would have got page references to the site.
I’ve added a WordPress Site Map Generator as well. The plug-in is Google XML Sitemaps.
The sitemap in XML can be seen at http://www.climateaudit.org/sitemap.xml
I’m sure it’s just a co-incidence. And in another amazing co-incidence, “The Great Global Warming Swindle” appears to have been removed from Google video.
Ooops – sorry! They didn’t – they just moved it.
Does anyone have any theories on why the google for “climateaudit” returns “le blog de s mcintyre” – why would this return in French?
Re: 48
Presumably Google’s cache has a record of some link to climateaudit with that title. It takes a while for the database to catch up with the crawled links. Funny things have been happening to my girlfriend’s web site too. *Sigh* I still think this was a combination of both the robots being disabled and some wholesale update in Google’s database.
Sinan
“Does anyone have any theories on why the google for “climateaudit” returns “le blog de s mcintyre” – why would this return in French?”
Because you are in Canada? Google returns different results depending on the geographical location of the requester. Because you are in Canada, Google might have determined that they can either return a response in either French or English and it is probably more politically correct to inconvenience an english speaker than to “offend” a french speaking Canadian. If in doubt, send them french.
On March 15, we fixed the robots.txt command. I signed on to Google (thanks Lubos for the directions) and it says that our robots.txt is fine. Google has a log of attempts to crawl pages (about 200 different pages were tried every day) and the log reports many read attempts for different pages between March 1 and March 14. On March 15 and after, there has not been single attempt to crawl a single climateaudit page, even though our robots.txt is now fine. Does anyone have any bright ideas?
Steve,
It appears that Google has rediscoverd climateaudit.org.
YMMD
It seems that Google is rebuilding some of its data for CA from
scratch. A search for mcintyre jones mann on CA gets 498
hits, but if I do not limit the search to CA, the CA hits
are mostly way down the list.
I’ve noticed that too. If you use climate rather than climateaudit as a limiter, CA still is up the lists.
The way I specify the limiter is with the site: option, as in
mcintyre jones mann site:climateaudit.org
Using that method, site:climate would preclude CA. The site
name needs to be spelled in full (excluding www.).
I googled some well-known statistical methods:
“variance adjustment” – 2nd hit CA
“evolving multivariate regression” – 4th hit CA
“robustly estimated median” – 4th hit CA