On Sep 27, the UEA carried out a wildcard search of the entire CRUBACK3 server for the Wahl Attachments, reporting on Sep 28 that the search had been unsuccessful. (See here for most recent previous status report). They refused to provide some requested crosschecking information e.g. whether the emails to which the Wahl Attachments were attached were themselves on the CRUBACKCK3 server and whether there were backups prior to August 2, 2009 (the earliest identified Briffa backup date) for other CRU employees e.g. Jones, Osborn. I need to revert to the Information Tribunal today and have been working on this file over the weekend.
I have a quick question for the technically-inclined about backup protocols. I had asked UEA the following question:
4. You stated that the earliest backup of Briffa’s computer that the university located was on August 2, 2009. I must confess to being completely astonished at this information, particularly since the Climategate dossier included Briffa emails from 2006 that were said to have been deleted.
To provide reassurance on this point, can you explain whether this late date of earliest backup also applied to other CRU computers e.g. it is my understanding that CRUBACK3 contained backups of four of Phil Jones’ computers, with a total of 22 individual backups. Did any of these backups date prior to July 2009? What was this earliest date? If there were earlier backups for other computers, why was the earliest backup of Briffa’s computer so late? Is there perhaps another machine attributable to Briffa that needs to be searched?
UEA replied as follows:
It is right to say that the earliest backup that is held for Professor Briffa’s work PC is the 2 August 2009 backup. However, that is not to say that that backup does not store emails dating back to a period before 2 August 2009. It is merely to say that there are no earlier backups. UEA’s position is that the 2 August 2009 backup would have included copies of all emails and attachments stored on Professor Briffa’s PC as at 2 August 2009 and this could easily have included documents and emails dating back to 2005/2006. You should in any event note that the backup server had an automated function that operated so as to remove older backups on a rolling basis. It is possible that the hacker who obtained and disclosed the emails to which you refer had access to the server for a number of months and that he or she obtained the emails from a backup that is no longer on the server.
Obviously, their reply is unresponsive to my question for actual dates. But some aspects of the backup don’t make obvious sense to me. (They appear to have used BackupPC). Is this common practice: “the backup server had an automated function that operated so as to remove older backups on a rolling basis”. Wouldn’t it be standard practice to periodically preserve some of the older backups?
I note that the police report indicated that access to the CRU backup was not established until September 2009 so that the presence of emails in the CLimategate dossier that cannot be located on the CRUBACK3 server would require a different explanation than the one proffered here by the UEA.
One aspect of UEA’s computer setup that puzzled CA readers was whether it contained backups of the email server, or merely client machines. One CA reader phrased the issue as follows:
I think this backup server may contain backups of the mail server as well, not just the individual client machines, which would be a much more complete archive than someone’s personal folders. The most appropriate way to obtain the attachments for your 2011 FOI request would have been to retrieve them from backup or archival copies of the e-mail server as opposed to the PC of the recipient.
UEA has now stated that there was no backup of an “email server”, only of individual client machines:
You query why there has been no search of any ‘email server’. For your information, CRUBACK3 was a backup server that backed up information held on local workstations in CRU. CRU had briefly piloted its own email service in the early 1990s but ultimately preferred the centrally provided solution and so the email service was removed. The server that was used to pilot the email service remained in use in CRU providing other services and was decommissioned around 2000. The pilot CRU email server was never backed up to CRUBACK3. With the exception of the pilot period, email for CRU was delivered from the central UEA mail servers. Staff within CRU used the Eudora mail client to access their email and this was configured to copy emails and attachments from the central email server to the local workstation, once copied to the local workstation the emails were removed from the central email server. As has previously been explained, Eudora stores emails and attachments separately. It follows that searching for emails in the present case will not actually assist on the question of whether documents 5-8 are contained on the server.
Unless readers have any comments, this seems to resolve that issue.
A More Extended Search
The first UEA search was limited to “the subdirectory containing backups of attachments which have been stored on Professor Briffa’s PC”. This procedure was sharply criticized by many readers, who, in addition, proposed additional wildcards.
Readers also observed that it is not uncommon for someone to save an email attachment into a topical directory, sometimes changing the name. I suggested the following to UEA:
Other Climate Audit readers have observed that it is not uncommon for someone to save an email attachment into a topical directory, sometimes changing the name. A search limited to only the subdirectory containing email attachments would be insufficient to locate a document that had been moved to a topical subdirectory even if the name of the document were unchanged. If someone were actually trying to locate a document not found in an initial search, they would first search all of Briffa’s directories using contractions and, if this was unsuccessful, then use meta-data searches (timestamps of July 2006, author – Eugene Wahl) to look for specific documents in all locations in the backup of the Briffa computer. The additional wildcards “*ERW*” were suggested.
I had also noted two Wahl attachments referred to in the Climategate dossier (CG2-1464 “c:\eudora\attach\AR4SOR_BatchAB_Ch06-KRB-r-look.doc”; and CG2-30 “c:\eudora\attach\AR4SOR_BatchAB_Ch06-KRB-r-look1.doc”) and asked that they be included in the search.
The UEA reported on Sep 28 that, on Sep 27, they had carried out a search of the entire server, identifying 74,061,135 files; that they then carried out case-insensitive wildcard searches using an expanded list of substrings: editorial, _editorial, aw_editorial, _erw,erw_, (and others), visually examining results with ~100 or fewer returns. For example, one of the Wahl attachments was named CH06_SOD_Text_FINAL_2000_12jul06_ERW_suggestions.doc. They returned 8 documents with “CH06_SOD_Text_FINAL_2000_12jul06″, all of which were visually examined and said not to be the requested document.
The requested Wahl documents were delivered on three dates in July/August 2006. UEA refused to look at potentially renamed documents on the specified dates for the following reason:
4.4 You have queried whether it may be that documents 5-8 are stored on the server under names other than the names identified in your original information request. The difficulty with this query is that, if in fact the names are different, UEA would not know where to begin in terms of searching for the documents. In effect, UEA would be looking for a needle within a vast information haystack with little or no indication as to the names it should be searching for. In the circumstances, we do not consider that UEA can reasonably be expected to search for documents 5-8 on the basis that they may be stored under different file names. In any event, you should note that:
4.4.1 timestamps are unlikely to be reliable as the act of copying a file would be sufficient to change the timestamp;
4.4.2 metadata such as author information is not held as part of the file system and in order to access such information UEA would need to decompress each file before undertaking the searches. This would be enormously and grossly disproportionately time consuming;
There are other issues which I may report on.