Update on the FOI for the Wahl Attachments

On Sep 27, the UEA carried out a wildcard search of the entire CRUBACK3 server for the Wahl Attachments, reporting on Sep 28 that the search had been unsuccessful. (See here for most recent previous status report). They refused to provide some requested crosschecking information e.g. whether the emails to which the Wahl Attachments were attached were themselves on the CRUBACKCK3 server and whether there were backups prior to August 2, 2009 (the earliest identified Briffa backup date) for other CRU employees e.g. Jones, Osborn. I need to revert to the Information Tribunal today and have been working on this file over the weekend.

I have a quick question for the technically-inclined about backup protocols. I had asked UEA the following question:

4. You stated that the earliest backup of Briffa’s computer that the university located was on August 2, 2009. I must confess to being completely astonished at this information, particularly since the Climategate dossier included Briffa emails from 2006 that were said to have been deleted.

To provide reassurance on this point, can you explain whether this late date of earliest backup also applied to other CRU computers e.g. it is my understanding that CRUBACK3 contained backups of four of Phil Jones’ computers, with a total of 22 individual backups. Did any of these backups date prior to July 2009? What was this earliest date? If there were earlier backups for other computers, why was the earliest backup of Briffa’s computer so late? Is there perhaps another machine attributable to Briffa that needs to be searched?

UEA replied as follows:

It is right to say that the earliest backup that is held for Professor Briffa’s work PC is the 2 August 2009 backup. However, that is not to say that that backup does not store emails dating back to a period before 2 August 2009. It is merely to say that there are no earlier backups. UEA’s position is that the 2 August 2009 backup would have included copies of all emails and attachments stored on Professor Briffa’s PC as at 2 August 2009 and this could easily have included documents and emails dating back to 2005/2006. You should in any event note that the backup server had an automated function that operated so as to remove older backups on a rolling basis. It is possible that the hacker who obtained and disclosed the emails to which you refer had access to the server for a number of months and that he or she obtained the emails from a backup that is no longer on the server.

Obviously, their reply is unresponsive to my question for actual dates. But some aspects of the backup don’t make obvious sense to me. (They appear to have used BackupPC). Is this common practice: “the backup server had an automated function that operated so as to remove older backups on a rolling basis”. Wouldn’t it be standard practice to periodically preserve some of the older backups?

I note that the police report indicated that access to the CRU backup was not established until September 2009 so that the presence of emails in the CLimategate dossier that cannot be located on the CRUBACK3 server would require a different explanation than the one proffered here by the UEA.

CRUBACK3
One aspect of UEA’s computer setup that puzzled CA readers was whether it contained backups of the email server, or merely client machines. One CA reader phrased the issue as follows:

I think this backup server may contain backups of the mail server as well, not just the individual client machines, which would be a much more complete archive than someone’s personal folders. The most appropriate way to obtain the attachments for your 2011 FOI request would have been to retrieve them from backup or archival copies of the e-mail server as opposed to the PC of the recipient.

UEA has now stated that there was no backup of an “email server”, only of individual client machines:

You query why there has been no search of any ‘email server’. For your information, CRUBACK3 was a backup server that backed up information held on local workstations in CRU. CRU had briefly piloted its own email service in the early 1990s but ultimately preferred the centrally provided solution and so the email service was removed. The server that was used to pilot the email service remained in use in CRU providing other services and was decommissioned around 2000. The pilot CRU email server was never backed up to CRUBACK3. With the exception of the pilot period, email for CRU was delivered from the central UEA mail servers. Staff within CRU used the Eudora mail client to access their email and this was configured to copy emails and attachments from the central email server to the local workstation, once copied to the local workstation the emails were removed from the central email server. As has previously been explained, Eudora stores emails and attachments separately. It follows that searching for emails in the present case will not actually assist on the question of whether documents 5-8 are contained on the server.

Unless readers have any comments, this seems to resolve that issue.

A More Extended Search
The first UEA search was limited to “the subdirectory containing backups of attachments which have been stored on Professor Briffa’s PC”. This procedure was sharply criticized by many readers, who, in addition, proposed additional wildcards.

Readers also observed that it is not uncommon for someone to save an email attachment into a topical directory, sometimes changing the name. I suggested the following to UEA:

Other Climate Audit readers have observed that it is not uncommon for someone to save an email attachment into a topical directory, sometimes changing the name. A search limited to only the subdirectory containing email attachments would be insufficient to locate a document that had been moved to a topical subdirectory even if the name of the document were unchanged. If someone were actually trying to locate a document not found in an initial search, they would first search all of Briffa’s directories using contractions and, if this was unsuccessful, then use meta-data searches (timestamps of July 2006, author – Eugene Wahl) to look for specific documents in all locations in the backup of the Briffa computer. The additional wildcards “*ERW*” were suggested.

I had also noted two Wahl attachments referred to in the Climategate dossier (CG2-1464 “c:\eudora\attach\AR4SOR_BatchAB_Ch06-KRB-r-look.doc”; and CG2-30 “c:\eudora\attach\AR4SOR_BatchAB_Ch06-KRB-r-look1.doc”) and asked that they be included in the search.

The UEA reported on Sep 28 that, on Sep 27, they had carried out a search of the entire server, identifying 74,061,135 files; that they then carried out case-insensitive wildcard searches using an expanded list of substrings: editorial, _editorial, aw_editorial, _erw,erw_, (and others), visually examining results with ~100 or fewer returns. For example, one of the Wahl attachments was named CH06_SOD_Text_FINAL_2000_12jul06_ERW_suggestions.doc. They returned 8 documents with “CH06_SOD_Text_FINAL_2000_12jul06″, all of which were visually examined and said not to be the requested document.

The requested Wahl documents were delivered on three dates in July/August 2006. UEA refused to look at potentially renamed documents on the specified dates for the following reason:

4.4 You have queried whether it may be that documents 5-8 are stored on the server under names other than the names identified in your original information request. The difficulty with this query is that, if in fact the names are different, UEA would not know where to begin in terms of searching for the documents. In effect, UEA would be looking for a needle within a vast information haystack with little or no indication as to the names it should be searching for. In the circumstances, we do not consider that UEA can reasonably be expected to search for documents 5-8 on the basis that they may be stored under different file names. In any event, you should note that:
4.4.1 timestamps are unlikely to be reliable as the act of copying a file would be sufficient to change the timestamp;
4.4.2 metadata such as author information is not held as part of the file system and in order to access such information UEA would need to decompress each file before undertaking the searches. This would be enormously and grossly disproportionately time consuming;

There are other issues which I may report on.

61 Comments

  1. Anthony Watts
    Posted Oct 7, 2012 at 10:04 AM | Permalink | Reply

    Backups of operating system active drives typically use rolling backups…because why would you need a backup from 3 years prior if your intent is simply to recover the operational state of the machine?

    In my server room we keep current backups for operational recovery, but not old backups unless that old backup has some particular configuration of value, like only running on specific older hardware that me may have to revert to.

    For a mailserver, one labeled CRUBACK3, the question then becomes, what is the purpose of that server?

    1. Is it a server that acts as a failover for the main mail server?

    …or…

    2. Is it an archiving server?

    If the latter, then there would be absolutely no reason to use a rolling backup, and in fact it would be contrary to the archival mission. The fact that that same server had emails on it from 2006 suggests its mission was archival.

    Archival servers typically have removable storage, so that you can put years of data/correspondence on the shelf. The FOI request may be too narrow in stating that the specific server be searched. I would restate it to include removable storage, including media such as: magnetic tape, DVD’s, CD ROM’s, removable hard drives, and Network attached storage drives that were used on CRUBACk3.

    You might also ask what happened to CRUBACK1 and CRUBACK2 servers.

  2. mt
    Posted Oct 7, 2012 at 10:39 AM | Permalink | Reply

    Removing older backups on a rolling practice is standard practice, it’s the retention policy on the backups. That’s going to define how far back files can be recovered “easily”. Having long term storage of backup sets isn’t standard, that’s going to depend on the value of the data. Additionally, BackupPC seems to be primarily for backing up to a server’s disk. So preserving a full backup set would be accomplished by backing up the backup server. So you may want to ask if there are tapes or other removable media of the system.

    I’d guess CRUBACK3 is a local backup server for the CRU group only. So, for looking for comprehensive email backups, you should ask if there’s a backup or other archiving system for the central mail server. UEA’s current policy suggests there’s no comprehensive email backup.

    Steve: CRU did not participate in the UEA central email server other than jitneying. The cRU backup seems to be all that there is.

    • P. Solar
      Posted Oct 7, 2012 at 3:30 PM | Permalink | Reply

      Steve: “CRU did not participate in the UEA central email server other than jitneying. The cRU backup seems to be all that there is.”

      Could you clarify? Are you saying that CRU had their own domain name, separate from UEA.ac.uk or whatever?

      There seems to be a lot of, probably deliberate, fudge in all these questions of email backups. Whether the final recipient reads his email via webmail or an email client on his PC seems beside the point. That is just him making a local copy or reading an html page served up with a copy of the email.

      The obligation of archiving must lie with domain admin, not with the end user.

      For example, say I have a domain name. Web hosting and email services are purchased from a hosting company. If the authorities decide I’m a villain and they need to check all emails I have recd and sent in the last two years they will not come knocking on my door ,sniff around in my hard disk in the hope that I have not deleted any incriminating evidence before they called.

      They will go, quietly, to my hosting provider and an ask for a copy of all my email correspondance for the last two years. copies which they are bound by UK law to retain of copy of.

      That is because they run the POP3 server not me.

      Now, the limitted searches that they are “assuming” would find anything if it existed smells just a little like trying to look like they are doing a search , whilst carefully crafting it so as to avoid finding what is there to be found. As in Don’s case.

      However, if there really is nothing on CRUBACK3 the definitive place to look is UEA computer services backups , not the backups of the client PCs.

      ie. ask who is running the email services for the domain name in question and where the backups for those servers are.

      Most email client software (eg Thunderbird) have an option like “delete email from server when read”. However, in reality this can only be implemented as “flag email as deleted when read” since LEGALLY the domain host is required by UK law to retain a copy.

      For police needs, this is only required for two years. However, in the case of FOIA there is no two year limit.

  3. Jeff Alberts
    Posted Oct 7, 2012 at 10:41 AM | Permalink | Reply

    At this point we’re simply asking the fox if he has any hens in his den. A third party needs to be doing this.

    Steve: Duh. But the ball has to be played where it lies.

    • Jeff Alberts
      Posted Oct 7, 2012 at 10:51 AM | Permalink | Reply

      Steve: Duh. But the ball has to be played where it lies.

      Not if the ref can place it in a more favorable position, due to unfair play.

  4. Posted Oct 7, 2012 at 10:41 AM | Permalink | Reply

    UEA: It is possible that the hacker who obtained and disclosed the emails to which you refer had access to the server for a number of months and that he or she obtained the emails from a backup that is no longer on the server.

    Isn’t it intriguing that the “sophisticated” team
    Of hackers that “conspired” across “countries” now just seem
    To be reduced to just one person, and that “he or she”
    Might just have worked at CRU? It’s no surprise to me.

    ===|==============/ Keith DeHavelle

    • Posted Oct 7, 2012 at 12:02 PM | Permalink | Reply

      Nor me. Team projection meet individual initiative and integrity. We always doubted you’d understand.

  5. Posted Oct 7, 2012 at 10:55 AM | Permalink | Reply

    WRT 4.4.2, it may be that the people undertaking the task are uncertain how to do a search in compressed files efficiently, but that’s a very different matter to it being “enormously and grossly disproportionately time consuming”

    Take for instance zgrep:

    http://linux.about.com/library/cmd/blcmdl1_zgrep.htm

    or register a copy of wingrep:

    http://www.wingrep.com/features.htm

    or, well, a load of other trivially scripted solutions.

  6. Don Keiller
    Posted Oct 7, 2012 at 11:49 AM | Permalink | Reply

    Steve, as you know I asked for an email with a defined date-
    15 January 2009.
    UEA approached an “Independent” Firm to perform a “wildcard search” of the backup server, using a parameter I felt to be inappropriate. I complained to the Information Commissioner’s Office (ICO) about this who, for reasons I am yet to fathom, supported UEA’s stance.
    The search was conducted, not surprisingly, without success.
    I complained to the ICO again stating that the search was flawed.
    Again the ICO came down on the side of UEA and that the search had been completed to the ICO’s satisfaction.
    It was only when I stated that I would be seeking a judical review of the process that a final search, with more appropriate parameters made.
    This time the email was found. Not only the email I asked for but a chain of correspondance going back to 2008.

    I would bet my shirt that the Whal attachment is there, it is just that UEA are up to their usual obstructive tricks.

    And why not? They usually get away with it.

    • Steve McIntyre
      Posted Oct 7, 2012 at 12:43 PM | Permalink | Reply

      For the benefit of CA readers, the denouement of Don’s FOI request occurred on October 1, 2012. UEA had refused to provide the covering email from Phil Jones to Georgia Tech researchers Jun Jian and Peter Webster. They made absurd and outrageous estimates of the cost; they then carried out a search using an inappropriate wildcard. The ICO supported them. Don re-iterated demands, eventually threatening judicial review. There must be five letters from UEA solicitors and the ICO all guaranteeing that the search was fine.

      Grudgingly, UEA agreed to carry out a search using the wildcard requested by Don. Needless to say, the email had been there all along.

      One curiosity: the search was to be carried out on the Jones backup closest in date after January 19, 2009. Why would there be a Jones backup near this date and not a Briffa backup?

      • Don Keiller
        Posted Oct 7, 2012 at 2:21 PM | Permalink | Reply

        A curiousity indeed!

      • mt
        Posted Oct 8, 2012 at 7:41 AM | Permalink | Reply

        It looks like BackupPC allows for per-host retention policies, so it’s possible that different machines have more or less backup than others. However, “closest in date after January 19, 2009″ can still be Aug 2, 2009.

  7. David Holland
    Posted Oct 7, 2012 at 12:25 PM | Permalink | Reply

    CG2 2094.txt – [my bold]

    cc: “Mcgarvie Michael Mr \(ACAD\)”
    date: Mon, 14 Jul 2008 10:25:17 +0100
    from: Keith Briffa
    subject: Re: FW: Freedom of Information request (FOI_08-23) – Appeal
    to: Tim Osborn ,Phil Jones , “Palmer Dave Mr \(LIB\)”

    Dave
    and others

    we discussed these points here last week , so not surprisingly we have a general consensus here – to be clear :
    The first two arguements (and hence the letter as it stands) adequately represent my opinion and I am happy for this to stand as our response.

    Like Phil and Tim, I would be loathe to see UEA arguing that we (Tim, Phil,myself and other IPCC contributors) were acting in a personal capacity. Indeed, if this ever comes to the courts, I would hope UEA would support us with legal representation. While I believe UEA should not be in any way responsible for our academic opinions, it should take responsibility for our right to academic freedom. This is why I am arguing that we (UEA and authors) should not release our emails – regardless of whether they are held at UEA, in principal or in substance. Incidentally, UEA does not hold the very vast majority of mine anyway which I copied onto private storage after the completion of the IPCC task.

    To reiterate Tim’s remarks , UEA and ENV does have a great deal of interest in our work for the IPCC , but I consider even though it does not have a direct interest in the detailed correspondence necessitated by this work, it would be unwise to follow this line of argument. At the least it would lead us open to accusations of hypocrisy.

    Thanks again to all for your continuing efforts. I now hear that John Mitchell is faced with questions to Holland’s MP, so no doubt more to come!

    cheers
    Keith

    • Steve McIntyre
      Posted Oct 7, 2012 at 1:11 PM | Permalink | Reply

      An undiscussed comment from that period. The Met Office and UEA sent one another information on FOI requests. Jones observed:

      If we both respond in this way, CA will claim we have colluded!

      The claim would, of course, have been correct.

  8. Don Keiller
    Posted Oct 7, 2012 at 12:25 PM | Permalink | Reply

    Steve, as a result of a further FOI request that I made for correspondence between UEA and the Independent Search Contractor regarding the email I requested, some interesting stuff came up.

    Not least that researchers at CRU sometimes had multiple machines machines which were backed up to the Backup3 server. Thus UEA stated:
    “We are aware from your preliminary work that there are back-ups of around 60 computers on CRUBACK3. Those which should be searched are machines which we can reasonably anticipate were used by Professor Phil Jones. As you know, the identifier “pdj” was used to identify any machines owned by Phil Jones”

    The number of backups per machine is indicated below:
    m-crupdj 3 backups
    m-crupdj2 15 backups
    m-crupdj4 2 backups
    m-crupdj5 2 backups

    Hope this helps,Steve.
    Don

    • Steve McIntyre
      Posted Oct 7, 2012 at 1:17 PM | Permalink | Reply

      Don, I remember that there had been multiple backups of the Phil Jones computers. However, I take it that there is no information on how far back they go.

      • Don Keiller
        Posted Oct 7, 2012 at 2:19 PM | Permalink | Reply

        Steve, on the basis of the dates of some of the emails released in Climategate 1 and 2, back to the late 1990s.

  9. P. Solar
    Posted Oct 7, 2012 at 12:39 PM | Permalink | Reply

    Backups are either a total snapshot or incrementental.

    In the latter case the (all) earlier backups and the base line archive are needed to restore or find any particular item. Deleting ‘old’ backups is out of the questions since the incrementals will not allow full recovery , only recovery of an abritrary selection that was changed since a particular data.

    In terms of an email backup the contents would presumably never be changed (only newer files added), so all must be kept.

    If full snapshots are taken any files that are deleted either accedentally or by design will only be present in the ‘older’ backups. So if the purpose of the backup is to fullfil legal requirements or archiving for FOIA , for example, again all copies MUST by law be retained.

    So it would seem that either way if they have not kept all ‘older’ backups they have deleted (apparently intentionally) information that they had a legal requirement NOT to delete.

    BTW watch out for the old three month trick ;) Argue the toss for three months then says its too late for ICO to do anything about it even if there was a criminal offense.

    Steve: I’d prefer remarks to stay technical, without drifting into editorial comments about legal obligations. Presume that I’m familiar with that.

    • Steve McIntyre
      Posted Oct 7, 2012 at 1:16 PM | Permalink | Reply

      If “Deleting ‘old’ backups is out of the questions “, then how do rolling systems work?

      • Posted Oct 7, 2012 at 5:58 PM | Permalink | Reply

        It depends how you define rolling systems, and what you consider the primary purpose of your backup regime is.

        If the purpose is to restore a snapshot of the systems as they were at an arbitrary point in time, then you’d be right to observe that deleting old backups isn’t congruent with that aim; every backup removed, deleted or overwritten is a loss of state as at a certain point in time.

        If your purpose is to enable your system to be restored to the latest version with the minimum of delay and maximising available storage, then regularly deleting backups is a legitimate strategy.

        Look at the steps involved in a grandfather/ father / son strategy for instance:

        http://en.wikipedia.org/wiki/Backup_rotation_scheme

        of Tower of Hanoi rotation:

        http://www.computer-repair.com/Backup.htm

        both rely on regular deletion of existing backups, sacrificing the number of intermediate states for the ability to allow the most up-to-date possible state to be restored with relatively restricted resources.

        In the best run organisations, backups are not deleted, but are archived on some slower / cheaper / more capacious medium to allow an arbitrary number of snapshots to be restored in dire circumstances.

        But the fact that someone *doesn’t* archive intermediate states rather than deleting them doesn’t necessarily indicate mischief or incompetence; it’s a series of legitimate trade-offs.

  10. nearwalden
    Posted Oct 7, 2012 at 12:47 PM | Permalink | Reply

    “…ultimately preferred the centrally provided solution and so the email service was removed.”

    A centrally provided email solution is very likely to have been backed up do to its “central service” nature, especially if it is supporting IMAP in addition to POP protocols. Where does this service live?

    • Steve McIntyre
      Posted Oct 7, 2012 at 1:13 PM | Permalink | Reply

      They stated:

      You query why there has been no search of any ‘email server’. For your information, CRUBACK3 was a backup server that backed up information held on local workstations in CRU. CRU had briefly piloted its own email service in the early 1990s but ultimately preferred the centrally provided solution and so the email service was removed. The server that was used to pilot the email service remained in use in CRU providing other services and was decommissioned around 2000. The pilot CRU email server was never backed up to CRUBACK3. With the exception of the pilot period, email for CRU was delivered from the central UEA mail servers. Staff within CRU used
      the Eudora mail client to access their email and this was configured to copy emails and attachments from the central email server to the local workstation, once copied to the local workstation the emails were removed from the central email server. As has previously been explained, Eudora stores emails and attachments separately. It follows that searching for emails in the present case will not actually assist on the question of whether documents 5-8 are contained on the server.

      • Tony Mach
        Posted Oct 8, 2012 at 2:15 AM | Permalink | Reply

        FOI the “central UEA mail servers” then?

  11. Steven Mosher
    Posted Oct 7, 2012 at 2:09 PM | Permalink | Reply

    they specify his WORK PC ( see gavins dodge )

    Steve Mc: which comment are you responding to?

    • Steven Mosher
      Posted Oct 7, 2012 at 6:58 PM | Permalink | Reply

      See this : “It is right to say that the earliest backup that is held for Professor Briffa’s work PC is the 2 August 2009 backup. ”

      note the specificity in using the word “work” why? unless he has another PC ? weird little detail that kinda stick out in my mind. why the qualifier “work” Perhaps he has another PC that he considers “personal”

      Now read Briffa’s mail

      “Like Phil and Tim, I would be loathe to see UEA arguing that we (Tim, Phil,myself and other IPCC contributors) were acting in a personal capacity. Indeed, if this ever comes to the courts, I would hope UEA would support us with legal representation. While I believe UEA should not be in any way responsible for our academic opinions, it should take responsibility for our right to academic freedom. This is why I am arguing that we (UEA and authors) should not release our emails – regardless of whether they are held at UEA, in principal or in substance. Incidentally, UEA does not hold the very vast majority of mine anyway which I copied onto private storage after the completion of the IPCC task.”

      And we know that one individual managed his own back ups.

      odd.

  12. Don Keiller
    Posted Oct 7, 2012 at 2:17 PM | Permalink | Reply

    Steve UEA stated in their letter to you:
    “You should in any event note that the backup server had an automated function that operated so as to remove older backups on a rolling basis.

    Note that this was not an argument that they used in their Tribunal case against me. NO mention Nada.

    This is what the Tribunal said in Decision Notice No: FER0280033

    “We also considered that there was no persuasive evidence before us that gave any indication that the email in question had been deleted from the CRU’s back-up server prior to its being retained by the police. In particular we noted the complete lack of
    evidence about anything resembling a coherent deletion/retention policy for emails.”

    There was no “automated function that operated so as to remove older backups on a rolling basis”.

    This was also demonstrated in the otherwise useless Muir Russell Review where he found
    “a lack of understanding within University central functions of the presence of extensive, and long duration, backups of e-mail and other materials, despite these being on a server housed within the central Information Technology (IT) facilities”.

    UEA is trying to airbrush out and ignore past decisions and reports that went against them and which clearly demonstrate that the Backup server holds the relevant information.

    UEA are trying to get you to go through this whole pantomime, once again, from square one.

    These people have no shame.

  13. ThomasL
    Posted Oct 7, 2012 at 2:34 PM | Permalink | Reply

    Except for regulatory compliance reasons, it is unusual in my experience for any specific backup copy to be kept more than a couple of years.

    However, I will say, as backup systems get upgraded, many times the old backups get orphaned and hang around more or less forever.

    Imagine using tapes, then, say, 500GB external drives, and later 2TB drives.

    It wouldn’t be uncommon that those old tapes and 500GB drives were around there somewhere, even though they were not part of the current backup system.

    It might not hurt to ask what backup systems they have used over the relevant time period.

    • ThomasL
      Posted Oct 7, 2012 at 2:37 PM | Permalink | Reply

      PS, I am not saying it is /proper/ for the old backups to hang around, just that it is not uncommon in practice.

  14. Posted Oct 7, 2012 at 2:37 PM | Permalink | Reply

    Removing the oldest backups to make room for new ones is not the best practice, but it is common.

    On the other hand, although timestamps are indeed “unreliable”, they are usually a good hint: someone will read an email, and copy an attachment to a permanent place at around the same time he reads it. So searching for files dated with the same date as the email is quite a good start; to cast a wider net, one might examine files dated in the subsequent two weeks. And although decompressing files before searching them is indeed “time consuming”, when done right it is machine time that is at issue, not human time. Either the search program decompresses them itself, or the decompression can be done as a batch process.

    My guess is that the “compression” referred to is part of the backup process, not compression that was already in the files being backed up. Thus the way to do batch decompression would be to do a full restore of the backup into some temporary location; then searches could be done on that restored data. The full restore process might take a while, but could be left to run overnight.

    As for searching files by “author”, there’s a possible trap here, which is that what a search tool considers to be an “author” might not be the actual person who wrote the document. But I don’t know enough about Windows search / Microsoft Word to know what the risks of this are. In any case, a search engine would have to look inside the file, and to some extent understand its format, to tell who the authors are.

    If my above guess is correct, what they’re telling you is that they couldn’t be bothered to do a full restore and use a powerful search engine, so they only used the dinky search engine built into the backup program. Powerful search engines index the whole content, so that searches inside files are not only available but are nearly instantaneous. Windows 7 includes such a search engine, although I am not necessarily recommending it in particular.

    • ThomasL
      Posted Oct 7, 2012 at 2:43 PM | Permalink | Reply

      If it were me, on Windows my first step would be FileLocator Pro.

      • Steve McIntyre
        Posted Oct 7, 2012 at 3:03 PM | Permalink | Reply

        what’s relevant to me today is information on backup practices and, in particular, what gets saved and what gets deleted in the backup system as we presently understand it.

        • ThomasL
          Posted Oct 8, 2012 at 2:43 PM | Permalink

          Yeah, I get that. For that, look at my previous comment.

          This was just a reply to Norman’s comments on how a search might be conducted.

  15. Steve McIntyre
    Posted Oct 7, 2012 at 3:02 PM | Permalink | Reply

    Minutes of the IT interview with Muir Russell include the following:

    Backup server does not contain all of the data. Hard discs are used for storage. Hard disc may be at home and at work. So may well not have been backed up as part of the CRU back up regime.

    Configuration of back-up server was unfortunate as it did not remove deleted emails. Centrally, UEA emails are held for only a month and then deleted permanently. Not the case on the CRU backup server.

    If this evidence is correct and deleted emails were not removed, why would attachments get lost?

    • Eric Barnes
      Posted Oct 7, 2012 at 9:00 PM | Permalink | Reply

      I could imagine that they possibly were running low on disk space on the backup server. Deleting attachments is a good way to free a lot of disk space and still keep a record of the emails themselves.
      Since the attachments are stored separately, it would seem that an automated script or simple command could have easily deleted all attachments older than date x.
      The setup sounds somewhat like Gavin Schmit’s setup.
      http://wattsupwiththat.com/2012/10/04/the-cyber-bonfire-of-gisss-vanities/

      Emails are only held centrally until read by the recipient, and then deleted. The backup of the central server gets the mail that hadn’t been read.

      Backups are only intended to backup mail that hadn’t been read (if there was a failure overnight say).

      my $0.02

    • Eric Barnes
      Posted Oct 7, 2012 at 9:25 PM | Permalink | Reply

      Another reason is that the attachment may not exist as a file, but rather is embedded in another file using mime http://en.wikipedia.org/wiki/MIME . The file doesn’t exist, but it doesn’t mean the attachment doesn’t exist. It would be good to learn what software their email server is using or used so more specific request could be made.

    • Bobl
      Posted Oct 8, 2012 at 6:58 AM | Permalink | Reply

      The backups are of client machines, like your own PC. Eudora used to automatically decode attachments (attachments are ascii encoded into e-mails because the transport is not 8 bit clean) and drop them into an attachment directory. If the attachments arrived and then were deleted out of the attachment directory between backups, then the backup server will never see them.

      This is why FOI of the central server might be more fruitful. The attachment at this point is still encoded into the e-mail. Many of the databases related to mail servers would implement deletion simply by setting an “I am deleted” flag in the database. This allows such data to be recovered. Infrastructure services like e-mail would also very likely have a daily backup regime, but for e-mail servers in particular the backups may not be retained very long.

      It may also be worth examining in more detail how Eudora handles attachments, it’s possible that despite decoding the attachment Eudora may keep the encoded version in the mailfile – someone out there might know.

      Steve: their evidence is that the central server did not retain CRU emails. What would be accomplished by asking for a search of the central server?

  16. Kan
    Posted Oct 7, 2012 at 3:18 PM | Permalink | Reply

    TO answer your question “Is this common practice: “the backup server had an automated function that operated so as to remove older backups on a rolling basis”.”

    Yes this is common practice to do when the purpose of the backup is to be able to restore a computer to the a particular state. If the purpose is for record retention purposes then the deletion policy will match the record retention policy.

    However, the CRU IT did not have BackupPC configured to delete backups prior to Aug 2, 2009. That is why Phil Jones etc emails available today.

    Was there a search of Tom Melvin’s laptop backups?

    Steve: they say that they searched the entire server.

    • Kan
      Posted Oct 7, 2012 at 6:05 PM | Permalink | Reply

      The way the attachments can be removed from the backup server is if the file gets removed from the original PC, and a backup gets created afterwards. Then that version of the backup will not have the file in it.

      The email could be still there because it is in a file (the email inbox file) that is never deleted (and the email was never deleted).

      If the system is indeed configured to delete previous backups (defined by date, by backup sequence number, etc), then the file will be removed from the backup server when the defined retention period has passed.

      But again, the CRU CRUBACK3 BackupPC system was miss configured and did not delete any backups. This statement contradicts other information:

      “You should in any event note that the backup server had an automated function that operated so as to remove older backups on a rolling basis.”

      I will look harder for the reference for this. It is in an email, or proposal from Qinetiq, that describes why the CRU emails/documents are available at all from the CRUBACK3 server.

  17. kuhnkat
    Posted Oct 7, 2012 at 6:10 PM | Permalink | Reply

    “The pilot CRU email server was never backed up to CRUBACK3.”

    As Anthony mentions, CRUBACK1, CRUBACK2, CRUBACKx, could have been used to backup the pilot CRU server. As it was only a pilot it would be expected that the servers FEEDING it would be backed up in case it had configuration issues which caused the loss or misdirection of mail. Additionally the Central Servers should have been backed up regularly anyway until the new configuration was verified.

    While Anthony does not see the purpose of having long term backups, there are actually LEGAL requirements for some organizations to insure that they maintain certain records for lengthy periods. I have no idea if CRU would be under any of these restrictions.

    Some companies who are not worried about having “DISCOVERABLE” information in their records also keep historic information as users should not be responsible for maintaining very old files on their local workstations. It is also inefficient to maintain large files on individual PC’s rather than central servers.

    Unless you can’t afford the space or have something to hide, keeping long term backups is de rigueur. You simply never know what may happen or what you will wish you kept at some future date. It might even be something in YOUR FAVOR for a court case!!!!

    Anthony’s suggestion to query for other high density removable storage is very good. IT installations need offsite backup for disaster recovery and often keep archival storage for long periods on removable media.

  18. Duke C.
    Posted Oct 7, 2012 at 6:29 PM | Permalink | Reply

    Steve asked UEA “Is there perhaps another machine attributable to Briffa that needs to be searched?”

    Interestingly, UEA never answered the question. The way the reply is worded, in fact, it almost seems as if they are dodging the answer.

    Phil Jones had 4 host machines backed up.
    Tim Osborne was having trouble with BackupPc on his cruto4 (windows) machine back in August 2209, according to CG2 email#0626, which would imply he had perhaps 4 or more host machines being backed up.

    Are we to believe that Briffa had only one work PC?

    • Steven Mosher
      Posted Oct 7, 2012 at 6:50 PM | Permalink | Reply

      If he travels to conferences you can assume that he has a laptop. The question is this.
      is that computer his private property. There are various bits and pieces that lead to the possibility that he may have had his own system and done his own back ups.

      • dougieh
        Posted Oct 8, 2012 at 7:22 PM | Permalink | Reply

        O/T a bit but as S.Mosher implies – seems obvious to me from the varies emails etc that Biffa has a copy? off all the emails on his “own private laptop” & is not for sharing. maybe for good reasons from his point of view.

  19. dixonstalbert
    Posted Oct 7, 2012 at 6:33 PM | Permalink | Reply

    (sorry if this has been asked before…)
    Instead of making a FOI request for a specific file, couldn’t you make a FOI request for the directory listing of every backup file and directory listing of every ‘zip’ or ‘tar’ inside those backups that they have?

    You could then do your own file name or date search and then request a copy of the file that you know they must have because it was in the directory data they gave you.

  20. Scott
    Posted Oct 8, 2012 at 12:29 AM | Permalink | Reply

    While I admire the persistance of Steve wading through the detritis of backup and retention policies, at the end of the day, it is a fool’s errand.

    It sounds to me that the email administrators have been pretty thorough in their attempts to get the info.

    As an email administrator myself who has responded to FOI and legal requests in the private sector, it’s not that easy to go back in time and find specific emails. If it’s not in the primary backup, the chances of finding the doc are remote.

    But the most likely cause of not finding email is not conspiracy, but more likely a stuff up on the part of the backup software being used, or user interaction (the user getting the email and deleting the attachment on the same day , before the backup software runs). But without specialist forensic software expertise, it is difficult to get any traction on this.

    In my opinion, let it go and focus on more important things.

  21. clivere
    Posted Oct 8, 2012 at 6:02 AM | Permalink | Reply

    “They returned 8 documents with “CH06_SOD_Text_FINAL_2000_12jul06″, all of which were visually examined and said not to be the requested document.”

    Ask for the 8 documents!

  22. Not Sure
    Posted Oct 8, 2012 at 1:55 PM | Permalink | Reply

    I think I can answer how a backup from 2009 could contain emails from much earlier.

    Eudora uses a modified version of the venerable mbox format to store email. Every message in each folder is stored in a single file. In plain vanilla mbox, the INBOX folder will be called “mbox”. Suppose you don’t clean your inbox very often. The file for your inbox could contain some very old messages. Suppose your PC gets backed up at that point. That file, with all your old and new messages will be copied as it is at that point in time. Suppose that some time later you decide to do some cleaning of your inbox. The next backup of your PC will copy your inbox file as it is now, without the older messages.

    The mbox format is very well understood. There are many, many tools out there for dealing with this type of file. It should be trivial to digest these mbox files and index them for easy searching.

    Uncompressing and indexing 74,000,000 files doesn’t strike me as particularly onerous as well. It could be that CRU does not have the expertise on staff to carry this out.

  23. compguy77
    Posted Oct 8, 2012 at 2:34 PM | Permalink | Reply

    Steve,

    Let me try this set of statements, based on Cru’s reply:
    - The backup program they refer to is only backing up directories on user’s PCs that are configured to be backed up, and are running, and on the network when the backup function runs.
    o Therefore, many days of backup could be missed, especially if someone was working from home for a while, and was not ‘on the network’.
    o Emails and attachments deleted during this ‘missed’ period would not get backed up.
    o From what I’ve seen (reading between the lines), this scenario where message may be missing would apply to Briffa mostly, and not so much to Jones.
    o It is not surprising that the backup software recycles, and removes older backups. The target of the backup (likely a disk system) has limited capacity, and recycling was common then.
    - The email client removes messages from the central email server. So it is unlikely that any central email server has a copy today, or on any backup of that server.
    - The email client saves email and attachments separately. Since both the backup and email client are configurable, it is entirely possible that the attachments were NOT backed up. For example, if the attachments could be configured to be stored in C:\temp, then I doubt they would get backed up.
    - Bottom line, the messages that are in CRUBACK3 will vary quite a bit, both in terms of dates, and in terms of who’s mail is in there, and who’s is not.
    - When doing legal discovery, the first and most important step is to determine (and usually agree to) relevancy criteria, search terms and processes. They can be constructed to create any result desired.
    - I can’t tell from their reply, but they almost certainly would have to restore the backup file before searching (particularly during Sept 27-28), if any useful search is to be performed. That would be a good follow up question.
    - Finding a particular attachment document reliably would require some knowledge of its contents. All the meta data (Name, author, date, etc.) could have been modified, or not mean what the searcher thinks .
    - This is especially true of author; that term is very ambiguous when discussing legal discovery. The normal term is ‘custodian’, or the person who owns the resource (the mail in this case). This concept is what they refer to in the phrase ‘backup that is held for Professor Briffa’s work PC’.
    - If this were a US legal discovery, first you would establish relevancy criteria (attachment in email between X and Y between these date ranges), then you would agree to search terms (eg. containing X AND Y AND is attachment). Then some lawyer/legal aid would review each matching document, and determine if the document was relevant to the inquiry. There is not yet another automated way, although some Technology Assisted Review (TAR) tools are available, and could be applied here.
    - CRU may or may not have legal obligations to archive email. It would be an interesting question to ask them.
    - If they have or had an obligation to archive, even indirectly (eg. as a recipient of UK or US grants, for example), and didn’t that would be embarrassing, although not especially uncommon.
    - Based on their replies here, and other places they have no formal archive system or process covering that period. They may now.

  24. eternaloptimist
    Posted Oct 8, 2012 at 5:50 PM | Permalink | Reply

    Steve,
    I dont know if this has already been mentioned.
    If one of the players lost an email, and wanted it to be restored, they would contact IT support, give some details then wait to hear back from them.

    If IT support could oblige, by restoring the email, thats good.
    and thats where the FOI should focus

    • Steve McIntyre
      Posted Oct 9, 2012 at 10:22 AM | Permalink | Reply

      If one of the players lost an email, and wanted it to be restored, they would contact IT support, give some details then wait to hear back from them.

      what is the basis for saying this? is it in the Climategate emails or is this just a surmise?

      • eternaloptimist
        Posted Oct 14, 2012 at 1:58 PM | Permalink | Reply

        I was talking about foi requests generally, from an IT point of view. There are procedures for recovering from back ups. There are not procedures for looking around dusty back rooms hoping for a lucky find of an old CD.
        Now you are a great detective, and you may be able to track down some missing or mislaid or deliberately hidden stuff, and you have, and that is fantastic.
        But the IT depts around the world must follow their procedures. If they dont stack up to the foi legislation, then the foi custodians should try to force them or even prosecute.
        If the foi legislation is not up to scratch, well lets try to get it changed. Dont expect the IT departments to become super sleuths or jump through hoops.
        I am a big fan by the way. keep up the good work

    • Duke C.
      Posted Oct 9, 2012 at 1:07 PM | Permalink | Reply

      Here are the results I get using Eudora 7.1.0.9

      Attachments are stripped from incoming emails and deposited in a folder along the following default path:

      Attachment Converted: “c:\users\*******\appdata\roaming\qualcomm\eudora\attach\**********.doc”

      The attached doc remains in the file under the above path after the original email is deleted inside Eudora.

      Deleting the attachment from the above folder and clicking on the attachment link within the email returns a “file not found” error

      A preliminary check of all emails in CG1 containing attachments seem to have a reconfigured path:

      Attachment Converted: “c:\eudora\attach\[filename]”

      With the exception of Tim Osborne, who opted for the default path:

      Attachment Converted: “c:\documents and settings\tim osborn\my documents\eudora\attach\[filename]”

      Here’s my (possibly incorrect) analysis:

      Eudora was configured to save attachments in a different location (c:\eudora\attach\) and therefore would NOT be in the default subdirectory associated with Keith Briffa’s Eudora email account.

      • Steve McIntyre
        Posted Oct 9, 2012 at 1:15 PM | Permalink | Reply

        why wouldnt the document show up in a search of the entire server?

        A question that I still dont understand. Let’s say that there is a c:/eudora/… file that contains attachments. WOuld this be incremental or mirror i.e. in a more recent backup, does the later backup merely add to the documents or would it check against the current directory and delete documents no longer existing in the current directory?

        • clivere
          Posted Oct 9, 2012 at 4:35 PM | Permalink

          Steve – in your earlier post I did put up this link

          http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/backuppc-21/search-for-file-113520/

          “Also, you need to be careful about incrementals vs. fulls since incrementals
          will include only the most recently changed files while fulls might
          not include the latest version if there are subsequent incrementals. ”

          This suggests latest version of files could appear on an incremental backup but not on a full backup

        • Duke C.
          Posted Oct 9, 2012 at 9:04 PM | Permalink

          This may seem a bit pedantic, but would FOIing the corresponding Briffa emails (even though they’re already public) accomplish anything? Getting UEA to just confirm or deny whether or not they reside on the server would be helpful.

          Steve: I’d had the same thought. It’s frustrating that they’ve decided to take everything to the distance.

  25. Dave
    Posted Oct 9, 2012 at 7:47 AM | Permalink | Reply

    In answer to the question of whether something is ‘standard practice’ in IT, there is good practice out there, but it’s rare as hens’ teeth. Many IT departments are run more along the lines of climate science than mining engineering, I’m afraid. The appearance is that CRU was a typical IT cluster-f*ck, so it’s entirely possible that they did anything they claim, however stupid – their systems are clearly run by incompetents.

    There are endless examples like this one:

    http://thedailywtf.com/Articles/Finish-the-Finnish-Audit.aspx

  26. Shevva
    Posted Oct 10, 2012 at 7:17 AM | Permalink | Reply

    A couple of helpful links:

    http://www.ico.gov.uk/foikb/PolicyLines/FOIPolicyDeletedelectronicinformation.htm

    http://www.ico.gov.uk/global/faqs/freedom_of_information_for_organisations.aspx

    ‘If you do not hold the information requested you must confirm this in writing within 20 working days. This is not technically classified as a ‘refusal notice’, but simply follows your duty to confirm or deny whether you hold it. When appropriate it is also useful to give an explanation of why you do not hold the information, particularly in cases where the information has been deleted in line with a disposal schedule.’

  27. Duke C.
    Posted Oct 10, 2012 at 12:28 PM | Permalink | Reply

    It should be noted that there is a similarly named file in the CG1 \documents directory, from the time period in question:

    AR4SOR_BatchAB_Ch06-KRB-1stAug.doc

    UEA didn’t specify whether that specific file was contained within the ~100 hits.

  28. Ben
    Posted Oct 16, 2012 at 4:56 PM | Permalink | Reply

    “””Wouldn’t it be standard practice to periodically preserve some of the older backups? “””

    No. It would be almost unheard of to retain backups for more than seven years without a legal justification.

    There is both legislation and other legal requirements which mean that certain things must be kept – for example tax records for seven years is statutory. And there is legislation and legal requirements which mean that certain data must be deleted.

    In particular any “personal data” which is not required for a lawful purpose, must be deleted. Any random PC backup will inevitably contain personal data so fall under this. The lawful purposes can be quite broad, including academic statistical or historic purposes as a general catch-all. So anything currently kept on purpose will probably be fine so long as it was obtained in a lawful way for a lawful purpose and is still being used for that purpose. But if the personal data is not, as a matter of actual fact, required, it is unlawful to keep it. This will be the case if it has been deleted from the researcher’s PC – clearly he has decided he doesn’t need it.

    The general backup schedule after that will run from about 2 weeks to 7 years depending on the legal environment, but if complete deletion never occurs that is a problem.

    A data retention policy and backup policy have to help solve two of the problems in the CIA acronym: Confidentiality, Integrity, Availability. The purposes of backups are to support availability – that data which is required should be available, in the face of hardware failure or accident. The purpose of the deletion schedule, including backup rotation, is to support confidentiality. When data is supposed to be deleted to comply with data protection law or other requirements, failure to delete it is an information security problem with possible legal consequences.

    Finaly, if a backup of PC1 is taken in 2009, when at that time PC1 contains files from 2006, that backup will contain those files. I am not familiar with backupPC but with traditional backup programs such as NTBackup, you can delete the whole backup or keep it. There is no option to remove files from the backup which meet certain criteria.

    In short, the explanation makes perfect sense to me. I don’t think there is a mystery here.

  29. Duke C.
    Posted Nov 12, 2012 at 4:20 PM | Permalink | Reply

    Steve-

    More information received today. Check your email subject line for “Briffa-Wahl”.

  30. Duke C.
    Posted Nov 30, 2012 at 11:34 AM | Permalink | Reply

    Here’s the current status of my Oct.12, 2012 FOI request-

    Bear in mind that this request relates to the emails, not the attachments. However, there is additional content in this release, not found in the public (CG1,CG2) versions. Have not parsed them completely, but no smoking gun so far…
    ———————————————————————————————————
    30 November 2012

    FREEDOM OF INFORMATION ACT 2000 – INFORMATION REQUEST
    (Our file: FOI_12-139)

    Further to our response in this matter of 12 November 2012 and your subsequent email of
    20 November 2012, I am writing to update you on this request and to provide further
    information.
    As noted in my email of 22 November, I have forwarded your concerns regarding the extent
    of our searching to the appropriate technical staff within the University and can report that in
    response to your question; no, we did not initially search the numbered mbx backup files.
    Having concluded that the requested information could, in fact, be held within these files, we
    conducted further searches late last week. Specifically, we identified three files which were
    not previously retrieved or searched: In.mbx.001, In.mbx.002 and Out.mbx.001. There was
    no Out.mbx.002 file. All three files were decompressed and searched for any references to
    ‘wahl’ and all search matches were checked to determine whether they related to emails
    exchanged between Keith Briffa and Eugene Wahl between 1 July 2006 and 31 August
    2006.
    As a result of this revised search, we not only discovered the two emails previously reported
    as ‘not held’ but nine (9) other emails that fall within the scope of your request of 11 October
    2012. These are contained within the attached document entitled ‘Appendix A_Additional
    material.pdf’.
    I would like to take this opportunity to apologise for not providing this information at first
    instance. We do take our obligations under the Act very seriously and are very sorry that our
    original search strategy overlooked this information. I would like to thank you for bringing this
    to our attention so that we can fulfil your request in its entirety.
    However, I must report that, in accordance with section 17 of the Freedom of Information Act
    2000 I am not obliged to supply all of the requested information. The exemptions are clearly
    indicated within the attached document and the reasons for exemption are as stated below:

    Exemption Reason
    s.40(2), Personal information Disclosure of information would contravene
    one of the data protection principles

    We would also hold that a small amount of the requested information contains information
    that meets the definition of ‘personal information’ as defined by section 1(1) of the UK Data
    Protection Act 1998 (DPA). Specifically, we believe that some of the information within the
    emails from Eugene Wahl to Keith Briffa is clearly personal data of both Dr Wahl and
    members of his family whose release would contravene the Act.
    Specifically, the disclosure of this information would be contrary to the first data protection
    principle under the DPA; namely that information be processed in a fair and lawful fashion
    and that the processing also meets at least one of the conditions set out in Schedule 2 of the
    Act. We do not have consent for the release of this information identifying the interviewees,
    nor are there any conditions present that would allow us to release under any of the other
    provisions of Schedule 2 of the DPA.
    As before, I would also add that any material released over which UEA has copyright is
    released subject to the understanding that you will comply with all relevant copyright rules
    regarding reproduction and/or transmission of the information released.
    You have the right of appeal against this decision. If you wish to appeal, please set out in
    writing your grounds of appeal and send to me at:
    University of East Anglia
    Norwich Research Park
    Norwich
    NR4 7TJ
    Telephone: 01603 593523
    E-mail:
    You must appeal our decision within 60 calendar days of the date of this letter. Any appeal
    received after that date will not be considered nor acknowledged. This policy has been
    reviewed and approved by the Information Commissioner’s Office.
    You also have a subsequent right of appeal to the Information Commissioner at:
    Information Commissioner’s Office
    Wycliffe House
    Water Lane
    Wilmslow, Cheshire
    SK9 5AF
    Telephone: 0303 123 1113
    http://www.ico.gov.uk
    Please quote our reference given at the head of this letter in all correspondence.
    Yours sincerely
    David Palmer
    Information Policy and Compliance Manager
    University of East Anglia

    • Brandon Shollenberger
      Posted Dec 2, 2012 at 1:45 AM | Permalink | Reply

      Duke C:

      Bear in mind that this request relates to the emails, not the attachments. However, there is additional content in this release, not found in the public (CG1,CG2) versions. Have not parsed them completely, but no smoking gun so far…

      As a person who deals with various IT issues, one of the more annoying things I see is the treating of “e-mails” and “attachments” as separate things. That’s a fiction. An attachment is, by definition, part of the e-mail it is “attached” to.

      The idea that it is a separate entity is just an illusion created by e-mail clients. In those, the user sees his “e-mail” and attachment as separate. But as soon as he or she clicks Send, they are combined into a single entity, an e-mail.

      I get the incorrect usage has entered common parlance, but I don’t understand why so many things involving IT people use it. Any IT person should realize it’s a bogus distinction.

One Trackback

  1. [...] Over at Climate Audit, Steve reports on the Update for the FOI for the Wahl Attachments [...]

Post a Comment

Required fields are marked *

*
*

Follow

Get every new post delivered to your Inbox.

Join 3,114 other followers

%d bloggers like this: