Radiosonde trends are back in the news. A few days ago, on May 24, 2008, Realclimate reviewed three recent papers: Lanzante and Free (J Clim 2008), Haimberger et al (J Clim 2008) and Sherwood et al 2008, adding a note with the even more recent Allen and Sherwood (2008.) Peter Thorne of Hadley Center stated of the Allen and Sherwood study:
The new study “provides … long-awaited experimental verification of model predictions,” Thorne wrote.
We discussed radiosonde data in connection with Douglass et al 2007, discussing in particular Gavin Schmidt’s purported excoriation of Douglass et al 2007, in which Schmidt took particular umbrage at Douglass et al use of Raobcore v1.2, a study published in April 2007, one month before the submission of Douglass et al 2007 in May 2007. We are used to climate scientists “moving on”, but the speed of decampment in this instance seems particularly rapid. Schmidt’s implication is that Haimberger had repudiated Raobcore v1.2 before it was even published (it is explicitly repudiated in Haimberger et al 2008); however, rather than criticizing Haimberger for his failure to withdraw his then still unpublished now repudiated results, he criticized Douglass et al for using the most recently published results (results that were a hoary one month of age at the time of the submission) rather than attempting to anticipate the results of future climate nomad migrations.
Overlooked in this particular exchange was exactly what prompted Haimberger’s rapid abandonment of the Raobcore v1.2 camp site. I thought that it would be interesting to examine the reasons for this abandonment and will do so today, as this is another interesting case of data adjustments by climate scientists – a topic not unfamiliar to CA readers.
Haimberger, like the other recent flurry of studies, to a considerable degree, turns on a statistical issue previously discussed at CA in connection with surface stations – the use of home-made breakpoint algorithms developed by climate scientists and unstudied in the general statistical literature to adjust (or attempt to adjust) for inhomogeneities and discontinuities in a data set with poor quality control and untrustworthy meta data. Indeed, some radiosonde literature cites adjustment (“homogenization”) techniques developed by USHCN, if you can imagine that.
If you browse the Surface Record category, you will see some discussion of these changepoint algorithms, although the issue has been noted, rather than run to ground. We spent a little extra time on the case of Lampasas TX, a case which also attracted the interest of Atmoz, where an obvious and easily observed discontinuity was missed by the USHCN changepoint algorithm. On other occasions, I’ve noted my sense that these changepoint algorithms in practice merely seem to end up blending good and bad data and, in particular, seem very vulnerable to contamination. Anthony Watts has a recent post referring to planned USHCN changepoint analysis.
Radiosonde adjusters take adjustment to extremes not contemplated in the surface record – ultimately even changing the sign of the trend. Sort of like Hansen on steroids.
The underlying difficulty for present-day scientists trying to extract information from the historical radiosonde data is that the problems with quality control and meta data in the radiosonde network appear far more severe than surface station record, which is disappointing, given that the radiosonde data was not collected by USHCN volunteers, but by trained climate professionals and that much of the data was collected during the IPCC era. Here’s a statement by Sherwood at realclimate in 2005 summarizing the compromising of the radiosonde record. Many other issues are identified in the specialist literature, problems being already identified in the early 1990s.
Few if any sites have used exactly the same technology for the entire length of their record, and large artifacts have been identified in association with changes from one manufacturer to another or design upgrades by the same manufacturer. Artifacts have even been caused by changing software and bug fixes, balloon technology, and tether lengths. Alas, many changes over time have not been recorded, and consistent corrections have proven elusive even for recorded changes. While all commonly used radiosondes have nominal temperature accuracy of 0.1 or 0.2 K, these accuracies are verified only in highly idealized laboratory conditions. Much larger errors are known to be possible in the real world. The most egregious example is when the temperature sensor becomes coated with ice in a rain cloud, in which case upper tropospheric temperatures can be as much as 20 C too warm. This particular scenario is fairly easy to spot and such soundings can be removed, but one can see the potential problems if many, less obvious errors are present or if the sensor had only a little bit of ice on it! Another potential problem is pressure readings; if these are off, the reported temperature will have been measured at the wrong level.
Sherwood et al 2007 forcefully criticized prior changepoint analysis by other scientists as follows:
A considerable climatic and statistical literature exists on the problem of detecting undocumented “change points,” or discontinuities in the statistics of a time series (see Menne and Williams 2005, and references therein). Climate relevant changes are usually modeled as simple step discontinuities in observing bias, due e.g., to changed sensor design, relocation of the sensor, etc., and are thereby distinguishable—at least in principle—from the relatively smooth variation of the underlying observable. Detection of the change point is followed by estimation (and ultimately, removal) of its associated level shift.
These studies have left key issues unresolved. First, detection methods typically assume that the observations possess little or no serial correlation, but real climate records contain variability on all time scales. This makes false detections more likely since the natural variability begins to resemble the artifacts. Second, the goal is usually not detection per se but accurate climate signals, yet previous studies have not carefully investigated to what extent that actually occurs. A tendency has been noted for radiosonde temperature trends to disappear upon homogenization (Free et al. 2002). Finally, while the value of using data from neighboring sites is wellrecognized for levelshift estimation (e.g. Karl and Williams 1987), detection studies have dwelled on the case of an isolated time series; the use of neighbor information remains adhoc in practice, and its efficacy untested.
This criticism was re-iterated in Sherwood et al 2008 :
A detailed exploration by Sherwood (2007, hereafter S07) using statistical simulations revealed that standard methods were often unable to estimate trends reliably. Three problems were identified. Even with liberal detection criteria not all change points are found: the “missed artifact” problem. On the other hand, even with very strict criteria, false change point detections are unavoidable when time series have realistic serial correlation. Subsequent adjustment of the time series tended to eliminate trends (or, in the case where a satellite reference is used, trends in the sondesatellite difference): the “greedy artifact” problem. Finally, when reference information from nearby stations was used, artifacts at neighbor stations tend to cause adjustment errors: the “bad neighbor” problem. In this case, after adjustment, climate signals became more similar at nearby stations even when the average bias over the whole network was not reduced.
Sherwood’s last sentence here is very reminiscent of a phenomenon that I’d noted in connection with USHCN adjustments.
Both Sherwood and Haimberger have quick surveys of prior adjustment efforts, which are worth reading. My take is that there seem to be two approaches to the “homogenization” problem. One approach is what seems to be the approach of Angell – search for the best quality stations and use them, even if the network is only a subset of the original network. Angell ended up with 62 stations, but had trends inconsistent with model expectations.
[Update: As Peter Thorne observes in a comment below, radiosonde scientists have made substantial efforts to make their data publicly available. I commend them for this. I was able to quickly locate and download relevant information on the Angell, RATPAC, HadAT2, Raobcore v1.4 data sets. Some of the data sets are very large and represent considerable effort. I corresponded once with Leopold Haimberger and promptly received directions to a url that I had been unable to locate. In the Raobcore data set that is the primary topic here, I was able to locate data representing 4 satellite levels, but, even after seeking Haimberger's assistance, could not locate data at the pressure levels portrayed in their web visualization.]
Lanzante and associates (Free, Seidel) also attempted to identify a QC-ed network (87 stations), but included that even this network (RATPAC-B) had large inhomogeneities and accordingly they developed adjustments based on metadata. After these adjustments, their trends were still inconsistent with model expectations. Sherwood deprecated this procedure as “subjective” and Haimberger deprecated it as “laborious”.
The Raobcore approach is based on the diametrically opposite approach. They make no attempt whatever at prior quality control – that would be “subjective”. They essentially dump all data into their network, regardless of quality or inhomogeneity and rely on automated adjustments through changepoint analysis to sort the mess. My entire instinct is against this sort of approach – which reminds me all too much of Mannian analysis of the North American tree ring network. Instead of trying to work with a controlled network of properly QCed stations, Haimberger constructed a network of 2881 records, of which 1536 were taken from the IGRA list of radiosonde stations and the other 1355 (including many very short records) from data that was not included in IGRA but was in the ERA-40 reanalysis project. Homogeneity adjustments were applied to stations with records longer than 180 days (of which there were only 1184 stations). One might well question the inclusion of 1697 stations with records of less than 180 days in a study purporting to understand 30-year trends.
The algorithm in Haimberger et al 2007 added a novel tweak to changepoint methods – a tweak that should not be accepted a proven methodology, merely because it’s been published in a journal with weak statistical refereeing (Journal of Climate):
This paper introduces a new technique that uses time series of temperature differences between the original radiosonde observations (obs) and background forecasts (bg) of an atmospheric climate data assimilation system for homogenization.
One wonders at the real statistical properties of this “new” technique. To what extent does this technique merely imprint trends in ERA-40 onto the radiosonde data? What if some other history had been used as a target? Would that have led to a different history? If that’s the case, what, if anything is proven by the Raobcore exercise?
In this case, it seems to me that important light is shed on this question because of a very curious change between Raobcore v1.2 to Raobcore v1.4 – which indicates that the target model has a substantial impact on the analysis results.
Adjusting the Target Model
Raobcore (both v1.2 and v1.4) did not merely adjust the radiosonde data. They adjusted the target ERA-40 model as well, with the adjustments to the target ERA-40 model being more extensive in v1.2 than in v1.4.
Haimberger 2007 described the v1.2 adjustments to the target ERA-40 model as follows:
Although ERA-40 used a frozen data assimilation system, the time series of the background forecasts contains some breaks as well, mainly due to changes in the satellite observing system. It has been necessary to adjust the global mean background forecast temperatures before the radiosonde homogenization. After this step, homogeneity adjustments, which can be added to existing raw radiosonde observations, have been calculated for 1184 radiosonde records….
It is essential to be aware of any inhomogeneities of the ERA-40 bg since these reduce the applicability of the ERA-40 bg as a reference. Inhomogeneities in the bg time series may be introduced by changes in the ERA-40 observation coverage, in the observation biases correction and in the overall observation quality. Apart from radiosondes mainly the satellite data are affected by changing biases
The most prominent breaks evident in Figure 8 occured in January 1975, September 1976 and April 1986 are related to problems with the NOAA-4 and NOAA-9 satellites. Jumps in 1995/1997 coincide with end of NOAA-11, start/end of NOAA-14 (see also Christy and Norris 2006). At high altitudes the effects of insufficient bias correction of radiances from the stratospheric sounding unit (SSU), particularly in the early 1980s, are noticeable (see Haimberger 2005; Uppala et al. 2006). Trenberth and Smith (2006) have recently diagnosed a spurious break in ERA-40 temperature analyses related to the assimilation of MSU-3 radiances at the end of the NOAA-9 period.
The principal changes in Raobcore v1.4 were changed adjustments to the target model, described on the Raobcore website as follows:
Version 1.4 of RAOBCORE contains 2 major improvements compared to the versions 1.2, 1.3 described in Haimberger (2007) (J. Climate, in press). These improvements are: …
2) The ERA-40 background modification described in Haimberger (2007) is only applied between Jan 1972 and Dec 1986. It has turned out that the ERA-40/ECMWF bg forecast time series are quite consistent with recent versions of the RSS and UAH satellite datasets, so that a modification of the ERA-40 bg is not necessary. Between 1972 and 1986, modifications of the bg are unavoidable. The bg is modified more strongly in the tropics in v1.4 compared to the modification applied in version 1.2. The differences between 1.2, 1.3 and 1.4 can be examined using the web visualization tool.
So Raobcore v1.2 argued in a peer reviewed journal that there were post-1986 inhomogeneities in the ERA-40 model that required adjustment, giving a list of such inhomogeneities. Raobcore v1.4 decided that adjustments to ERA-40 after 1986 were not required after all. One would have thought that Journal of Climate would have required a detailed explanation of why Haimberger et al had changed their views so quickly and a detailed analysis of each post-1986 adjustment that was no longer deemed pertinent. A year earlier, Haimberger expressed concern about “jumps” at the end of NOAA-14. A year later, he was no longer concerned. Why? Is there any such analysis in Haimberger et al 2008? Nope.
The impact of removing the post-1986 ERA-40 model adjustments is not small. Douglass et al 2007 argued that the Raobcore v1.2 trends were inconsistent with models, while Schmidt argued that Raobcore v1.4 was consistent with models.
The graphic below compares Raobcore v1.2 (blue) and v1.4 (green) tropical trends by altitude. One version of v1.2 is as reported in Douglass et al 2007 and one is manually estimated from the Raobcore “web visualization tool”; one version of v1.4 is manually estimated from a figure in Haimberger et al 2008 and one version is manually estimated from the Raobcore web visualization tool.
There’s another issue here. I wasn’t able to replicate the Raobcore v1.4 diagram from archived data. Raobcore data is not archived apples to apples with the “web visualization” tool. They’ve archived data blended to TLT, TMT, TTS and TLS levels, which I’ve plotted in red below. This should reconcile to the v1.4 data at the different altitudes, but doesn’t appear to. I don’t exclude the possibility that I’ve been wrongfooted somewhere along the way. I’m careful about these things, but I’m not familiar with this subfield. So there may be still another issue here.
Raobcore Tropical Trends. blue – v1.2; green – v1.4; red – from archived v1.4 data.
For the purposes of trying to say whether or not the radiosonde “data” is consistent or inconsistent with the models, the underlying problem, as noted above, is that the radiosonde network is so thoroughly contaminated by inhomogeneities, worsened by defective quality control by observers. Different qualified observers have extracted radically different trends from the radiosonde data.
In the surface stations situation, one way of trying to find solid ground in a sea of adjustments was locating “crn 1″ stations with long histories of consistent observation and metadata, as Anthony Watts has been trying to do. In the surface stations example, it’s hard to see any value in blending such data with compromised data from sites like the University of Arizona parking lot. It seems highly probable that temperatures in the 2000s are warmer than the 1930s, but I’d like to see this established from “crn 1″ stations. This seems like an elementary form of quality control.
Raobcore v1.4 is hardly the last word in radiosonde adjustments. Leopold Haimberger (and I intend no slight by the title of the post, I just liked the sound of it) has already moved on to yet another adjustment system (“RICH”); Allen and Sherwood 2008 have used Iterative Universal Kriging, adding wind information into their adjustment brew. What each of these studies has in common is that none of them are new “experimental verification”; they are merely adjustments of ever increasing magnitude.
I noted above that Sherwood had cogently criticized changepoint analysis as carried out in every predecessor adjustment. I think that, at this point, it’s quite reasonable to stipulate Sherwood’s criticisms. An immediate result is that neither of the Raobcore versions constitutes “data” that could possibly confirm or reject a model, nor for that matter would any of the other adjustment systems.
The only alternative in the radiosonde field to these after-the-fact adjustments was Angell’s effort to create a small network of “good” sites without adjustment. If specialists have concluded – and this appears to be common ground – that Angell’s network was also compromised, then I think it’s possible that the field may simply have reached a stalemate in terms of trying to determine whether the models are consistent with the “data” – either to claim vindication as Thorne has recently done or claim inconsistency (Douglass et al 2007). This does not imply anything one way or the other in respect to the ability to draw conclusions from the satellite record, which, among other things, has the advantage of involving only a limited number of instruments, whose properties have been studied in detail.
Allen and Sherwood 2008 try a different tack – they try to create a homogenized wind data series on the basis that the radiosonde wind data is much less screwed up. They then argue that the trends in wind are consistent with tropical troposphere warming. They use this as evidence for the side of the argument that the UAH satellite temperature trends in the tropics are incorrect. I guess that we’ll see more about tropospheric wind data in the next while.
Allen, R. J., and S. C. Sherwood. 2008. Warming maximum in the tropical upper troposphere deduced from thermal winds. Nature Geoscience (May 25). http://www.nature.com/ngeo/journal/vaop/ncurrent/abs/ngeo208.html.
Douglass, D. H., J. R. Christy, B. D. Pearson, and S. F. Singer. 2007. A comparison of tropical temperature trends with model predictions. Intl J Climatology (Royal Meteorol Soc).
Haimberger, L., C Tavolato, and S Sperka. 2008. Towards elimination of the warm bias in historic radiosonde temperature records -
some new results from a comprehensive intercomparison of upper air data. Journal of Climate, no. under review. ftp://srvx6.img.univie.ac.at/pub/haimbergeret_al08.pdf.
Lanzante, J, and M. Free. 2008. Comparison of Radiosonde and GCM Vertical Temperature Trend Profiles: Effects of Dataset Choice and Data Homogenization. Journal of Climate. http://ams.allenpress.com/perlserv/?request=get-abstract&doi=10.1175%2F2008JCLI2287.1.
Sherwood, S. C., C. L. Meyer, R. J. Allen, and Holly A. Titchner. 2008. Robust tropospheric warming revealed by iteratively homogenized radiosonde data. Journal of Climate. http://ams.allenpress.com/perlserv/?request=get-abstract&doi=10.1175%2F2008JCLI2320.1.