Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Thu, 06 Dec 2007 13:04:20 -0500
Cc: Phil Jones <firstname.lastname@example.org>, carl mears <email@example.com>, Karl Taylor <firstname.lastname@example.org>, Tom Wigley <email@example.com>, Tom Wigley <firstname.lastname@example.org>, "Thorne, Peter" <email@example.com>, Steven Sherwood <Steven.Sherwood@yale.edu>, John Lanzante <John.Lanzante@noaa.gov>, Melissa Free <Melissa.Free@noaa.gov>, Frank Wentz <firstname.lastname@example.org>, Steve Klein <email@example.com>, Leopold Haimberger <firstname.lastname@example.org>, peter gleckler <email@example.com>
Hello Ben and Colleagues,
I've been following these exchanges with interest. One particular point
in your message below is a little puzzling to me. That's the issue of
trying to avoid circularity in the culling of models for any given D&A
Two potential problems occur to me. One is that choosing models on the
basis of their fidelity to observed regional and short term variability
may not be completely orthogonal to choosing based on long-term trend.
That's because those smaller scale changes may contribute to the trends
and their patterns. Second, choosing a different set of models for one
variable (temperature) than for another (humidity) seems highly
problematic. If we are interested in projections of other variables,
e.g. storm tracks or cloud cover, for which D&A has not been done, which
group of models would we then deem to be most credible? I don't have a
good alternative to propose, but, in light of these considerations,
maybe one-model-one-vote doesn't appear so unreasonable after all.
Ben Santer wrote:
> Dear Phil,
> Just a quick response to the issue of "model weighting" which you and
> Carl raised in your emails.
> We recently published a paper dealing with the identification of an
> anthropogenic fingerprint in SSM/I-based estimates of total column
> water vapor changes. This was a true multi-model detection and
> attribution ("D&A") study, which made use of results from 22 different
> A/OGCMs for fingerprint and noise estimation. Together with Peter
> Gleckler and Karl Taylor, I'm now in the process of repeating our
> water vapor D&A study using a subset of the original 22 models. This
> subset will comprise 10-12 models which are demonstrably more
> successful in capturing features of the observed mean state and
> variability of water vapor and SST - particularly features crucial to
> the D&A problem (such as the low-frequency variability). We've had fun
> computing a whole range of metrics that might be used to define such a
> subset of "better" models. The ultimate goal is to determine the
> sensitivity of our water vapor D&A results to model quality. I think
> that this kind of analysis will be unavoidable in the multi-model
> world in which we now live. Given substantial inter-model differences
> in simulation quality, "one model, one vote" is probably not the best
> policy for D&A work!
> Once we've used Carl's method to calculate synthetic MSU temperatures
> from the IPCC AR4 20c3m data (as described in my previous email), it
> should be relatively easy to do a similar "model culling" exercise
> with MSU T2, T4, and TLT. In fact, this is what we had already planned
> to do in collaboration with Carl and Frank.
> One key point in any model weighting or selection strategy is to avoid
> circularity. In the D&A context, it would be impermissible to include
> information on trend behavior as a criterion used for selecting
> "better" models. Likewise, if our interest is in assessing the
> statistical significance of model-versus-observed trend differences,
> we can't use model performance in simulating "observed" tropospheric
> or stratospheric trends (whatever those might be!) as a means of
> identifying more credible models.
> A further issue, of course, is that we are relying on results from
> fully coupled A/OGCMs, and are making trend comparisons over
> relatively short periods (several decades). On these short timescales,
> estimates of the "true" trend in response to the applied 20c3m
> forcings are quite sensitive to natural variability noise (as Peter
> Thorne's 2007 GRL paper clearly illustrates). Because of such chaotic
> variability, even a hypothetical model with perfect physics and
> forcings would yield a distribution of tropospheric temperature trends
> over 1979 to 1999, some of which would show larger or smaller cooling
> than observed. This is why it's illogical to stratify model results
> according to correspondence between modeled and observed surface
> warming - something which John Christy is very fond of doing.
> What we've done (in the new water vapor work described above) is to
> evaluate the fidelity with which the AR4 models simulate the observed
> mean state and variability of precipitable water and SST - not the
> trends in these quantities. We've looked at a model performance in a
> variety of different regions, and on multiple timescales. The results
> are fascinating, and show (at least for water vapor and SST) that
> every model has its own individual strengths and weaknesses. It is
> difficult to identify a subset of models that CONSISTENTLY does well
> in many different regions and over a range of different timescales.
> My guess is that we would obtain somewhat different results for MSU
> temperatures - particularly for comparisons involving variability.
> Clearly, the absence of volcanic forcing in roughly half of the 20c3m
> experiments will have a large impact on the estimated variability of
> synthetic T4 temperatures (and perhaps even on T2), and hence on
> model-versus-data variability comparisons. It's also quite possible
> that the inclusion or absence of volcanic forcing has an impact not
> only on the amplitude of the variability of global-mean T4 anomalies,
> but also on the pattern of T4 variability. So model ranking exercises
> based on performance in simulating the mean state and variability of
> T4 and T2 may show some connection to the presence or absence of
> volcanic/ozone forcing.
> The sad thing is we are being distracted from doing this fun stuff by
> the need to respond to Douglass et al. That's a real shame.
> With best regards,
> Phil Jones wrote:
>> IJC do have comments but only very rarely. I see little point in
>> doing this
>> as there is likely to be a word limit, and if the system works properly
>> Douglass et al would get the final say. There is also a large
>> backlog in
>> papers awaiting to appear, so even if the comment were accepted it
>> be some time after Douglass et al that it would appear.
>> Better would be a submission to another journal (JGR?) which
>> would be quicker. This could go in before Douglass et al appeared in
>> print - it should be in the IJC early online view fairly soon based on
>> recent experiences.
>> A paper pointing out the issues of trying to weight models in some
>> would be very beneficial to the community. AR5 will have to go down
>> route at some point. How models simulate the
>> recent trends at the surface and in the troposphere/stratosphere and
>> how they might be ranked is a possibility. This could bring in the
>> new work Peter alludes to with the sondes.
>> There are also some aspects of recent surface T changes that could be
>> discussed as well. These relate to the growing dominance of buoy SSTs
>> (now 70% of the total) vs conventional ships. There is a paper in J.
>> accepted from Smith/Reynolds et al at NCDC, which show that buoys
>> could conceivably be cooler than ship-based SST by about 0.1C - meaning
>> that the last 5-10 years are being gradually underestimated over the
>> Overlap is still too short to be confident about this, but it
>> highlights a
>> major systematic change occurring in surface ocean measurements. As the
>> buoys are presumably better for absolute SSTs, this means models
>> driven with fixed SSTs should be using fields that are marginally
>> And then there is the continual reference to Kalnay and Cai, when
>> Simmons et al (2004) have shown the problems with NCEP. It is possible
>> to add in the ERA-Interim analyses and operational analyses to
>> being results from ERA-40 up to date.
>> At 23:40 04/12/2007, carl mears wrote:
>>> Karl -- thanks for clarifying what I was trying to say
>>> Some further comments.....
>>> At 02:53 PM 12/4/2007, Karl Taylor wrote:
>>>> Dear all,
>>>> 2) unforced variability hasn't dominated the observations.
>>> But on this short time scale, we strongly suspect that it has
>>> dominated. For example, the
>>> 2 sigma error bars from table 3.4, CCSP for satellite TLT are 0.18
>>> (UAH) or 0.19 (RSS), larger
>>> than either group's trends (0.05, 0.15) for 1979-2004. These were
>>> calculated using a "goodness
>>> of linear fit" criterion, corrected for autocorrelation. This is a
>>> probably a reasonable
>>> estimate of the contribution of unforced variability to trend
>>>> Douglass et al. have *not* shown that every individual model is in
>>>> fact inconsistent with the observations. If the spread of
>>>> individual model results is large enough and at least 1 model
>>>> overlaps the observations, then one cannot claim that all models
>>>> are wrong, just that the mean is biased.
>>> Given the magnitude of the unforced variability, I would say "the
>>> mean *may* be biased." You can't prove this
>>> with only one universe, as Tom alluded. All we can say is that the
>>> observed trend cannot be proven to
>>> be inconsistent with the model results, since it is inside their range.
>>> It we interesting to see if we can say anything more, when we start
>>> culling out the less realistic models,
>>> as Ben has suggested.
>> Prof. Phil Jones
>> Climatic Research Unit Telephone +44 (0) 1603 592090
>> School of Environmental Sciences Fax +44 (0) 1603 507784
>> University of East Anglia
>> Norwich Email firstname.lastname@example.org
>> NR4 7TJ
Dian J. Seidel
NOAA Air Resources Laboratory (R/ARL)
1315 East West Highway
Silver Spring, MD 20910