To: Phil Jones <email@example.com>
Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Wed, 05 Dec 2007 14:19:17 -0800
Cc: carl mears <firstname.lastname@example.org>, Karl Taylor <email@example.com>, Tom Wigley <firstname.lastname@example.org>, Tom Wigley <email@example.com>, "Thorne, Peter" <firstname.lastname@example.org>, Steven Sherwood <Steven.Sherwood@yale.edu>, John Lanzante <John.Lanzante@noaa.gov>, "'Dian J. Seidel'" <email@example.com>, Melissa Free <Melissa.Free@noaa.gov>, Frank Wentz <firstname.lastname@example.org>, Steve Klein <email@example.com>, Leopold Haimberger <firstname.lastname@example.org>, peter gleckler <email@example.com>
Just a quick response to the issue of "model weighting" which you and
Carl raised in your emails.
We recently published a paper dealing with the identification of an
anthropogenic fingerprint in SSM/I-based estimates of total column water
vapor changes. This was a true multi-model detection and attribution
("D&A") study, which made use of results from 22 different A/OGCMs for
fingerprint and noise estimation. Together with Peter Gleckler and Karl
Taylor, I'm now in the process of repeating our water vapor D&A study
using a subset of the original 22 models. This subset will comprise
10-12 models which are demonstrably more successful in capturing
features of the observed mean state and variability of water vapor and
SST - particularly features crucial to the D&A problem (such as the
low-frequency variability). We've had fun computing a whole range of
metrics that might be used to define such a subset of "better" models.
The ultimate goal is to determine the sensitivity of our water vapor D&A
results to model quality. I think that this kind of analysis will be
unavoidable in the multi-model world in which we now live. Given
substantial inter-model differences in simulation quality, "one model,
one vote" is probably not the best policy for D&A work!
Once we've used Carl's method to calculate synthetic MSU temperatures
from the IPCC AR4 20c3m data (as described in my previous email), it
should be relatively easy to do a similar "model culling" exercise with
MSU T2, T4, and TLT. In fact, this is what we had already planned to do
in collaboration with Carl and Frank.
One key point in any model weighting or selection strategy is to avoid
circularity. In the D&A context, it would be impermissible to include
information on trend behavior as a criterion used for selecting "better"
models. Likewise, if our interest is in assessing the statistical
significance of model-versus-observed trend differences, we can't use
model performance in simulating "observed" tropospheric or stratospheric
trends (whatever those might be!) as a means of identifying more
A further issue, of course, is that we are relying on results from fully
coupled A/OGCMs, and are making trend comparisons over relatively short
periods (several decades). On these short timescales, estimates of the
"true" trend in response to the applied 20c3m forcings are quite
sensitive to natural variability noise (as Peter Thorne's 2007 GRL paper
clearly illustrates). Because of such chaotic variability, even a
hypothetical model with perfect physics and forcings would yield a
distribution of tropospheric temperature trends over 1979 to 1999, some
of which would show larger or smaller cooling than observed. This is why
it's illogical to stratify model results according to correspondence
between modeled and observed surface warming - something which John
Christy is very fond of doing.
What we've done (in the new water vapor work described above) is to
evaluate the fidelity with which the AR4 models simulate the observed
mean state and variability of precipitable water and SST - not the
trends in these quantities. We've looked at a model performance in a
variety of different regions, and on multiple timescales. The results
are fascinating, and show (at least for water vapor and SST) that every
model has its own individual strengths and weaknesses. It is difficult
to identify a subset of models that CONSISTENTLY does well in many
different regions and over a range of different timescales.
My guess is that we would obtain somewhat different results for MSU
temperatures - particularly for comparisons involving variability.
Clearly, the absence of volcanic forcing in roughly half of the 20c3m
experiments will have a large impact on the estimated variability of
synthetic T4 temperatures (and perhaps even on T2), and hence on
model-versus-data variability comparisons. It's also quite possible that
the inclusion or absence of volcanic forcing has an impact not only on
the amplitude of the variability of global-mean T4 anomalies, but also
on the pattern of T4 variability. So model ranking exercises based on
performance in simulating the mean state and variability of T4 and T2
may show some connection to the presence or absence of volcanic/ozone
The sad thing is we are being distracted from doing this fun stuff by
the need to respond to Douglass et al. That's a real shame.
With best regards,
Phil Jones wrote:
> IJC do have comments but only very rarely. I see little point in
> doing this
> as there is likely to be a word limit, and if the system works properly
> Douglass et al would get the final say. There is also a large backlog in
> papers awaiting to appear, so even if the comment were accepted it would
> be some time after Douglass et al that it would appear.
> Better would be a submission to another journal (JGR?) which
> would be quicker. This could go in before Douglass et al appeared in
> print - it should be in the IJC early online view fairly soon based on
> recent experiences.
> A paper pointing out the issues of trying to weight models in some way
> would be very beneficial to the community. AR5 will have to go down this
> route at some point. How models simulate the
> recent trends at the surface and in the troposphere/stratosphere and
> how they might be ranked is a possibility. This could bring in the
> new work Peter alludes to with the sondes.
> There are also some aspects of recent surface T changes that could be
> discussed as well. These relate to the growing dominance of buoy SSTs
> (now 70% of the total) vs conventional ships. There is a paper in J.
> accepted from Smith/Reynolds et al at NCDC, which show that buoys
> could conceivably be cooler than ship-based SST by about 0.1C - meaning
> that the last 5-10 years are being gradually underestimated over the
> Overlap is still too short to be confident about this, but it highlights a
> major systematic change occurring in surface ocean measurements. As the
> buoys are presumably better for absolute SSTs, this means models
> driven with fixed SSTs should be using fields that are marginally cooler.
> And then there is the continual reference to Kalnay and Cai, when
> Simmons et al (2004) have shown the problems with NCEP. It is possible
> to add in the ERA-Interim analyses and operational analyses to
> being results from ERA-40 up to date.
> At 23:40 04/12/2007, carl mears wrote:
>> Karl -- thanks for clarifying what I was trying to say
>> Some further comments.....
>> At 02:53 PM 12/4/2007, Karl Taylor wrote:
>>> Dear all,
>>> 2) unforced variability hasn't dominated the observations.
>> But on this short time scale, we strongly suspect that it has
>> dominated. For example, the
>> 2 sigma error bars from table 3.4, CCSP for satellite TLT are 0.18
>> (UAH) or 0.19 (RSS), larger
>> than either group's trends (0.05, 0.15) for 1979-2004. These were
>> calculated using a "goodness
>> of linear fit" criterion, corrected for autocorrelation. This is a
>> probably a reasonable
>> estimate of the contribution of unforced variability to trend
>>> Douglass et al. have *not* shown that every individual model is in
>>> fact inconsistent with the observations. If the spread of individual
>>> model results is large enough and at least 1 model overlaps the
>>> observations, then one cannot claim that all models are wrong, just
>>> that the mean is biased.
>> Given the magnitude of the unforced variability, I would say "the mean
>> *may* be biased." You can't prove this
>> with only one universe, as Tom alluded. All we can say is that the
>> observed trend cannot be proven to
>> be inconsistent with the model results, since it is inside their range.
>> It we interesting to see if we can say anything more, when we start
>> culling out the less realistic models,
>> as Ben has suggested.
> Prof. Phil Jones
> Climatic Research Unit Telephone +44 (0) 1603 592090
> School of Environmental Sciences Fax +44 (0) 1603 507784
> University of East Anglia
> Norwich Email firstname.lastname@example.org
> NR4 7TJ
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (925) 422-2486
FAX: (925) 422-7675