To: email@example.com, John Lanzante <John.Lanzante@noaa.gov>
Subject: Re: Updated Figures
Date: Sat, 12 Jan 2008 13:20:26 -0500
Cc: Melissa Free <Melissa.Free@noaa.gov>, Peter Thorne <firstname.lastname@example.org>, Dian Seidel <email@example.com>, Tom Wigley <firstname.lastname@example.org>, Karl Taylor <email@example.com>, Thomas R Karl <Thomas.R.Karl@noaa.gov>, Carl Mears <firstname.lastname@example.org>, "David C. Bader" <email@example.com>, "'Francis W. Zwiers'" <firstname.lastname@example.org>, Frank Wentz <email@example.com>, Leopold Haimberger <firstname.lastname@example.org>, "Michael C. MacCracken" <email@example.com>, Phil Jones <firstname.lastname@example.org>, Steve Sherwood <Steven.Sherwood@yale.edu>, Steve Klein <email@example.com>, Susan Solomon <Susan.Solomon@noaa.gov>, Tim Osborn <firstname.lastname@example.org>, Gavin Schmidt <email@example.com>, "Hack, James J." <firstname.lastname@example.org>
Dear Ben and All,
After returning to the office earlier in the week after a couple of weeks
off during the holidays, I had the best of intentions of responding to
some of the earlier emails. Unfortunately it has taken the better part of
the week for me to shovel out my avalanche of email. [This has a lot to
do with the remarkable progress that has been made -- kudos to Ben and others
who have made this possible]. At this point I'd like to add my 2 cents worth
(although with the declining dollar I'm not sure it's worth that much any more)
on several issues, some from earlier email and some from the last day or two.
I had given some thought as to where this article might be submitted.
Although that issue has been settled (IJC) I'd like to add a few related
thoughts regarding the focus of the paper. I think Ben has brokered the
best possible deal, an expedited paper in IJC, that is not treated as a
comment. But I'm a little confused as to whether our paper will be titled
"Comments on ... by Douglass et al." or whether we have a bit more latitude.
While I'm not suggesting anything beyond a short paper, it might be possible
to "spin" this in more general terms as a brief update, while at the same
time addressing Douglass et al. as part of this. We could begin in the
introduction by saying that this general topic has been much studied and
debated in the recent past [e.g. NRC (2000), the Science (2005) papers, and
CCSP (2006)] but that new developments since these works warrant revisiting
the issue. We could consider Douglass et al. as one of several new
developments. We could perhaps title the paper something like "Revisiting
temperature trends in the atmosphere". The main conclusion will be that, in
stark contrast to Douglass et al., the new evidence from the last couple of
years has strengthened the conclusion of CCSP (2006) that there is no
meaningful discrepancy between models and observations.
In an earlier email Ben suggested an outline for the paper:
1) Point out flaws in the statistical approach used by Douglass et al.
2) Show results from significance testing done properly.
3) Show a figure with different estimates of radiosonde temperature trends
illustrating the structural uncertainty.
4) Discuss complementary evidence supporting the finding that the tropical
lower troposphere has warmed over the satellite era.
I think this is fine but I'd like to suggest a couple of other items. First,
some mention could be made regarding the structural uncertainty in satellite
datasets. We could have 3a) for sondes and 3b) for satellite data. The
satellite issue could be handled in as briefly as a paragraph, or with a
bit more work and discussion a figure or table (with some trends). The main
point to get across is that it's not just UAH vs. RSS (with an implied edge
to UAH because its trends agree better with sondes) it's actually UAH vs
all others (RSS, UMD and Zou et al.). There are complications in adding UMD
and Zou et al. to the discussion, but these can be handled either
qualitatively or quantitatively. The complication with UMD is that it only
exists for T2, which has stratospheric influences (and UMD does not have a
corresponding measure for T4 which could be used to remove the stratospheric
effects). The complication with Zou et al. is that the data begin in 1987,
rather than 1979 (as for the other satellite products).
It would be possible to use the Fu method to remove the stratospheric
influences from UMD using T4 measures from either or both UAH and RSS. It
would be possible to directly compare trends from Zou et al. with UAH, RSS
& UMD for a time period starting in 1987. So, in theory we could include
some trend estimates from all 4 satellite datasets in apples vs. apples
comparisons. But perhaps this is more work than is warranted for this project.
Then at very least we can mention that in apples vs. apples comparisons made
in CCSP (2006) UMD showed more tropospheric warming than both UAH and RSS,
and in comparisons made by Zou et al. their dataset showed more warming than
both UAH and RSS. Taken together this evidence leaves UAH as the "outlier"
compared to the other 3 datasets. Furthermore, better trend agreement between
UAH and some sonde data is not necessarily "good" since the sonde data in
question are likely to be afflicted with considerable spurious cooling biases.
The second item that I'd suggest be added to Ben's earlier outline (perhaps
as item 5) is a discussion of the issues that Susan raised in earlier emails.
The main point is that there is now some evidence that inadequacies in the
AR4 model formulations pertaining to the treatment of stratospheric ozone may
contribute to spurious cooling trends in the troposphere.
Regarding Ben's Fig. 1 -- this is a very nice graphical presentation of the
differences in methodology between the current work and Douglass et al.
However, I would suggest a cautionary statement to the effect that while error
bars are useful for illustrative purposes, the use of overlapping error bars
is not advocated for testing statistical significance between two variables
following Lanzante (2005).
Lanzante, J. R., 2005: A cautionary note on the use of error bars.
Journal of Climate, 18(17), 3699-3703.
This is also motivation for application of the two-sample test that Ben has
> So why is there a small positive bias in the empirically-determined
> rejection rates? Karl believes that the answer may be partly linked to
> the skewness of the empirically-determined rejection rate distributions.
[NB: this is in regard to Ben's Fig. 3 which shows that the rejection rate
in simulations using synthetic data appears to be slightly positively biased
compared to the nominal (expected) rate].
I would note that the distribution of rejection rates is like the distribution
of precipitation in that it is bounded by zero. A quick-and-dirty way to
explore this possibility using a "trick" used with precipitation data is to
apply a square root transformation to the rejection rates, average these, then
reverse transform the average. The square root transformation should yield
data that is more nearly Gaussian than the untransformed data.
> Figure 3: As Mike suggested, I've removed the legend from the interior
> of the Figure (it's now below the Figure), and have added arrows to
> indicate the theoretically-expected rejection rates for 5%, 10%, and
> 20% tests. As Dian suggested, I've changed the colors and thicknesses
> of the lines indicating results for the "paired trends". Visually,
> attention is now drawn to the results we think are most reasonable -
> the results for the paired trend tests with standard errors adjusted
> for temporal autocorrelation effects.
I actually liked the earlier version of Fig. 3 better in some regards.
The labeling is now rather busy. How about going back to dotted, thin
and thick curves to designate 5%, 10%, and 20%, and also placing labels
(5%/10%/20%) on or near each curve? Then using just three colors to
differentiate between Douglass, paired/no_SE_adj, and paired/with_SE_adj
it will only be necessary to have 3 legends: one for each of the three colors.
This would eliminate most of the legends.
Another topic of recent discussion is what radiosonde datasets to include
in the trend figure. My own personal preference would be to have all available
datasets shown in the figure. However, I would defer to the individual
dataset creators if they feel uncomfortable about including sets that are
not yet published.
Peter also raised the point about trends being derived differently for
different datasets. To the extent possible it would be desirable to
have things done the same for all datasets. This is especially true for
using the same time period and the same method to perform the regression.
Another issue is the conversion of station data to area-averaged data. It's
usually easier to insure consistency if one person computes the trends
from the raw data using the same procedures rather than having several
people provide the trend estimates.
Karl Taylor wrote:
> The lower panel <of Figure 2> ...
> ... By chance the mean of the results is displaced negatively ...
> ... I contend that the likelihood of getting a difference of x is equal
> to the likelihood of getting a difference of -x ...
> ... I would like to see each difference plotted twice, once with a positive
> sign and again with a negative sign ...
> ... One of the unfortunate problems with the asymmetry of the current figure
> is that to a casual reader it might suggest a consistency between the
> intra-ensemble distributions and the model-obs distributions that is not real
> Ben and I have already discussed this point, and I think we're both
> still a bit unsure on what's the best thing to do here. Perhaps others
> can provide convincing arguments for keeping the figure as is or making
> it symmetric as I suggest.
I agree with Karl in regard to both his concern for misinterpretation as
well as his suggested solution. In the limit as N goes to infinity we
expect the distribution to be symmetric since we're comparing the model data
with itself. The problem we are encountering is due to finite sample effects.
For simplicity Ben used a limited number of unique combinations -- using
full bootstrapping the problem should go away. Karl's suggestion seems like
a simple and effective way around the problem.
Karl Taylor wrote:
> It would appear that if we believe FGOALS or MIROC, then the
> differences between many of the model runs and obs are not likely to be
> due to chance alone, but indicate a real discrepancy ... This would seem
> to indicate that our conclusion depends on which model ensembles we have
> most confidence in.
Given the tiny sample sizes, I'm not sure one can make any meaningful
statements regarding differences between models, particularly with regard to
some measure of variability such as is implied by the width of a distribution.
This raises another issue regarding Fig. 2 -- why show the results separately
for each model? This does not seem to be relevant to this project. Our
objective is to show that the models as a collection are not inconsistent
with the observations -- not that any particular model is more or less
consistent with the observations. Furthermore showing results for different
models tempts the reader to make such comparisons. Why not just aggregate the
results over all models and produce a histogram? This would also simplify