To: "Thomas.R.Karl" <Thomas.R.Karl@noaa.gov>

Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]

Date: Fri, 14 Dec 2007 14:31:15 -0800

Reply-to: santer1@llnl.gov

Cc: carl mears <mears@remss.com>, SHERWOOD Steven <steven.sherwood@yale.edu>, Tom Wigley <wigley@cgd.ucar.edu>, Frank Wentz <frank.wentz@remss.com>, "'Philip D. Jones'" <p.jones@uea.ac.uk>, Karl Taylor <taylor13@llnl.gov>, Steve Klein <klein21@mail.llnl.gov>, John Lanzante <John.Lanzante@noaa.gov>, "Thorne, Peter" <peter.thorne@metoffice.gov.uk>, "'Dian J. Seidel'" <dian.seidel@noaa.gov>, Melissa Free <Melissa.Free@noaa.gov>, Leopold Haimberger <leopold.haimberger@univie.ac.at>, "'Francis W. Zwiers'" <francis.zwiers@ec.gc.ca>, "Michael C. MacCracken" <mmaccrac@comcast.net>, Tim Osborn <t.osborn@uea.ac.uk>, "David C. Bader" <bader2@llnl.gov>, 'Susan Solomon' <ssolomon@al.noaa.gov>

<x-flowed>

Dear Tom,

As promised, I've now repeated all of the significance testing involving

model-versus-observed trend differences, but this time using

spatially-averaged T2 and T2LT changes that are not "masked out" over

tropical land areas. As I mentioned this morning, the use of non-masked

data facilitates a direct comparison with Douglass et al.

The results for combined changes over tropical land and ocean are very

similar to those I sent out yesterday, which were for T2 and T2LT

changes over tropical oceans only:

COMBINED LAND/OCEAN RESULTS (WITH STANDARD ERRORS ADJUSTED FOR TEMPORAL

AUTOCORRELATION EFFECTS; SPATIAL AVERAGES OVER 20N-20S; ANALYSIS PERIOD

1979 TO 1999)

T2LT tests, RSS observational data: 0 out of 49 model-versus-observed

trend differences are significant at the 5% level.

T2LT tests, UAH observational data: 1 out of 49 model-versus-observed

trend differences are significant at the 5% level.

T2 tests, RSS observational data: 1 out of 49 model-versus-observed

trend differences are significant at the 5% level.

T2 tests, UAH observational data: 1 out of 49 model-versus-observed

trend differences are significant at the 5% level.

So our conclusion - that model tropical T2 and T2LT trends are, in

virtually all realizations and models, not significantly different from

either RSS or UAH trends - is not sensitive to whether we do the

significance testing with "ocean only" or combined "land+ocean"

temperature changes.

With best regards, and happy holidays to all!

Ben

Thomas.R.Karl wrote:

> Ben,

>

> This is very informative. One question I raise is whether the results

> would have been at all different if you had not masked the land. I

> doubt it, but it would be nice to know.

>

> Tom

>

> Ben Santer said the following on 12/13/2007 9:58 PM:

>> Dear folks,

>>

>> I've been doing some calculations to address one of the statistical

>> issues raised by the Douglass et al. paper in the International

>> Journal of Climatology. Here are some of my results.

>>

>> Recall that Douglass et al. calculated synthetic T2LT and T2

>> temperatures from the CMIP-3 archive of 20th century simulations

>> ("20c3m" runs). They used a total of 67 20c3m realizations, performed

>> with 22 different models. In calculating the statistical uncertainty

>> of the model trends, they introduced sigma{SE}, an "estimate of the

>> uncertainty of the mean of the predictions of the trends". They defined

>> sigma{SE} as follows:

>>

>> sigma{SE} = sigma / sqrt(N - 1), where

>>

>> "N = 22 is the number of independent models".

>>

>> As we've discussed in our previous correspondence, this definition has

>> serious problems (see comments from Carl and Steve below), and allows

>> Douglass et al. to reach the erroneous conclusion that modeled T2LT

>> and T2 trends are significantly different from the observed T2LT and

>> T2 trends in both the RSS and UAH datasets. This comparison of

>> simulated and observed T2LT and T2 trends is given in Table III of

>> Douglass et al.

>> [As an amusing aside, I note that the RSS datasets are referred to as

>> "RSS" in this table, while UAH results are designated as "MSU". I

>> guess there's only one true "MSU" dataset...]

>>

>> I decided to take a quick look at the issue of the statistical

>> significance of differences between simulated and observed

>> tropospheric temperature trends. My first cut at this "quick look"

>> involves only UAH and RSS observational data - I have not yet done any

>> tests with radiosonde datas, UMD T2 data, or satellite results from

>> Zou et al.

>>

>> I operated on the same 49 realizations of the 20c3m experiment that we

>> used in Chapter 5 of CCSP 1.1. As in our previous work, all model

>> results are synthetic T2LT and T2 temperatures that I calculated using

>> a static weighting function approach. I have not yet implemented

>> Carl's more sophisticated method of estimating synthetic MSU

>> temperatures from model data (which accounts for effects of topography

>> and land/ocean differences). However, for the current application, the

>> simple static weighting function approach is more than adequate, since

>> we are focusing on T2LT and T2 changes over tropical oceans only - so

>> topographic and land-ocean differences are unimportant. Note that I

>> still need to calculate synthetic MSU temperatures from about 18-20

>> 20c3m realizations which were not in the CMIP-3 database at the time

>> we were working on the CCSP report. For the full response to Douglass

>> et al., we should use the same 67 20c3m realizations that they employed.

>>

>> For each of the 49 realizations that I processed, I first masked out

>> all tropical land areas, and then calculated the spatial averages of

>> monthly-mean, gridded T2LT and T2 data over tropical oceans (20N-20S).

>> All model and observational results are for the common 252-month

>> period from January 1979 to December 1999 - the longest period of

>> overlap between the RSS and UAH MSU data and the bulk of the 20c3m

>> runs. The simulated trends given by Douglass et al. are calculated

>> over the same 1979 to 1999 period; however, they use a longer period

>> (1979 to 2004) for calculating observational trends - so there is an

>> inconsistency between their model and observational analysis periods,

>> which they do not explain. This difference in analysis periods is a

>> little puzzling given that we are dealing with relatively short

>> observational record lengths, resulting in some sensitivity to

>> end-point effects.

>>

>> I then calculated anomalies of the spatially-averaged T2LT and T2 data

>> (w.r.t. climatological monthly-means over 1979-1999), and fit

>> least-squares linear trends to model and observational time series.

>> The standard errors of the trends were adjusted for temporal

>> autocorrelation of the regression residuals, as described in Santer et

>> al. (2000) ["Statistical significance of trends and trend differences

>> in layer-average atmospheric temperature time series"; JGR 105,

>> 7337-7356.]

>>

>> Consider first panel A of the attached plot. This shows the simulated

>> and observed T2LT trends over 1979 to 1999 (again, over 20N-20S,

>> oceans only) with their adjusted 1-sigma confidence intervals). For

>> the UAH and RSS data, it was possible to check against the adjusted

>> confidence intervals independently calculated by Dian during the

>> course of work on the CCSP report. Our adjusted confidence intervals

>> are in good agreement. The grey shaded envelope in panel A denotes the

>> 1-sigma standard error for the RSS T2LT trend.

>>

>> There are 49 pairs of UAH-minus-model trend differences and 49 pairs

>> of RSS-minus-model trend differences. We can therefore test - for each

>> model and each 20c3m realization - whether there is a statistically

>> significant difference between the observed and simulated trends.

>>

>> Let bx and by represent any single pair of modeled and observed

>> trends, with adjusted standard errors s{bx} and s{by}. As in our

>> previous work (and as in related work by John Lanzante), we define the

>> normalized trend difference d as:

>>

>> d = (bx - by) / sqrt[ (s{bx})**2 + (s{by})**2 ]

>>

>> Under the assumption that d is normally distributed, values of d >

>> +1.96 or < -1.96 indicate observed-minus-model trend differences that

>> are significant at the 5% level. We are performing a two-tailed test

>> here, since we have no information a priori about the "direction" of

>> the model trend (i.e., whether we expect the simulated trend to be

>> significantly larger or smaller than observed).

>>

>> Panel c shows values of the normalized trend difference for T2LT trends.

>> the grey shaded area spans the range +1.96 to -1.96, and identifies

>> the region where we fail to reject the null hypothesis (H0) of no

>> significant difference between observed and simulated trends.

>>

>> Consider the solid symbols first, which give results for tests

>> involving RSS data. We would reject H0 in only one out of 49 cases

>> (for the CCCma-CGCM3.1(T47) model). The open symbols indicate results

>> for tests involving UAH data. Somewhat surprisingly, we get the same

>> qualitative outcome that we obtained for tests involving RSS data:

>> only one of the UAH-model trend pairs yields a difference that is

>> statistically significant at the 5% level.

>>

>> Panels b and d provide results for T2 trends. Results are very similar

>> to those achieved with T2LT trends. Irrespective of whether RSS or UAH

>> T2 data are used, significant trend differences occur in only one of

>> 49 cases.

>>

>> Bottom line: Douglass et al. claim that "In all cases UAH and RSS

>> satellite trends are inconsistent with model trends." (page 6, lines

>> 61-62). This claim is categorically wrong. In fact, based on our

>> results, one could justifiably claim that THERE IS ONLY ONE CASE in

>> which model T2LT and T2 trends are inconsistent with UAH and RSS

>> results! These guys screwed up big time.

>>

>> SENSITIVITY TESTS

>>

>> QUESTION 1: Some of the model-data trend comparisons made by Douglass

>> et al. used temperatures averaged over 30N-30S rather than 20N-20S.

>> What happens if we repeat our simple trend significance analysis using

>> T2LT and T2 data averaged over ocean areas between 30N-30S?

>>

>> ANSWER 1: Very little. The results described above for oceans areas

>> between 20N-20S are virtually unchanged.

>>

>> QUESTION 2: Even though it's clearly inappropriate to estimate the

>> standard errors of the linear trends WITHOUT accounting for temporal

>> autocorrelation effects (the 252 time sample are clearly not

>> independent; effective sample sizes typically range from 6 to 56),

>> someone is bound to ask what the outcome is when one repeats the

>> paired trend tests with non-adjusted standard errors. So here are the

>> results:

>>

>> T2LT tests, RSS observational data: 19 out of 49 trend differences are

>> significant at the 5% level.

>> T2LT tests, UAH observational data: 34 out of 49 trend differences are

>> significant at the 5% level.

>>

>> T2 tests, RSS observational data: 16 out of 49 trend differences are

>> significant at the 5% level.

>> T2 tests, UAH observational data: 35 out of 49 trend differences are

>> significant at the 5% level.

>>

>> So even under the naive (and incorrect) assumption that each model and

>> observational time series contains 252 independent time samples, we

>> STILL find no support for Douglass et al.'s assertion that: "In all

>> cases UAH and RSS satellite trends are inconsistent with model trends."

>> Q.E.D.

>>

>> If Leo is agreeable, I'm hopeful that we'll be able to perform a

>> similar trend comparison using synthetic MSU T2LT and T2 temperatures

>> calculated from the RAOBCORE radiosonde data - all versions, not just

>> v1.2!

>>

>> As you can see from the email list, I've expanded our "focus group" a

>> little bit, since a number of you have written to me about this issue.

>>

>> I am leaving for Miami on Monday, Dec. 17th. My Mom is having cataract

>> surgery, and I'd like to be around to provide her with moral and

>> practical support. I'm not exactly sure when I'll be returning to

>> PCMDI - although I hope I won't be gone longer than a week. As soon as

>> I get back, I'll try to make some more progress with this stuff. Any

>> suggestions or comments on what I've done so far would be greatly

>> appreciated. And for the time being, I think we should not alert

>> Douglass et al. to our results.

>>

>> With best regards, and happy holidays! May all your "Singers" be carol

>> singers, and not of the S. Fred variety...

>>

>> Ben

>>

>> (P.S.: I noticed one unfortunate typo in Table II of Douglass et al.

>> The MIROC3.2 (medres) model is referred to as "MIROC3.2_Merdes"....)

>>

>> carl mears wrote:

>>> Hi Steve

>>>

>>> I'd say it's the equivalent of rolling a 6-sided die a hundred times,

>>> and

>>> finding a mean value of ~3.5 and a standard deviation of ~1.7, and

>>> calculating the standard error of the mean to be ~0.17 (so far so

>>> good). An then rolling the die one more time, getting a 2, and

>>> claiming that the die is no longer 6 sided because the new measurement

>>> is more than 2 standard errors from the mean.

>>>

>>> In my view, this problem trumps the other problems in the paper.

>>> I can't believe Douglas is a fellow of the American Physical Society.

>>>

>>> -Carl

>>>

>>>

>>> At 02:07 AM 12/6/2007, you wrote:

>>>> If I understand correctly, what Douglass et al. did makes the

>>>> stronger assumption that unforced variability is *insignificant*.

>>>> Their statistical test is logically equivalent to falsifying a

>>>> climate model because it did not consistently predict a particular

>>>> storm on a particular day two years from now.

>>>

>>>

>>> Dr. Carl Mears

>>> Remote Sensing Systems

>>> 438 First Street, Suite 200, Santa Rosa, CA 95401

>>> mears@remss.com

>>> 707-545-2904 x21

>>> 707-545-2906 (fax))

>>

>>

>

> --

>

> *Dr. Thomas R. Karl, L.H.D.*

>

> */Director/*//

>

> NOAA�s National Climatic Data Center

>

> Veach-Baley Federal Building

>

> 151 Patton Avenue

>

> Asheville, NC 28801-5001

>

> Tel: (828) 271-4476

>

> Fax: (828) 271-4246

>

> Thomas.R.Karl@noaa.gov <mailto:Thomas.R.Karl@noaa.gov>

>

--

----------------------------------------------------------------------------

Benjamin D. Santer

Program for Climate Model Diagnosis and Intercomparison

Lawrence Livermore National Laboratory

P.O. Box 808, Mail Stop L-103

Livermore, CA 94550, U.S.A.

Tel: (925) 422-2486

FAX: (925) 422-7675

email: santer1@llnl.gov

----------------------------------------------------------------------------

</x-flowed>

## No comments:

## Post a Comment