To: Tim Osborn <t.osborn@uea.ac.uk>

Subject: Re: N(eff) and practicality

Date: Tue, 24 Jul 2001 08:49:14 -0400

Cc: Phil Jones <p.jones@uea.ac.uk>, Keith Briffa <k.briffa@uea.ac.uk>

Hi Tim,

Thanks for the remarks. We can certainly spend some time talking through

some of the points raised. I guess I am still finding it difficult to

believe that an rbar of 0.05 has any operational significance in estimating

Neff. It is kind of like doing correlations between tree rings and climate:

a correlation of 0.10 may be statistically significant, but have no

practical value at all for reconstruction. The same goes for an rbar of

0.05 in my mind. I agree that what I suggested (i.e. testing the individual

correlations for significance and only using those above the some

significance level for estimating rbar) is somewhat ad hoc and not

theoretically pleasing. However, it is also true that correlations below

the chosen significance threshold are "not significantly different from

zero" and could be ignored in principle, just as we would do in testing

variables for entry into a regression model. This would clearly muddy (a

nice choice of words!) the rbar waters, I admit.

In terms of the problem I am working on (computing bootstrap confidence

limits on annual values of 1205 RCS-detrended tree-ring series from 14

sites), it is hard to know what to do. Certainly, using Neff will result in

almost none of the annual means being statistically significant over the

past 1200 years. I don't believe that this is "true". Other highly

conservative methods of testing significance result in a very high

frequency of similarly negative results, i.e. the test of significance in

spectral analysis that takes into account the multiplicity effect of

testing all frequencies in an a posteriori way (see Mitchell et al. 1966,

Climatic Change, pg. 41). If you use this correction, virtually no

"significant" band-limited signals will ever be identified in

paleoclimatological spectra. So, this test has very low statistical power.

I think that this is the crux issue: Type-1 vs. Type-2 error in statistical

hypothesis testing. The Neff correction greatly increases the probability

of Type-2 error, while virtually eliminating Type-1 error. So, truth or

dare.

Consider one last "thought experiment". Suppose you came to Earth from

another planet to study its climate. You put out 1,000 randomly distributed

recording thermometers and measure daily temperatures for 1 Earth year. You

then pick up the thermometers and return to your planet where you estimate

the mean annual temperature of the Earth for that one year. How many

degrees of freedom do you have? Presumably, 999. Now, suppose that you

leave those same recording thermometers in place for 20 years and calculate

20 annual means. From these 20-year records, you also calculate an rbar of

0.10. How many degrees of freedom per year do you have now? 999 or 9.9?

What has changed? Certainly not the observation network. Does this mean

that we can just as accurately measure the Earth's mean annual temperature

with only 10 randomly placed thermometers if they provide temperature

records with an rbar of 0.00 over a 20 year period? I wouldn't bet on it,

but your theory implies it to be so. Surely, one would have more confidence

(i.e. smaller confidence intervals) in mean annual tempertures estimated

from a 1000-station network.

Cheers,

Ed

>Ed,

>

>re. your recent questions about Neff and rbar etc...

>

>I've thought a bit about these kind of questions over the past few years,

>but have never completely got my head around it all in a satisfactory way.

>I agree with what Phil said in his reply to you. Also, your idea of

>subsamping 40% of the cores at a time sounds reasonable, though I don't

>think it would be possible to write a very elegant statistical

>justification! Anyway, I just wanted to add a couple of points to what

>Phil said:

>

>(1) Even for very low rbar, the formula certainly works for

>idealised/synthetic cases (i.e. with similar standard deviations and

>inter-series correlations etc.). For example, I just generated 1000 random

>time series (each 500 elements long) with a very weak common signal,

>resulting in rbar=0.047. n=1000 was the closest I could get to n=infinity

>without waiting for ages for the correlation matrix to be computed! The

>formula:

>

>neff = n / ( 1 + [n-1]rbar )

>

>which reduces to neff = 1 / rbar for n=infinity gives neff = 20.83. For

>such a low rbar, neff seems rather few? The mean of the variances of the

>1000 series was 1.04677. If I took the "global-mean" timeseries (i.e. the

>mean of the 1000 series, then it's variance was 0.05041. The ratio of

>these variances is 20.77 - almost the same as neff! If our expectation

>that neff should be higher than 20.83 was true, then the variance of the

>mean series should have been much lower than it was. It should be easy to

>try out similar synthetic tests with various options (e.g. shorter time

>series, sets of series with differing variances, subsets with higher common

>signal (within-site) combined with subsets with weaker common signal

>(distant sites) etc.) to test the formula further.

>

>(2) I agree that rbar is computed from sample correlations rather than true

>(population) correlations.

>(a) For short overlaps, the individual correlations will rarely be

>significant. But the true correlations could be higher as well as lower,

>so rbar could be an underestimate and neff could be an overestimate! Maybe

>you have even fewer than 20 degrees of freedom!

>(b) I did wonder whether the sample rbar might be a biased estimate of the

>population rbar, given that the uncertainty ranges surrounding individual

>correlations are asymmetric (with a wider range on the lower side than the

>higher side). But I've checked this out with synthetic data and the rbar

>computed from short samples is uncertain but not biased.

>(c) Just because rbar is only 0.05 does not mean that you need series 1500

>elements long to be significant - that would be the case for testing a

>single correlation coefficient. But rbar is the mean of many coefficients

>(not all independent though!) so it is much easier to obtain significance.

>Not sure how you'd test for this theoretically, but a Monte Carlo test

>would work, given some assumptions about the core data. For 100 cores,

>each just 20 years long, a quick Monte Carlo test indicates that an rbar of

>0.05 is indeed significant - therefore rbar=0.05 in your case with > 100

>cores, many of which will be > 20 years long, should certainly be significant.

>

>Looking forward to your visit! We can discuss this some more.

>

>Tim

>

>

>Dr Timothy J Osborn | phone: +44 1603 592089

>Senior Research Associate | fax: +44 1603 507784

>Climatic Research Unit | e-mail: t.osborn@uea.ac.uk

>School of Environmental Sciences | web-site:

>University of East Anglia __________| http://www.cru.uea.ac.uk/~timo/

>Norwich NR4 7TJ | sunclock:

>UK | http://www.cru.uea.ac.uk/~timo/sunclock.htm

==================================

Dr. Edward R. Cook

Doherty Senior Scholar

Tree-Ring Laboratory

Lamont-Doherty Earth Observatory

Palisades, New York 10964 USA

Phone: 1-845-365-8618

Fax: 1-845-365-8152

Email: drdendro@ldeo.columbia.edu

==================================

## No comments:

## Post a Comment