Alleged CRU Emails - 25 results below


The below are part of a series of alleged emails from the Climate Research Unit at the University of East Anglia, released on 20 November 2009.

Browse by 10 | 25 | 50 100

Original Filename: 1197507092.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: Tim Osborn <t.osborn@xxxxxxxxx.xxx>
Subject: Re: Douglass paper
Date: Wed, 12 Dec 2007 19:51:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx
Cc: Phil Jones <p.jones@xxxxxxxxx.xxx>, Keith Briffa <k.briffa@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>

<x-flowed>
Dear Tim,

Thanks for the "heads up". As Phil mentioned, I was already aware of
this. The Douglass et al. paper was rejected twice before it was finally
accepted by IJC. I think this paper is a real embarrassment for the IJC.
It has serious scientific flaws. I'm already working on a response.

Phil can tell you about some of the other sordid details of Douglass et
al. These guys ignored information from radiosonde datasets that did not
support their "models are wrong" argument (even though they had these
datasets in their possession). Pretty deplorable behaviour...

Douglass is the guy who famously concluded (after examining the
temperature response to Pinatubo) that the climate system has negative
sensitivity. Amazingly, he managed to publish that crap in GRL. Christy
sure does manage to pick some brilliant scientific collaborators...

With best regards,

Ben

Tim Osborn wrote:
> Hi Ben,
>
> I guess it's likely that you're aware of the Douglass paper that's just
> come out in IJC, but in case you aren't then a reprint is attached.
> They are somewhat critical of your 2005 paper, though I recall that some
> (most?) of Douglass' previous papers -- and papers that he's tried to
> get through the review process -- appear to have serious problems.
>
> cc Phil & Keith for your interest too!
>
> Cheers
>
> Tim
> Dr Timothy J Osborn, Academic Fellow
> Climatic Research Unit
> School of Environmental Sciences
> University of East Anglia
> Norwich NR4 7TJ, UK
>
> e-mail: t.osborn@xxxxxxxxx.xxx
> phone: xxx xxxx xxxx
> fax: xxx xxxx xxxx
> web: http://www.cru.uea.ac.uk/~timo/
> sunclock: http://www.cru.uea.ac.uk/~timo/sunclock.htm
>


--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------
</x-flowed>

Original Filename: 1197590292.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: carl mears <mears@xxxxxxxxx.xxx>
Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Thu, 13 Dec 2007 18:58:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx
Cc: SHERWOOD Steven <steven.sherwood@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>

<x-flowed>
Dear folks,

I've been doing some calculations to address one of the statistical
issues raised by the Douglass et al. paper in the International Journal
of Climatology. Here are some of my results.

Recall that Douglass et al. calculated synthetic T2LT and T2
temperatures from the CMIP-3 archive of 20th century simulations
("20c3m" runs). They used a total of 67 20c3m realizations, performed
with 22 different models. In calculating the statistical uncertainty of
the model trends, they introduced sigma{SE}, an "estimate of the
uncertainty of the mean of the predictions of the trends". They defined
sigma{SE} as follows:

sigma{SE} = sigma / sqrt(N - 1), where

"N = 22 is the number of independent models".

As we've discussed in our previous correspondence, this definition has
serious problems (see comments from Carl and Steve below), and allows
Douglass et al. to reach the erroneous conclusion that modeled T2LT and
T2 trends are significantly different from the observed T2LT and T2
trends in both the RSS and UAH datasets. This comparison of simulated
and observed T2LT and T2 trends is given in Table III of Douglass et al.
[As an amusing aside, I note that the RSS datasets are referred to as
"RSS" in this table, while UAH results are designated as "MSU". I guess
there's only one true "MSU" dataset...]

I decided to take a quick look at the issue of the statistical
significance of differences between simulated and observed tropospheric
temperature trends. My first cut at this "quick look" involves only UAH
and RSS observational data - I have not yet done any tests with
radiosonde datas, UMD T2 data, or satellite results from Zou et al.

I operated on the same 49 realizations of the 20c3m experiment that we
used in Chapter 5 of CCSP 1.1. As in our previous work, all model
results are synthetic T2LT and T2 temperatures that I calculated using a
static weighting function approach. I have not yet implemented Carl's
more sophisticated method of estimating synthetic MSU temperatures from
model data (which accounts for effects of topography and land/ocean
differences). However, for the current application, the simple static
weighting function approach is more than adequate, since we are focusing
on T2LT and T2 changes over tropical oceans only - so topographic and
land-ocean differences are unimportant. Note that I still need to
calculate synthetic MSU temperatures from about xxx xxxx xxxxc3m realizations
which were not in the CMIP-3 database at the time we were working on the
CCSP report. For the full response to Douglass et al., we should use the
same 67 20c3m realizations that they employed.

For each of the 49 realizations that I processed, I first masked out all
tropical land areas, and then calculated the spatial averages of
monthly-mean, gridded T2LT and T2 data over tropical oceans (20N-20S).
All model and observational results are for the common 252-month period
from January 1979 to December 1999 - the longest period of overlap
between the RSS and UAH MSU data and the bulk of the 20c3m runs. The
simulated trends given by Douglass et al. are calculated over the same
1979 to 1999 period; however, they use a longer period (1979 to 2004)
for calculating observational trends - so there is an inconsistency
between their model and observational analysis periods, which they do
not explain. This difference in analysis periods is a little puzzling
given that we are dealing with relatively short observational record
lengths, resulting in some sensitivity to end-point effects.

I then calculated anomalies of the spatially-averaged T2LT and T2 data
(w.r.t. climatological monthly-means over 1xxx xxxx xxxx), and fit
least-squares linear trends to model and observational time series. The
standard errors of the trends were adjusted for temporal autocorrelation
of the regression residuals, as described in Santer et al. (2000)
["Statistical significance of trends and trend differences in
layer-average atmospheric temperature time series"; JGR 105, 7xxx xxxx xxxx.]

Consider first panel A of the attached plot. This shows the simulated
and observed T2LT trends over 1979 to 1999 (again, over 20N-20S, oceans
only) with their adjusted 1-sigma confidence intervals). For the UAH and
RSS data, it was possible to check against the adjusted confidence
intervals independently calculated by Dian during the course of work on
the CCSP report. Our adjusted confidence intervals are in good
agreement. The grey shaded envelope in panel A denotes the 1-sigma
standard error for the RSS T2LT trend.

There are 49 pairs of UAH-minus-model trend differences and 49 pairs of
RSS-minus-model trend differences. We can therefore test - for each
model and each 20c3m realization - whether there is a statistically
significant difference between the observed and simulated trends.

Let bx and by represent any single pair of modeled and observed trends,
with adjusted standard errors s{bx} and s{by}. As in our previous work
(and as in related work by John Lanzante), we define the normalized
trend difference d as:

d = (bx - by) / sqrt[ (s{bx})**2 + (s{by})**2 ]

Under the assumption that d is normally distributed, values of d > +1.96
or < -1.96 indicate observed-minus-model trend differences that are
significant at the 5% level. We are performing a two-tailed test here,
since we have no information a priori about the "direction" of the model
trend (i.e., whether we expect the simulated trend to be significantly
larger or smaller than observed).

Panel c shows values of the normalized trend difference for T2LT trends.
the grey shaded area spans the range +1.96 to -1.96, and identifies the
region where we fail to reject the null hypothesis (H0) of no
significant difference between observed and simulated trends.

Consider the solid symbols first, which give results for tests involving
RSS data. We would reject H0 in only one out of 49 cases (for the
CCCma-CGCM3.1(T47) model). The open symbols indicate results for tests
involving UAH data. Somewhat surprisingly, we get the same qualitative
outcome that we obtained for tests involving RSS data: only one of the
UAH-model trend pairs yields a difference that is statistically
significant at the 5% level.

Panels b and d provide results for T2 trends. Results are very similar
to those achieved with T2LT trends. Irrespective of whether RSS or UAH
T2 data are used, significant trend differences occur in only one of 49
cases.

Bottom line: Douglass et al. claim that "In all cases UAH and RSS
satellite trends are inconsistent with model trends." (page 6, lines
61-62). This claim is categorically wrong. In fact, based on our
results, one could justifiably claim that THERE IS ONLY ONE CASE in
which model T2LT and T2 trends are inconsistent with UAH and RSS
results! These guys screwed up big time.

SENSITIVITY TESTS

QUESTION 1: Some of the model-data trend comparisons made by Douglass et
al. used temperatures averaged over 30N-30S rather than 20N-20S. What
happens if we repeat our simple trend significance analysis using T2LT
and T2 data averaged over ocean areas between 30N-30S?

ANSWER 1: Very little. The results described above for oceans areas
between 20N-20S are virtually unchanged.

QUESTION 2: Even though it's clearly inappropriate to estimate the
standard errors of the linear trends WITHOUT accounting for temporal
autocorrelation effects (the 252 time sample are clearly not
independent; effective sample sizes typically range from 6 to 56),
someone is bound to ask what the outcome is when one repeats the paired
trend tests with non-adjusted standard errors. So here are the results:

T2LT tests, RSS observational data: 19 out of 49 trend differences are
significant at the 5% level.
T2LT tests, UAH observational data: 34 out of 49 trend differences are
significant at the 5% level.

T2 tests, RSS observational data: 16 out of 49 trend differences are
significant at the 5% level.
T2 tests, UAH observational data: 35 out of 49 trend differences are
significant at the 5% level.

So even under the naive (and incorrect) assumption that each model and
observational time series contains 252 independent time samples, we
STILL find no support for Douglass et al.'s assertion that: "In all
cases UAH and RSS satellite trends are inconsistent with model trends."
Q.E.D.

If Leo is agreeable, I'm hopeful that we'll be able to perform a similar
trend comparison using synthetic MSU T2LT and T2 temperatures calculated
from the RAOBCORE radiosonde data - all versions, not just v1.2!

As you can see from the email list, I've expanded our "focus group" a
little bit, since a number of you have written to me about this issue.

I am leaving for Miami on Monday, Dec. 17th. My Mom is having cataract
surgery, and I'd like to be around to provide her with moral and
practical support. I'm not exactly sure when I'll be returning to PCMDI
- although I hope I won't be gone longer than a week. As soon as I get
back, I'll try to make some more progress with this stuff. Any
suggestions or comments on what I've done so far would be greatly
appreciated. And for the time being, I think we should not alert
Douglass et al. to our results.

With best regards, and happy holidays! May all your "Singers" be carol
singers, and not of the S. Fred variety...

Ben

(P.S.: I noticed one unfortunate typo in Table II of Douglass et al. The
MIROC3.2 (medres) model is referred to as "MIROC3.2_Merdes"....)

carl mears wrote:
> Hi Steve
>
> I'd say it's the equivalent of rolling a 6-sided die a hundred times, and
> finding a mean value of ~3.5 and a standard deviation of ~1.7, and
> calculating the standard error of the mean to be ~0.17 (so far so
> good). An then rolling the die one more time, getting a 2, and
> claiming that the die is no longer 6 sided because the new measurement
> is more than 2 standard errors from the mean.
>
> In my view, this problem trumps the other problems in the paper.
> I can't believe Douglas is a fellow of the American Physical Society.
>
> -Carl
>
>
> At 02:07 AM 12/6/2007, you wrote:
>> If I understand correctly, what Douglass et al. did makes the stronger
>> assumption that unforced variability is *insignificant*. Their
>> statistical test is logically equivalent to falsifying a climate model
>> because it did not consistently predict a particular storm on a
>> particular day two years from now.
>
>
> Dr. Carl Mears
> Remote Sensing Systems
> 438 First Street, Suite 200, Santa Rosa, CA 95401
> mears@xxxxxxxxx.xxx
> xxx xxxx xxxxx21
> xxx xxxx xxxx(fax))


--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------


</x-flowed>

Attachment Converted: "c:eudoraattachdouglass_reply1.pdf"

Original Filename: 1197590293.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: carl mears <mears@xxxxxxxxx.xxx>
Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Thu, 13 Dec 2007 18:58:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx
Cc: SHERWOOD Steven <steven.sherwood@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>

<x-flowed>
Dear folks,

I've been doing some calculations to address one of the statistical
issues raised by the Douglass et al. paper in the International Journal
of Climatology. Here are some of my results.

Recall that Douglass et al. calculated synthetic T2LT and T2
temperatures from the CMIP-3 archive of 20th century simulations
("20c3m" runs). They used a total of 67 20c3m realizations, performed
with 22 different models. In calculating the statistical uncertainty of
the model trends, they introduced sigma{SE}, an "estimate of the
uncertainty of the mean of the predictions of the trends". They defined
sigma{SE} as follows:

sigma{SE} = sigma / sqrt(N - 1), where

"N = 22 is the number of independent models".

As we've discussed in our previous correspondence, this definition has
serious problems (see comments from Carl and Steve below), and allows
Douglass et al. to reach the erroneous conclusion that modeled T2LT and
T2 trends are significantly different from the observed T2LT and T2
trends in both the RSS and UAH datasets. This comparison of simulated
and observed T2LT and T2 trends is given in Table III of Douglass et al.
[As an amusing aside, I note that the RSS datasets are referred to as
"RSS" in this table, while UAH results are designated as "MSU". I guess
there's only one true "MSU" dataset...]

I decided to take a quick look at the issue of the statistical
significance of differences between simulated and observed tropospheric
temperature trends. My first cut at this "quick look" involves only UAH
and RSS observational data - I have not yet done any tests with
radiosonde datas, UMD T2 data, or satellite results from Zou et al.

I operated on the same 49 realizations of the 20c3m experiment that we
used in Chapter 5 of CCSP 1.1. As in our previous work, all model
results are synthetic T2LT and T2 temperatures that I calculated using a
static weighting function approach. I have not yet implemented Carl's
more sophisticated method of estimating synthetic MSU temperatures from
model data (which accounts for effects of topography and land/ocean
differences). However, for the current application, the simple static
weighting function approach is more than adequate, since we are focusing
on T2LT and T2 changes over tropical oceans only - so topographic and
land-ocean differences are unimportant. Note that I still need to
calculate synthetic MSU temperatures from about xxx xxxx xxxxc3m realizations
which were not in the CMIP-3 database at the time we were working on the
CCSP report. For the full response to Douglass et al., we should use the
same 67 20c3m realizations that they employed.

For each of the 49 realizations that I processed, I first masked out all
tropical land areas, and then calculated the spatial averages of
monthly-mean, gridded T2LT and T2 data over tropical oceans (20N-20S).
All model and observational results are for the common 252-month period
from January 1979 to December 1999 - the longest period of overlap
between the RSS and UAH MSU data and the bulk of the 20c3m runs. The
simulated trends given by Douglass et al. are calculated over the same
1979 to 1999 period; however, they use a longer period (1979 to 2004)
for calculating observational trends - so there is an inconsistency
between their model and observational analysis periods, which they do
not explain. This difference in analysis periods is a little puzzling
given that we are dealing with relatively short observational record
lengths, resulting in some sensitivity to end-point effects.

I then calculated anomalies of the spatially-averaged T2LT and T2 data
(w.r.t. climatological monthly-means over 1xxx xxxx xxxx), and fit
least-squares linear trends to model and observational time series. The
standard errors of the trends were adjusted for temporal autocorrelation
of the regression residuals, as described in Santer et al. (2000)
["Statistical significance of trends and trend differences in
layer-average atmospheric temperature time series"; JGR 105, 7xxx xxxx xxxx.]

Consider first panel A of the attached plot. This shows the simulated
and observed T2LT trends over 1979 to 1999 (again, over 20N-20S, oceans
only) with their adjusted 1-sigma confidence intervals). For the UAH and
RSS data, it was possible to check against the adjusted confidence
intervals independently calculated by Dian during the course of work on
the CCSP report. Our adjusted confidence intervals are in good
agreement. The grey shaded envelope in panel A denotes the 1-sigma
standard error for the RSS T2LT trend.

There are 49 pairs of UAH-minus-model trend differences and 49 pairs of
RSS-minus-model trend differences. We can therefore test - for each
model and each 20c3m realization - whether there is a statistically
significant difference between the observed and simulated trends.

Let bx and by represent any single pair of modeled and observed trends,
with adjusted standard errors s{bx} and s{by}. As in our previous work
(and as in related work by John Lanzante), we define the normalized
trend difference d as:

d = (bx - by) / sqrt[ (s{bx})**2 + (s{by})**2 ]

Under the assumption that d is normally distributed, values of d > +1.96
or < -1.96 indicate observed-minus-model trend differences that are
significant at the 5% level. We are performing a two-tailed test here,
since we have no information a priori about the "direction" of the model
trend (i.e., whether we expect the simulated trend to be significantly
larger or smaller than observed).

Panel c shows values of the normalized trend difference for T2LT trends.
the grey shaded area spans the range +1.96 to -1.96, and identifies the
region where we fail to reject the null hypothesis (H0) of no
significant difference between observed and simulated trends.

Consider the solid symbols first, which give results for tests involving
RSS data. We would reject H0 in only one out of 49 cases (for the
CCCma-CGCM3.1(T47) model). The open symbols indicate results for tests
involving UAH data. Somewhat surprisingly, we get the same qualitative
outcome that we obtained for tests involving RSS data: only one of the
UAH-model trend pairs yields a difference that is statistically
significant at the 5% level.

Panels b and d provide results for T2 trends. Results are very similar
to those achieved with T2LT trends. Irrespective of whether RSS or UAH
T2 data are used, significant trend differences occur in only one of 49
cases.

Bottom line: Douglass et al. claim that "In all cases UAH and RSS
satellite trends are inconsistent with model trends." (page 6, lines
61-62). This claim is categorically wrong. In fact, based on our
results, one could justifiably claim that THERE IS ONLY ONE CASE in
which model T2LT and T2 trends are inconsistent with UAH and RSS
results! These guys screwed up big time.

SENSITIVITY TESTS

QUESTION 1: Some of the model-data trend comparisons made by Douglass et
al. used temperatures averaged over 30N-30S rather than 20N-20S. What
happens if we repeat our simple trend significance analysis using T2LT
and T2 data averaged over ocean areas between 30N-30S?

ANSWER 1: Very little. The results described above for oceans areas
between 20N-20S are virtually unchanged.

QUESTION 2: Even though it's clearly inappropriate to estimate the
standard errors of the linear trends WITHOUT accounting for temporal
autocorrelation effects (the 252 time sample are clearly not
independent; effective sample sizes typically range from 6 to 56),
someone is bound to ask what the outcome is when one repeats the paired
trend tests with non-adjusted standard errors. So here are the results:

T2LT tests, RSS observational data: 19 out of 49 trend differences are
significant at the 5% level.
T2LT tests, UAH observational data: 34 out of 49 trend differences are
significant at the 5% level.

T2 tests, RSS observational data: 16 out of 49 trend differences are
significant at the 5% level.
T2 tests, UAH observational data: 35 out of 49 trend differences are
significant at the 5% level.

So even under the naive (and incorrect) assumption that each model and
observational time series contains 252 independent time samples, we
STILL find no support for Douglass et al.'s assertion that: "In all
cases UAH and RSS satellite trends are inconsistent with model trends."
Q.E.D.

If Leo is agreeable, I'm hopeful that we'll be able to perform a similar
trend comparison using synthetic MSU T2LT and T2 temperatures calculated
from the RAOBCORE radiosonde data - all versions, not just v1.2!

As you can see from the email list, I've expanded our "focus group" a
little bit, since a number of you have written to me about this issue.

I am leaving for Miami on Monday, Dec. 17th. My Mom is having cataract
surgery, and I'd like to be around to provide her with moral and
practical support. I'm not exactly sure when I'll be returning to PCMDI
- although I hope I won't be gone longer than a week. As soon as I get
back, I'll try to make some more progress with this stuff. Any
suggestions or comments on what I've done so far would be greatly
appreciated. And for the time being, I think we should not alert
Douglass et al. to our results.

With best regards, and happy holidays! May all your "Singers" be carol
singers, and not of the S. Fred variety...

Ben

(P.S.: I noticed one unfortunate typo in Table II of Douglass et al. The
MIROC3.2 (medres) model is referred to as "MIROC3.2_Merdes"....)

carl mears wrote:
> Hi Steve
>
> I'd say it's the equivalent of rolling a 6-sided die a hundred times, and
> finding a mean value of ~3.5 and a standard deviation of ~1.7, and
> calculating the standard error of the mean to be ~0.17 (so far so
> good). An then rolling the die one more time, getting a 2, and
> claiming that the die is no longer 6 sided because the new measurement
> is more than 2 standard errors from the mean.
>
> In my view, this problem trumps the other problems in the paper.
> I can't believe Douglas is a fellow of the American Physical Society.
>
> -Carl
>
>
> At 02:07 AM 12/6/2007, you wrote:
>> If I understand correctly, what Douglass et al. did makes the stronger
>> assumption that unforced variability is *insignificant*. Their
>> statistical test is logically equivalent to falsifying a climate model
>> because it did not consistently predict a particular storm on a
>> particular day two years from now.
>
>
> Dr. Carl Mears
> Remote Sensing Systems
> 438 First Street, Suite 200, Santa Rosa, CA 95401
> mears@xxxxxxxxx.xxx
> xxx xxxx xxxxx21
> xxx xxxx xxxx(fax))


--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------


</x-flowed>

Attachment Converted: "c:documents and settingstim osbornmy documentseudoraattachdouglass_reply1.pdf"

Original Filename: 1197660675.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: "Thomas.R.Karl" <Thomas.R.Karl@xxxxxxxxx.xxx>
Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Fri, 14 Dec 2007 14:31:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx
Cc: carl mears <mears@xxxxxxxxx.xxx>, SHERWOOD Steven <steven.sherwood@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>

<x-flowed>
Dear Tom,

As promised, I've now repeated all of the significance testing involving
model-versus-observed trend differences, but this time using
spatially-averaged T2 and T2LT changes that are not "masked out" over
tropical land areas. As I mentioned this morning, the use of non-masked
data facilitates a direct comparison with Douglass et al.

The results for combined changes over tropical land and ocean are very
similar to those I sent out yesterday, which were for T2 and T2LT
changes over tropical oceans only:

COMBINED LAND/OCEAN RESULTS (WITH STANDARD ERRORS ADJUSTED FOR TEMPORAL
AUTOCORRELATION EFFECTS; SPATIAL AVERAGES OVER 20N-20S; ANALYSIS PERIOD
1979 TO 1999)

T2LT tests, RSS observational data: 0 out of 49 model-versus-observed
trend differences are significant at the 5% level.
T2LT tests, UAH observational data: 1 out of 49 model-versus-observed
trend differences are significant at the 5% level.

T2 tests, RSS observational data: 1 out of 49 model-versus-observed
trend differences are significant at the 5% level.
T2 tests, UAH observational data: 1 out of 49 model-versus-observed
trend differences are significant at the 5% level.

So our conclusion - that model tropical T2 and T2LT trends are, in
virtually all realizations and models, not significantly different from
either RSS or UAH trends - is not sensitive to whether we do the
significance testing with "ocean only" or combined "land+ocean"
temperature changes.

With best regards, and happy holidays to all!

Ben

Thomas.R.Karl wrote:
> Ben,
>
> This is very informative. One question I raise is whether the results
> would have been at all different if you had not masked the land. I
> doubt it, but it would be nice to know.
>
> Tom
>
> Ben Santer said the following on 12/13/2007 9:58 PM:
>> Dear folks,
>>
>> I've been doing some calculations to address one of the statistical
>> issues raised by the Douglass et al. paper in the International
>> Journal of Climatology. Here are some of my results.
>>
>> Recall that Douglass et al. calculated synthetic T2LT and T2
>> temperatures from the CMIP-3 archive of 20th century simulations
>> ("20c3m" runs). They used a total of 67 20c3m realizations, performed
>> with 22 different models. In calculating the statistical uncertainty
>> of the model trends, they introduced sigma{SE}, an "estimate of the
>> uncertainty of the mean of the predictions of the trends". They defined
>> sigma{SE} as follows:
>>
>> sigma{SE} = sigma / sqrt(N - 1), where
>>
>> "N = 22 is the number of independent models".
>>
>> As we've discussed in our previous correspondence, this definition has
>> serious problems (see comments from Carl and Steve below), and allows
>> Douglass et al. to reach the erroneous conclusion that modeled T2LT
>> and T2 trends are significantly different from the observed T2LT and
>> T2 trends in both the RSS and UAH datasets. This comparison of
>> simulated and observed T2LT and T2 trends is given in Table III of
>> Douglass et al.
>> [As an amusing aside, I note that the RSS datasets are referred to as
>> "RSS" in this table, while UAH results are designated as "MSU". I
>> guess there's only one true "MSU" dataset...]
>>
>> I decided to take a quick look at the issue of the statistical
>> significance of differences between simulated and observed
>> tropospheric temperature trends. My first cut at this "quick look"
>> involves only UAH and RSS observational data - I have not yet done any
>> tests with radiosonde datas, UMD T2 data, or satellite results from
>> Zou et al.
>>
>> I operated on the same 49 realizations of the 20c3m experiment that we
>> used in Chapter 5 of CCSP 1.1. As in our previous work, all model
>> results are synthetic T2LT and T2 temperatures that I calculated using
>> a static weighting function approach. I have not yet implemented
>> Carl's more sophisticated method of estimating synthetic MSU
>> temperatures from model data (which accounts for effects of topography
>> and land/ocean differences). However, for the current application, the
>> simple static weighting function approach is more than adequate, since
>> we are focusing on T2LT and T2 changes over tropical oceans only - so
>> topographic and land-ocean differences are unimportant. Note that I
>> still need to calculate synthetic MSU temperatures from about xxx xxxx xxxx
>> 20c3m realizations which were not in the CMIP-3 database at the time
>> we were working on the CCSP report. For the full response to Douglass
>> et al., we should use the same 67 20c3m realizations that they employed.
>>
>> For each of the 49 realizations that I processed, I first masked out
>> all tropical land areas, and then calculated the spatial averages of
>> monthly-mean, gridded T2LT and T2 data over tropical oceans (20N-20S).
>> All model and observational results are for the common 252-month
>> period from January 1979 to December 1999 - the longest period of
>> overlap between the RSS and UAH MSU data and the bulk of the 20c3m
>> runs. The simulated trends given by Douglass et al. are calculated
>> over the same 1979 to 1999 period; however, they use a longer period
>> (1979 to 2004) for calculating observational trends - so there is an
>> inconsistency between their model and observational analysis periods,
>> which they do not explain. This difference in analysis periods is a
>> little puzzling given that we are dealing with relatively short
>> observational record lengths, resulting in some sensitivity to
>> end-point effects.
>>
>> I then calculated anomalies of the spatially-averaged T2LT and T2 data
>> (w.r.t. climatological monthly-means over 1xxx xxxx xxxx), and fit
>> least-squares linear trends to model and observational time series.
>> The standard errors of the trends were adjusted for temporal
>> autocorrelation of the regression residuals, as described in Santer et
>> al. (2000) ["Statistical significance of trends and trend differences
>> in layer-average atmospheric temperature time series"; JGR 105,
>> 7xxx xxxx xxxx.]
>>
>> Consider first panel A of the attached plot. This shows the simulated
>> and observed T2LT trends over 1979 to 1999 (again, over 20N-20S,
>> oceans only) with their adjusted 1-sigma confidence intervals). For
>> the UAH and RSS data, it was possible to check against the adjusted
>> confidence intervals independently calculated by Dian during the
>> course of work on the CCSP report. Our adjusted confidence intervals
>> are in good agreement. The grey shaded envelope in panel A denotes the
>> 1-sigma standard error for the RSS T2LT trend.
>>
>> There are 49 pairs of UAH-minus-model trend differences and 49 pairs
>> of RSS-minus-model trend differences. We can therefore test - for each
>> model and each 20c3m realization - whether there is a statistically
>> significant difference between the observed and simulated trends.
>>
>> Let bx and by represent any single pair of modeled and observed
>> trends, with adjusted standard errors s{bx} and s{by}. As in our
>> previous work (and as in related work by John Lanzante), we define the
>> normalized trend difference d as:
>>
>> d = (bx - by) / sqrt[ (s{bx})**2 + (s{by})**2 ]
>>
>> Under the assumption that d is normally distributed, values of d >
>> +1.96 or < -1.96 indicate observed-minus-model trend differences that
>> are significant at the 5% level. We are performing a two-tailed test
>> here, since we have no information a priori about the "direction" of
>> the model trend (i.e., whether we expect the simulated trend to be
>> significantly larger or smaller than observed).
>>
>> Panel c shows values of the normalized trend difference for T2LT trends.
>> the grey shaded area spans the range +1.96 to -1.96, and identifies
>> the region where we fail to reject the null hypothesis (H0) of no
>> significant difference between observed and simulated trends.
>>
>> Consider the solid symbols first, which give results for tests
>> involving RSS data. We would reject H0 in only one out of 49 cases
>> (for the CCCma-CGCM3.1(T47) model). The open symbols indicate results
>> for tests involving UAH data. Somewhat surprisingly, we get the same
>> qualitative outcome that we obtained for tests involving RSS data:
>> only one of the UAH-model trend pairs yields a difference that is
>> statistically significant at the 5% level.
>>
>> Panels b and d provide results for T2 trends. Results are very similar
>> to those achieved with T2LT trends. Irrespective of whether RSS or UAH
>> T2 data are used, significant trend differences occur in only one of
>> 49 cases.
>>
>> Bottom line: Douglass et al. claim that "In all cases UAH and RSS
>> satellite trends are inconsistent with model trends." (page 6, lines
>> 61-62). This claim is categorically wrong. In fact, based on our
>> results, one could justifiably claim that THERE IS ONLY ONE CASE in
>> which model T2LT and T2 trends are inconsistent with UAH and RSS
>> results! These guys screwed up big time.
>>
>> SENSITIVITY TESTS
>>
>> QUESTION 1: Some of the model-data trend comparisons made by Douglass
>> et al. used temperatures averaged over 30N-30S rather than 20N-20S.
>> What happens if we repeat our simple trend significance analysis using
>> T2LT and T2 data averaged over ocean areas between 30N-30S?
>>
>> ANSWER 1: Very little. The results described above for oceans areas
>> between 20N-20S are virtually unchanged.
>>
>> QUESTION 2: Even though it's clearly inappropriate to estimate the
>> standard errors of the linear trends WITHOUT accounting for temporal
>> autocorrelation effects (the 252 time sample are clearly not
>> independent; effective sample sizes typically range from 6 to 56),
>> someone is bound to ask what the outcome is when one repeats the
>> paired trend tests with non-adjusted standard errors. So here are the
>> results:
>>
>> T2LT tests, RSS observational data: 19 out of 49 trend differences are
>> significant at the 5% level.
>> T2LT tests, UAH observational data: 34 out of 49 trend differences are
>> significant at the 5% level.
>>
>> T2 tests, RSS observational data: 16 out of 49 trend differences are
>> significant at the 5% level.
>> T2 tests, UAH observational data: 35 out of 49 trend differences are
>> significant at the 5% level.
>>
>> So even under the naive (and incorrect) assumption that each model and
>> observational time series contains 252 independent time samples, we
>> STILL find no support for Douglass et al.'s assertion that: "In all
>> cases UAH and RSS satellite trends are inconsistent with model trends."
>> Q.E.D.
>>
>> If Leo is agreeable, I'm hopeful that we'll be able to perform a
>> similar trend comparison using synthetic MSU T2LT and T2 temperatures
>> calculated from the RAOBCORE radiosonde data - all versions, not just
>> v1.2!
>>
>> As you can see from the email list, I've expanded our "focus group" a
>> little bit, since a number of you have written to me about this issue.
>>
>> I am leaving for Miami on Monday, Dec. 17th. My Mom is having cataract
>> surgery, and I'd like to be around to provide her with moral and
>> practical support. I'm not exactly sure when I'll be returning to
>> PCMDI - although I hope I won't be gone longer than a week. As soon as
>> I get back, I'll try to make some more progress with this stuff. Any
>> suggestions or comments on what I've done so far would be greatly
>> appreciated. And for the time being, I think we should not alert
>> Douglass et al. to our results.
>>
>> With best regards, and happy holidays! May all your "Singers" be carol
>> singers, and not of the S. Fred variety...
>>
>> Ben
>>
>> (P.S.: I noticed one unfortunate typo in Table II of Douglass et al.
>> The MIROC3.2 (medres) model is referred to as "MIROC3.2_Merdes"....)
>>
>> carl mears wrote:
>>> Hi Steve
>>>
>>> I'd say it's the equivalent of rolling a 6-sided die a hundred times,
>>> and
>>> finding a mean value of ~3.5 and a standard deviation of ~1.7, and
>>> calculating the standard error of the mean to be ~0.17 (so far so
>>> good). An then rolling the die one more time, getting a 2, and
>>> claiming that the die is no longer 6 sided because the new measurement
>>> is more than 2 standard errors from the mean.
>>>
>>> In my view, this problem trumps the other problems in the paper.
>>> I can't believe Douglas is a fellow of the American Physical Society.
>>>
>>> -Carl
>>>
>>>
>>> At 02:07 AM 12/6/2007, you wrote:
>>>> If I understand correctly, what Douglass et al. did makes the
>>>> stronger assumption that unforced variability is *insignificant*.
>>>> Their statistical test is logically equivalent to falsifying a
>>>> climate model because it did not consistently predict a particular
>>>> storm on a particular day two years from now.
>>>
>>>
>>> Dr. Carl Mears
>>> Remote Sensing Systems
>>> 438 First Street, Suite 200, Santa Rosa, CA 95401
>>> mears@xxxxxxxxx.xxx
>>> xxx xxxx xxxxx21
>>> xxx xxxx xxxx(fax))
>>
>>
>
> --
>
> *Dr. Thomas R. Karl, L.H.D.*
>
> */Director/*//
>
> NOAA

Original Filename: 1197739308.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: "Thomas.R.Karl" <Thomas.R.Karl@xxxxxxxxx.xxx>
To: santer1@xxxxxxxxx.xxx
Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Sat, 15 Dec 2007 12:21:xxx xxxx xxxx
Cc: carl mears <mears@xxxxxxxxx.xxx>, SHERWOOD Steven <steven.sherwood@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>

Thanks Ben,
You have the makings of a nice article.
I note that we would expect to 10 cases that are significantly different by chance (based
on the 196 tests at the .05 sig level). You found 3. With appropriately corrected Leopold
I suspect you will find there is indeed stat sig. similar trends incl. amplification.
Setting up the statistical testing should be interesting with this many combinations.
Regards, Tom
Ben Santer said the following on 12/14/2007 5:31 PM:

Dear Tom,
As promised, I've now repeated all of the significance testing involving
model-versus-observed trend differences, but this time using spatially-averaged T2 and
T2LT changes that are not "masked out" over tropical land areas. As I mentioned this
morning, the use of non-masked data facilitates a direct comparison with Douglass et al.
The results for combined changes over tropical land and ocean are very similar to those
I sent out yesterday, which were for T2 and T2LT changes over tropical oceans only:
COMBINED LAND/OCEAN RESULTS (WITH STANDARD ERRORS ADJUSTED FOR TEMPORAL AUTOCORRELATION
EFFECTS; SPATIAL AVERAGES OVER 20N-20S; ANALYSIS PERIOD 1979 TO 1999)
T2LT tests, RSS observational data: 0 out of 49 model-versus-observed trend differences
are significant at the 5% level.
T2LT tests, UAH observational data: 1 out of 49 model-versus-observed trend differences
are significant at the 5% level.
T2 tests, RSS observational data: 1 out of 49 model-versus-observed trend differences
are significant at the 5% level.
T2 tests, UAH observational data: 1 out of 49 model-versus-observed trend differences
are significant at the 5% level.
So our conclusion - that model tropical T2 and T2LT trends are, in virtually all
realizations and models, not significantly different from either RSS or UAH trends - is
not sensitive to whether we do the significance testing with "ocean only" or combined
"land+ocean" temperature changes.
With best regards, and happy holidays to all!
Ben
Thomas.R.Karl wrote:

Ben,
This is very informative. One question I raise is whether the results would have been
at all different if you had not masked the land. I doubt it, but it would be nice to
know.
Tom
Ben Santer said the following on 12/13/2007 9:58 PM:

Dear folks,
I've been doing some calculations to address one of the statistical issues raised by the
Douglass et al. paper in the International Journal of Climatology. Here are some of my
results.
Recall that Douglass et al. calculated synthetic T2LT and T2 temperatures from the
CMIP-3 archive of 20th century simulations ("20c3m" runs). They used a total of 67 20c3m
realizations, performed with 22 different models. In calculating the statistical
uncertainty of the model trends, they introduced sigma{SE}, an "estimate of the
uncertainty of the mean of the predictions of the trends". They defined
sigma{SE} as follows:
sigma{SE} = sigma / sqrt(N - 1), where
"N = 22 is the number of independent models".
As we've discussed in our previous correspondence, this definition has serious problems
(see comments from Carl and Steve below), and allows Douglass et al. to reach the
erroneous conclusion that modeled T2LT and T2 trends are significantly different from
the observed T2LT and T2 trends in both the RSS and UAH datasets. This comparison of
simulated and observed T2LT and T2 trends is given in Table III of Douglass et al.
[As an amusing aside, I note that the RSS datasets are referred to as "RSS" in this
table, while UAH results are designated as "MSU". I guess there's only one true "MSU"
dataset...]
I decided to take a quick look at the issue of the statistical significance of
differences between simulated and observed tropospheric temperature trends. My first cut
at this "quick look" involves only UAH and RSS observational data - I have not yet done
any tests with radiosonde datas, UMD T2 data, or satellite results from Zou et al.
I operated on the same 49 realizations of the 20c3m experiment that we used in Chapter 5
of CCSP 1.1. As in our previous work, all model results are synthetic T2LT and T2
temperatures that I calculated using a static weighting function approach. I have not
yet implemented Carl's more sophisticated method of estimating synthetic MSU
temperatures from model data (which accounts for effects of topography and land/ocean
differences). However, for the current application, the simple static weighting function
approach is more than adequate, since we are focusing on T2LT and T2 changes over
tropical oceans only - so topographic and land-ocean differences are unimportant. Note
that I still need to calculate synthetic MSU temperatures from about xxx xxxx xxxxc3m
realizations which were not in the CMIP-3 database at the time we were working on the
CCSP report. For the full response to Douglass et al., we should use the same 67 20c3m
realizations that they employed.
For each of the 49 realizations that I processed, I first masked out all tropical land
areas, and then calculated the spatial averages of monthly-mean, gridded T2LT and T2
data over tropical oceans (20N-20S). All model and observational results are for the
common 252-month period from January 1979 to December 1999 - the longest period of
overlap between the RSS and UAH MSU data and the bulk of the 20c3m runs. The simulated
trends given by Douglass et al. are calculated over the same 1979 to 1999 period;
however, they use a longer period (1979 to 2004) for calculating observational trends -
so there is an inconsistency between their model and observational analysis periods,
which they do not explain. This difference in analysis periods is a little puzzling
given that we are dealing with relatively short observational record lengths, resulting
in some sensitivity to end-point effects.
I then calculated anomalies of the spatially-averaged T2LT and T2 data (w.r.t.
climatological monthly-means over 1xxx xxxx xxxx), and fit least-squares linear trends to
model and observational time series. The standard errors of the trends were adjusted for
temporal autocorrelation of the regression residuals, as described in Santer et al.
(2000) ["Statistical significance of trends and trend differences in layer-average
atmospheric temperature time series"; JGR 105, 7xxx xxxx xxxx.]
Consider first panel A of the attached plot. This shows the simulated and observed T2LT
trends over 1979 to 1999 (again, over 20N-20S, oceans only) with their adjusted 1-sigma
confidence intervals). For the UAH and RSS data, it was possible to check against the
adjusted confidence intervals independently calculated by Dian during the course of work
on the CCSP report. Our adjusted confidence intervals are in good agreement. The grey
shaded envelope in panel A denotes the 1-sigma standard error for the RSS T2LT trend.
There are 49 pairs of UAH-minus-model trend differences and 49 pairs of RSS-minus-model
trend differences. We can therefore test - for each model and each 20c3m realization -
whether there is a statistically significant difference between the observed and
simulated trends.
Let bx and by represent any single pair of modeled and observed trends, with adjusted
standard errors s{bx} and s{by}. As in our previous work (and as in related work by John
Lanzante), we define the normalized trend difference d as:
d = (bx - by) / sqrt[ (s{bx})**2 + (s{by})**2 ]
Under the assumption that d is normally distributed, values of d > +1.96 or < -1.96
indicate observed-minus-model trend differences that are significant at the 5% level. We
are performing a two-tailed test here, since we have no information a priori about the
"direction" of the model trend (i.e., whether we expect the simulated trend to be
significantly larger or smaller than observed).
Panel c shows values of the normalized trend difference for T2LT trends.
the grey shaded area spans the range +1.96 to -1.96, and identifies the region where we
fail to reject the null hypothesis (H0) of no significant difference between observed
and simulated trends.
Consider the solid symbols first, which give results for tests involving RSS data. We
would reject H0 in only one out of 49 cases (for the CCCma-CGCM3.1(T47) model). The open
symbols indicate results for tests involving UAH data. Somewhat surprisingly, we get the
same qualitative outcome that we obtained for tests involving RSS data: only one of the
UAH-model trend pairs yields a difference that is statistically significant at the 5%
level.
Panels b and d provide results for T2 trends. Results are very similar to those achieved
with T2LT trends. Irrespective of whether RSS or UAH T2 data are used, significant trend
differences occur in only one of 49 cases.
Bottom line: Douglass et al. claim that "In all cases UAH and RSS satellite trends are
inconsistent with model trends." (page 6, lines 61-62). This claim is categorically
wrong. In fact, based on our results, one could justifiably claim that THERE IS ONLY ONE
CASE in which model T2LT and T2 trends are inconsistent with UAH and RSS results! These
guys screwed up big time.
SENSITIVITY TESTS
QUESTION 1: Some of the model-data trend comparisons made by Douglass et al. used
temperatures averaged over 30N-30S rather than 20N-20S. What happens if we repeat our
simple trend significance analysis using T2LT and T2 data averaged over ocean areas
between 30N-30S?
ANSWER 1: Very little. The results described above for oceans areas between 20N-20S are
virtually unchanged.
QUESTION 2: Even though it's clearly inappropriate to estimate the standard errors of
the linear trends WITHOUT accounting for temporal autocorrelation effects (the 252 time
sample are clearly not independent; effective sample sizes typically range from 6 to
56), someone is bound to ask what the outcome is when one repeats the paired trend tests
with non-adjusted standard errors. So here are the results:
T2LT tests, RSS observational data: 19 out of 49 trend differences are significant at
the 5% level.
T2LT tests, UAH observational data: 34 out of 49 trend differences are significant at
the 5% level.
T2 tests, RSS observational data: 16 out of 49 trend differences are significant at the
5% level.
T2 tests, UAH observational data: 35 out of 49 trend differences are significant at the
5% level.
So even under the naive (and incorrect) assumption that each model and observational
time series contains 252 independent time samples, we STILL find no support for Douglass
et al.'s assertion that: "In all cases UAH and RSS satellite trends are inconsistent
with model trends."
Q.E.D.
If Leo is agreeable, I'm hopeful that we'll be able to perform a similar trend
comparison using synthetic MSU T2LT and T2 temperatures calculated from the RAOBCORE
radiosonde data - all versions, not just v1.2!
As you can see from the email list, I've expanded our "focus group" a little bit, since
a number of you have written to me about this issue.
I am leaving for Miami on Monday, Dec. 17th. My Mom is having cataract surgery, and I'd
like to be around to provide her with moral and practical support. I'm not exactly sure
when I'll be returning to PCMDI - although I hope I won't be gone longer than a week. As
soon as I get back, I'll try to make some more progress with this stuff. Any suggestions
or comments on what I've done so far would be greatly appreciated. And for the time
being, I think we should not alert Douglass et al. to our results.
With best regards, and happy holidays! May all your "Singers" be carol singers, and not
of the S. Fred variety...
Ben
(P.S.: I noticed one unfortunate typo in Table II of Douglass et al. The MIROC3.2
(medres) model is referred to as "MIROC3.2_Merdes"....)
carl mears wrote:

Hi Steve
I'd say it's the equivalent of rolling a 6-sided die a hundred times, and
finding a mean value of ~3.5 and a standard deviation of ~1.7, and
calculating the standard error of the mean to be ~0.17 (so far so
good). An then rolling the die one more time, getting a 2, and
claiming that the die is no longer 6 sided because the new measurement
is more than 2 standard errors from the mean.
In my view, this problem trumps the other problems in the paper.
I can't believe Douglas is a fellow of the American Physical Society.
-Carl
At 02:07 AM 12/6/2007, you wrote:

If I understand correctly, what Douglass et al. did makes the stronger assumption that
unforced variability is *insignificant*. Their statistical test is logically equivalent
to falsifying a climate model because it did not consistently predict a particular storm
on a particular day two years from now.

Dr. Carl Mears
Remote Sensing Systems
438 First Street, Suite 200, Santa Rosa, CA 95401
[1]mears@xxxxxxxxx.xxx
xxx xxxx xxxxx21
xxx xxxx xxxx(fax))

--
*Dr. Thomas R. Karl, L.H.D.*
*/Director/*//
NOAA's National Climatic Data Center
Veach-Baley Federal Building
151 Patton Avenue
Asheville, NC 28xxx xxxx xxxx
Tel: (8xxx xxxx xxxx
Fax: (8xxx xxxx xxxx
[2]Thomas.R.Karl@xxxxxxxxx.xxx [3]<mailto:Thomas.R.Karl@xxxxxxxxx.xxx>

--

Dr. Thomas R. Karl, L.H.D.

Director

NOAA's National Climatic Data Center

Veach-Baley Federal Building

151 Patton Avenue

Asheville, NC 28xxx xxxx xxxx

Tel: (8xxx xxxx xxxx

Fax: (8xxx xxxx xxxx

[4]Thomas.R.Karl@xxxxxxxxx.xxx

References

1. mailto:mears@xxxxxxxxx.xxx
2. mailto:Thomas.R.Karl@xxxxxxxxx.xxx
3. mailto:Thomas.R.Karl@xxxxxxxxx.xxx
4. mailto:Thomas.R.Karl@xxxxxxxxx.xxx

Original Filename: 1198443017.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>
To: John.Lanzante@xxxxxxxxx.xxx
Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Sun, 23 Dec 2007 15:50:17 +0100
Cc: "Thomas.R.Karl" <Thomas.R.Karl@xxxxxxxxx.xxx>, carl mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, santer1@xxxxxxxxx.xxx, Sherwood Steven <steven.sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <susan.solomon@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>

<x-flowed>
Dear all,

I have attached a plot which summarizes the recent developments
concerning tropical radiosonde temperature datasets and which could be
a candidate to be included in a reply to Douglass et al.
It contains trend profiles from unadjusted radiosondes, HadAT2-adjusted
radiosondes, RAOBCORE (versions 1.2-1.4) adjusted radiosondes
and from radiosondes adjusted with a neighbor composite method (RICH)
that uses the break dates detected with RAOBCORE (v1.4) as metadata.
RAOBCORE v1.2,v1.3 are documented in Haimberger (2007), RAOBCORE v1.4
and RICH are discussed in the manuscript I mentioned in my previous email.
Latitude range is 20S-20N, only time series with less than 24 months of
missing data are included. Spatial sampling of all curves is the same
except HadAT which contains less stations that meet the 24month
criterion. Sampling uncertainty of the trend curves is ca.
+/-0.1K/decade (95% percentiles estimated with bootstrap method).

RAOBCORE v1.3,1.4 and RICH are results from ongoing research and warming
trends from radiosondes may still be underestimated.
The upper tropospheric warming maxima from RICH are even larger (up to
0.35K/decade, not shown), if only radiosondes within the tropics
(20N-20S) are allowed as reference for adjustment of tropical radiosonde
temperatures. The pink/blue curves in the attached plot should therefore
not be regarded as upper bound of what may be achieved with plausible
choices of reference series for homogenization.

Please let me know your comments.

I wish you a merry Christmas.

With best regards

Leo

John Lanzante wrote:
> Ben,
>
> Perhaps a resampling test would be appropriate. The tests you have performed
> consist of pairing an observed time series (UAH or RSS MSU) with each one
> of 49 GCM times series from your "ensemble of opportunity". Significance
> of the difference between each pair of obs/GCM trends yields a certain
> number of "hits".
>
> To determine a baseline for judging how likely it would be to obtain the
> given number of hits one could perform a set of resampling trials by
> treating one of the ensemble members as a surrogate observation. For each
> trial, select at random one of the 49 GCM members to be the "observation".
> From the remaining 48 members draw a bootstrap sample of 49, and perform
> 49 tests, yielding a certain number of "hits". Repeat this many times to
> generate a distribution of "hits".
>
> The actual number of hits, based on the real observations could then be
> referenced to the Monte Carlo distribution to yield a probability that this
> could have occurred by chance. The basic idea is to see if the observed
> trend is inconsistent with the GCM ensemble of trends.
>
> There are a couple of additional tweaks that could be applied to your method.
> You are currently computing trends for each of the two time series in the
> pair and assessing the significance of their differences. Why not first
> create a difference time series and assess the significance of it's trend?
> The advantage of this is that you would reduce somewhat the autocorrelation
> in the time series and hence the effect of the "degrees of freedom"
> adjustment. Since the GCM runs are based on coupled model runs this
> differencing would help remove the common externally forced variability,
> but not internally forced variability, so the adjustment would still be
> needed.
>
> Another tweak would be to alter the significance level used to assess
> differences in trends. Currently you are using the 5% level, which yields
> only a small number of hits. If you made this less stringent you would get
> potentially more weaker hits. But it would all come out in the wash so to
> speak since the number of hits in the Monte Carlo simulations would increase
> as well. I suspect that increasing the number of expected hits would make the
> whole procedure more powerful/efficient in a statistical sense since you
> would no longer be dealing with a "rare event". In the current scheme, using
> a 5% level with 49 pairings you have an expected hit rate of 0.05 X 49 = 2.45.
> For example, if instead you used a 20% significance level you would have an
> expected hit rate of 0.20 X 49 = 9.8.
>
> I hope this helps.
>
> On an unrelated matter, I'm wondering a bit about the different versions of
> Leo's new radiosonde dataset (RAOBCORE). I was surprised to see that the
> latest version has considerably more tropospheric warming than I recalled
> from an earlier version that was written up in JCLI in 2007. I have a
> couple of questions that I'd like to ask Leo. One concern is that if we use
> the latest version of RAOBCORE is there a paper that we can reference --
> if this is not in a peer-reviewed journal is there a paper in submission?
> The other question is: could you briefly comment on the differences in
> methodology used to generate the latest version of RAOBCORE as compared to
> the version used in JCLI 2007, and what/when/where did changes occur to
> yield a stronger warming trend?
>
> Best regards,
>
> ______John
>
>
>
> On Saturday 15 December 2007 12:21 pm, Thomas.R.Karl wrote:
>
>> Thanks Ben,
>>
>> You have the makings of a nice article.
>>
>> I note that we would expect to 10 cases that are significantly different
>> by chance (based on the 196 tests at the .05 sig level). You found 3.
>> With appropriately corrected Leopold I suspect you will find there is
>> indeed stat sig. similar trends incl. amplification. Setting up the
>> statistical testing should be interesting with this many combinations.
>>
>> Regards, Tom
>>
>
>

--
Ao. Univ. Prof. Dr. Leopold Haimberger
Institut für Meteorologie und Geophysik, Universität Wien
Althanstraße 14, A - 1090 Wien
Tel.: xxx xxxx xxxx
Fax.: xxx xxxx xxxx
http://mailbox.univie.ac.at/~haimbel7/


</x-flowed>

Attachment Converted: "c:documents and settingstim osbornmy documentseudoraattacht00_trendbeltbg_Tropics_1xxx xxxx xxxx_1.4.eps"

Original Filename: 1198790779.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, carl mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Steven Sherwood <Steven.Sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Gavin Schmidt <gschmidt@xxxxxxxxx.xxx>
Subject: More significance testing
Date: Thu, 27 Dec 2007 16:26:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx

<x-flowed>
Dear folks,

This email briefly summarizes the trend significance test results. As I
mentioned in yesterday's email, I've added a new case (referred to as
"TYPE3" below). I've also added results for tests with a stipulated 10%
significance level. Here is the explanation of the four different types
of trend test:

1. "OBS-vs-MODEL": Observed MSU trends in RSS and UAH are tested against
trends in synthetic MSU data in 49 realizations of the 20c3m experiment.
Results from RSS and UAH are pooled, yielding a total of 98 tests for T2
trends and 98 tests for T2LT trends.

2. "MODEL-vs-MODEL (TYPE1)": Involves model data only. Trend in
synthetic MSU data in each of 49 20c3m realizations is tested against
each trend in the remaining 48 realizations (i.e., no trend tests
involving identical data). Yields a total of 49 x 48 = 2352 tests. The
significance of trend differences is a function of BOTH inter-model
differences (in climate sensitivity, applied 20c3m forcings, and the
amplitude of variability) AND "within-model" effects (i.e., is related
to the different manifestations of natural internal variability
superimposed on the underlying forced response).

3. "MODEL-vs-MODEL (TYPE2)": Involves model data only. Limited to the M
models with multiple realizations of the 20c3m experiment. For each of
these M models, the number of unique combinations C of N 20c3m
realizations into R trend pairs is determined. For example, in the case
of N = 5, C = N! / [ R!(N-R)! ] = 10. The significance of trend
differences is solely a function of "within-model" effects (i.e., is
related to the different manifestations of natural internal variability
superimposed on the underlying forced response). There are a total of 62
tests (not 124, as I erroneously reported yesterday!)

4. "MODEL-vs-MODEL (TYPE3)": Involves model data only. For each of the
19 models, only the first 20c3m realization is used. The trend in each
model's first 20c3m realization is tested against each trend in the
first 20c3m realization of the remaining 18 models. Yields a total of 19
x 18 = 342 tests. The significance of trend differences is solely a
function of inter-model differences (in climate sensitivity, applied
20c3m forcings, and the amplitude of variability).

REJECTION RATES FOR STIPULATED 5% SIGNIFICANCE LEVEL
Test type No. of tests T2 "Hits" T2LT "Hits"
1. OBS-vs-MODEL 49 x xxx xxxx xxxx(xxx xxxx xxxx(2.04%xxx xxxx xxxx(1.02%)
2. MODEL-vs-MODEL (TYPExxx xxxx xxxxx 48 (23xxx xxxx xxxx(2.47%xxx xxxx xxxx(1.36%)
3. MODEL-vs-MODEL (TYPExxx xxxx xxxx(xxx xxxx xxxx(0.00%xxx xxxx xxxx(0.00%)
4. MODEL-vs-MODEL (TYPExxx xxxx xxxxx 18 (3xxx xxxx xxxx(6.43%xxx xxxx xxxx(4.09%)

REJECTION RATES FOR STIPULATED 10% SIGNIFICANCE LEVEL
Test type No. of tests T2 "Hits" T2LT "Hits"
1. OBS-vs-MODEL 49 x xxx xxxx xxxx(xxx xxxx xxxx(4.08%xxx xxxx xxxx(2.04%)
2. MODEL-vs-MODEL (TYPExxx xxxx xxxxx 48 (23xxx xxxx xxxx(3.40%xxx xxxx xxxx(1.96%)
3. MODEL-vs-MODEL (TYPExxx xxxx xxxx(xxx xxxx xxxx(1.61%xxx xxxx xxxx(0.00%)
4. MODEL-vs-MODEL (TYPExxx xxxx xxxxx 18 (3xxx xxxx xxxx(8.19%xxx xxxx xxxx(5.85%)

REJECTION RATES FOR STIPULATED 20% SIGNIFICANCE LEVEL
Test type No. of tests T2 "Hits" T2LT "Hits"
1. OBS-vs-MODEL 49 x xxx xxxx xxxx(xxx xxxx xxxx(7.14%xxx xxxx xxxx(5.10%)
2. MODEL-vs-MODEL (TYPExxx xxxx xxxxx 48 (23xxx xxxx xxxx(7.48%xxx xxxx xxxx(4.25%)
3. MODEL-vs-MODEL (TYPExxx xxxx xxxx(xxx xxxx xxxx(6.45%xxx xxxx xxxx(4.84%)
4. MODEL-vs-MODEL (TYPExxx xxxx xxxxx 18 (3xxx xxxx xxxx(12.28%xxx xxxx xxxx(8.19%)

Features of interest:

A) As you might expect, for each of the three significance levels, TYPE3
tests yield the highest rejection rates of the null hypothesis of "No
significant difference in trend". TYPE2 tests yield the lowest rejection
rates. This is simply telling us that the inter-model differences in
trends tend to be larger than the "between-realization" differences in
trends in any individual model.

B) Rejection rates for the model-versus-observed trend tests are
consistently LOWER than for the model-versus-model (TYPE3) tests. On
average, therefore, the tropospheric trend differences between the
observational datasets used here (RSS and UAH) and the synthetic MSU
temperatures calculated from 19 CMIP-3 models are actually LESS
SIGNIFICANT than the inter-model trend differences arising from
differences in sensitivity, 20c3m forcings, and levels of variability.

I also thought that it would be fun to use the model data to explore the
implications of Douglass et al.'s flawed statistical procedure. Recall
that Douglass et al. compare (in their Table III) the observed T2 and
T2LT trends in RSS and UAH with the overall means of the multi-model
distributions of T2 and T2LT trends. Their standard error, sigma{SE}, is
meant to represent an "estimate of the uncertainty of the mean" (i.e.,
the mean trend). sigma{SE} is given as:

sigma{SE} = sigma / sqrt{N - 1}

where sigma is the standard deviation of the model trends, and N is "the
number of independent models" (22 in their case). Douglass et al.
apparently estimate sigma using ensemble-mean trends for each model (if
20c3m ensembles are available).

So what happens if we apply this procedure using model data only? This
is rather easy to do. As above (in the TYPE1, TYPE2, and TYPE3 tests), I
simply used the synthetic MSU trends from the 19 CMIP-3 models employed
in our CCSP Report and in Santer et al. 2005 (so N = 19). For each
model, I calculated the ensemble-mean 20c3m trend over 1979 to 1999
(where multiple 20c3m realizations were available). Let's call these
mean trends b{j}, where j (the index over models) = 1, 2, .. 19.
Further, let's regard b{1} as the surrogate observations, and then use
Douglass et al.'s approach to test whether b{1} is significantly
different from the overall mean of the remaining 18 members of b{j}.
Then repeat with b{2} as surrogate observations, etc. For each
layer-averaged temperature series, this yields 19 tests of the
significance of differences in mean trends.

To give you a feel for this stuff, I've reproduced below the results for
tests involving T2LT trends. The "OBS" column is the ensemble-mean T2LT
trend in the surrogate observations. "MODAVE" is the overall mean trend
in the 18 remaining members of the distribution, and "SIGMA" is the
1-sigma standard deviation of these trends. "SIGMA{SE}" is 1 x
SIGMA{SE} (note that Douglass et al. give 2 x SIGMA{SE} in their Table
III; multiplying our SIGMA{SE} results by two gives values similar to
theirs). "NORMD" is simply the normalized difference (OBS-MODAVE) /
SIGMA{SE}, and "P-VALUE" is the p-value for the normalized difference,
assuming that this difference is approximately normally distributed.

MODEL "OBS" MODAVE SIGMA SIGMA{SE} NORMD P-VALUE

CCSM3.xxx xxxx xxxx.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.7xxx xxxx xxxx.0052

GFDL2.xxx xxxx xxxx.2xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.0359

GFDL2.xxx xxxx xxxx.3xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.4xxx xxxx xxxx.0000

GISS_EH 0.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.3xxx xxxx xxxx.0009

GISS_ER 0.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.3075
MIROC3.2_Txxx xxxx xxxx.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.3xxx xxxx xxxx.0000
MIROC3.2_T106 0.2xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.7xxx xxxx xxxx.4651
MRI2.3.2a 0.2xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.2xxx xxxx xxxx.0013

PCM 0.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.2xxx xxxx xxxx.0013

HADCMxxx xxxx xxxx.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.3018

HADGEMxxx xxxx xxxx.3xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.7xxx xxxx xxxx.0000

CCCMA3.xxx xxxx xxxx.4xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.1xxx xxxx xxxx.0000

CNRM3.xxx xxxx xxxx.2xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.2xxx xxxx xxxx.2019

CSIRO3.xxx xxxx xxxx.2xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.1xxx xxxx xxxx.0018
ECHAMxxx xxxx xxxx.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.4xxx xxxx xxxx.0000
IAP_FGOALS1.0 0.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.5xxx xxxx xxxx.1257
GISS_AOM 0.1xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.7xxx xxxx xxxx.0788
INMCM3.xxx xxxx xxxx.0xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.0000
IPSL_CMxxx xxxx xxxx.2xxx xxxx xxxx.2xxx xxxx xxxx.0xxx xxxx xxxx.0xxx xxxx xxxx.5xxx xxxx xxxx.5920

T2LT: No. of p-values .le. 0.05: 12. Rejection rate: 63.16%
T2LT: No. of p-values .le. 0.10: 13. Rejection rate: 68.42%
T2LT: No. of p-values .le. 0.20: 14. Rejection rate: 73.68%

The corresponding rejection rates for the tests involving T2 data are:

T2: No. of p-values .le. 0.05: 12. Rejection rate: 63.16%
T2: No. of p-values .le. 0.10: 13. Rejection rate: 68.42%
T2: No. of p-values .le. 0.20: 15. Rejection rate: 78.95%

Bottom line: If we applied Douglass et al.'s ridiculous test of
difference in mean trends to model data only - in fact, to virtually the
same model data they used in their paper - one would conclude that
nearly two-thirds of the individual models had trends that were
significantly different from the multi-model mean trend! To follow
Douglass et al.'s flawed logic, this would mean that two-thirds of the
models really aren't models after all...

Happy New Year to all of you!

With best regards,

Ben
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------
</x-flowed>

Original Filename: 1198984230.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>
To: santer1@xxxxxxxxx.xxx
Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Sat, 29 Dec 2007 22:10:30 +0100
Cc: John.Lanzante@xxxxxxxxx.xxx, "Thomas.R.Karl" <Thomas.R.Karl@xxxxxxxxx.xxx>, carl mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Sherwood Steven <steven.sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <susan.solomon@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>

<x-flowed>
Ben,

I have attached the tropical mean trend profiles, now for the period
1xxx xxxx xxxx.

RAOBCORE versions show much more upper tropospheric heating for this
period, RICH shows slightly more heating.
Note also stronger cooling of unadjusted radiosondes in stratospheric
layers compared to 1xxx xxxx xxxx.

Just for information I have included also zonal mean trend plots for the
unadjusted radiosondes (tm), RAOBCORE v1.4 (tmcorr) and RICH (rgmra)
I do not suggest that these plots should be included but some of you
maybe want to know about the spatial coherence
of the zonal mean trends. It is interesting to see the lower
tropospheric warming minimum in the tropics in all three plots,
which I cannot explain. I believe it is spurious but it is remarkably
robust against my adjustment efforts.

Meridional resolution is 10 degrees.
As you can imagine, the tropical upper tropospheric heating maximum at
5S and the cooling in the unadjusted radiosondes at 5N are
based on very few long records in these belts. 2-3 in 5S, about 5 in 5N.

Best regards and I wish you all a happy new year.

Leo


Ben Santer wrote:
> Dear Leo,
>
> The Figure that you sent is extremely informative, and would be great
> to include in a response to Douglass et al. The Figure clearly
> illustrates that the "structural uncertainties" inherent in
> radiosonde-based estimates of tropospheric temperature change are much
> larger than Douglass et al. have claimed. This is an important point
> to make.
>
> Would it be possible to produce a version of this Figure showing
> results for the period 1979 to 1999 (the period that I've used for
> testing the significance of model-versus-observed trend differences)
> instead of 1979 to 2004?
>
> With best regards, and frohes Neues Jahr!
>
> Ben
> Leopold Haimberger wrote:
>> Dear all,
>>
>> I have attached a plot which summarizes the recent developments
>> concerning tropical radiosonde temperature datasets and which could
>> be a candidate to be included in a reply to Douglass et al.
>> It contains trend profiles from unadjusted radiosondes,
>> HadAT2-adjusted radiosondes, RAOBCORE (versions 1.2-1.4) adjusted
>> radiosondes
>> and from radiosondes adjusted with a neighbor composite method (RICH)
>> that uses the break dates detected with RAOBCORE (v1.4) as metadata.
>> RAOBCORE v1.2,v1.3 are documented in Haimberger (2007), RAOBCORE v1.4
>> and RICH are discussed in the manuscript I mentioned in my previous
>> email.
>> Latitude range is 20S-20N, only time series with less than 24 months
>> of missing data are included. Spatial sampling of all curves is the
>> same except HadAT which contains less stations that meet the 24month
>> criterion. Sampling uncertainty of the trend curves is ca.
>> +/-0.1K/decade (95% percentiles estimated with bootstrap method).
>>
>> RAOBCORE v1.3,1.4 and RICH are results from ongoing research and
>> warming trends from radiosondes may still be underestimated.
>> The upper tropospheric warming maxima from RICH are even larger (up
>> to 0.35K/decade, not shown), if only radiosondes within the tropics
>> (20N-20S) are allowed as reference for adjustment of tropical
>> radiosonde temperatures. The pink/blue curves in the attached plot
>> should therefore not be regarded as upper bound of what may be
>> achieved with plausible choices of reference series for homogenization.
>> Please let me know your comments.
>>
>> I wish you a merry Christmas.
>>
>> With best regards
>>
>> Leo
>>
>> John Lanzante wrote:
>>> Ben,
>>>
>>> Perhaps a resampling test would be appropriate. The tests you have
>>> performed
>>> consist of pairing an observed time series (UAH or RSS MSU) with
>>> each one
>>> of 49 GCM times series from your "ensemble of opportunity".
>>> Significance
>>> of the difference between each pair of obs/GCM trends yields a certain
>>> number of "hits".
>>>
>>> To determine a baseline for judging how likely it would be to obtain
>>> the
>>> given number of hits one could perform a set of resampling trials by
>>> treating one of the ensemble members as a surrogate observation. For
>>> each
>>> trial, select at random one of the 49 GCM members to be the
>>> "observation".
>>> From the remaining 48 members draw a bootstrap sample of 49, and
>>> perform
>>> 49 tests, yielding a certain number of "hits". Repeat this many
>>> times to
>>> generate a distribution of "hits".
>>>
>>> The actual number of hits, based on the real observations could then be
>>> referenced to the Monte Carlo distribution to yield a probability
>>> that this
>>> could have occurred by chance. The basic idea is to see if the observed
>>> trend is inconsistent with the GCM ensemble of trends.
>>>
>>> There are a couple of additional tweaks that could be applied to
>>> your method.
>>> You are currently computing trends for each of the two time series
>>> in the
>>> pair and assessing the significance of their differences. Why not first
>>> create a difference time series and assess the significance of it's
>>> trend?
>>> The advantage of this is that you would reduce somewhat the
>>> autocorrelation
>>> in the time series and hence the effect of the "degrees of freedom"
>>> adjustment. Since the GCM runs are based on coupled model runs this
>>> differencing would help remove the common externally forced
>>> variability,
>>> but not internally forced variability, so the adjustment would still be
>>> needed.
>>>
>>> Another tweak would be to alter the significance level used to assess
>>> differences in trends. Currently you are using the 5% level, which
>>> yields
>>> only a small number of hits. If you made this less stringent you
>>> would get
>>> potentially more weaker hits. But it would all come out in the wash
>>> so to
>>> speak since the number of hits in the Monte Carlo simulations would
>>> increase
>>> as well. I suspect that increasing the number of expected hits would
>>> make the
>>> whole procedure more powerful/efficient in a statistical sense since
>>> you
>>> would no longer be dealing with a "rare event". In the current
>>> scheme, using
>>> a 5% level with 49 pairings you have an expected hit rate of 0.05 X
>>> 49 = 2.45.
>>> For example, if instead you used a 20% significance level you would
>>> have an
>>> expected hit rate of 0.20 X 49 = 9.8.
>>>
>>> I hope this helps.
>>>
>>> On an unrelated matter, I'm wondering a bit about the different
>>> versions of
>>> Leo's new radiosonde dataset (RAOBCORE). I was surprised to see that
>>> the
>>> latest version has considerably more tropospheric warming than I
>>> recalled
>>> from an earlier version that was written up in JCLI in 2007. I have a
>>> couple of questions that I'd like to ask Leo. One concern is that if
>>> we use
>>> the latest version of RAOBCORE is there a paper that we can
>>> reference --
>>> if this is not in a peer-reviewed journal is there a paper in
>>> submission?
>>> The other question is: could you briefly comment on the differences
>>> in methodology used to generate the latest version of RAOBCORE as
>>> compared to the version used in JCLI 2007, and what/when/where did
>>> changes occur to
>>> yield a stronger warming trend?
>>>
>>> Best regards,
>>>
>>> ______John
>>>
>>>
>>>
>>> On Saturday 15 December 2007 12:21 pm, Thomas.R.Karl wrote:
>>>
>>>> Thanks Ben,
>>>>
>>>> You have the makings of a nice article.
>>>>
>>>> I note that we would expect to 10 cases that are significantly
>>>> different by chance (based on the 196 tests at the .05 sig level).
>>>> You found 3. With appropriately corrected Leopold I suspect you
>>>> will find there is indeed stat sig. similar trends incl.
>>>> amplification. Setting up the statistical testing should be
>>>> interesting with this many combinations.
>>>>
>>>> Regards, Tom
>>>>
>>>
>>>
>>
>
>

--
Ao. Univ. Prof. Dr. Leopold Haimberger
Institut für Meteorologie und Geophysik, Universität Wien
Althanstraße 14, A - 1090 Wien
Tel.: xxx xxxx xxxx
Fax.: xxx xxxx xxxx
http://mailbox.univie.ac.at/~haimbel7/


</x-flowed>

Attachment Converted: "c:documents and settingstim osbornmy documentseudoraattacht00_trendbeltbg_Tropics_1xxx xxxx xxxx_v1_4.eps"

Attachment Converted: "c:documents and settingstim osbornmy documentseudoraattacht00_trendzonalGlobe_tmcorr_1xxx xxxx xxxx.ps"

Attachment Converted: "c:documents and settingstim osbornmy documentseudoraattacht00_trendzonalGlobe_rgmra_1xxx xxxx xxxx.ps"

Attachment Converted: "c:documents and settingstim osbornmy documentseudoraattacht00_trendzonalGlobe_tm_1xxx xxxx xxxx.ps"

Original Filename: 1199027884.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Susan Solomon <Susan.Solomon@xxxxxxxxx.xxx>
To: Tom Wigley <wigley@xxxxxxxxx.xxx>, "Thomas.R.Karl" <Thomas.R.Karl@xxxxxxxxx.xxx>
Subject: Re: Douglass et al. paper
Date: Sun, 30 Dec 2007 10:18:xxx xxxx xxxx
Cc: John.Lanzante@xxxxxxxxx.xxx, carl mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, santer1@xxxxxxxxx.xxx, Sherwood Steven <steven.sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, myles <m.allen1@xxxxxxxxx.xxx>, Bill Fulkerson <wfulk@xxxxxxxxx.xxx>

<x-flowed>
Dear All,

Thanks very much for the helpful discussion on these issues.

I write to make a point that may not be well
recognized regarding the character of the
temperature trends in the lowermost
stratosphere/upper troposphere. I have already
discussed this with Ben but want to share with
others since I believe it is relevant to this
controversy at least at some altitudes. The
question I want to raise is not related to the
very important dialogue on how to handle the
errors and the statistics, but rather how to
think about the models.

The attached paper by Forster et al. appeared
recently in GRL. It taught me something I
didn't realize, namely that ozone losses and
accompanying temperature trends at higher
altitudes can strongly affect lower altitudes,
through the influence of downwelling longwave.
There is now much evidence that ozone has
decreased significantly in the tropics near 70
mbar. What we show in the attached paper by
Forster et al is that ozone depletion near 70
mbar affects temperatures not only at that level,
but also down to lower altitudes. I think this
is bound to be important to the tropical
temperature trends at least in the xxx xxxx xxxxmbar
height range, possibly lower down as well,
depending upon the degree to which there is a
'substratosphere' that is more radiatively
influenced than the rest of the troposphere.
Whether it can have an influence as low as 200
mbar - I don't know. But note that having an
influence could mean reducing the warming there,
not necessarily flipping it over to a net
cooling. This 'long-distance' physics, whereby
ozone depletion and associated cooling up high
can affect the thermal structure lower down, is
not a point I had understood despite many years
of studying the problem so I thought it
worthwhile to point it out to you here. It has
often been said (I probably said it myself five
years ago) that ozone losses and associated
cooling can't happen or aren't important in this
region - but that is wrong.

Further, the fundamental point made in the paper
of Thompson and Solomon a few years back remains
worth noting, and is, I believe, now resolved in
the more recent Forster et al paper: that the
broad structure of the temperature trends, with
quite large cooing in the lowermost stratosphere
in the tropics, comparable to that seen at higher
latitudes, is a feature NOT explained by e.g. CO2
cooling, but now can be explained by the observed
ozone losses. Exactly how big the tropical
cooling is, and exactly how low down it goes,
remains open to quantitative question and
improvement of radiosonde datasets. But I
believe the fundamental point we made in 2005
remains true: the temperature trends in the
lower stratosphere in the tropics are, even with
corrections, quite comparable to that seen at
other latitudes. We can now say it is surely
linked to the now-well-observed trends in ozone
there. The new paper further shows that you
don't have to have ozone trends at 100 mbar to
have a cooling there, due to down-welling
longwave, possibly lower down still. Whether
enhanced upwelling is a factor is a central
question.

No global general circulation model can possibly
be expected to simulate this correctly unless it
has interactive ozone, or prescribes an observed
tropical ozone trend. The AR4 models did not
include this, and any 'discrepancies' are not
relevant at all to the issue of the fidelity of
those models for global warming. So in closing
let me just say that just how low down this
effect goes needs more study, but that it does
happen and is relevant to the key problem of
tropical temperature trends is one that I hope
this email has clarified.

Happy new year,
Susan


At 6:13 PM -0700 12/29/07, Tom Wigley wrote:
>Tom,
>
>Yes -- I had this in an earlier version, but I did not want to
>overwhelm people with the myriad errors in the D et al. paper.
>
>I liked the attached item -- also in an earlier version.
>
>Tom.
>
>+++++++++++++
>
>Thomas.R.Karl wrote:
>
>>Tom,
>>
>>This is a very nice set of slides clearly
>>showing the problem with the Douglass et al
>>paper. One other aspect of this issue that
>>John L has mentioned and we discussed when we
>>were doing SAP 1.1 relates to difference
>>series. I am not sure whether Ben was
>>calculating the significance of the difference
>>series between sets of observations and model
>>simulations (annually). This would help offset
>>the effects of El-Nino and Volcanoes on the
>>trends.
>>
>>Tom K.
>>
>>Tom Wigley said the following on 12/29/2007 1:05 PM:
>>
>>>Dear all,
>>>
>>>I was recently at a meeting in Rome where Fred Singer was a participant.
>>>He was not on the speaker list, but, in
>>>advance of the meeting, I had thought
>>>he might raise the issue of the Douglass et
>>>al. paper. I therefore prepared the
>>>attached power point -- modified slightly since returning from Rome. As it
>>>happened, Singer did not raise the Douglass et al. issue, so I did not use
>>>the ppt. Still, it may be useful for members
>>>of this group so I am sending it
>>>to you all.
>>>
>>>Please keep this in confidence. I do not want
>>>it to get back to Singer or any
>>>of the Douglass et al. co-authors -- at least
>>>not at this stage while Ben is still
>>>working on a paper to rebut the Douglass et al. claims.
>>>
>>>On slide 6 I have attributed the die tossing
>>>argument to Carl Mears -- but, in
>>>looking back at my emails I can't find the
>>>original. If I've got this attribution
>>>wrong, please let me know.
>>>
>>>Other comments are welcome. Mike MacCracken and Ben helped in putting
>>>this together -- thanks to both.
>>>
>>>Tom.
>>>
>>>++++++++++++++++++++++++++++++++++++++++
>>
>>
>>--
>>
>>*Dr. Thomas R. Karl, L.H.D.*
>>
>>*/Director/*//
>>
>>NOAA's National Climatic Data Center
>>
>>Veach-Baley Federal Building
>>
>>151 Patton Avenue
>>
>>Asheville, NC 28xxx xxxx xxxx
>>
>>Tel: (8xxx xxxx xxxx
>>
>>Fax: (8xxx xxxx xxxx
>>
>>Thomas.R.Karl@xxxxxxxxx.xxx <mailto:Thomas.R.Karl@xxxxxxxxx.xxx>
>>
>
>
>
>Attachment converted: Junior:Comment on Douglass.ppt (SLD3/

Original Filename: 1199286511.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Peter Thorne <peter.thorne@xxxxxxxxx.xxx>
To: Susan Solomon <Susan.Solomon@xxxxxxxxx.xxx>
Subject: Re: Douglass et al. paper
Date: Wed, 02 Jan 2008 10:08:31 +0000
Cc: Tom Wigley <wigley@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, Carl Mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, Dian Seidel <dian.seidel@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, Melissa Free <melissa.free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, Phil Jones <p.jones@xxxxxxxxx.xxx>, Ben Santer <santer1@xxxxxxxxx.xxx>, Steve Sherwood <Steven.Sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Myles Allen <m.allen1@xxxxxxxxx.xxx>, Bill Fulkerson <wfulk@xxxxxxxxx.xxx>

Susan et al.,

I had also seen the Forster et al paper and was glad to see he had
followed up on work and ideas we had discussed some years ago when he
was at Reading and from the Exeter workshop. At the time I had done some
simple research on whether the stratosphere could affect the tropical
troposphere - possibly through convection modification or radiative
cooling. I'd done a simple timeseries regression of T2LT=a*Tsurf+b*T4+c
and got some regression coefficients out that suggested an influence.
Now, this was with old and now discredited data and the Fu et al.
technique has since superseded it to some extent (or at least cast
considerable doubt upon its efficacy) ... it would certainly be hard to
prove in a regression what was cause and effect with such broad
weighting functions even using T2LT which still isn't *really*
independent from T4.

But one thing I did do to try to "prove" the regression result was real
is take the composite differences between QBO phases on 45 years of
detrended (can't remember exactly how but I think I took differences
from decadally filtered data) data from radiosondes (HadAT1 at the
time). This showed a really very interesting result and suggested that
this communication if it was real went quite far down in to the
troposphere and was statistically significant, particularly in those
seasons when the ITCZ and QBO were geographically coincident. I attach
the slide for interest. I think this is the only scientifically valid
part of the analysis that I would stand by today given the rather
massive developments since. I doubt that raobs inhomogeneities could
explain the plot result as they project much more onto the trend than
they would onto this type of analysis.

The cooling stratosphere may really have an influence even quite low
down if this QBO composite technique is a good analogue for a cooling
startosphere's impact, and timeseries regression analysis supports it in
some obs (it would be interesting to repeat such an analysis with the
newer obs but I don't have time). A counter, however, is that surely the
models do radiation so those with ozone loss should do a good job of
this effect. This could be checked in Ben's ensemble in a poor man's
sense at least because some have ozone depletion and some don't.

The only way this could be a real factor not picked by the models, I
concluded at the time, is if models are far too keen to trigger
convection and that any real-world increased radiative cooling
efficiency effect is masked in the models because they convect far too
often and regain CAPE closure as a condition.

On another matter, we seem to be concentrating entirely on layer-average
temperatures. This is fine, but we know from CCSP these show little in
the way of differences. The key, and much harder test is to capture the
differences in behaviour between layers / levels - the "amplification"
behaviour. This was the focus of Santer et al. and I still believe is
the key scientific question given that each model realisation is
inherently so different but that we believe the physics determining the
temperature profile to be the key test that has to be answered. Maybe we
need to step back and rephrase the question in terms of the physics
rather than aiming solely to rebutt Douglass et al? In this case the key
physical questions in my view would be:

1. Why is there such strong evidence from sondes for a minima at c. 500?
Is this because it is near the triple point of water in the tropics? Or
at the top of the shallow convection? Or simply an artefact? [I don't
have any good ideas how we would answer the first two of these
questions]

2. Is there really a stratospheric radiative influence? If so, how low
does it go? What is the cause? Are the numbers consistent with the
underlying governing physics or simply an artefact of residual obs
errors?

3. Can any models show trend behaviour that deviates from a SALR on
multi-decadal timescales? If so, what is it about the model that causes
this effect? Physics? Forcings? Phasing of natural variability? Is it
also true on shorter timescales in this model?

It seems to me that trying to do an analysis based upon such physical
understanding / questions will clarify things far better than simply
doing another set of statistical analysis. I'm still particularly
interested if #2 is really true in the raobs (its not possible to do
with satellites I suspect, but if it is true it means we need to
massively rethink Fu et al. type analysis at least in the tropics) and
would be interested in helping someone follow up on that ... I think in
the future the Forster et al paper may be seen as the more
scientifically significant result when Douglass et al is no longer cared
about ...

Happy new year to you all.

Peter
--
Peter Thorne Climate Research Scientist
Met Office Hadley Centre, FitzRoy Road, Exeter, EX1 3PB
tel. xxx xxxx xxxxfax xxx xxxx xxxx
www.metoffice.gov.uk/hadobs


Attachment Converted: "c:eudoraattachqbo_slide.ppt"

Original Filename: 1199303943.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Susan Solomon <Susan.Solomon@xxxxxxxxx.xxx>
To: P.Jones@xxxxxxxxx.xxx, Kevin Trenberth <trenbert@xxxxxxxxx.xxx>
Subject: Re: urban stuff
Date: Wed, 02 Jan 2008 14:59:xxx xxxx xxxx
Cc: Phil Jones <p.jones@xxxxxxxxx.xxx>

<x-flowed>
Phil
Thanks for the Benestad reference, which I hadn't seen and will read
with interest.

Please keep me in the loop on your reprints.

I'm aware of the work with Dave Thompson, which is very interesting.

Happy new year to you too.

We can all look back on 2007 as a year in which we, the scientists,
did a fantastic job.
best
Susan



At 8:59 PM +0000 1/2/08, P.Jones@xxxxxxxxx.xxx wrote:
> Kevin, Susan,
> Working on several things at the moment, so won't
> have much time for a few weeks. Rasmus Benestad of
> the Norwegian Met Service wrote a paper on a very similar
> earlier verion of this McKittrick/Michaels paper (both
> were in Climate Research). There is nothing new in this
> paper in JGR.
> The only thing new in both this JGR paper and the
> Douglass et al one in IJC is the awful reviewing!!!!
> Rebuttals help, but often the damage is done once the
> paper comes out. The MM paper is bad, but the reviewing
> is even worse. Why did MM refer to an erratum on their
> paper which is essentially the same? Any reviewer worth
> any salt should have spotted that and then they would have
> seen the Benestad comment, which MM surprisingly don't refer to.
>
> I'm hoping to submit a paper on urbanization soon -
> based on work with Chinese series - this relates to the
> fraud allegation against Wei-Chyung Wang that Kevin knows
> about.
>
> Also should be a press release tomorrow or Friday about
> the forecast for 2008 temperatures. La Nina looks like making
> it coolish - cooler just than all years since 2001 (including
> 2001) and 1998. Pointing out that 2xxx xxxx xxxxis 0.21 warmer
> than 1xxx xxxx xxxxwhich is exactly as it should be with ghg-related
> warming of 0.2 per decade.
>
> [Also working on something with Dave Thompson (Dave's laeding)
> that will have an ENSO-factored out (and COWL) global T series.]
>
>
> We're (with the Met Office) extending the press release
> due to the silly coverage in mid-December about global warming
> ending, as all years since 1998 are cooler than it. Mostly this
> was by people just parrotting the same message from the same
> people. It is a case of people who should know better (and check
> their sources) just copying from people who don't know any
> better.
>
> Oh - forgot - Happy New Year!
>
> Any pictures on the IPCC web site of Oslo on Dec 10 !
>
> Patchy is on the front cover of the last issue of the 2007 in Nature.
>
> Cheers
> Phil
>
>
> Susan
>> Not me. Phil has been involved in various stuff related to this but I
>> am not up to speed. I'll cc him.
>> I recall some exchanges a while ago now.
>> Kevin
>>
>> Susan Solomon wrote:
>>> Kevin
>>> Happy new year to you. All's well here. Have you or other
>>> colleagues organized a rebuttal to the McKitrick and Michaels JGR 2007
>>> material on urbanization? It's getting exposure, along with the
>>> Douglass et al. paper. On the latter, you probably know Ben Santer is
>>> preparing one.
>>> best
>>> Susan
>>
>> --
>> ****************
>> Kevin E. Trenberth e-mail: trenbert@xxxxxxxxx.xxx
>> Climate Analysis Section, www.cgd.ucar.edu/cas/trenbert.html
>> NCAR
>> P. O. Box 3000, (3xxx xxxx xxxx
>> Boulder, CO 80xxx xxxx xxxx (3xxx xxxx xxxx(fax)
>>
>> Street address: 1850 Table Mesa Drive, Boulder, CO 80305
>>
>>
>>

</x-flowed>

Original Filename: 1199325151.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: "Thomas.R.Karl" <Thomas.R.Karl@xxxxxxxxx.xxx>
Subject: Re: More significance testing stuff
Date: Wed, 02 Jan 2008 20:52:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx
Cc: John.Lanzante@xxxxxxxxx.xxx, carl mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Sherwood Steven <steven.sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <Susan.Solomon@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Gavin Schmidt <gschmidt@xxxxxxxxx.xxx>

<x-flowed>
Dear Tom,

In the end, I decided to test the significance of trends in the O(t)
minus M(t) difference time series, as you and John Lanzante have
suggested. I still think that this "difference series test" is more
appropriate when one is operating on a pair of time series with
correlated variability (for example, if you wished to test whether an
observed tropical T2LT trend was significantly different from the T2LT
trend simulated in an AMIP experiment). But you and John convinced me
that our response to Douglass et al. would be strengthened by using
several different approaches to address the statistical significance of
differences between modeled and observed temperature trends.

The Tables given below show the results from two different types of
test. You've already seen the "TYPE1" or "PAIRED TREND" results. These
involve b{O} and b{M}, which represent any single pair of Observed and
Modeled trends, with standard errors s{bO} and s{bM} (which are adjusted
for temporal autocorrelation effects). As in our previous work (and as
in related work by John Lanzante), we define the normalized trend
difference d as:

d1 = (b{O} - b{M}) / sqrt[ (s{bO})**2 + (s{bM})**2 ]

Under the assumption that d1 is normally distributed, values of d1 >
+1.96 or < -1.96 indicate observed-minus-model trend differences that
are significant at the 5% level, and one can easily calculate a p-value
for each value of d. These p-values for the 98 pairs of trend tests (49
involving UAH data and 49 involving RSS data) are what we use for
determining the total number of "hits", or rejections of the null
hypothesis of no significant difference between modeled and observed
trends. I note that each test is two-tailed, since we have no
information a priori about the "direction" of the model trend (i.e.,
whether we expect the simulated trend to be significantly larger or
smaller than observed).

The "TYPE2" results are the "DIFFERENCE SERIES" tests. These involve
O(t) and M(t), which represent any single pair of modeled and observed
layer-averaged temperature time series. One first defines the difference
time series D(t) = O(t) - M(t), and then calculates the trend b{D} in
D(t) and its adjusted standard error, s{bD}. The test statistic is then
simply d2 = b{D} / s{bD}. As in the case of the "PAIRED TREND" tests, we
assume that d2 is normally distributed, and then calculate p-values for
the 98 pairs of difference series tests.

As I mentioned in a previous email, the interpretation of the
"DIFFERENCE SERIES" tests is a little complicated. Over half (35) of the
49 model simulations examined in the CCSP report include some form of
volcanic forcing. In these 35 cases, differencing the O(t) and M(t) time
series reduces the amplitude of this externally-forced component in
D(t). This will tend to reduce the overall temporal variability of D(t),
and hence reduce s{bD}, the standard error of the trend in D(t). Such
noise reduction should make it easier to identify true differences in
the anthropogenically-forced components of b{O} and b{D}. But since the
internally-generated variability in O(t) and M(t) is uncorrelated,
differencing O(t) and M(t) has the opposite effect of amplifying the
noise, thus inflating s{bD} and making it more difficult to identify
model-versus-observed trend differences.

The results given below show that the "PAIRED TREND" and "DIFFERENCE
SERIES" tests yield very similar rejection rates of the null hypothesis.
The bottom line is that, regardless of which test we use, which
significance level we stipulate, which observational dataset we use, or
which atmospheric layer we focus on, there is no evidence to support
Douglass et al.'s assertion that all "UAH and RSS satellite trends are
inconsistent with model results".

REJECTION RATES FOR STIPULATED 5% SIGNIFICANCE LEVEL
Test type No. of tests T2 "Hits" T2LT "Hits"
1. OBS-vs-MODEL (TYPExxx xxxx xxxxx xxx xxxx xxxx(xxx xxxx xxxx(2.04%xxx xxxx xxxx(1.02%)
2. OBS-vs-MODEL (TYPExxx xxxx xxxxx xxx xxxx xxxx(xxx xxxx xxxx(2.04%xxx xxxx xxxx(2.04%)

REJECTION RATES FOR STIPULATED 10% SIGNIFICANCE LEVEL
Test type No. of tests T2 "Hits" T2LT "Hits"
1. OBS-vs-MODEL (TYPExxx xxxx xxxxx xxx xxxx xxxx(xxx xxxx xxxx(4.08%xxx xxxx xxxx(2.04%)
2. OBS-vs-MODEL (TYPExxx xxxx xxxxx xxx xxxx xxxx(xxx xxxx xxxx(3.06%xxx xxxx xxxx(3.06%)

REJECTION RATES FOR STIPULATED 20% SIGNIFICANCE LEVEL
Test type No. of tests T2 "Hits" T2LT "Hits"
1. OBS-vs-MODEL (TYPExxx xxxx xxxxx xxx xxxx xxxx(xxx xxxx xxxx(7.14%xxx xxxx xxxx(5.10%)
2. OBS-vs-MODEL (TYPExxx xxxx xxxxx xxx xxxx xxxx(xxx xxxx xxxx(10.20%xxx xxxx xxxx(7.14%)

As I've mentioned in previous emails, I think it's a little tricky to
figure out the null distribution of rejection rates - i.e., the
distribution that might be expected by chance alone. My gut feeling is
that this is easiest to do by generating distributions of the d1 and d2
statistics using model control run data only. Use of Monte Carlo
procedures gets into issues of whether one should use "block
resampling", and attempt to preserve the characteristic decorrelation
times of the model and observational data being tested, etc., etc.

Thanks very much to all of you for your advice and comments. I still
believe that there is considerable merit in a brief response to Douglass
et al. I think this could be done relatively quickly. From my
perspective, this response should highlight four issues:

1) It should identify the flaws in the statistical approach used by
Douglass et al. to compare modeled and observed trends.

2) It should do the significance testing properly, and report on the
results of "PAIRED TREND" and "DIFFERENCE SERIES" tests.

3) It should show something similar to the figure that Leo recently
distributed (i.e., zonal-mean trend profiles in various versions of the
RAOBCORE data), and highlight the fact that the structural uncertainty
in sonde-based estimates of tropospheric temperature change is much
larger than was claimed in Douglass et al.

4) It should note and discuss the considerable body of "complementary
evidence" supporting the finding that the tropical lower troposphere has
warmed over the satellite era.

With best regards,

Ben




Thomas.R.Karl wrote:
> Thanks Ben,
>
> You have been busy! I sent Tom an email before reading the last
> paragraph of this note. Recognizing the "random" placement of ENSO in
> the models and volcanic effects (in a few) and the known impact of the
> occurrence of these events on the trends, I think it is appropriate that
> the noise and related uncertainty about the trend differences be
> increased. Amplifying the noise could be argued as an appropriate
> conservative approach, since we know that these events are confounding
> our efforts to see differences between models and obs w/r to greenhouse
> forcing.
>
> I know it is more work, but I think it does make sense to calculate
> O(1)-M(1), O(2)-M(2) .... O(n)-M(n) for all combinations of observed
> data sets and model simulations. You could test for significance by
> using a Monte Carlo bootstrap approach by randomizing the years for both
> models and data.
>
> Regards, Tom
>
>
> Ben Santer said the following on 12/26/2007 9:50 PM:
>> Dear John,
>>
>> Thanks for your email. As usual, your comments were constructive and
>> thought-provoking. I've tried to do some of the additional tests that
>> you suggested, and will report on the results below.
>>
>> But first, let's have a brief recap. As discussed in my previous
>> emails, I've tested the significance of differences between trends in
>> observed MSU time series and the trends in synthetic MSU temperatures
>> in a multi-model "ensemble of opportunity". The "ensemble of
>> opportunity" comprises results from 49 realizations of the CMIP-3
>> "20c3m" experiment, performed with 19 different A/OGCMs. This is the
>> same ensemble that was analyzed in Chapter 5 of the CCSP Synthesis and
>> Assessment Product 1.1.
>> I've used observational results from two different groups (RSS and
>> UAH). From each group, we have results for both T2 and T2LT. This
>> yields a total of 196 different tests of the significance of
>> observed-versus-model trend differences (2 observational datasets x 2
>> layer-averaged temperatures x 49 realizations of the 20c3m
>> experiment). Thus far, I've tested the significance of trend
>> differences using T2 and T2LT data spatially averaged over oceans only
>> (both 20N-20S and 30N-30S), as well as over land and ocean (20N-20S).
>> All results described below focus on the land and ocean results, which
>> facilitates a direct comparison with Douglass et al.
>>
>> Here was the information that I sent you on Dec. 14th:
>>
>> COMBINED LAND/OCEAN RESULTS (WITH STANDARD ERRORS ADJUSTED FOR
>> TEMPORAL AUTOCORRELATION EFFECTS; SPATIAL AVERAGES OVER 20N-20S;
>> ANALYSIS PERIOD 1979 TO 1999)
>>
>> T2LT tests, RSS observational data: 0 out of 49 model-versus-observed
>> trend differences are significant at the 5% level.
>> T2LT tests, UAH observational data: 1 out of 49 model-versus-observed
>> trend differences are significant at the 5% level.
>>
>> T2 tests, RSS observational data: 1 out of 49 model-versus-observed
>> trend differences are significant at the 5% level.
>> T2 tests, UAH observational data: 1 out of 49 model-versus-observed
>> trend differences are significant at the 5% level.
>>
>> In other words, at a stipulated significance level of 5% (for a
>> two-tailed test), we rejected the null hypothesis of "No significant
>> difference between observed and simulated tropospheric temperature
>> trends" in only 1 out of 98 cases (1.02%) for T2LT and 2 out of 98
>> cases (2.04%) for T2.
>>
>> You asked, John, how we might determine a baseline for judging the
>> likelihood of obtaining the 'observed' rejection rate by chance alone.
>> You suggested use of a bootstrap procedure involving the model data
>> only. In this procedure, one of the 49 20c3m realizations would be
>> selected at random, and would constitute the "surrogate observations".
>> The remaining 48 members would be randomly sampled (with replacement)
>> 49 times. The significance of the difference between the surrogate
>> "observed" trend and the 49 simulated trends would then be assessed.
>> This procedure would be repeated many times, yielding a distribution
>> of rejection rates of the null hypothesis.
>>
>> As you stated in your email, "The actual number of hits, based on the
>> real observations could then be referenced to the Monte Carlo
>> distribution to yield a probability that this could have occurred by
>> chance."
>>
>> One slight problem with your suggested bootstrap approach is that it
>> convolves the trend differences due to internally-generated
>> variability with trend differences arising from inter-model
>> differences in both climate sensitivity and in the forcings applied in
>> the 20c3m experiment. So the distribution of "hits" (as you call it;
>> or "rejection rates" in my terminology) is not the distribution that
>> one might expect due to chance alone.
>>
>> Nevertheless, I thought it would be interesting to generate a
>> distribution of "rejection rates" based on model data only. Rather
>> than implementing the resampling approach that you suggested, I
>> considered all possible combinations of trend pairs involving model
>> data, and performed the paired difference test between the trend in
>> each 20c3m realization and in each of the other 48 realizations. This
>> yields a total of 2352 (49 x 48) non-identical pairs of trend tests
>> (for each layer-averaged temperature time series).
>>
>> Here are the results:
>>
>> T2: At a stipulated 5% significance level, 58 out of 2352 tests
>> involving model data only (2.47%) yielded rejection of the null
>> hypothesis of no significant difference in trend.
>>
>> T2LT: At a stipulated 5% significance level, 32 out of 2352 tests
>> involving model data only (1.36%) yielded rejection of the null
>> hypothesis of no significant difference in trend.
>>
>> For both layer-averaged temperatures, these numbers are slightly
>> larger than the "observed" rejection rates (2.04% for T2 and 1.02% for
>> T2LT). I would conclude from this that the statistical significance of
>> the differences between the observed and simulated MSU tropospheric
>> temperature trends is comparable to the significance of the
>> differences between the simulated 20c3m trends from any two CMIP-3
>> models (with the proviso that the simulated trend differences arise
>> not only from internal variability, but also from inter-model
>> differences in sensitivity and 20th century forcings).
>>
>> Since I was curious, I thought it would be fun to do something a
>> little closer to what you were advocating, John - i.e., to use model
>> data to look at the statistical significance of trend differences that
>> are NOT related to inter-model differences in the 20c3m forcings or in
>> climate sensitivity. I did this in the following way. For each model
>> with multiple 20c3m realizations, I tested each realization against
>> all other (non-identical) realizations of that model - e.g., for a
>> model with an 20c3m ensemble size of 5, there are 20 paired trend
>> tests involving non-identical data. I repeated this procedure for the
>> next model with multiple 20c3m realizations, etc., and accumulated
>> results. In our CCSP report, we had access to 11 models with multiple
>> 20c3m realizations. This yields a total of 124 paired trend tests for
>> each layer-averaged temperature time series of interest.
>>
>> For both T2 and T2LT, NONE of the 124 paired trend tests yielded
>> rejection of the null hypothesis of no significant difference in trend
>> (at a stipulated 5% significance level).
>>
>> You wanted to know, John, whether these rejection rates are sensitive
>> to the stipulated significance level. As per your suggestion, I also
>> calculated rejection rates for a 20% significance level. Below, I've
>> tabulated a comparison of the rejection rates for tests with 5% and
>> 20% significance levels. The two "rows" of "MODEL-vs-MODEL" results
>> correspond to the two cases I've considered above - i.e., tests
>> involving 2352 trend pairs (Row 2) and 124 trend pairs (Row 3). Note
>> that the "OBSERVED-vs-MODEL" row (Row 1) is the combined number of
>> "hits" for 49 tests involving RSS data and 49 tests involving UAH data:
>>
>> REJECTION RATES FOR STIPULATED 5% SIGNIFICANCE LEVEL:
>> Test type No. of tests T2 "Hits" T2LT "Hits"
>>
>> Row 1. OBSERVED-vs-MODEL 49 x xxx xxxx xxxx(2.04%xxx xxxx xxxx(1.02%)
>> Row 2. MODEL-vs-MODEL 2xxx xxxx xxxx(2.47%xxx xxxx xxxx(1.36%)
>> Row 3. MODEL-vs-MODEL xxx xxxx xxxx(0.00%xxx xxxx xxxx(0.00%)
>>
>> REJECTION RATES FOR STIPULATED 20% SIGNIFICANCE LEVEL:
>> Test type No. of tests T2 "Hits" T2LT "Hits"
>>
>> Row 1. OBSERVED-vs-MODEL 49 x xxx xxxx xxxx(7.14%xxx xxxx xxxx(5.10%)
>> Row 2. MODEL-vs-MODEL 2xxx xxxx xxxx(7.48%xxx xxxx xxxx(4.25%)
>> Row 3. MODEL-vs-MODEL xxx xxxx xxxx(6.45%xxx xxxx xxxx(4.84%)
>>
>> So what can we conclude from this?
>>
>> 1) Irrespective of the stipulated significance level (5% or 20%), the
>> differences between the observed and simulated MSU trends are, on
>> average, substantially smaller than we might expect if we were
>> conducting these tests with trends selected from a purely random
>> distribution (i.e., for the "Row 1" results, 2.04 and 1.02% << 5%, and
>> 7.14% and 5.10% << 20%).
>>
>> 2) Why are the rejection rates for the "Row 3" results substantially
>> lower than 5% and 20%? Shouldn't we expect - if we are only testing
>> trend differences between multiple realizations of the same model,
>> rather than trend differences between models - to obtain rejection
>> rates of roughly 5% for the 5% significance tests and 20% for the 20%
>> tests? The answer is clearly "no". The "Row 3" results do not involve
>> tests between samples drawn from a population of randomly-distributed
>> trends! If we were conducting this paired test using randomly-sampled
>> trends from a long control simulation, we would expect (given a
>> sufficiently large sample size) to eventually obtain rejection rates
>> of 5% and 20%. But our "Row 3" results are based on paired samples
>> from individual members of a given model's 20c3m experiment, and thus
>> represent both signal (response to the imposed forcing changes) and
>> noise - not noise alone. The common signal component makes it more
>> difficult to reject the null hypothesis of no significant difference
>> in trend.
>>
>> 3) Your point about sensitivity to the choice of stipulated
>> significance level was well-taken. This is obvious by comparing "Row
>> 3" results in the 5% and 20% test cases.
>>
>> 4) In both the 5% and 20% cases, the rejection rate for paired tests
>> involving model-versus-observed trend differences ("Row 1") is
>> comparable to the rejection rate for tests involving inter-model trend
>> differences ("Row 2") arising from the combined effects of differences
>> in internal variability, sensitivity, and applied forcings. On
>> average, therefore, model versus observed trend differences are not
>> noticeably more significant than the trends between any given pair of
>> CMIP-3 models. [N.B.: This inference is not entirely justified, since,
>> "Row 2" convolves the effects of both inter-model differences and
>> "within model" differences arising from the different manifestations
>> of natural variability superimposed on the signal. We would need a
>> "Row 4", which involves 19 x 18 paired tests of model results, using
>> only one 20c3m realization from each model. I'll generate "Row 4"
>> tomorrow.]
>>
>> John, you also suggested that we might want to look at the statistical
>> significance of trends in time series of differences - e.g., in O(t)
>> minus M(t), or in M1(t) minus M2(t), where "O" denotes observations,
>> and "M" denotes model, and t is an index of time in months. While I've
>> done this in previous work (for example in the Santer et al. 2000 JGR
>> paper, where we were looking at the statistical significance of trend
>> differences between multiple observational upper air temperature
>> datasets), I don't think it's advisable in this particular case. As
>> your email notes, we are dealing here with A/OGCM results in which the
>> phasing of El Ninos and La Ninas (and the effects of ENSO variability
>> on T2 and T2LT) differs from the phasing in the real world. So
>> differencing M(t) from O(t), or M2(t) from M1(t), probably actually
>> amplifies rather than damps noise, particularly in the tropics, where
>> the externally-forced component of M(t) or O(t) over 1979 to 1999 is
>> only a relatively small fraction of the overall variance of the time
>> series. I think this amplification of noise is a disadvantage in
>> assessing whether trends in O(t) and M(t) are significantly different.
>>
>> Anyway, thanks again for your comments and suggestions, John. They
>> gave me a great opportunity to ignore the hundreds of emails that
>> accumulated in my absence, and instead do some science!
>>
>> With best regards,
>>
>> Ben
>>
>> John Lanzante wrote:
>>> Ben,
>>>
>>> Perhaps a resampling test would be appropriate. The tests you have
>>> performed
>>> consist of pairing an observed time series (UAH or RSS MSU) with each
>>> one
>>> of 49 GCM times series from your "ensemble of opportunity". Significance
>>> of the difference between each pair of obs/GCM trends yields a certain
>>> number of "hits".
>>>
>>> To determine a baseline for judging how likely it would be to obtain the
>>> given number of hits one could perform a set of resampling trials by
>>> treating one of the ensemble members as a surrogate observation. For
>>> each
>>> trial, select at random one of the 49 GCM members to be the
>>> "observation".
>>> >From the remaining 48 members draw a bootstrap sample of 49, and
>>> perform
>>> 49 tests, yielding a certain number of "hits". Repeat this many times to
>>> generate a distribution of "hits".
>>>
>>> The actual number of hits, based on the real observations could then be
>>> referenced to the Monte Carlo distribution to yield a probability
>>> that this
>>> could have occurred by chance. The basic idea is to see if the observed
>>> trend is inconsistent with the GCM ensemble of trends.
>>>
>>> There are a couple of additional tweaks that could be applied to your
>>> method.
>>> You are currently computing trends for each of the two time series in
>>> the
>>> pair and assessing the significance of their differences. Why not first
>>> create a difference time series and assess the significance of it's
>>> trend?
>>> The advantage of this is that you would reduce somewhat the
>>> autocorrelation
>>> in the time series and hence the effect of the "degrees of freedom"
>>> adjustment. Since the GCM runs are based on coupled model runs this
>>> differencing would help remove the common externally forced variability,
>>> but not internally forced variability, so the adjustment would still be
>>> needed.
>>>
>>> Another tweak would be to alter the significance level used to assess
>>> differences in trends. Currently you are using the 5% level, which
>>> yields
>>> only a small number of hits. If you made this less stringent you
>>> would get
>>> potentially more weaker hits. But it would all come out in the wash
>>> so to
>>> speak since the number of hits in the Monte Carlo simulations would
>>> increase
>>> as well. I suspect that increasing the number of expected hits would
>>> make the
>>> whole procedure more powerful/efficient in a statistical sense since you
>>> would no longer be dealing with a "rare event". In the current
>>> scheme, using
>>> a 5% level with 49 pairings you have an expected hit rate of 0.05 X
>>> 49 = 2.45.
>>> For example, if instead you used a 20% significance level you would
>>> have an
>>> expected hit rate of 0.20 X 49 = 9.8.
>>>
>>> I hope this helps.
>>>
>>> On an unrelated matter, I'm wondering a bit about the different
>>> versions of
>>> Leo's new radiosonde dataset (RAOBCORE). I was surprised to see that the
>>> latest version has considerably more tropospheric warming than I
>>> recalled
>>> from an earlier version that was written up in JCLI in 2007. I have a
>>> couple of questions that I'd like to ask Leo. One concern is that if
>>> we use
>>> the latest version of RAOBCORE is there a paper that we can reference --
>>> if this is not in a peer-reviewed journal is there a paper in
>>> submission?
>>> The other question is: could you briefly comment on the differences
>>> in methodology used to generate the latest version of RAOBCORE as
>>> compared to the version used in JCLI 2007, and what/when/where did
>>> changes occur to
>>> yield a stronger warming trend?
>>>
>>> Best regards,
>>>
>>> ______John
>>>
>>>
>>>
>>> On Saturday 15 December 2007 12:21 pm, Thomas.R.Karl wrote:
>>>> Thanks Ben,
>>>>
>>>> You have the makings of a nice article.
>>>>
>>>> I note that we would expect to 10 cases that are significantly
>>>> different by chance (based on the 196 tests at the .05 sig level).
>>>> You found 3. With appropriately corrected Leopold I suspect you
>>>> will find there is indeed stat sig. similar trends incl.
>>>> amplification. Setting up the statistical testing should be
>>>> interesting with this many combinations.
>>>>
>>>> Regards, Tom
>>>
>>
>>
>
> --
>
> *Dr. Thomas R. Karl, L.H.D.*
>
> */Director/*//
>
> NOAA’s National Climatic Data Center
>
> Veach-Baley Federal Building
>
> 151 Patton Avenue
>
> Asheville, NC 28xxx xxxx xxxx
>
> Tel: (8xxx xxxx xxxx
>
> Fax: (8xxx xxxx xxxx
>
> Thomas.R.Karl@xxxxxxxxx.xxx <mailto:Thomas.R.Karl@xxxxxxxxx.xxx>
>


--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------
</x-flowed>

Original Filename: 1199458641.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: Phil Jones <p.jones@xxxxxxxxx.xxx>
Subject: Re: Thanks for the photos of Nick !
Date: Fri, 04 Jan 2008 09:57:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx

<x-flowed>
Dear Phil,

I was very sorry to hear of Hannah's health problems. I hope she makes a
speedy recovery. Please give her my best wishes, and tell her that there
is life and love after divorce!

My Mom's cataract surgery did not go very well, and it looks like she
won't be able to drive any longer. Nick and I are best placed to take
care of her, so I'm trying to persuade her to move to California. So
there could be some big changes in our lives in 2008.

Nick has turned into a fine young man. It's going to be tough to see him
leave for college in three and a half years.

I share your frustration about having to devote valuable time to the
rebuttal of crappy papers. Douglass et al. is truly awful. It should
never have been published. Any residual respect I might have had for
John Christy has now vanished. I can't believe that he's a coauthor on
this garbage.

Best wishes to all of you from rainy Livermore,

Ben
Phil Jones wrote:
>
>> Ben,
> Thanks for the card and photos of Nick and your caving exploits
> with Tom and Karl !
> Had a quiet Christmas and New Year. We did get to see Poppy
> at Hannah's house in Deal in Kent. Matthew and Miranda came as well
> along with Ruth's mum - so she saw her great granddaughter.
> We were there as Hannah had to have another cyst removed from around
> her ovary - all is well and she's recovering. Ruth has been with her since
> mid-December. Hannah had an earlier cyst when she was 12, but this time
> they managed to save the ovary. She still needs to see a gynaecologist to
> see if the ovary is still working OK.
> 2007 hasn't been a great year for Hannah, as she has started divorce
> proceedings from her husband (Gordon). They only married in 2005. He
> seemed fine initially, but has had at least 2 affairs.
>
> Keep up the good work on the Douglass et al comment. I'm trying to
> finish
> a few things in the next couple of months. I will comment on drafts if
> you want.
> Susan Solomon is trying to encourage me to respond to this piece of
> rubbish. I'll try and encourage Rasmus Benestad of DNMI to respond. He did
> so last time to a very similar paper in Climate Research. MM don't
> refer to
> that and MM don't use RSS data! Their analysis is flawed anyway, but it
> would
> all go away if they had used RSS instead of UAH!
>
> What gets me is who are the reviewers of these two awful papers. I know
> editors have a hard time finding reviewers, but they must have known that
> both papers were likely awful. It seems that editors (even of these
> two used-to-be OK
> journals) just want more papers.
>
> Sad day - coming in to hear of Bert Bolin's death.
>
> Cheers
> Phil
>
>
>
>
> Prof. Phil Jones
> Climatic Research Unit Telephone +44 xxx xxxx xxxx
> School of Environmental Sciences Fax +44 xxx xxxx xxxx
> University of East Anglia
> Norwich Email p.jones@xxxxxxxxx.xxx
> NR4 7TJ
> UK
> ----------------------------------------------------------------------------
>


--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------
</x-flowed>

Original Filename: 1199466465.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Phil Jones <p.jones@xxxxxxxxx.xxx>
To: "Humphrey, Kathryn (CEOSA)" <kathryn.humphrey@xxxxxxxxx.xxx>, "Stephens, A (Ag)" <A.Stephens@xxxxxxxxx.xxx>
Subject: RE: Questions on the weather generator
Date: Fri Jan 4 12:07:xxx xxxx xxxx
Cc: "David Sexton" <david.sexton@xxxxxxxxx.xxx>, <C.G.Kilsby@xxxxxxxxx.xxx>, "Jenkins, Geoff" <geoff.jenkins@xxxxxxxxx.xxx>

Kathryn,
I did talk to the Metro yesterday - no idea what they used. Maybe a few will
have read it - before copies are tossed around on the tube!
Added Geoff on this email.

Ag has answered the second question. I may come back to that after
trying to answer the first part.
There are two aspects to the WG work we're doing. The first, which I've mentioned
on a number of occasions, is to prove that the perturbation process used with the WG
works. Colin Harpham sent around a load of plots to Chris/Ag/David/Geoff just before
Christmas. I have a rough draft of a paper on this which I sent to Chris yesterday. This
involves the UKCIP08 WG, but is totally independent of the change factors David is
developing for UKCIP08. This uses some earlier HadRM3 model runs. The WG is fit to
10 grid box series across the UK and then perturbed according to the differences between
the future model integrations and the control runs. We then generate future weather and
show that its characteristics are similar to what HadRM3 got directly. This has used
the same change factors (same variables) but from a different set of RCM runs.
The whole purpose of this exercise is to show that the perturbation process works.
The only way we can test this is to use RCM model runs - because they have future
runs with a big climate change. We can't use past weather data as it doesn't have
enough of a climate change. This is validation of the perturbation process.
We can additionally validate the WG using observational data - which we've done
earlier.
Return to Q2. Ag has said how the model variants get chosen. The model variants
used have a variety of ways of being chosen. Let's say we start with the 50th percentile
for rainfall. We select all model variants between 45 and 55%. Then we want temperature
at the 90th percentile. We then do a second selection of the variants already selected
that have temperature changes between 85 and 95%. As we had initially 10,000
variants, the first selection reduced this to a 1000 (as we chose 10% of them). The
second selection reduced this to 100 (as we've again chosen only 10% of them).
Now with these 100 variants, most users will average the change factors (from David)
across these 100. These average change factors (which will approximately be
at the 50% and 90% value for precipitation and temperature respectively) get passed
to the WG. The WG then simulates 100 runs of 30 years - for the already
pre-selected location (small area) and future period.
There are obviously loads of permutations as we will be allowing users to select all
percentile levels (singly for temperature or precipitation) or jointly for both from
5 to 95 % in steps of 5.
The percentile levels can be chosen based on seasons (4) and years (1). If you
select summer say, users will also get the rest of the year - using the change factors
that
go along with those for the selected model variants.
Another possibility is to select one model variant within the chosen percentile bands
and pass these change factors to the WG.
There are other possibilities, but I think we've limited the choices to these two.
The other possibility was a variant (can't think of a better word here - but not
related to the model variants) to the first. As you have 100 chosen model variants
in this example, you could chose one at random or allow each of the 100 WG
integrations to be based on a different one of the model variants. These generated
sequences will likely have greater variability than that based on the average of the
100 or that based on the single model variant.
I think this may open up a can of worms with Ag when he reads it !


Whichever of these are chosen, the use should still run the WG for
xxx xxxx xxxxyear sequences.
I think I've made the last bit on model variant selection complicated
and haven't gone back to look at what Ag has written in the User Guidance.
It ought to tell you how the change factors that the WG needs will get selected.
Cheers
Phil

At 10:07 04/01/2008, Humphrey, Kathryn (CEOSA) wrote:

Hi Ag,

Yes that makes perfect sense in terms of selecting one/several model variant/s, thanks.
I'm still a bit confused about the utility of random sampling though as this won't give
you results for a particular probability level (will it?). I think Phil was going to
get back to me on this as well as the change factors question.

Phil, I liked your quote in the Metro this morning!

Kathryn
___________________________________________________________________________________

From: Stephens, A (Ag) [[1]mailto:A.Stephens@xxxxxxxxx.xxx]
Sent: 04 January 2008 08:56
To: Humphrey, Kathryn (CEOSA)
Cc: Phil Jones; David Sexton; C.G.Kilsby@xxxxxxxxx.xxx
Subject: RE: Questions on the weather generator
Hi Kathryn,

I can comment on your second question. Here is my understanding:

Firstly, users must run a minimum of 100 WG runs regardless of which ones they run. This
is to enforce the use of a "probabilistic" approach.

Selection by model variant will only make sense once a user has produced some runs.
After any run they will have access to the model variant IDs that were used. The use
case that gave rise to us including "selection by model variant ID" was as follows:

1. Person X does some WG runs (sampling by whatever method she chooses).
2. She uses/analyses a set of runs to produce some interesting results.
3. She is keen to do more/different analyses using the model variants that represented
that part of parameter space.
4. She has the list of model variant IDs so she can publish these so that others can use
them or she can re-use them herself in other experiments.
5. Person Y can read about what Person X did and re-produce exactly her results, or use
the same set of interesting model variants for some other experiments.

Does that make sense?

Cheers,

Ag
___________________________________________________________________________________

From: Humphrey, Kathryn (CEOSA) [[2]mailto:kathryn.humphrey@xxxxxxxxx.xxx]
Sent: 03 January 2008 16:58
To: Stephens, A (Ag)
Subject: FW: Questions on the weather generator
______________________________________________
From: Humphrey, Kathryn (CEOSA)
Sent: 03 January 2008 16:55
To: 'Phil Jones'; 'Chris Kilsby'; 'Stephens, Ag'
Subject: Questions on the weather generator
Phil/Chris/Ag,
I'm putting together a "quick and easy" presentation on the UKCIP08 methodology for
Defra officials to give them some idea of how it's all done so they can better
appreciate what's it's potential uses may, and may not, be.
However I'm getting stuck still on some of the WG methodology! Can you help? (I'm not
planning on telling them this level of detail about the WG but am just bothered by the
issues below).
I'm firstly confused about the RCM change factors; are you using these to validate the
WG runs (which I do understand) or to generate them (which I don't as I thought they
were being generated using the data in final PDFs themselves)?
And I'm still confused about the reasons for allowing users to select runs by model
variant. I think by model variant you mean each perturbed version of HadCM3 or other
single model run or emulator result that creates a point in parameter space. Is this
right? If so then I understand why you can't run your WG on all model variants (too
many) so selecting a random sample is a representation of parameter space. But my
initial understand of how the WG works is that you pick a point on the PDF (say 50th
percentile) with a given probability and run the WG for that point. But this doesn't
make sense if you are allowing users to select random/ single model variants seasons
etc. because these won't reflect a particular percentile. Maybe it's the case that you
don't need a particular percentile for whatever use the WG data is for, but if you don't
know, how do you know how likely your WG output is and therefore what to do with the
result in terms of planning?
Apologies for my ignorance and assistance would be gratefully received!
Kind Regards,
Kathryn
Kathryn Humphrey
Climate Change Impacts and Adaptation Team, Defra
Zone 3F Ergon House, Horseferry Road, London, SW1P 3JR
tel 0xxx xxxx xxxxfax 0xxx xxxx xxxx
Department for Environment, Food and Rural Affairs (Defra)

This email and any attachments is intended for the named recipient only.
If you have received it in error you have no authority to use, disclose,
store or copy any of its contents and you should destroy it and inform
the sender.
Whilst this email and associated attachments will have been checked
for known viruses whilst within Defra systems we can accept no
responsibility once it has left our systems.
Communications on Defra's computer systems may be monitored and/or
recorded to secure the effective operation of the system and for other
lawful purposes.

Prof. Phil Jones
Climatic Research Unit Telephone +44 xxx xxxx xxxx
School of Environmental Sciences Fax +44 xxx xxxx xxxx
University of East Anglia
Norwich Email p.jones@xxxxxxxxx.xxx
NR4 7TJ
UK
----------------------------------------------------------------------------

References

1. mailto:A.Stephens@xxxxxxxxx.xxx
2. mailto:kathryn.humphrey@xxxxxxxxx.xxx

Original Filename: 1199926335.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: Tom Wigley <wigley@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, carl mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Dian J. Seidel'" <dian.seidel@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Steven Sherwood <Steven.Sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Gavin Schmidt <gschmidt@xxxxxxxxx.xxx>, "Hack, James J." <jhack@xxxxxxxxx.xxx>
Subject: Update on response to Douglass et al.
Date: Wed, 09 Jan 2008 19:52:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx

<x-flowed>
Dear folks,

I just wanted to update you on my progress in formulating a response to
the Douglass et al. paper in the International Journal of Climatology
(IJC). There have been several developments.

First, I contacted Science to gauge their level of interest in
publishing a response to Douglass et al. I thought it was worthwhile to
"test the water" before devoting a lot of time to the preparation of a
manuscript for submission to Science. I spoke with Jesse Smith, who
handles most of the climate-related papers at Science magazine.

The bottom line is that, while Science is interested in this issue
(particularly since Douglass et al. are casting doubt on the findings of
the 2005 Santer et al. Science paper), Jesse Smith thought it was highly
unlikely that Science would carry a rebuttal of work published in a
different journal (IJC). Regretfully, I agree. Our response to Douglass
et al. does not contain any fundamentally new science - although it does
contain some new and interesting work (see below).

It's an unfortunate situation. Singer is promoting the Douglass et al.
paper as a startling "new scientific evidence", which undercuts the key
conclusions of the IPCC and CCSP Reports. Christy is using the Douglass
et al. paper to argue that his UAH group is uniquely positioned to
perform "hard-nosed" and objective evaluation of model performance, and
that it's dangerous to leave model evaluation in the hands of biased
modelers. Much as I would like to see a high-profile rebuttal of
Douglass et al. in a journal like Science or Nature, it's unlikely that
either journal will publish such a rebuttal.

So what are our options? Personally, I'd vote for GRL. I think that it
is important to publish an expeditious response to the statistical flaws
in Douglass et al. In theory, GRL should be able to give us the desired
fast turnaround time. Would GRL accept our contribution, given that the
Douglass et al. paper was published in IJC? I think they would - we've
done a substantial amount of new work (see below), and can argue, with
some justification, that our contribution is more than just a rebuttal
of Douglass et al.

Why not go for publication of a response in IJC? According to Phil, this
option would probably take too long. I'd be interested to hear any other
thoughts you might have on publication options.

Now to the science (with a lower-case "s"). I'm appending three
candidate Figures for a GRL paper. The first Figure was motivated by
discussions I've had with Karl Taylor and Tom Wigley. It's an attempt to
convey the differences between our method of comparing observed and
simulated trends (panel A) and the approach used by Douglass et al.
(panel B).

In our method, we account for both statistical uncertainties in fitting
least-squares linear trends to noisy, temporally-autocorrelated data and
for the effects of internally-generated variability. As I've described
in previous emails, we compare each of the 49 simulated T2 and T2LT
trends (i.e., the same multi-model ensemble used in our 2005 Science
paper and in the 2006 CCSP Report) with observed T2 and T2LT trends
obtained from the RSS and UAH groups. Our 2-sigma confidence intervals
on the model and observed trends are estimated as in Santer et al.
(2000). [Santer, B.D., T.M.L. Wigley, J.S. Boyle, D.J. Gaffen, J.J.
Hnilo, D. Nychka, D.E. Parker, and K.E. Taylor, 2000: Statistical
significance of trends and trend differences in layer-average
atmospheric temperature time series, J. Geophys. Res., 105, 7xxx xxxx xxxx]

The method that Santer et al. (2000) used to compute "adjusted" trend
confidence intervals accounts for the fact that, after fitting a trend
to T2 or T2LT data, the regression residuals are typically highly
autocorrelated. If this autocorrelation is not accounted for, one could
easily reach incorrect decisions on whether the trend in an individual
time series is significantly different from zero, or whether two time
series have significantly different trends. Santer et al. (2000)
accounted for temporal autocorrelation effects by estimating r{1}, the
lag-1 autocorrelation of the regression residuals, using r{1} to
calculate an effective sample size n{e}, and then using n{e} to
determine an adjusted standard error of the least-squares linear trend.
Panel A of Figure 1 shows the 2-sigma "adjusted" standard errors for
each individual trend. Models with excessively large tropical
variability (like FGOALS-g1.0 and GFDL-CM2.1) have large adjusted
standard errors. Models with coarse-resolution OGCMs and low-amplitude
ENSO variability (like the GISS-AOM) have smaller than observed adjusted
standard errors. Neglect of volcanic forcing (i.e., absence of El
Chichon and Pinatubo-induced temperature variability) can also
contribute to smaller than observed standard errors, as in
CCCma-CGCM3.1(T47).

The dark and light grey bars in Panel A show (respectively) the 1- and
2-sigma standard errors for the RSS T2LT trend. As is visually obvious,
36 of the 49 model trends are within 1 standard error of the RSS trend,
and 47 of the 49 model trends are within 2 standard errors of the RSS
trend.

I've already explained our "paired trend test" procedure for calculating
the statistical significance of the model-versus-observed trend
differences. This involves the normalized trend difference d1:

d1 = (b{O} - b{M}) / sqrt[ (s{bO})**2 + (s{bM})**2 ]

where b{O} and b{M} represent any single pair of Observed and Modeled
trends, with adjusted standard errors s{bO} and s{bM}.

Under the assumption that d1 is normally distributed, values of d1 >
+1.96 or < -1.96 indicate observed-minus-model trend differences that
are significant at some stipulated significance level, and one can
easily calculate a p-value for each value of d1. These p-values for the
98 pairs of trend tests (49 involving UAH data and 49 involving RSS
data) are what we use for determining the total number of "hits", or
rejections of the null hypothesis of no significant difference between
modeled and observed trends. I note that each test is two-tailed, since
we have no information a priori about the "direction" of the model trend
(i.e., whether we expect the simulated trend to be significantly larger
or smaller than observed).

REJECTION RATES FOR "PAIRED TREND TESTS, OBS-vs-MODEL
Stipulated sign. level No. of tests T2 "Hits" T2LT "Hits"
5% 49 x xxx xxxx xxxx(xxx xxxx xxxx(2.04%xxx xxxx xxxx(1.02%)
10% 49 x xxx xxxx xxxx(xxx xxxx xxxx(4.08%xxx xxxx xxxx(2.04%)
15% 49 x xxx xxxx xxxx(xxx xxxx xxxx(7.14%xxx xxxx xxxx(5.10%)

Now consider Panel B of Figure 1. It helps to clarify the differences
between the Douglass et al. comparison of model and observed trends and
our own comparison. The black horizontal line ("Multi-model mean trend")
is the T2LT trend in the 19-model ensemble, calculated from model
ensemble mean trends (the colored symbols). Douglass et al.'s
"consistency criterion", sigma{SE}, is given by:

sigma{SE} = sigma / sqrt(N - 1)

where sigma is the standard deviation of the 19 ensemble-mean trends,
and N is 19. The orange and yellow envelopes denote the 1- and
2-sigma{SE} regions.

Douglass et al. use sigma{SE} to decide whether the multi-model mean
trend is consistent with either of the observed trends. They conclude
that the RSS and UAH trends lie outside of the yellow envelope (the
2-sigma{SE} region), and interpret this as evidence of a fundamental
inconsistency between modeled and observed trends. As noted previously,
Douglass et al. obtain this result because they fail to account for
statistical uncertainty in the estimation of the RSS and UAH trends.
They ignore the statistical error bars on the RSS and UAH trends (which
are shown in Panel A). As is clear from Panel A, the statistical error
bars on the RSS and UAH trends overlap with the Douglass et al.
2-sigma{SE} region. Had Douglass et al. accounted for statistical
uncertainty in estimation of the observed trends, they would have been
unable to conclude that all "UAH and RSS satellite trends are
inconsistent with model trends".

The second Figure plots values of our test statistic (d1) for the
"paired trend test". The grey histogram is based on the values of d1 for
the 49 tests involving the RSS T2LT trend and the simulated T2LT trends
from 20c3m runs. The green histogram is for the 49 paired trend tests
involving model 20c3m data and the UAH T2LT trend. Note that the d1
distribution obtained with the UAH data is negatively skewed. This is
because the numerator of the d1 test statistic is b{O} - b{M}, and the
UAH tropical T2LT trend over 1xxx xxxx xxxxis smaller than most of the model
trends (see Figure 1, panel A).

The colored dots are values of the d1 test statistic for what I referred
to previously as "TYPE2" tests. These tests are limited to the M models
with multiple realizations of the 20c3m experiment. Here, M = 11. For
each of these M models, I performed paired trend tests for all C unique
combinations of trends pairs. For example, for a model with 5
realizations of the 20c3m experiment, like GISS-EH, C = 10. The
significance of trend differences is solely a function of "within-model"
effects (i.e., is related to the different manifestations of natural
internal variability superimposed on the underlying forced response).
There are a total of 62 paired trend tests. Note that the separation of
the colored symbols on the y-axis is for visual display purposes only,
and facilitates the identification of results for individual models.

The clear message from Figure 2 is that the values of d1 arising from
internal variability alone are typically as large as the d1 values
obtained by testing model trends against observational data. The two
negative "outlier" values of d1 for the model-versus-observed trend
tests involve the large positive trend in CCCma-CGCM3.1(T47). If you
have keen eagle eyes, you'll note that the distribution of colored
symbols is slightly skewed to the negative side. If you look at Panel A
of Figure 1, you'll see that this skewness arises from the relatively
small ensemble sizes. Consider results for the 5-member ensemble of
20c3m trends from the MRI-CGCM2.3.2. The trend in realization 1 is close
to zero; trends in realizations 2, 3, 4, and 5 are large, positive, and
vary between 0.27 to 0.37 degrees C/decade. So d1 is markedly negative
for tests involving realization 1 versus realizations 2, 3, 4, and 5. If
we showed non-unique combinations of trend pairs (e.g., realization 2
versus realization 1, as well as 1 versus 2), the distribution of
colored symbols would be symmetric. But I was concerned that we might be
accused of "double counting" if we did this....

The third Figure is the most interesting one. You have not seen this
yet. I decided to examine how the Douglass et al. "consistency test"
behaves with synthetic data. I did this as a function of sample size N,
for N values ranging from 19 (the number of models we used in the CCSP
report) to 100. Consider the N = 19 case first. I generated 19 synthetic
time series using an AR-1 model of the form:

xt(i) = a1 * (xt(i-1) - am) + zt(i) + am

where a1 is the coefficient of the AR-1 model, zt(i) is a
randomly-generated noise term, and am is a mean (set to zero here).
Here, I set a1 to 0.86, close to the lag-1 autocorrelation of the UAH
T2LT anomaly data. The other free parameter is a scaling term which
controls the amplitude of zt(i). I chose this scaling term to yield a
temporal standard deviation of xt(i) that was close to the temporal
standard deviation of the monthly-mean UAH T2LT anomaly data. The
synthetic time series had the same length as the observational and model
data (252 months), and monthly-mean anomalies were calculated in the
same way as we did for observations and models.

For each of these 19 synthetic time series, I first calculated
least-squares linear trends and adjusted standard errors, and then
performed the "paired trends". The test involves all 171 unique pairs of
trends: b{1} versus b{2}, b{1} versus b{3},... b{1} versus b{19}, b{2}
versus b{3}, etc. I then calculate the rejection rates of the null
hypothesis of "no significant difference in trend", for stipulated
significance levels of 5%, 10%, and 20%. This procedure is repeated 1000
times, with 1000 different realizations of 19 synthetic time series. We
can therefore build up a distribution of rejection rates for N = 19, and
then do the same for N = 20, etc.

The "paired trend" results are plotted as the blue lines in Figure 3.
Encouragingly, the percentage rejections of the null hypothesis are
close to the theoretical expectations. The 5% significance tests yield a
rejection rate of a little over 6%; 10% tests have a rejection rate of
over 11%, and 20% tests have a rejection rate of 21%. I'm not quite sure
why this slight positive bias arises. This bias does show some small
sensitivity (1-2%) to choice of the a1 parameter and the scaling term.
Different choices of these parameters can give rejection rates that are
closer to the theoretical expectation. But my parameter choices for the
AR-1 model were guided by the goal of generating synthetic data with
roughly the same autocorrelation and variance properties as the UAH
data, and not by a desire to get as close as I possibly could to the
theoretical rejection rates.

So why is there a small positive bias in the empirically-determined
rejection rates? Perhaps Francis can provide us with some guidance here.
Karl believes that the answer may be partly linked to the skewness of
the empirically-determined rejection rate distributions. For example,
for the N = 19 case, and for 5% tests, values of rejection rates in the
1000-member distribution range from a minimum of 0 to a maximum of 24%,
with a mean value of 6.7% and a median of 6.4%. Clearly, the minimum
value is bounded by zero, but the maximum is not bounded, and in rare
cases, rejection rates can be quite large, and influences the mean. This
inherent skewness must make some contribution to the small positive bias
in rejection rates in the "paired trends" test.

What happens if we naively perform the paired trends test WITHOUT
adjusting the standard errors of the trends for temporal autocorrelation
effects? Results are shown by the black lines in Figure 3. If we ignore
temporal autocorrelation, we get the wrong answer. Rejection rates for
5% tests are 60%!

We did not publish results from any of these synthetic data experiments
in our 2000 JGR paper. In retrospect, this is a bit of a shame, since
Figure 3 nicely shows that the adjustment for temporal autocorrelation
effects works reasonably well, while failure to adjust yields completely
erroneous results.

Now consider the red lines in Figure 3. These are the results of
applying the Douglass et al. "consistency test" to synthetic data.
Again, let's consider the N = 19 case first. I calculate the trends in
all 19 synthetic time series. Let's consider the first of these 19 time
series as the surrogate observations. The trend in this time series,
b{1}, is compared with the mean trend, b{Synth}, computed from the
remaining 18 synthetic time series. The Douglass sigma{SE} is also
computed from these 18 remaining trends. We then form a test statistic
d2 = (b{1} - b{Synth}) / sigma{SE}, and calculate rejection rates for
the null hypothesis of no significant difference between the mean trend
and the trend in the surrogate observations. This procedure is then
repeated with the trend in time series 2 as the surrogate observations,
and b{Synth} and sigma{SE} calculated from time series 1, 3, 4,..19.
This yields 19 different tests of the null hypothesis. Repeat 1,000
times, and build up a distribution of rejection rates, as in the "paired
trends" test.

The results are truly alarming. Application of the Douglass et al.
"consistency test" to synthetic data - data generated with the same
underlying AR-1 model! - leads to rejection of the above-stated null
hypothesis at least 65% of the time (for N = 19, 5% significance tests).
As expected, rejection rates for the Douglass consistency test rise as
N increases. For N = 100, rejection rates for 5% tests are nearly 85%.
As my colleague Jim Boyle succinctly put it when he looked at these
results, "This is a pretty hard test to pass".

I think this nicely illustrates the problems with the statistical
approach used by Douglass et al. If you want to demonstrate that modeled
and observed temperature trends are fundamentally inconsistent, you
devise a fundamentally flawed test is very difficult to pass.

I hope to have a first draft of this stuff written up by the end of next
week. If Leo is agreeable, Figure 4 of this GRL paper would show the
vertical profiles of tropical temperature trends in the various versions
of the RAOBCORE data, plus model results.

Sorry to bore you with all the gory details. But as we've seen from
Douglass et al., details matter.

With best regards,

Ben
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------


</x-flowed>

Attachment Converted: "c:eudoraattachsanter_fig01.pdf"

Attachment Converted: "c:eudoraattachsanter_fig02.pdf"

Attachment Converted: "c:eudoraattachsanter_fig03.pdf"

Original Filename: 1199972428.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: dian.seidel@xxxxxxxxx.xxx
To: santer1@xxxxxxxxx.xxx
Subject: Re: Update on response to Douglass et al.
Date: Thu, 10 Jan 2008 08:40:xxx xxxx xxxx
Cc: Tom Wigley <wigley@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, carl mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, Melissa Free <Melissa.Free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>, Steven Sherwood <Steven.Sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>, "Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Gavin Schmidt <gschmidt@xxxxxxxxx.xxx>, "Hack, James J." <jhack@xxxxxxxxx.xxx>

Dear Ben,

Thank you for this detailed update of your work. A few thoughts for
your consideration ...

Where to submit this: Although I understand your and Phil's
reluctance to try IJC, it seems to me that, despite the new work
presented, this is really a comment on Douglass et al. and so rightly
belongs in IJC. If you suspect the review and publication process
there is unacceptably long, perhaps this should be confirmed by
inquiring with the editor, as a professional courtesy. Decide in
advance what you'd consider a reasonable turn-around time, and if the
editor says it will take longer, going with another journal makes
sense.

Figures: They look great. As usual, you've done a super job telling
the story in pictures. One suggestion would be to indicate in Fig. 3
which test, or trio of tests, is the most appropriate. Now it is shown
as the blue curves, but I'd suggest making these black (and the black
ones blue) and thicker than the rest. That way those readers who just
skim the paper and look at the figures will get the message quickly.

Observations: Have you considered including results from HadAT and
RATPAC as well as RAOBCOR? For even greater completeness, a version
of RATPAC pared down based on the results of Randel and Wu could be
added, as could Steve Sherwood's adjusted radiosonde data. I'd
suggest adding results from these datasets to your Fig. 1, not the
planned Fig 4, which I gather is meant to show the differences in
versions of RAOBCOR and the impact of Douglass et al.'s choice to use
and early version.

With best wishes,
Dian

----- Original Message -----
From: Ben Santer <santer1@xxxxxxxxx.xxx>
Date: Wednesday, January 9, 2008 10:52 pm
Subject: Update on response to Douglass et al.

> Dear folks,
>
> I just wanted to update you on my progress in formulating a
> response to
> the Douglass et al. paper in the International Journal of
> Climatology
> (IJC). There have been several developments.
>
> First, I contacted Science to gauge their level of interest in
> publishing a response to Douglass et al. I thought it was
> worthwhile to
> "test the water" before devoting a lot of time to the preparation
> of a
> manuscript for submission to Science. I spoke with Jesse Smith,
> who
> handles most of the climate-related papers at Science magazine.
>
> The bottom line is that, while Science is interested in this issue
> (particularly since Douglass et al. are casting doubt on the
> findings of
> the 2005 Santer et al. Science paper), Jesse Smith thought it was
> highly
> unlikely that Science would carry a rebuttal of work published in
> a
> different journal (IJC). Regretfully, I agree. Our response to
> Douglass
> et al. does not contain any fundamentally new science - although
> it does
> contain some new and interesting work (see below).
>
> It's an unfortunate situation. Singer is promoting the Douglass et
> al.
> paper as a startling "new scientific evidence", which undercuts
> the key
> conclusions of the IPCC and CCSP Reports. Christy is using the
> Douglass
> et al. paper to argue that his UAH group is uniquely positioned to
> perform "hard-nosed" and objective evaluation of model
> performance, and
> that it's dangerous to leave model evaluation in the hands of
> biased
> modelers. Much as I would like to see a high-profile rebuttal of
> Douglass et al. in a journal like Science or Nature, it's unlikely
> that
> either journal will publish such a rebuttal.
>
> So what are our options? Personally, I'd vote for GRL. I think
> that it
> is important to publish an expeditious response to the statistical
> flaws
> in Douglass et al. In theory, GRL should be able to give us the
> desired
> fast turnaround time. Would GRL accept our contribution, given
> that the
> Douglass et al. paper was published in IJC? I think they would -
> we've
> done a substantial amount of new work (see below), and can argue,
> with
> some justification, that our contribution is more than just a
> rebuttal
> of Douglass et al.
>
> Why not go for publication of a response in IJC? According to
> Phil, this
> option would probably take too long. I'd be interested to hear any
> other
> thoughts you might have on publication options.
>
> Now to the science (with a lower-case "s"). I'm appending three
> candidate Figures for a GRL paper. The first Figure was motivated
> by
> discussions I've had with Karl Taylor and Tom Wigley. It's an
> attempt to
> convey the differences between our method of comparing observed
> and
> simulated trends (panel A) and the approach used by Douglass et
> al.
> (panel B).
>
> In our method, we account for both statistical uncertainties in
> fitting
> least-squares linear trends to noisy, temporally-autocorrelated
> data and
> for the effects of internally-generated variability. As I've
> described
> in previous emails, we compare each of the 49 simulated T2 and
> T2LT
> trends (i.e., the same multi-model ensemble used in our 2005
> Science
> paper and in the 2006 CCSP Report) with observed T2 and T2LT
> trends
> obtained from the RSS and UAH groups. Our 2-sigma confidence
> intervals
> on the model and observed trends are estimated as in Santer et al.
> (2000). [Santer, B.D., T.M.L. Wigley, J.S. Boyle, D.J. Gaffen,
> J.J.
> Hnilo, D. Nychka, D.E. Parker, and K.E. Taylor, 2000: Statistical
> significance of trends and trend differences in layer-average
> atmospheric temperature time series, J. Geophys. Res., 105, 7337-
7356]
>
> The method that Santer et al. (2000) used to compute "adjusted"
> trend
> confidence intervals accounts for the fact that, after fitting a
> trend
> to T2 or T2LT data, the regression residuals are typically highly
> autocorrelated. If this autocorrelation is not accounted for, one
> could
> easily reach incorrect decisions on whether the trend in an
> individual
> time series is significantly different from zero, or whether two
> time
> series have significantly different trends. Santer et al. (2000)
> accounted for temporal autocorrelation effects by estimating r{1},
> the
> lag-1 autocorrelation of the regression residuals, using r{1} to
> calculate an effective sample size n{e}, and then using n{e} to
> determine an adjusted standard error of the least-squares linear
> trend.
> Panel A of Figure 1 shows the 2-sigma "adjusted" standard errors
> for
> each individual trend. Models with excessively large tropical
> variability (like FGOALS-g1.0 and GFDL-CM2.1) have large adjusted
> standard errors. Models with coarse-resolution OGCMs and low-
> amplitude
> ENSO variability (like the GISS-AOM) have smaller than observed
> adjusted
> standard errors. Neglect of volcanic forcing (i.e., absence of El
> Chichon and Pinatubo-induced temperature variability) can also
> contribute to smaller than observed standard errors, as in
> CCCma-CGCM3.1(T47).
>
> The dark and light grey bars in Panel A show (respectively) the 1-
> and
> 2-sigma standard errors for the RSS T2LT trend. As is visually
> obvious,
> 36 of the 49 model trends are within 1 standard error of the RSS
> trend,
> and 47 of the 49 model trends are within 2 standard errors of the
> RSS
> trend.
>
> I've already explained our "paired trend test" procedure for
> calculating
> the statistical significance of the model-versus-observed trend
> differences. This involves the normalized trend difference d1:
>
> d1 = (b{O} - b{M}) / sqrt[ (s{bO})**2 + (s{bM})**2 ]
>
> where b{O} and b{M} represent any single pair of Observed and
> Modeled
> trends, with adjusted standard errors s{bO} and s{bM}.
>
> Under the assumption that d1 is normally distributed, values of d1
> >
> +1.96 or < -1.96 indicate observed-minus-model trend differences
> that
> are significant at some stipulated significance level, and one can
> easily calculate a p-value for each value of d1. These p-values
> for the
> 98 pairs of trend tests (49 involving UAH data and 49 involving
> RSS
> data) are what we use for determining the total number of "hits",
> or
> rejections of the null hypothesis of no significant difference
> between
> modeled and observed trends. I note that each test is two-tailed,
> since
> we have no information a priori about the "direction" of the model
> trend
> (i.e., whether we expect the simulated trend to be significantly
> larger
> or smaller than observed).
>
> REJECTION RATES FOR "PAIRED TREND TESTS, OBS-vs-MODEL
> Stipulated sign. level No. of tests T2 "Hits" T2LT
> "Hits" 5% 49 x xxx xxxx xxxx(xxx xxxx xxxx(2.04%xxx xxxx xxxx
> 1 (1.02%)
> 10% 49 x xxx xxxx xxxx(xxx xxxx xxxx(4.08%xxx xxxx xxxx
> (2.04%)15% 49 x xxx xxxx xxxx(xxx xxxx xxxx(7.14%xxx xxxx xxxx
> 5 (5.10%)
>
> Now consider Panel B of Figure 1. It helps to clarify the
> differences
> between the Douglass et al. comparison of model and observed
> trends and
> our own comparison. The black horizontal line ("Multi-model mean
> trend")
> is the T2LT trend in the 19-model ensemble, calculated from model
> ensemble mean trends (the colored symbols). Douglass et al.'s
> "consistency criterion", sigma{SE}, is given by:
>
> sigma{SE} = sigma / sqrt(N - 1)
>
> where sigma is the standard deviation of the 19 ensemble-mean
> trends,
> and N is 19. The orange and yellow envelopes denote the 1- and
> 2-sigma{SE} regions.
>
> Douglass et al. use sigma{SE} to decide whether the multi-model
> mean
> trend is consistent with either of the observed trends. They
> conclude
> that the RSS and UAH trends lie outside of the yellow envelope
> (the
> 2-sigma{SE} region), and interpret this as evidence of a
> fundamental
> inconsistency between modeled and observed trends. As noted
> previously,
> Douglass et al. obtain this result because they fail to account
> for
> statistical uncertainty in the estimation of the RSS and UAH
> trends.
> They ignore the statistical error bars on the RSS and UAH trends
> (which
> are shown in Panel A). As is clear from Panel A, the statistical
> error
> bars on the RSS and UAH trends overlap with the Douglass et al.
> 2-sigma{SE} region. Had Douglass et al. accounted for statistical
> uncertainty in estimation of the observed trends, they would have
> been
> unable to conclude that all "UAH and RSS satellite trends are
> inconsistent with model trends".
>
> The second Figure plots values of our test statistic (d1) for the
> "paired trend test". The grey histogram is based on the values of
> d1 for
> the 49 tests involving the RSS T2LT trend and the simulated T2LT
> trends
> from 20c3m runs. The green histogram is for the 49 paired trend
> tests
> involving model 20c3m data and the UAH T2LT trend. Note that the
> d1
> distribution obtained with the UAH data is negatively skewed. This
> is
> because the numerator of the d1 test statistic is b{O} - b{M}, and
> the
> UAH tropical T2LT trend over 1xxx xxxx xxxxis smaller than most of the
> model
> trends (see Figure 1, panel A).
>
> The colored dots are values of the d1 test statistic for what I
> referred
> to previously as "TYPE2" tests. These tests are limited to the M
> models
> with multiple realizations of the 20c3m experiment. Here, M = 11.
> For
> each of these M models, I performed paired trend tests for all C
> unique
> combinations of trends pairs. For example, for a model with 5
> realizations of the 20c3m experiment, like GISS-EH, C = 10. The
> significance of trend differences is solely a function of "within-
> model"
> effects (i.e., is related to the different manifestations of
> natural
> internal variability superimposed on the underlying forced
> response).
> There are a total of 62 paired trend tests. Note that the
> separation of
> the colored symbols on the y-axis is for visual display purposes
> only,
> and facilitates the identification of results for individual models.
>
> The clear message from Figure 2 is that the values of d1 arising
> from
> internal variability alone are typically as large as the d1 values
> obtained by testing model trends against observational data. The
> two
> negative "outlier" values of d1 for the model-versus-observed
> trend
> tests involve the large positive trend in CCCma-CGCM3.1(T47). If
> you
> have keen eagle eyes, you'll note that the distribution of colored
> symbols is slightly skewed to the negative side. If you look at
> Panel A
> of Figure 1, you'll see that this skewness arises from the
> relatively
> small ensemble sizes. Consider results for the 5-member ensemble
> of
> 20c3m trends from the MRI-CGCM2.3.2. The trend in realization 1 is
> close
> to zero; trends in realizations 2, 3, 4, and 5 are large,
> positive, and
> vary between 0.27 to 0.37 degrees C/decade. So d1 is markedly
> negative
> for tests involving realization 1 versus realizations 2, 3, 4, and
> 5. If
> we showed non-unique combinations of trend pairs (e.g.,
> realization 2
> versus realization 1, as well as 1 versus 2), the distribution of
> colored symbols would be symmetric. But I was concerned that we
> might be
> accused of "double counting" if we did this....
>
> The third Figure is the most interesting one. You have not seen
> this
> yet. I decided to examine how the Douglass et al. "consistency
> test"
> behaves with synthetic data. I did this as a function of sample
> size N,
> for N values ranging from 19 (the number of models we used in the
> CCSP
> report) to 100. Consider the N = 19 case first. I generated 19
> synthetic
> time series using an AR-1 model of the form:
>
> xt(i) = a1 * (xt(i-1) - am) + zt(i) + am
>
> where a1 is the coefficient of the AR-1 model, zt(i) is a
> randomly-generated noise term, and am is a mean (set to zero
> here).
> Here, I set a1 to 0.86, close to the lag-1 autocorrelation of the
> UAH
> T2LT anomaly data. The other free parameter is a scaling term
> which
> controls the amplitude of zt(i). I chose this scaling term to
> yield a
> temporal standard deviation of xt(i) that was close to the
> temporal
> standard deviation of the monthly-mean UAH T2LT anomaly data. The
> synthetic time series had the same length as the observational and
> model
> data (252 months), and monthly-mean anomalies were calculated in
> the
> same way as we did for observations and models.
>
> For each of these 19 synthetic time series, I first calculated
> least-squares linear trends and adjusted standard errors, and then
> performed the "paired trends". The test involves all 171 unique
> pairs of
> trends: b{1} versus b{2}, b{1} versus b{3},... b{1} versus b{19},
> b{2}
> versus b{3}, etc. I then calculate the rejection rates of the null
> hypothesis of "no significant difference in trend", for stipulated
> significance levels of 5%, 10%, and 20%. This procedure is
> repeated 1000
> times, with 1000 different realizations of 19 synthetic time
> series. We
> can therefore build up a distribution of rejection rates for N =
> 19, and
> then do the same for N = 20, etc.
>
> The "paired trend" results are plotted as the blue lines in Figure
> 3.
> Encouragingly, the percentage rejections of the null hypothesis
> are
> close to the theoretical expectations. The 5% significance tests
> yield a
> rejection rate of a little over 6%; 10% tests have a rejection
> rate of
> over 11%, and 20% tests have a rejection rate of 21%. I'm not
> quite sure
> why this slight positive bias arises. This bias does show some
> small
> sensitivity (1-2%) to choice of the a1 parameter and the scaling
> term.
> Different choices of these parameters can give rejection rates
> that are
> closer to the theoretical expectation. But my parameter choices
> for the
> AR-1 model were guided by the goal of generating synthetic data
> with
> roughly the same autocorrelation and variance properties as the
> UAH
> data, and not by a desire to get as close as I possibly could to
> the
> theoretical rejection rates.
>
> So why is there a small positive bias in the empirically-
> determined
> rejection rates? Perhaps Francis can provide us with some guidance
> here.
> Karl believes that the answer may be partly linked to the skewness
> of
> the empirically-determined rejection rate distributions. For
> example,
> for the N = 19 case, and for 5% tests, values of rejection rates
> in the
> 1000-member distribution range from a minimum of 0 to a maximum of
> 24%,
> with a mean value of 6.7% and a median of 6.4%. Clearly, the
> minimum
> value is bounded by zero, but the maximum is not bounded, and in
> rare
> cases, rejection rates can be quite large, and influences the
> mean. This
> inherent skewness must make some contribution to the small
> positive bias
> in rejection rates in the "paired trends" test.
>
> What happens if we naively perform the paired trends test WITHOUT
> adjusting the standard errors of the trends for temporal
> autocorrelation
> effects? Results are shown by the black lines in Figure 3. If we
> ignore
> temporal autocorrelation, we get the wrong answer. Rejection rates
> for
> 5% tests are 60%!
>
> We did not publish results from any of these synthetic data
> experiments
> in our 2000 JGR paper. In retrospect, this is a bit of a shame,
> since
> Figure 3 nicely shows that the adjustment for temporal
> autocorrelation
> effects works reasonably well, while failure to adjust yields
> completely
> erroneous results.
>
> Now consider the red lines in Figure 3. These are the results of
> applying the Douglass et al. "consistency test" to synthetic data.
> Again, let's consider the N = 19 case first. I calculate the
> trends in
> all 19 synthetic time series. Let's consider the first of these 19
> time
> series as the surrogate observations. The trend in this time
> series,
> b{1}, is compared with the mean trend, b{Synth}, computed from the
> remaining 18 synthetic time series. The Douglass sigma{SE} is also
> computed from these 18 remaining trends. We then form a test
> statistic
> d2 = (b{1} - b{Synth}) / sigma{SE}, and calculate rejection rates
> for
> the null hypothesis of no significant difference between the mean
> trend
> and the trend in the surrogate observations. This procedure is
> then
> repeated with the trend in time series 2 as the surrogate
> observations,
> and b{Synth} and sigma{SE} calculated from time series 1, 3,
> 4,..19.
> This yields 19 different tests of the null hypothesis. Repeat
> 1,000
> times, and build up a distribution of rejection rates, as in the
> "paired
> trends" test.
>
> The results are truly alarming. Application of the Douglass et al.
> "consistency test" to synthetic data - data generated with the
> same
> underlying AR-1 model! - leads to rejection of the above-stated
> null
> hypothesis at least 65% of the time (for N = 19, 5% significance
> tests).
> As expected, rejection rates for the Douglass consistency test
> rise as
> N increases. For N = 100, rejection rates for 5% tests are nearly
> 85%.
> As my colleague Jim Boyle succinctly put it when he looked at
> these
> results, "This is a pretty hard test to pass".
>
> I think this nicely illustrates the problems with the statistical
> approach used by Douglass et al. If you want to demonstrate that
> modeled
> and observed temperature trends are fundamentally inconsistent,
> you
> devise a fundamentally flawed test is very difficult to pass.
>
> I hope to have a first draft of this stuff written up by the end
> of next
> week. If Leo is agreeable, Figure 4 of this GRL paper would show
> the
> vertical profiles of tropical temperature trends in the various
> versions
> of the RAOBCORE data, plus model results.
>
> Sorry to bore you with all the gory details. But as we've seen
> from
> Douglass et al., details matter.
>
> With best regards,
>
> Ben
> -------------------------------------------------------------------
> ---------
> Benjamin D. Santer
> Program for Climate Model Diagnosis and Intercomparison
> Lawrence Livermore National Laboratory
> P.O. Box 808, Mail Stop L-103
> Livermore, CA 94550, U.S.A.
> Tel: (9xxx xxxx xxxx
> FAX: (9xxx xxxx xxxx
> email: santer1@xxxxxxxxx.xxx
> -------------------------------------------------------------------
> ---------
>
>

Original Filename: 1199984805.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: Phil Jones <p.jones@xxxxxxxxx.xxx>
Subject: Re: [Fwd: Re: John Christy's latest ideas]
Date: Thu, 10 Jan 2008 12:06:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx

<x-flowed>
Dear Phil,

If you get a chance, could you call me up at work xxx xxxx xxxx) to
talk about the "IJC publication" option? I'd really like to discuss that
with you.

With best regards,

Ben
Phil Jones wrote:
>
> Ben,
> Almost said something about this in the main email about the diagrams!
> Other emails and a couple of phone calls distracting me - have to make
> sure
> I'm sending the right email to the right list/person!
> He's clearly biased, but he gets an audience unfortunately. There are
> enough people out there who think we're wrong to cause me to worry at
> times.
> I'd like the world to warm up quicker, but if it did, I know that
> the sensitivity
> is much higher and humanity would be in a real mess!
>
> I'm getting people misinterpreting my comment that went along with
> Chris Folland's press release about the 2008 forecast. It says we're
> warming at 0.2 degC/decade and that is exactly what we should be.
> The individual years don't matter.
>
> CA are now to send out FOIA requests for the Review Editor comments
> on the AR4 Chapters. For some reason they think they exist!
>
> Cheers
> Phil
>
>
> At 16:52 09/01/2008, you wrote:
>> Dear Phil,
>>
>> I can't believe John is now arguing that he's the only guy who can
>> provide unbiased assessments of model performance. After all the
>> mistakes he's made with MSU, and after the Douglass et al. fiasco, he
>> should have acquired a little humility. But I guess "humility" isn't
>> in his dictionary...
>>
>> With best regards,
>>
>> Ben
>> Phil Jones wrote:
>>> Ben,
>>> I'll give up on trying to catch him on the road to Damascus -
>>> he's beyond redemption.
>>> Glad to see that someone's rejected something he's written.
>>> Jim Hack's good, so I'm confident he won't be fooled.
>>> Cheers
>>> Phil
>>>
>>> At 17:28 07/01/2008, you wrote:
>>>> Dear Phil,
>>>>
>>>> More Christy stuff... The guy is just incredible...
>>>>
>>>> With best regards,
>>>>
>>>> Ben
>>>> ----------------------------------------------------------------------------
>>>>
>>>> Benjamin D. Santer
>>>> Program for Climate Model Diagnosis and Intercomparison
>>>> Lawrence Livermore National Laboratory
>>>> P.O. Box 808, Mail Stop L-103
>>>> Livermore, CA 94550, U.S.A.
>>>> Tel: (9xxx xxxx xxxx
>>>> FAX: (9xxx xxxx xxxx
>>>> email: santer1@xxxxxxxxx.xxx
>>>> ----------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> X-Account-Key: account1
>>>> Return-Path: <santer1@xxxxxxxxx.xxx>
>>>> Received: from mail-2.llnl.gov ([unix socket])
>>>> by mail-2.llnl.gov (Cyrus v2.2.12) with LMTPA;
>>>> Mon, 07 Jan 2008 09:00:xxx xxxx xxxx
>>>> Received: from nspiron-2.llnl.gov (nspiron-2.llnl.gov [128.115.41.82])
>>>> by mail-2.llnl.gov (8.13.1/8.12.3/LLNL evision: 1.6 $) with
>>>> ESMTP id m07H0edp031523;
>>>> Mon, 7 Jan 2008 09:00:xxx xxxx xxxx
>>>> X-Attachments: None
>>>> X-IronPort-AV: E=McAfee;i="5100,188,5200"; a="5944377"
>>>> X-IronPort-AV: E=Sophos;i="4.24,254,1196668800";
>>>> d="scan'208";a="5944377"
>>>> Received: from dione.llnl.gov (HELO [128.115.57.29]) ([128.115.57.29])
>>>> by nspiron-2.llnl.gov with ESMTP; 07 Jan 2008 09:00:xxx xxxx xxxx
>>>> Message-ID: <47825AB8.5000608@xxxxxxxxx.xxx>
>>>> Date: Mon, 07 Jan 2008 09:00:xxx xxxx xxxx
>>>> From: Ben Santer <santer1@xxxxxxxxx.xxx>
>>>> Reply-To: santer1@xxxxxxxxx.xxx
>>>> Organization: LLNL
>>>> User-Agent: Thunderbird 1.5.0.12 (X11/20070529)
>>>> MIME-Version: 1.0
>>>> To: "Hack, James J." <jhack@xxxxxxxxx.xxx>
>>>> Subject: Re: John Christy's latest ideas
>>>> References:
>>>> <537C6C0940C6C143AA46A88946B854170B9FAF74@xxxxxxxxx.xxx>
>>>> In-Reply-To:
>>>> <537C6C0940C6C143AA46A88946B854170B9FAF74@xxxxxxxxx.xxx>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>> Content-Transfer-Encoding: 7bit
>>>>
>>>> Dear Jim,
>>>>
>>>> I'm well aware of this paper, and am currently preparing a reply
>>>> (together with many others who were involved in the first CCSP
>>>> report). To put it bluntly, the Douglass paper is a piece of
>>>> worthless garbage. It has serious statistical flaws. Christy should
>>>> be ashamed that he's a co-author on this. His letter to Dr. Strayer
>>>> is deplorable and offensive. For over a decade, Christy has
>>>> portrayed himself as the only guy who is smart enough to develop
>>>> climate-quality data records from MSU. Recently, he's also portrayed
>>>> himself as the only guy who's smart enough to develop
>>>> climate-quality data records from radiosonde data. And now he's the
>>>> only scientist who is capable of performing "hard-nosed",
>>>> independent assessments of climate model performance.
>>>>
>>>> John Christy has made a scientific career out of being wrong. He's
>>>> not even a third-rate scientist. I'd be happy to discuss Christy's
>>>> "unique ways of validating climate models" with you.
>>>>
>>>> With best regards,
>>>>
>>>> Ben
>>>> Hack, James J. wrote:
>>>>> Dear Ben,
>>>>>
>>>>> Happy New Year. Hope all is well. I was wondering if you're
>>>>> familiar with the attached paper? I thought that you had recently
>>>>> published something that concludes something quite different. Is
>>>>> that right? If yes, could you forward me a copy? And, any
>>>>> comments are also welcome.
>>>>> He's coming to ORNL next week to under the premise that he has some
>>>>> unique ways to validate climate models (this time with regard to
>>>>> the lower thermodynamic structure). I'd be happy to chat with you
>>>>> about this as well if you would like. I'm appending what I know to
>>>>> the bottom of this note.
>>>>>
>>>>> Best regards ...
>>>>>
>>>>> Jim
>>>>>
>>>>> James J. Hack Director, National Center for Computational Sciences
>>>>> Oak Ridge National Laboratory
>>>>> One Bethel Valley Road
>>>>> P.O. Box 2008, MS-6008
>>>>> Oak Ridge, TN 37xxx xxxx xxxx
>>>>>
>>>>> email: jhack@xxxxxxxxx.xxx <mailto:jhack@xxxxxxxxx.xxx>
>>>>> voice: xxx xxxx xxxx
>>>>> fax: xxx xxxx xxxx
>>>>> cell: xxx xxxx xxxx
>>>>>
>>>>>
>>>>>> >> -----Original Message-----
>>>>>> >> From: John Christy [_mailto:john.christy@xxxxxxxxx.xxx_]
>>>>>> >> Sent: Tuesday, October 23, 2007 9:16 AM
>>>>>> >> To: Strayer, Michael
>>>>>> >> Cc: Salmon, Jeffrey
>>>>>> >> Subject: Climate Model Evaluation
>>>>>> >>
>>>>>> >> Dr. Strayer:
>>>>>> >>
>>>>>> >> Jeff Salmon is aware of a project we at UAHuntsville believe is
>>>>>> >> vital and that you may provide a way to see it accomplished.
>>>>>> As you
>>>>>> >> know, our nation's energy and climate change policies are being
>>>>>> >> driven by output from global climate models. However, there has
>>>>>> >> never been a true "red team" assessment of these model
>>>>>> projections
>>>>>> >> in the way other government programs are subjected to hard-nosed,
>>>>>> >> independent evaluations. To date, most of the "evaluation" of
>>>>>> these
>>>>>> >> models has been left in the hands of the climate modelers
>>>>>> >> themselves. This has the potential of biasing the entire process.
>>>>>> >>
>>>>>> >> It is often a climate modeler's claim (and promoted in IPCC
>>>>>> >> documents - see attached) that the models must be correct because
>>>>>> >> the global surface
>>>>>> >> temperature variations since 1850 are reproduced (somewhat) by
>>>>>> the
>>>>>> >> models when run in hindcast mode. However, this is not a
>>>>>> scientific
>>>>>> >> experiment for the simple reason that every climate modeler
>>>>>> saw the
>>>>>> >> answer ahead of time. It is terribly easy to get the right answer
>>>>>> >> for the wrong reason, especially if you already know the answer.
>>>>>> >>
>>>>>> >> A legitimate experiment is to test the models' output against
>>>>>> >> variables to which modelers did not have access ... a true blind
>>>>>> >> test of the models.
>>>>>> >>
>>>>>> >> I have proposed and have had rejected a model evaluation
>>>>>> project to
>>>>>> >> DOE based on the utilization of global datasets we build here at
>>>>>> >> UAH. We have published many of these datasets (most are
>>>>>> >> satellite-based) which document the complexity of the climate
>>>>>> >> system and which we think models should replicate in some way,
>>>>>> and
>>>>>> >> to aid in model development where shortcomings are found.
>>>>>> These are
>>>>>> >> datasets of quantities that modelers in general were not aware of
>>>>>> >> when doing model testing. We have performed
>>>>>> >> a few of these tests and have found models reveal serious
>>>>>> >> shortcomings in some of the most fundamental aspects of energy
>>>>>> >> distribution. We believe a rigorous test of climate models is in
>>>>>> >> order as the congress starts considering energy reduction
>>>>>> >> strategies which can have significant consequences on our
>>>>>> economy.
>>>>>> >> Below is an abstract of a retooled proposal I am working on.
>>>>>> >>
>>>>>> >> If you see a possible avenue for research along these lines,
>>>>>> please
>>>>>> >> let me know. Too, we have been considering some type of
>>>>>> partnership
>>>>>> >> with Oakridge since the facility is nearby, and this may be a way
>>>>>> >> to do that.
>>>>>> >>
>>>>>> >> John C.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> Understanding the vertical energy distribution of the Earth's
>>>>> atmosphere
>>>>>> >> and its expression in global climate model simulations
>>>>>> >>
>>>>>> >> John R. Christy, P.I., University of Alabama in Huntsville
>>>>>> >>
>>>>>> >> Abstract
>>>>>> >>
>>>>>> >> Sets of independent observations indicate, unexpectedly, that the
>>>>>> >> warming of the tropical atmosphere since 1978 is proceeding at a
>>>>>> >> rate much less than that anticipated from climate model
>>>>>> simulations.
>>>>>> >> Specifically, while the surface has warmed, the lower troposphere
>>>>>> >> has experienced less warming. In contrast, all climate models we
>>>>>> >> and others have examined indicate the lower tropical atmosphere
>>>>>> >> should be warming at a rate 1.2 to 1.5 times greater than the
>>>>>> >> surface when forced with increasing greenhouse gases within the
>>>>>> >> context of other observed forcings (the so-called "negative lapse
>>>>>> >> rate feedback".) We propose to diagnose this curious phenomenon
>>>>>> >> with several satellite-based datasets to document its relation to
>>>>>> >> other climate variables. We shall do the same for climate model
>>>>>> >> output of the same simulated variables. This will
>>>>>> >> enable us to propose an integrated conceptual framework of the
>>>>>> >> phenomenon for further testing. Tied in with this research are
>>>>> potential
>>>>>> >> answers to fundamental questions such as the following: (1) In
>>>>>> >> response to increasing surface temperatures, is the lower
>>>>>> >> atmosphere reconfiguring the way heat energy is transported which
>>>>>> >> allows for an increasing amount of heat to more freely escape to
>>>>>> >> space? (2) Could there be a natural thermostatic effect in the
>>>>>> >> climate system which acts in a different way than parameterized
>>>>>> >> convective-adjustment schemes dependent upon current
>>>>>> assumptions of
>>>>>> >> heat deposition and retention? (3)
>>>>>> >> If observed atmospheric heat retention is considerably less than
>>>>>> >> model projections, what impact will lower retention rates have on
>>>>>> >> anticipated increases in surface temperatures in the 21st
>>>>>> century?
>>>>>> >>
>>>>
>>>>
>>>> --
>>>> ----------------------------------------------------------------------------
>>>>
>>>> Benjamin D. Santer
>>>> Program for Climate Model Diagnosis and Intercomparison
>>>> Lawrence Livermore National Laboratory
>>>> P.O. Box 808, Mail Stop L-103
>>>> Livermore, CA 94550, U.S.A.
>>>> Tel: (9xxx xxxx xxxx
>>>> FAX: (9xxx xxxx xxxx
>>>> email: santer1@xxxxxxxxx.xxx
>>>> ----------------------------------------------------------------------------
>>>>
>>> Prof. Phil Jones
>>> Climatic Research Unit Telephone +44 xxx xxxx xxxx
>>> School of Environmental Sciences Fax +44 xxx xxxx xxxx
>>> University of East Anglia
>>> Norwich Email p.jones@xxxxxxxxx.xxx
>>> NR4 7TJ
>>> UK
>>> ----------------------------------------------------------------------------
>>>
>>
>>
>> --
>> ----------------------------------------------------------------------------
>>
>> Benjamin D. Santer
>> Program for Climate Model Diagnosis and Intercomparison
>> Lawrence Livermore National Laboratory
>> P.O. Box 808, Mail Stop L-103
>> Livermore, CA 94550, U.S.A.
>> Tel: (9xxx xxxx xxxx
>> FAX: (9xxx xxxx xxxx
>> email: santer1@xxxxxxxxx.xxx
>> ----------------------------------------------------------------------------
>>
>
> Prof. Phil Jones
> Climatic Research Unit Telephone +44 xxx xxxx xxxx
> School of Environmental Sciences Fax +44 xxx xxxx xxxx
> University of East Anglia
> Norwich Email p.jones@xxxxxxxxx.xxx
> NR4 7TJ
> UK
> ----------------------------------------------------------------------------
>


--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------
</x-flowed>

Original Filename: 1199988028.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: Tim Osborn <t.osborn@xxxxxxxxx.xxx>
Subject: Re: Update on response to Douglass et al.
Date: Thu, 10 Jan 2008 13:00:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx
Cc: "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>

<x-flowed>
Dear Tim,

Thanks very much for your email. I greatly appreciate the additional
information that you've given me. I am a bit conflicted about what we
should do.

IJC published a paper with egregious statistical errors. Douglass et al.
was essentially a commentary on work by myself and colleagues - work
that had been previously published in Science in 2005 and in Chapter 5
of the first U.S. CCSP Report in 2006. To my knowledge, none of the
authors or co-authors of the Santer et al. Science paper or of CCSP 1.1
Chapter 5 were used as reviewers of Douglass et al. I am assuming that,
when he submitted his paper to IJC, Douglass specifically requested that
certain scientists should be excluded from the review process. Such an
approach is not defensible for a paper which is largely a comment on
previously-published work.

It would be fair and reasonable to give IJC the opportunity to "set the
record straight", and correct the harm they have done by publication of
Douglass et al. I use the word "harm" advisedly. The author and
coauthors of the Douglass et al. IJC paper are using this paper to argue
that "Nature, not CO2, rules the climate", and that the findings of
Douglass et al. invalidate the "discernible human influence" conclusions
of previous national and international scientific assessments.

Quick publication of a response to Douglass et al. in IJC would go some
way towards setting the record straight. I am troubled, however, by the
very real possibility that Douglass et al. will have the last word on
this subject. In my opinion (based on many years of interaction with
these guys), neither Douglass, Christy or Singer are capable of
admitting that their paper contained serious scientific errors. Their
"last word" will be an attempt to obfuscate rather than illuminate. They
are not interested in improving our scientific understanding of the
nature and causes of recent changes in atmospheric temperature. They are
solely interested in advancing their own agendas. It is telling and
troubling that Douglass et al. ignored radiosonde data showing
substantial warming of the tropical troposphere - data that were in
accord with model results - even though such data were in their
possession. Such behaviour constitutes intellectual dishonesty. I
strongly believe that leaving these guys the last word is inherently unfair.

If IJC are interested in publishing our contribution, I believe it's
fair to ask for the following:

1) Our paper should be regarded as an independent contribution, not as a
comment on Douglass et al. This seems reasonable given i) The
substantial amount of new work that we have done; and ii) The fact that
the Douglass et al. paper was not regarded as a comment on Santer et al.
(2005), or on Chapter 5 of the 2006 CCSP Report - even though Douglass
et al. clearly WAS a comment on these two publications.

2) If IJC agrees to 1), then Douglass et al. should have the opportunity
to respond to our contribution, and we should be given the chance to
reply. Any response and reply should be published side-by-side, in the
same issue of IJC.

I'd be grateful if you and Phil could provide me with some guidance on
1) and 2), and on whether you think we should submit to IJC. Feel free
to forward my email to Glenn McGregor.

With best regards,

Ben
Tim Osborn wrote:
> At 03:52 10/01/2008, Ben Santer wrote:
>> ...Much as I would like to see a high-profile rebuttal of Douglass et
>> al. in a journal like Science or Nature, it's unlikely that either
>> journal will publish such a rebuttal.
>>
>> So what are our options? Personally, I'd vote for GRL. I think that it
>> is important to publish an expeditious response to the statistical
>> flaws in Douglass et al. In theory, GRL should be able to give us the
>> desired fast turnaround time...
>>
>> Why not go for publication of a response in IJC? According to Phil,
>> this option would probably take too long. I'd be interested to hear
>> any other thoughts you might have on publication options.
>
> Hi Ben and Phil,
>
> as you may know (Phil certainly knows), I'm on the editorial board of
> IJC. Phil is right that it can be rather slow (though faster than
> certain other climate journals!). Nevertheless, IJC really is the
> preferred place to publish (though a downside is that Douglass et al.
> may have the opportunity to have a response considered to accompany any
> comment).
>
> I just contacted the editor, Glenn McGregor, to see what he can do. He
> promises to do everything he can to achieve a quick turn-around time (he
> didn't quantify this) and he will also "ask (the publishers) for
> priority in terms of getting the paper online asap after the authors
> have received proofs". He genuinely seems keen to correct the
> scientific record as quickly as possible.
>
> He also said (and please treat this in confidence, which is why I
> emailed to you and Phil only) that he may be able to hold back the
> hardcopy (i.e. the print/paper version) appearance of Douglass et al.,
> possibly so that any accepted Santer et al. comment could appear
> alongside it. Presumably depends on speed of the review process.
>
> If this does persuade you to go with IJC, Glenn suggested that I could
> help (because he is in Kathmandu at present) with achieving the quick
> turn-around time by identifying in advance reviewers who are both
> suitable and available. Obviously one reviewer could be someone who is
> already familiar with this discussion, because that would enable a fast
> review - i.e., someone on the email list you've been using - though I
> don't know which of these people you will be asking to be co-authors and
> hence which won't be available as possible reviewers. For objectivity
> the other reviewer would need to be independent, but you could still
> suggest suitable names.
>
> Well, that's my thoughts... let me know what you decide.
>
> Cheers
>
> Tim
>
>
> Dr Timothy J Osborn, Academic Fellow
> Climatic Research Unit
> School of Environmental Sciences
> University of East Anglia
> Norwich NR4 7TJ, UK
>
> e-mail: t.osborn@xxxxxxxxx.xxx
> phone: xxx xxxx xxxx
> fax: xxx xxxx xxxx
> web: http://www.cru.uea.ac.uk/~timo/
> sunclock: http://www.cru.uea.ac.uk/~timo/sunclock.htm
>


--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------
</x-flowed>

Original Filename: 1199994210.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Peter Thorne <peter.thorne@xxxxxxxxx.xxx>
To: Dian Seidel <dian.seidel@xxxxxxxxx.xxx>
Subject: Dian, something like this?
Date: Thu, 10 Jan 2008 14:43:30 +0000
Cc: Ben Santer <santer1@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, Carl Mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, Melissa Free <melissa.free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, Phil Jones <p.jones@xxxxxxxxx.xxx>, Steve Sherwood <Steven.Sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Gavin Schmidt <gschmidt@xxxxxxxxx.xxx>, "Hack, James J." <jhack@xxxxxxxxx.xxx>

All,

as it happens I am preparing a figure precisely as Dian suggested. This
has only been possible due to substantial efforts by Leo in particular,
but all the other dataset providers also. I wanted to give a feel for
where we are at although I want to tidy this substantially if we were to
use it. To do this I've taken every single scrap of info I have in my
possession that has a status of at least submitted to a journal. I have
considered the common period of 1xxx xxxx xxxx. So, assuming you are all
sitting comfortably:

Grey shading is a little cheat from Santer et al using a trusty ruler.
See Figure 3.B in this paper, take the absolute range of model scaling
factors at each of the heights on the y-axis and apply this scaling to
HadCRUT3 tropical mean trend denoted by the star at the surface. So, if
we assume HadCRUT3 is correct then we are aiming for the grey shading or
not depending upon one's pre-conceived notion as to whether the models
are correct.

Red is HadAT2 dataset.

black dashed is the raw data used in Titchner et al. submitted (all
tropical stations with a xxx xxxx xxxxclimatology)

Black whiskers are median, inter-quartile range and max / min from
Titchner et al. submission. We know, from complex error-world
assessments, that the median under-cooks the required adjustment here
and that the truth may conceivably lie (well) outside the upper limit.

Bright green is RATPAC

Then, and the averaging and trend calculation has been done by Leo here
and not me so any final version I'd want to get the raw gridded data and
do it exactly the same way. But for the raw raobs data that Leo provided
as a sanity check it seems to make a miniscule (<0.05K/decade even at
height) difference:

Lime green: RICH (RAOBCORE 1.4 breaks, neighbour based adjustment
estimates)

Solid purple: RAOBCORE 1.2
Dotted purple: RAOBCORE 1.3
Dashed purple: RAOBCORE 1.4

I am also in possession of Steve's submitted IUK dataset and will be
adding this trend line shortly.

I'll be adding a legend in the large white space bottom left.

My take home is that all datasets are heading the right way and that
this reduces the probability of a discrepancy. Compare this with Santer
et al. Figure 3.B.

I'll be using this in an internal report anyway but am quite happy for
it to be used in this context too if that is the general feeling. Or for
Leo's to be used. Whatever people prefer.

Peter
--
Peter Thorne Climate Research Scientist
Met Office Hadley Centre, FitzRoy Road, Exeter, EX1 3PB
tel. xxx xxxx xxxxfax xxx xxxx xxxx
www.metoffice.gov.uk/hadobs


Attachment Converted: "c:eudoraattachtrend_profiles_dogs_dinner.png"

Original Filename: 1199999668.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Phil Jones <p.jones@xxxxxxxxx.xxx>
To: santer1@xxxxxxxxx.xxx
Subject: An issue/problem with Tim's idea !!!!!!!
Date: Thu Jan 10 16:14:xxx xxxx xxxx

Ben,
Tim's idea is a possibility. I've not always got on that well great
with Glenn McGregor, but Tim seems to have a reasonable rapport
with him. Dian has suggested that this would be the best route - it
is the logical one. I also think that Glenn would get quick reviews, as
Tim thinks he realises he's made a mistake.
Tim has let me into part of secret. Glenn said the paper had two
reviews - one positive, the other said it wasn't great, but would leave it
up to the editor's discretion. This is why Glenn knows he made the wrong
choice.
The problem !! The person who said they would leave it to the editor's
discretion is on your email list! I don't know who it is - Tim does -
maybe they have told you? I don't want to put pressure on Tim. He
doesn't know I'm sending this. It isn't me by the way - nor Tim !
Tim said it was someone who hasn't contributed to the discussion -
which does narrow the possibilities down!
Tim/Glenn discussed getting quick reviews. Whoever this person
is they could be the familiar reviewer - and we could then come up
with another reasonable name (Kevin - he does everything at the
speed of light) as the two reviewers.
Colour in IJC costs a bit, but I'm sure we can lean on Glenn.
Also we can just have colour in the pdf.
I'll now send a few thoughts on the figures!
Cheers
Phil
Tom Wigley <wigley@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>,
Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>,
John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, carl mears <mears@xxxxxxxxx.xxx>,
"David C. Bader" <bader2@xxxxxxxxx.xxx>,
"'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>,
Frank Wentz <frank.wentz@xxxxxxxxx.xxx>,
Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>,
Melissa Free <Melissa.Free@xxxxxxxxx.xxx>,
"Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>,
"'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>,
Steven Sherwood <Steven.Sherwood@xxxxxxxxx.xxx>,
Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>,
"Thorne, Peter" <peter.thorne@xxxxxxxxx.xxx>,
Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Gavin Schmidt <gschmidt@xxxxxxxxx.xxx>,
"Hack, James J." <jhack@xxxxxxxxx.xxx>

X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Thu, 10 Jan 2008 13:00:39 +0000
To: santer1@xxxxxxxxx.xxx,"'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>
From: Tim Osborn <t.osborn@xxxxxxxxx.xxx>
Subject: Re: Update on response to Douglass et al.
At 03:52 10/01/2008, Ben Santer wrote:

...Much as I would like to see a high-profile rebuttal of Douglass et al. in a journal
like Science or Nature, it's unlikely that either journal will publish such a rebuttal.
So what are our options? Personally, I'd vote for GRL. I think that it is important to
publish an expeditious response to the statistical flaws in Douglass et al. In theory,
GRL should be able to give us the desired fast turnaround time...
Why not go for publication of a response in IJC? According to Phil, this option would
probably take too long. I'd be interested to hear any other thoughts you might have on
publication options.

Hi Ben and Phil,
as you may know (Phil certainly knows), I'm on the editorial board of IJC. Phil is
right that it can be rather slow (though faster than certain other climate journals!).
Nevertheless, IJC really is the preferred place to publish (though a downside is that
Douglass et al. may have the opportunity to have a response considered to accompany any
comment).
I just contacted the editor, Glenn McGregor, to see what he can do. He promises to do
everything he can to achieve a quick turn-around time (he didn't quantify this) and he
will also "ask (the publishers) for priority in terms of getting the paper online asap
after the authors have received proofs". He genuinely seems keen to correct the
scientific record as quickly as possible.
He also said (and please treat this in confidence, which is why I emailed to you and
Phil only) that he may be able to hold back the hardcopy (i.e. the print/paper version)
appearance of Douglass et al., possibly so that any accepted Santer et al. comment could
appear alongside it. Presumably depends on speed of the review process.
If this does persuade you to go with IJC, Glenn suggested that I could help (because he
is in Kathmandu at present) with achieving the quick turn-around time by identifying in
advance reviewers who are both suitable and available. Obviously one reviewer could be
someone who is already familiar with this discussion, because that would enable a fast
review - i.e., someone on the email list you've been using - though I don't know which
of these people you will be asking to be co-authors and hence which won't be available
as possible reviewers. For objectivity the other reviewer would need to be independent,
but you could still suggest suitable names.
Well, that's my thoughts... let me know what you decide.
Cheers
Tim
Dr Timothy J Osborn, Academic Fellow
Climatic Research Unit
School of Environmental Sciences
University of East Anglia
Norwich NR4 7TJ, UK
e-mail: t.osborn@xxxxxxxxx.xxx
phone: xxx xxxx xxxx
fax: xxx xxxx xxxx
web: [1]http://www.cru.uea.ac.uk/~timo/
sunclock: [2]http://www.cru.uea.ac.uk/~timo/sunclock.htm

Prof. Phil Jones
Climatic Research Unit Telephone +44 xxx xxxx xxxx
School of Environmental Sciences Fax +44 xxx xxxx xxxx
University of East Anglia
Norwich Email p.jones@xxxxxxxxx.xxx
NR4 7TJ
UK
----------------------------------------------------------------------------

References

1. http://www.cru.uea.ac.uk/~timo/
2. http://www.cru.uea.ac.uk/~timo/sunclock.htm

Original Filename: 1200003656.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Phil Jones <p.jones@xxxxxxxxx.xxx>
To: Peter Thorne <peter.thorne@xxxxxxxxx.xxx>, Dian Seidel <dian.seidel@xxxxxxxxx.xxx>
Subject: Re: Dian, something like this?
Date: Thu Jan 10 17:20:xxx xxxx xxxx
Cc: Ben Santer <santer1@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, Carl Mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>, Melissa Free <melissa.free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, Steve Sherwood <Steven.Sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Gavin Schmidt <gschmidt@xxxxxxxxx.xxx>, "Hack, James J." <jhack@xxxxxxxxx.xxx>

Ben et al,
As Dian has said Ben's diagrams are as usual great! I also like the one
that Peter has just sent around as that illustrates the issue with the
various RAOBCORE versions. Although I still think they should have used
HadCRUT3v for the surface, I know HadCRUT2v shows much the same.
What this figure shows is the differences between the various sonde
datasets. Dian/Peter also make the point that there are other new datasets
to be added - so the sondes are very much still work in progress. I know
you will point out all the analytical/statistical issues see the series
brings home the issues better. I know you could add the values to
your Fig1, a plot like this is much better.
In the email Ben, you seem to have written much of the response!
Whichever route you go down (GRL/IJC) the text can't be too long.
I would favour copious captions, and even an Appendix, to get the
main points across quickly.
Cheers
Phil

At 14:43 10/01/2008, Peter Thorne wrote:

All,
as it happens I am preparing a figure precisely as Dian suggested. This
has only been possible due to substantial efforts by Leo in particular,
but all the other dataset providers also. I wanted to give a feel for
where we are at although I want to tidy this substantially if we were to
use it. To do this I've taken every single scrap of info I have in my
possession that has a status of at least submitted to a journal. I have
considered the common period of 1xxx xxxx xxxx. So, assuming you are all
sitting comfortably:
Grey shading is a little cheat from Santer et al using a trusty ruler.
See Figure 3.B in this paper, take the absolute range of model scaling
factors at each of the heights on the y-axis and apply this scaling to
HadCRUT3 tropical mean trend denoted by the star at the surface. So, if
we assume HadCRUT3 is correct then we are aiming for the grey shading or
not depending upon one's pre-conceived notion as to whether the models
are correct.
Red is HadAT2 dataset.
black dashed is the raw data used in Titchner et al. submitted (all
tropical stations with a xxx xxxx xxxxclimatology)
Black whiskers are median, inter-quartile range and max / min from
Titchner et al. submission. We know, from complex error-world
assessments, that the median under-cooks the required adjustment here
and that the truth may conceivably lie (well) outside the upper limit.
Bright green is RATPAC
Then, and the averaging and trend calculation has been done by Leo here
and not me so any final version I'd want to get the raw gridded data and
do it exactly the same way. But for the raw raobs data that Leo provided
as a sanity check it seems to make a miniscule (<0.05K/decade even at
height) difference:
Lime green: RICH (RAOBCORE 1.4 breaks, neighbour based adjustment
estimates)
Solid purple: RAOBCORE 1.2
Dotted purple: RAOBCORE 1.3
Dashed purple: RAOBCORE 1.4
I am also in possession of Steve's submitted IUK dataset and will be
adding this trend line shortly.
I'll be adding a legend in the large white space bottom left.
My take home is that all datasets are heading the right way and that
this reduces the probability of a discrepancy. Compare this with Santer
et al. Figure 3.B.
I'll be using this in an internal report anyway but am quite happy for
it to be used in this context too if that is the general feeling. Or for
Leo's to be used. Whatever people prefer.
Peter
--
Peter Thorne Climate Research Scientist
Met Office Hadley Centre, FitzRoy Road, Exeter, EX1 3PB
tel. xxx xxxx xxxxfax xxx xxxx xxxx
[1]www.metoffice.gov.uk/hadobs

Prof. Phil Jones
Climatic Research Unit Telephone +44 xxx xxxx xxxx
School of Environmental Sciences Fax +44 xxx xxxx xxxx
University of East Anglia
Norwich Email p.jones@xxxxxxxxx.xxx
NR4 7TJ
UK
----------------------------------------------------------------------------

References

1. http://www.metoffice.gov.uk/hadobs

Original Filename: 1200010023.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: Leopold Haimberger <leopold.haimberger@xxxxxxxxx.xxx>
Subject: Re: Update on response to Douglass et al., Dian, something like this?
Date: Thu, 10 Jan 2008 19:07:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx
Cc: Peter Thorne <peter.thorne@xxxxxxxxx.xxx>, Dian Seidel <dian.seidel@xxxxxxxxx.xxx>, Tom Wigley <wigley@xxxxxxxxx.xxx>, Karl Taylor <taylor13@xxxxxxxxx.xxx>, Thomas R Karl <Thomas.R.Karl@xxxxxxxxx.xxx>, John Lanzante <John.Lanzante@xxxxxxxxx.xxx>, Carl Mears <mears@xxxxxxxxx.xxx>, "David C. Bader" <bader2@xxxxxxxxx.xxx>, "'Francis W. Zwiers'" <francis.zwiers@xxxxxxxxx.xxx>, Frank Wentz <frank.wentz@xxxxxxxxx.xxx>, Melissa Free <melissa.free@xxxxxxxxx.xxx>, "Michael C. MacCracken" <mmaccrac@xxxxxxxxx.xxx>, Phil Jones <p.jones@xxxxxxxxx.xxx>, Steve Sherwood <Steven.Sherwood@xxxxxxxxx.xxx>, Steve Klein <klein21@xxxxxxxxx.xxx>, 'Susan Solomon' <ssolomon@xxxxxxxxx.xxx>, Tim Osborn <t.osborn@xxxxxxxxx.xxx>, Gavin Schmidt <gschmidt@xxxxxxxxx.xxx>, "Hack, James J." <jhack@xxxxxxxxx.xxx>

<x-flowed>
Dear Leo,

Thanks very much for your email. I can easily make the observations a
bit more prominent in Figure 1. As you can see from today's
(voluminous!) email traffic, I've received lots of helpful suggestions
regarding improvements to the Figures. I'll try to produce revised
versions of the Figures tomorrow.

On the autocorrelation issue: The models have a much larger range of
lag-1 autocorrelation coefficients (0.66 to 0.95 for T2LT, and 0.69 to
0.95 for T2) than the UAH or RSS data (which range from 0.87 to 0.89). I
was concerned that if we used the model lag-1 autocorrelations to guide
the choice of AR-1 parameter in the synthetic data analysis, Douglass
and colleagues would have an easy opening for criticising us ("Aha!
Santer et al. are using model results to guide them in their selection
of the coefficients for their AR-1 model!") I felt that it was much more
difficult for Douglass et al. to criticize what we've done if we used
UAH data to dictate our choice of the AR-1 parameter and the "scaling
factor" for the amplitude of the temporal variability.

As you know, my personal preference would be to include in our response
to Douglass et al. something like the Figure 4 that Peter has produced.
While inclusion of a Figure 4 is not essential for the purpose of
illuminating the statistical flaws in the Douglass et al. "consistency
test", such a Figure would clearly show the (currently large) structural
uncertainties in radiosonde-based estimates of the vertical profile of
atmospheric temperature changes. I think this is an important point,
particularly in view of the fact that Douglass et al. failed to discuss
versions 1.3 and 1.4 of your RAOBCORE data - even though they had
information from those datasets in their possession.

However, I fully agree with Tom's comment that we don't want to do
anything to "steal the thunder" from ongoing efforts to improve
sonde-based estimates of atmospheric temperature change, and to better
quantify structural uncertainties in those estimates. Your group,
together with the groups at the Hadley Centre, Yale, NOAA ARL and NOAA
GFDL, deserve great credit for making significant progress on a
difficult, time-consuming, yet important problem.

I guess the best solution is to leave this decision up to all of you
(the radiosonde dataset developers). I'm perfectly happy to include a
version of Figure 4 in our response to Douglass et al. If we do go with
inclusion of a Figure 4, you, Peter, Dian, Melissa, Steve Sherwood and
John should decide whether you feel comfortable providing radiosonde
data for such a Figure. I will gladly abide by your decisions. As you
note in your email, our use of a Figure 4 would not preclude a more
detailed and thorough comparison of simulated and observed amplification
in some later publication.

Once again, thanks for all your help with this project, Leo.

With best regards,

Ben
Leopold Haimberger wrote:
> All,
>
> These three figures are really very clear and leave no doubts that the
> Douglass et al analysis is flawed. This is true especially for Fig. 1.
> In Fig. 1 one has to look carefully to find the RSS and UAH "observed"
> trends to the right of all the model trends. Maybe one can make their
> symbols more prominent.
>
> Concerning Fig. 3 I wonder whether the UAH autocorrelation is the lowest
> of all available data. .86 is quite substantial autocorrelation. Maybe
> it is a good idea to be on the safe side and use the lowest
> autocorrelation of all datasets (models, RSS, UAH) for this analysis.
>
> Concerning Fig. 4, I like Peter's and Dian's idea to include RAOBCORE,
> HadAT2, RATPAC and Steve's data and compare it in one plot with model
> output. While I agree that the first three figures and the corresponding
> text are already sufficient for the reply, they target mainly to the
> right panel of Fig. 1 in Douglass et al's paper. The trend profile plot
> of Fig. 4 is complementary as a counterpart to the left panel of their
> plot. To see the trend amplification in in some of the vertical profiles
> is much more suggestive than seeing the LT trends being larger than
> surface trends, at least for me. Showing all available profiles adds
> value beyond the RAOBCORE v1.2 vs RAOBCORE v1.4 issue. Yes, it is work
> in progress and such a plot as drafted by Peter makes that very clear.
> In this paper it is sufficient to show that the uncertainty of
> radiosonde trends is much larger than suggested by Douglass et al. and
> we do not need to have the final answer yet. I have nothing against
> Peter doing the drawing of the figure, since he has most of the
> necessary data. The plot would be needed for 1xxx xxxx xxxx, however. Peter,
> I will send you the trend profiles for this period a bit later.
>
> Publishing the reply in either IJC or GRL including Fig. 4 is fine for me.
> When we first discussed a follow up of the Santer et al paper in
> October, we had in mind to publish post-FAR climate model data up to
> present (not just 1999) and also new radiosonde data up to present in a
> highest ranking journal. I am confident that this is still possible even
> if some of the new material planned for such a paper is submitted
> already now. What do you think?
>
> With best Regards,
>
> Leo
>
> Peter Thorne wrote:
>> All,
>>
>> as it happens I am preparing a figure precisely as Dian suggested. This
>> has only been possible due to substantial efforts by Leo in particular,
>> but all the other dataset providers also. I wanted to give a feel for
>> where we are at although I want to tidy this substantially if we were to
>> use it. To do this I've taken every single scrap of info I have in my
>> possession that has a status of at least submitted to a journal. I have
>> considered the common period of 1xxx xxxx xxxx. So, assuming you are all
>> sitting comfortably:
>>
>> Grey shading is a little cheat from Santer et al using a trusty ruler.
>> See Figure 3.B in this paper, take the absolute range of model scaling
>> factors at each of the heights on the y-axis and apply this scaling to
>> HadCRUT3 tropical mean trend denoted by the star at the surface. So, if
>> we assume HadCRUT3 is correct then we are aiming for the grey shading or
>> not depending upon one's pre-conceived notion as to whether the models
>> are correct.
>>
>> Red is HadAT2 dataset.
>>
>> black dashed is the raw data used in Titchner et al. submitted (all
>> tropical stations with a xxx xxxx xxxxclimatology)
>>
>> Black whiskers are median, inter-quartile range and max / min from
>> Titchner et al. submission. We know, from complex error-world
>> assessments, that the median under-cooks the required adjustment here
>> and that the truth may conceivably lie (well) outside the upper limit.
>>
>> Bright green is RATPAC
>>
>> Then, and the averaging and trend calculation has been done by Leo here
>> and not me so any final version I'd want to get the raw gridded data and
>> do it exactly the same way. But for the raw raobs data that Leo provided
>> as a sanity check it seems to make a miniscule (<0.05K/decade even at
>> height) difference:
>>
>> Lime green: RICH (RAOBCORE 1.4 breaks, neighbour based adjustment
>> estimates)
>>
>> Solid purple: RAOBCORE 1.2
>> Dotted purple: RAOBCORE 1.3
>> Dashed purple: RAOBCORE 1.4
>>
>> I am also in possession of Steve's submitted IUK dataset and will be
>> adding this trend line shortly.
>>
>> I'll be adding a legend in the large white space bottom left.
>>
>> My take home is that all datasets are heading the right way and that
>> this reduces the probability of a discrepancy. Compare this with Santer
>> et al. Figure 3.B.
>>
>> I'll be using this in an internal report anyway but am quite happy for
>> it to be used in this context too if that is the general feeling. Or for
>> Leo's to be used. Whatever people prefer.
>>
>> Peter
>>
>>
>> ------------------------------------------------------------------------
>>
>


--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------
</x-flowed>

Original Filename: 1200059003.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Ben Santer <santer1@xxxxxxxxx.xxx>
To: Tim Osborn <t.osborn@xxxxxxxxx.xxx>
Subject: Potential reviewers
Date: Fri, 11 Jan 2008 08:43:xxx xxxx xxxx
Reply-to: santer1@xxxxxxxxx.xxx
Cc: "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>

<x-flowed>
Dear Tim,

Here are some suggestions for potential reviewers of a Santer et al.
IJoC submission on issues related to the consistency between modeled and
observed atmospheric temperature trends. None of the suggested reviewers
have been involved in the recent "focus group" that has discussed
problems with the Douglass et al. IJoC paper.

1. Mike Wallace, University of Washington. U.S. National Academy member.
Expert on atmospheric dynamics. Chair of National Academy of Sciences
committee on "Reconciling observations of global temperature change"
(2000). Email: wallace@xxxxxxxxx.xxx

2. Qiang Fu, University of Washington. Expert on atmospheric radiation,
dynamics, radiosonde and satellite data. Published 2004 Nature paper and
2005 GRL paper dealing with issues related to global and tropical
temperature trends. Email: qfu@xxxxxxxxx.xxx

3. Gabi Hegerl, University of Edinburgh. Expert on detection and
attribution of externally-forced climate change. Co-Convening Lead
Author of "Understanding and Attributing Climate Change" chapter of IPCC
Fourth Assessment Report. Email: Gabi.Hegerl@xxxxxxxxx.xxx

4. Jim Hurrell, National Center for Atmospheric Research (NCAR). Former
Director of Climate and Global Dynamics division at NCAR. Expert on
climate modeling, observational data. Published a number of papers on
MSU-related issues. Email: jhurrell@xxxxxxxxx.xxx

5. Myles Allen, Oxford University. Expert in Climate Dynamics, detection
and attribution, application of statistical methods in climatology.
Email: allen@xxxxxxxxx.xxx

6. Peter Stott, Hadley Centre for Climate Prediction and Research.
Expert in climate modeling, detection and attribution. Email:
peter.stott@xxxxxxxxx.xxx

With best regards,

Ben
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------
</x-flowed>

Original Filename: 1200076878.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Tim Osborn <t.osborn@xxxxxxxxx.xxx>
To: santer1@xxxxxxxxx.xxx
Subject: Re: Update on response to Douglass et al.
Date: Fri, 11 Jan 2008 13:41:18 +0000
Cc: "'Philip D. Jones'" <p.jones@xxxxxxxxx.xxx>

<x-flowed>
Hi Ben (cc Phil),

just heard back from Glenn. He's prepared to treat it as a new
submission rather than a comment on Douglass et al. and he also
reiterates that "Needless to say my offer of a quick turn around time
etc still stands".

So basically this makes the IJC option more attractive than if it
were treated as a comment. But whether IJC is still a less
attractive option than GRL is up to you to decide :-) (or feel free
to canvas your potential co-authors [the only thing I didn't want to
make more generally known was the suggestion that print publication
of Douglass et al. might be delayed... all other aspects of this
discussion are unrestricted]).

Cheers

Tim

At 21:00 10/01/2008, Ben Santer wrote:
>Dear Tim,
>
>Thanks very much for your email. I greatly appreciate the additional
>information that you've given me. I am a bit conflicted about what
>we should do.
>
>IJC published a paper with egregious statistical errors. Douglass et
>al. was essentially a commentary on work by myself and colleagues -
>work that had been previously published in Science in 2005 and in
>Chapter 5 of the first U.S. CCSP Report in 2006. To my knowledge,
>none of the authors or co-authors of the Santer et al. Science paper
>or of CCSP 1.1 Chapter 5 were used as reviewers of Douglass et al. I
>am assuming that, when he submitted his paper to IJC, Douglass
>specifically requested that certain scientists should be excluded
>from the review process. Such an approach is not defensible for a
>paper which is largely a comment on previously-published work.
>
>It would be fair and reasonable to give IJC the opportunity to "set
>the record straight", and correct the harm they have done by
>publication of Douglass et al. I use the word "harm" advisedly. The
>author and coauthors of the Douglass et al. IJC paper are using this
>paper to argue that "Nature, not CO2, rules the climate", and that
>the findings of Douglass et al. invalidate the "discernible human
>influence" conclusions of previous national and international
>scientific assessments.
>
>Quick publication of a response to Douglass et al. in IJC would go
>some way towards setting the record straight. I am troubled,
>however, by the very real possibility that Douglass et al. will have
>the last word on this subject. In my opinion (based on many years of
>interaction with these guys), neither Douglass, Christy or Singer
>are capable of admitting that their paper contained serious
>scientific errors. Their "last word" will be an attempt to obfuscate
>rather than illuminate. They are not interested in improving our
>scientific understanding of the nature and causes of recent changes
>in atmospheric temperature. They are solely interested in advancing
>their own agendas. It is telling and troubling that Douglass et al.
>ignored radiosonde data showing substantial warming of the tropical
>troposphere - data that were in accord with model results - even
>though such data were in their possession. Such behaviour
>constitutes intellectual dishonesty. I strongly believe that leaving
>these guys the last word is inherently unfair.
>
>If IJC are interested in publishing our contribution, I believe it's
>fair to ask for the following:
>
>1) Our paper should be regarded as an independent contribution, not
>as a comment on Douglass et al. This seems reasonable given i) The
>substantial amount of new work that we have done; and ii) The fact
>that the Douglass et al. paper was not regarded as a comment on
>Santer et al. (2005), or on Chapter 5 of the 2006 CCSP Report - even
>though Douglass et al. clearly WAS a comment on these two publications.
>
>2) If IJC agrees to 1), then Douglass et al. should have the
>opportunity to respond to our contribution, and we should be given
>the chance to reply. Any response and reply should be published
>side-by-side, in the same issue of IJC.
>
>I'd be grateful if you and Phil could provide me with some guidance
>on 1) and 2), and on whether you think we should submit to IJC. Feel
>free to forward my email to Glenn McGregor.
>
>With best regards,
>
>Ben
>Tim Osborn wrote:
>>At 03:52 10/01/2008, Ben Santer wrote:
>>>...Much as I would like to see a high-profile rebuttal of Douglass
>>>et al. in a journal like Science or Nature, it's unlikely that
>>>either journal will publish such a rebuttal.
>>>
>>>So what are our options? Personally, I'd vote for GRL. I think
>>>that it is important to publish an expeditious response to the
>>>statistical flaws in Douglass et al. In theory, GRL should be able
>>>to give us the desired fast turnaround time...
>>>
>>>Why not go for publication of a response in IJC? According to
>>>Phil, this option would probably take too long. I'd be interested
>>>to hear any other thoughts you might have on publication options.
>>Hi Ben and Phil,
>>as you may know (Phil certainly knows), I'm on the editorial board
>>of IJC. Phil is right that it can be rather slow (though faster
>>than certain other climate journals!). Nevertheless, IJC really is
>>the preferred place to publish (though a downside is that Douglass
>>et al. may have the opportunity to have a response considered to
>>accompany any comment).
>>I just contacted the editor, Glenn McGregor, to see what he can
>>do. He promises to do everything he can to achieve a quick
>>turn-around time (he didn't quantify this) and he will also "ask
>>(the publishers) for priority in terms of getting the paper online
>>asap after the authors have received proofs". He genuinely seems
>>keen to correct the scientific record as quickly as possible.
>>He also said (and please treat this in confidence, which is why I
>>emailed to you and Phil only) that he may be able to hold back the
>>hardcopy (i.e. the print/paper version) appearance of Douglass et
>>al., possibly so that any accepted Santer et al. comment could
>>appear alongside it. Presumably depends on speed of the review process.
>>If this does persuade you to go with IJC, Glenn suggested that I
>>could help (because he is in Kathmandu at present) with achieving
>>the quick turn-around time by identifying in advance reviewers who
>>are both suitable and available. Obviously one reviewer could be
>>someone who is already familiar with this discussion, because that
>>would enable a fast review - i.e., someone on the email list you've
>>been using - though I don't know which of these people you will be
>>asking to be co-authors and hence which won't be available as
>>possible reviewers. For objectivity the other reviewer would need
>>to be independent, but you could still suggest suitable names.
>>Well, that's my thoughts... let me know what you decide.
>>Cheers
>>Tim
>>
>>Dr Timothy J Osborn, Academic Fellow
>>Climatic Research Unit
>>School of Environmental Sciences
>>University of East Anglia
>>Norwich NR4 7TJ, UK
>>e-mail: t.osborn@xxxxxxxxx.xxx
>>phone: xxx xxxx xxxx
>>fax: xxx xxxx xxxx
>>web: http://www.cru.uea.ac.uk/~timo/
>>sunclock: http://www.cru.uea.ac.uk/~timo/sunclock.htm
>
>
>--
>----------------------------------------------------------------------------
>Benjamin D. Santer
>Program for Climate Model Diagnosis and Intercomparison
>Lawrence Livermore National Laboratory
>P.O. Box 808, Mail Stop L-103
>Livermore, CA 94550, U.S.A.
>Tel: (9xxx xxxx xxxx
>FAX: (9xxx xxxx xxxx
>email: santer1@xxxxxxxxx.xxx
>----------------------------------------------------------------------------

Dr Timothy J Osborn, Academic Fellow
Climatic Research Unit
School of Environmental Sciences
University of East Anglia
Norwich NR4 7TJ, UK

e-mail: t.osborn@xxxxxxxxx.xxx
phone: xxx xxxx xxxx
fax: xxx xxxx xxxx
web: http://www.cru.uea.ac.uk/~timo/
sunclock: http://www.cru.uea.ac.uk/~timo/sunclock.htm


</x-flowed>

Original Filename: 1200090166.txt | Return to the index page | Permalink | Earlier Emails | Later Emails

From: Tim Osborn <t.osborn@xxxxxxxxx.xxx>
To: santer1@xxxxxxxxx.xxx
Subject: Re: Potential reviewers
Date: Fri Jan 11 17:22:xxx xxxx xxxx

I didn't know about the link between John and Kevin. Sounds like Qiang or Myles, plus
Francis, would be best combination of expertise and speediness.
By the way, for online submission you'll just need to convert the Latex to a PDF file and
submit that.
Have a good weekend,
Tim
At 17:07 11/01/2008, you wrote:

Dear Phil and Tim,
I did leave Kevin's name off because of concerns that he might be extremely upset by
Christy's involvement in Douglass et al. I guess you know that John was a Ph.D. student
of Kevin's. It must be tough to have a student who's the antithesis of everything you
stand for and care about - careful, thorough science.
Qiang Fu would be great, since he's so knowledgable about MSU-related issues. I think he
would be fast, too. Myles reviewed one of the GRL versions of Douglass et al., so he's
very familiar with this territory.
With best regards,
Ben
Phil Jones wrote:

Ben,
I briefly discussed this with Tim a few minutes ago.
With IDAG coming up, it is probably best not on to use Gabi and Myles.
I also suggested that Mike Wallace might be slow - as Myles would
have been. Peter S might not be right for the IDAG reason and he
does work for the HC - where Peter T does.
If Jim is back working he would be good. So would Fu. If Tim can
just persuade them to do it - and quickly.
I did suggest Kevin - he would do it quickly - but it may be a read rag
to a bull with John Christy on the other paper.
Glad to see you've gone down his route!
Have a good weekend!
Ruth says hello!
Cheers
Phil
At 16:43 11/01/2008, Ben Santer wrote:

Dear Tim,
Here are some suggestions for potential reviewers of a Santer et al. IJoC submission on
issues related to the consistency between modeled and observed atmospheric temperature
trends. None of the suggested reviewers have been involved in the recent "focus group"
that has discussed problems with the Douglass et al. IJoC paper.
1. Mike Wallace, University of Washington. U.S. National Academy member. Expert on
atmospheric dynamics. Chair of National Academy of Sciences committee on "Reconciling
observations of global temperature change" (2000). Email: wallace@xxxxxxxxx.xxx
2. Qiang Fu, University of Washington. Expert on atmospheric radiation, dynamics,
radiosonde and satellite data. Published 2004 Nature paper and 2005 GRL paper dealing
with issues related to global and tropical temperature trends. Email:
qfu@xxxxxxxxx.xxx
3. Gabi Hegerl, University of Edinburgh. Expert on detection and attribution of
externally-forced climate change. Co-Convening Lead Author of "Understanding and
Attributing Climate Change" chapter of IPCC Fourth Assessment Report. Email:
Gabi.Hegerl@xxxxxxxxx.xxx
4. Jim Hurrell, National Center for Atmospheric Research (NCAR). Former Director of
Climate and Global Dynamics division at NCAR. Expert on climate modeling, observational
data. Published a number of papers on MSU-related issues. Email: jhurrell@xxxxxxxxx.xxx
5. Myles Allen, Oxford University. Expert in Climate Dynamics, detection and
attribution, application of statistical methods in climatology. Email:
allen@xxxxxxxxx.xxx
6. Peter Stott, Hadley Centre for Climate Prediction and Research. Expert in climate
modeling, detection and attribution. Email: peter.stott@xxxxxxxxx.xxx
With best regards,
Ben
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------

Prof. Phil Jones
Climatic Research Unit Telephone +44 xxx xxxx xxxx
School of Environmental Sciences Fax +44 xxx xxxx xxxx
University of East Anglia
Norwich Email p.jones@xxxxxxxxx.xxx
NR4 7TJ
UK
----------------------------------------------------------------------------

--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (9xxx xxxx xxxx
FAX: (9xxx xxxx xxxx
email: santer1@xxxxxxxxx.xxx
----------------------------------------------------------------------------