Replies

Now you're asking me to "data-dredge." Data-dredging is a known problem when the researcher finds his original data set does not support his original hypothesis. But without rejecting the null hypothesis, he cannot publish. So he then falls subject to the sin of data-dredging by slicing up his data in a bunch of different ways. If he slices it enough different ways, one or more of the slices will show a statistically significant relationship BY RANDOM CHANCE.

If he then reports the statistically significant slice relationship without performing what is called the Bonferonni adjustment, he is is in a state of statistical sin.

In the data at hand, what you have just done is data-dredging. You have picked the period by eye that is most likely to show a relationship and want to know the numbers for that time slice. Because you just did a pretty good job of picking one of the most favorable for your hypothesis, that is the same as if you ran the numbers on all possible time periods for all possible series.

Off the top of my head, there are 5x5x5 possible series of five or more points to report in the author's data. That's about 125 different "slices" you could test (I limited it to five to help your cause, you are unlikely to get a statistically significant regression out of fewer than five). You just picked one of the most favorable of those 125 slices. But if I ran all 125 slices at the 95% confidence level, about 6 of them would show a statistically significant relationship sheerly by random chance.

The question at hand is not, "can I slice the data so I can report a statistically significant relationship that is consistent with my hypothesis." It is, "does my original data support my original hypothesis at my originally chosen confidence level?" That's why you define your test, your significance level, and the data in advance. It avoids the sin of data dredging.

The scope of my response was limited to the author's choice of data and his hypothesis (note, I couldn't use his significance level because he didn't report it). That let me avoid dredging the data and other related sins such as adjusting your significance level downward once you see the data.

So, with that caveat, I ran things somewhat sloppily but I'm pretty sure the results are: You can find a statistically significant upward trend only in one of the three series: NASA GISS and only on a few of the 125 slices. I'm pretty sure out of the 125 total slices, you have 4 slices (one of them the one you requested) that are statistically significant at the 95% level. Compare that to the expectation that 6 of those 125 slices will show a statistically significant relationship by random chance.

So even done your way, the overall data set is consistent ONLY with the hypothesis of NO TREND.