Free Republic
Browse · Search
General/Chat
Topics · Post Article

To: ThunderSleeps

You’re also wrong about a single data point not making much of a difference in OLS regression with a modest number of data points: consider the two data sets

(1,1.1), (2,0.9), (3,1.0), (4,1.0), (5, 0.9), (6, 1.1)

and

(0, 0.0), (1,1.1), (2,0.9), (3,1.0), (4,1.0), (5, 0.9), (6, 1.1)

In the first, the regression line is y = 1

In the second, it has positive slope and a y-intercept less than 1.

In fact, this is a baby example of the phenomenon for which Briggs gave a more realistic example.

And, where the errant point is, does matter. If it had been (3.5, 0.0) inserted into the first data set to get the second,
the regression slope would still have been 0 and only the intercept (or rather the height of the whole horizontal regression line) would have changed.

Of course for less modest numbers, the outlier would need to be more extreme.


15 posted on 02/01/2012 9:38:16 PM PST by The_Reader_David (And when they behead your own people in the wars which are to come, then you will know. . .)
[ Post Reply | Private Reply | To 9 | View Replies ]


To: The_Reader_David
Sorry, I should have been more specific. To me a "modest number" of data points is several hundred, maybe a thousand. (I routinely deal with data sets in the tens of thousands, pushing 7 figures)

In the case of a hundred or so data points, for any reasonably self-consistent data set (eg. surface temperature) that isn't likely to see wild data points then any one does not have much of a chance of drastically influencing slope of the line.

17 posted on 02/02/2012 4:59:44 AM PST by ThunderSleeps (Stop obama now! Stop the hussein - insane agenda!)
[ Post Reply | Private Reply | To 15 | View Replies ]

Free Republic
Browse · Search
General/Chat
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson