Posted on 11/01/2004 5:33:35 PM PST by IowaHawk
Ever wonder where that stupid "margin of error" thingy comes from? Here's a quick primer for the curious and the masochistic:
The infamous "margin" comes directly from the formula for the 95% confidence interval for a sample proportion. Before getting to that, let's define a couple of terms:
P => the REAL population proportion p => the observed proportion in a random sample of the population n => the sample
The margin of error arises from the fact that a limited random sample may not reflect reality. To illustrate, think about a simple experiment: Flip a fair coin (where the real probability of heads = P = 0.5) 100 times. Odds are that you won't end up with 50 heads / 50 tails in your sample. In fact, the probability of ending up with exactly 50 heads (p=0.5) is only around 8%; the odds of getting exactly 49 heads is almost as high (7.8%).
Nevertheless, if you don't know the TRUE proportion (P), your sample proportion (p) is the best estimate you've got. On the other hand, it makes sense to build in a "fudge factor" to reflect how much confidence you have in the estimate. That's where confidence intervals come in. When you're estimating a proportion with a random sample, The 95% confidence interval is
p +/- 1.96 * sqrt[p * (1-p) / (n-1)]
or, simply,
p +/- Margin of Error
the n in the formula is the sample size (number of coin flips). That weird "1.96" comes from integrating a Normal, or Gaussian, distribution -- the formal version of the famous 'bell curve.' The correct interpretation of the confidence interval is this: assuming a random sample, there is a 95% change the REAL population proportion (P) is somewhere in the range of the estimate (p) plus or minus the Margin of Error.
A couple of things worth noting: starting with
Margin of Error = 1.96 * sqrt[p * (1-p) / (n-1)]
Let's conservatively assume p = .5, approximate n ~= (n-1), and round 1.96 to 2. The formula can be simplified to
Margin of Error ~= sqrt[1/n]
For example, the MOE for a poll with a sample size of 1000 is approximately sqrt(1/1000) = 0.032, or 3.2%; A poll of 600 has a MOE of sqrt(1/600) = 0.041, or 4.1%. This formula also means that in order to cut your MOE in half, you have to quadruple the sample size. In short, this formula is why you invariably get a MOE of "plus or minus 3 percent" in political polls where the sample size is about 1000.
All that said, the formulas above rest on a critical assumption: that the estimate was based on a RANDOM SAMPLE -- e.g., seed plots, product defects, coin flips. But let's say in order to flip the coin, you have to...
1. Make sure the coin has not enrolled in the national 'do not flip' list
2. Catch the coin when he is at home (no cell phones)
3. Make it past the coin's caller-ID
4. Persuade the coin to let you flip him.
And suppose even after you get to this point, 60%-75% of the coins tell you to go get screwed. And let's further suppose that 3%-5% of the oddball coins that actually agree to let you flip them will change their mind by tomorrow. Now what's the margin of error?
p +/- 666.666 * log(length of Zogby's goat entrails in furlongs) + arcsin(loudness of Frank Luntz's farts in decibels).
Moral to the story: Whether they're unfavorable or favorable to your candidate, modern political polls are almost entirely meaningless despite the "scientific" patina of error margins. Ignore the polls and go vote for your candidate as if election was tied.
Flip-Ping.
Glad to see you post this, IowaHawk.
Thank you for this info. Quick question: does the margin of error mean that I go either side of the stated value, or is it the total range of error? For example, if a candidate polls at 50% with a 3% margin of error, does that mean that 95% of the time the actual value will be between 47% and 53%, or between 48.5% and 51.5%?
It's the wider range. There's a 95% chance that the real population proportion is between 47% and 53%.
To be clear, the MOE formula only considers SAMPLING error. For political polls, sampling error is swamped by NON-SAMPLING error. And there ain't no formula for that.
Since this is a nice probability primer, I would like to add some other statistical issues with sampled polls: The big problem with the MSM journalists is that they have never had a math course and do not understand sampled probability systems. For example, they talk about MOE as if it were binary: inside is statistical dead heat, outside is an incontrovertable fact. The reality is, of course, that the polls are samples of a population and are only approximate. Practically: if you sample the same population many times, you will get different answers and those answers will vary according to a distribution. Therefore, you should expect sample variations.
1. The MSM problem is that they see a change and assume it is real. They are responding to their own statistics.
2. Usually, they quote one poll. By contrast, the many, excellent analyses done here on FR usually consider the average of multiple polls.
3. The excellent FR analyses I have seen here consider poll biases as well as the precisions. Most polls assume a zero-bias.
4. The bias effects can be mitigated by considering trend lines over some period and multiple polls.
Lets say 5 polls with a MOE of 3% have B 49 K 46, and 2 polls with K 49 B 46, does President Bushs chances of winning increase statistically? I hope you understand the question, I don't exactly know how to put it in words, but do "averages" like RCP mean anything?
I could not agree more. What is truly amazing is the self-delusion of the pollsters and thier willingness to be knowingly deceitful - reminds me of Kerry!
Lets vote and get it over with.
read later with coffee
From the Harris Poll link on RealClearPolitics
Harris Poll
In theory, with a probability sample of this size [1092], one can say with 95 percent certainty that the results have a statistical precision of plus or minus 3 percentage points of what they would be if the entire U.S. adult population of likely voters had been polled with complete accuracy. Unfortunately, there are several other possible sources of error in all polls or surveys that are probably more serious than theoretical calculations of sampling error. They include refusals to be interviewed (nonresponse), question wording and question order, interviewer bias, weighting by demographic control data and screening (e.g., for likely voters). It is impossible to quantify the errors that may result from these factors
So take their Poll Result:
First, there are three SEPARATE (important) results, 1 for Bush (49), 1 for Kerry (46), 1 for Nader (2).
Read them thus:
Bush, 49 +/- 3 = range 46 - 52
Kerry 46 +/- 3 = 43 - 49
Nader 2 +/- 3 = 5 - null response (the tail of the distribution approaches zero).
Do not attempt to link an increase for one candidate with a decrease for another - they are not statistically related (remember, this is statistics, not rational logic). It is perfectly legitimate to focus only on the Bush sample and ignore the others.
Harris "expects" (statistical term) the most likely outcome for Bush to be 49% of the national Popular Vote, and "expects" that 95% of the possible outcomes should fall between 46% and 52% of the Popular Vote.
If something odd like Bush getting 61% of the Popular Vote should occur, that is accounted for as a 3-standard-deviation event, and it is also "expected." Honest pollsters would not be surprised by such an outcome, other than any difficulty explaining it rationally.
I'm 95% certain that you are now more uncertain than you were before what all this means <|;>)
Another source of error is introduced when the pollsters "adjust" and "weight" the responses for demographic, sex, and other factors.
Even a margin of error of +/-4% if it were truly accurate MOE with a 95% confidence interval, means that a 52B to 48K vote might actually be a 52K to 48B vote! But with the numerous other errors as described in your post, I would put the REAL MOE more like +/- 8%. Ronald Reagans landslide win his second time around showed just how bad polls can be, he beat some of them by 10%!
The accuracy of the polls increases with sample size, and the best bet is to trend the poll and look at the trends of numerous polls. When you do that, Bush wins :)
I followed most of this, but what is the "sqrt" value?
sqrt = square root.
duh...Thanks.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.