"Margin of error" for Dummies (Ignore the polls)

"Margin of error" for Dummies (Ignore the polls)
Principles of Statistics | 11/01/04 | Iowahawk

Posted on 11/01/2004 5:33:35 PM PST by IowaHawk

Ever wonder where that stupid "margin of error" thingy comes from? Here's a quick primer for the curious and the masochistic:

The infamous "margin" comes directly from the formula for the 95% confidence interval for a sample proportion. Before getting to that, let's define a couple of terms:

P => the REAL population proportion p => the observed proportion in a random sample of the population n => the sample

The margin of error arises from the fact that a limited random sample may not reflect reality. To illustrate, think about a simple experiment: Flip a fair coin (where the real probability of heads = P = 0.5) 100 times. Odds are that you won't end up with 50 heads / 50 tails in your sample. In fact, the probability of ending up with exactly 50 heads (p=0.5) is only around 8%; the odds of getting exactly 49 heads is almost as high (7.8%).

Nevertheless, if you don't know the TRUE proportion (P), your sample proportion (p) is the best estimate you've got. On the other hand, it makes sense to build in a "fudge factor" to reflect how much confidence you have in the estimate. That's where confidence intervals come in. When you're estimating a proportion with a random sample, The 95% confidence interval is

p +/- 1.96 * sqrt[p * (1-p) / (n-1)]

or, simply,

p +/- Margin of Error

the n in the formula is the sample size (number of coin flips). That weird "1.96" comes from integrating a Normal, or Gaussian, distribution -- the formal version of the famous 'bell curve.' The correct interpretation of the confidence interval is this: assuming a random sample, there is a 95% change the REAL population proportion (P) is somewhere in the range of the estimate (p) plus or minus the Margin of Error.

A couple of things worth noting: starting with

Margin of Error = 1.96 * sqrt[p * (1-p) / (n-1)]

Let's conservatively assume p = .5, approximate n ~= (n-1), and round 1.96 to 2. The formula can be simplified to

Margin of Error ~= sqrt[1/n]

For example, the MOE for a poll with a sample size of 1000 is approximately sqrt(1/1000) = 0.032, or 3.2%; A poll of 600 has a MOE of sqrt(1/600) = 0.041, or 4.1%. This formula also means that in order to cut your MOE in half, you have to quadruple the sample size. In short, this formula is why you invariably get a MOE of "plus or minus 3 percent" in political polls where the sample size is about 1000.

All that said, the formulas above rest on a critical assumption: that the estimate was based on a RANDOM SAMPLE -- e.g., seed plots, product defects, coin flips. But let's say in order to flip the coin, you have to...

1. Make sure the coin has not enrolled in the national 'do not flip' list
2. Catch the coin when he is at home (no cell phones)
3. Make it past the coin's caller-ID
4. Persuade the coin to let you flip him.

And suppose even after you get to this point, 60%-75% of the coins tell you to go get screwed. And let's further suppose that 3%-5% of the oddball coins that actually agree to let you flip them will change their mind by tomorrow. Now what's the margin of error?

p +/- 666.666 * log(length of Zogby's goat entrails in furlongs) + arcsin(loudness of Frank Luntz's farts in decibels).

Moral to the story: Whether they're unfavorable or favorable to your candidate, modern political polls are almost entirely meaningless despite the "scientific" patina of error margins. Ignore the polls and go vote for your candidate as if election was tied.

TOPICS: Miscellaneous
KEYWORDS: error; math; polls

1 posted on 11/01/2004 5:33:36 PM PST by IowaHawk

[ Post Reply | Private Reply | View Replies]

To: RhoTheta

Flip-Ping.

2 posted on 11/01/2004 5:37:52 PM PST by Egon (If Kerry had been right about screwed-up returning vets, he wouldn't have lived to see 1975!)

[ Post Reply | Private Reply | To 1 | View Replies]

To: IowaHawk

Glad to see you post this, IowaHawk.

3 posted on 11/01/2004 5:39:28 PM PST by Ole Okie

[ Post Reply | Private Reply | To 1 | View Replies]

To: IowaHawk

Thank you for this info. Quick question: does the margin of error mean that I go either side of the stated value, or is it the total range of error? For example, if a candidate polls at 50% with a 3% margin of error, does that mean that 95% of the time the actual value will be between 47% and 53%, or between 48.5% and 51.5%?

4 posted on 11/01/2004 5:50:21 PM PST by RabbitMan

[ Post Reply | Private Reply | To 1 | View Replies]

To: RabbitMan

It's the wider range. There's a 95% chance that the real population proportion is between 47% and 53%.

To be clear, the MOE formula only considers SAMPLING error. For political polls, sampling error is swamped by NON-SAMPLING error. And there ain't no formula for that.

5 posted on 11/01/2004 5:56:43 PM PST by IowaHawk

[ Post Reply | Private Reply | To 4 | View Replies]

To: IowaHawk

Since this is a nice probability primer, I would like to add some other statistical issues with sampled polls: The big problem with the MSM journalists is that they have never had a math course and do not understand sampled probability systems. For example, they talk about MOE as if it were binary: inside is statistical dead heat, outside is an incontrovertable fact. The reality is, of course, that the polls are samples of a population and are only approximate. Practically: if you sample the same population many times, you will get different answers and those answers will vary according to a distribution. Therefore, you should expect sample variations.

1. The MSM problem is that they see a change and assume it is real. They are responding to their own statistics.

2. Usually, they quote one poll. By contrast, the many, excellent analyses done here on FR usually consider the average of multiple polls.

3. The excellent FR analyses I have seen here consider poll biases as well as the precisions. Most polls assume a zero-bias.

4. The bias effects can be mitigated by considering trend lines over some period and multiple polls.

6 posted on 11/01/2004 6:02:46 PM PST by 2ndreconmarine

[ Post Reply | Private Reply | To 1 | View Replies]

To: IowaHawk

When we see the majority of polls within the MOE yet still with Bush leading, does that mean anything statistically.

Lets say 5 polls with a MOE of 3% have B 49 K 46, and 2 polls with K 49 B 46, does President Bushs chances of winning increase statistically? I hope you understand the question, I don't exactly know how to put it in words, but do "averages" like RCP mean anything?

7 posted on 11/01/2004 6:03:07 PM PST by codercpc

[ Post Reply | Private Reply | To 1 | View Replies]

To: IowaHawk

I could not agree more. What is truly amazing is the self-delusion of the pollsters and thier willingness to be knowingly deceitful - reminds me of Kerry!
Lets vote and get it over with.

8 posted on 11/01/2004 6:04:24 PM PST by bjc (Attachments?)

[ Post Reply | Private Reply | To 1 | View Replies]

To: sauropod

read later with coffee

9 posted on 11/01/2004 6:06:32 PM PST by sauropod (Hitlary: "We're going to take things away from you on behalf of the common good.")

[ Post Reply | Private Reply | To 1 | View Replies]

To: IowaHawk

Please note that the "length of Zogby's goat entrails in furlongs" is accepted datum in French mathematics as represented by the symbol (Zg~~...)

10 posted on 11/01/2004 6:06:44 PM PST by lunarville (memo to Dan)

[ Post Reply | Private Reply | To 1 | View Replies]

To: RabbitMan

Quick question: does the margin of error mean that I go either side of the stated value, or is it the total range of error? For example, if a candidate polls at 50% with a 3% margin of error, does that mean that 95% of the time the actual value will be between 47% and 53%, or between 48.5% and 51.5%?

From the Harris Poll link on RealClearPolitics
Harris Poll

In theory, with a probability sample of this size [1092], one can say with 95 percent certainty that the results have a statistical precision of plus or minus 3 percentage points of what they would be if the entire U.S. adult population of likely voters had been polled with complete accuracy. Unfortunately, there are several other possible sources of error in all polls or surveys that are probably more serious than theoretical calculations of sampling error. They include refusals to be interviewed (nonresponse), question wording and question order, interviewer bias, weighting by demographic control data and screening (e.g., for likely voters). It is impossible to quantify the errors that may result from these factors

So take their Poll Result:

First, there are three SEPARATE (important) results, 1 for Bush (49), 1 for Kerry (46), 1 for Nader (2).

Read them thus:

Bush, 49 +/- 3 = range 46 - 52
Kerry 46 +/- 3 = 43 - 49
Nader 2 +/- 3 = 5 - null response (the tail of the distribution approaches zero).

Do not attempt to link an increase for one candidate with a decrease for another - they are not statistically related (remember, this is statistics, not rational logic). It is perfectly legitimate to focus only on the Bush sample and ignore the others.

Harris "expects" (statistical term) the most likely outcome for Bush to be 49% of the national Popular Vote, and "expects" that 95% of the possible outcomes should fall between 46% and 52% of the Popular Vote.

If something odd like Bush getting 61% of the Popular Vote should occur, that is accounted for as a 3-standard-deviation event, and it is also "expected." Honest pollsters would not be surprised by such an outcome, other than any difficulty explaining it rationally.

I'm 95% certain that you are now more uncertain than you were before what all this means <|;>)

11 posted on 11/01/2004 6:23:40 PM PST by 1stMarylandRegiment (Conserve Liberty)

[ Post Reply | Private Reply | To 4 | View Replies]

To: IowaHawk

This is probably the best post I have ever seen stating why the "margin of error" in the polls is baloney - the poll accuracy (margin of error) assume random samples that ACCURATELY reflect the populace, and you give some of the reasons they do not. The proof of this is quite evident when you notice that sometimes different polls taken at the same time span vary by a huge margin, outside of the MOE.

Another source of error is introduced when the pollsters "adjust" and "weight" the responses for demographic, sex, and other factors.

Even a margin of error of +/-4% if it were truly accurate MOE with a 95% confidence interval, means that a 52B to 48K vote might actually be a 52K to 48B vote! But with the numerous other errors as described in your post, I would put the REAL MOE more like +/- 8%. Ronald Reagans landslide win his second time around showed just how bad polls can be, he beat some of them by 10%!

The accuracy of the polls increases with sample size, and the best bet is to trend the poll and look at the trends of numerous polls. When you do that, Bush wins :)

12 posted on 11/01/2004 6:30:47 PM PST by Enlightiator

[ Post Reply | Private Reply | To 1 | View Replies]

To: IowaHawk

I followed most of this, but what is the "sqrt" value?

13 posted on 11/01/2004 6:47:09 PM PST by Ilya Mourometz

[ Post Reply | Private Reply | To 1 | View Replies]

To: Ilya Mourometz

sqrt = square root.

14 posted on 11/01/2004 6:54:31 PM PST by IowaHawk

[ Post Reply | Private Reply | To 13 | View Replies]

To: IowaHawk

duh...Thanks.

15 posted on 11/01/2004 7:04:34 PM PST by Ilya Mourometz

[ Post Reply | Private Reply | To 14 | View Replies]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794