Posted on 09/03/2004 2:37:14 PM PDT by dvwjr
Here are the latest political affiliation breakdowns for many of the previous American Research Group, CBS News, Fox News/Opinion Dynamics, Gallup, Los Angeles Times, Newsweek/PSRAI polls. If the poll result percentages in the table show below are rounded off so as not to have a fractional component, the results should match the published polls for each polling organization. The fractional differences in the table poll results are due to the calculations done to distribute the sample number of Registered voter responses which was necessary to avoid any great inaccuracies due to the rounding already performed by the polling organizations.
Now for a description of my "R/D/I" breakdown methodology. The first thing necessary of course is the data published by the polling organization with their nation-wide poll results based on at least ~750 Registered poll respondents. They will give the current three-way candidate results, to include other minor candidates and/or 'non-voting' or 'do not know' positions. The next set of necessary information is what the "R/D/I" breakdowns for a particular poll are according to the polling organization itself. While not usually provided, in the previous August 2nd, 2004 presidential preference poll conducted by the American Research Group (currently September) does have the "R/D/I" partisan affiliation breakdowns published, so no calculations are required. This appears to be an exception, as this is the first American Research Group presidential preference poll which had the "R/D/I" breakdowns published along with the nation-wide results and party candidate preference breakdowns per presidential candidate.
August 2 | Bush | Kerry | Nader | Undecided |
All voters | 45% | 49% | 2% |
4% |
Republicans (35%) | 87% | 7% | 1% |
5% |
Democrats (37%) | 9% | 85% | 2% |
4% |
Independents (28%) | 40% | 53% | 3% |
4% |
Jul 2004 | 44% | 47% | 3% | 6% |
Jun 2004 | 45% | 46% | 3% | 6% |
May 2004 | 44% | 45% | 4% | 7% |
Apr 2004 | 43% | 48% | 2% | 7% |
Mar 2004 | 42% | 48% | 2% | 8% |
Ok, if the polling organization does not provide published "R/D/I" breakdowns, then they may be calculated by the candidate preferences by party with respect to the overall national preference. Use of the August 2nd, 2004 ARG presidential poll as an example will allow the breakdown calculations to be checked against the actual ARG published "R/D/I" results. First thing to do is to put the data into a spreadsheet which uses matrix inversion to solve for the "three equations/three unknowns" to get a 'ball-park' estimate.
Well, this first estimate of 33.20% (R), 34.65% (D), 32.50% (I) does not match up to the ARG published 35% (R), 37% (D), 28% (I) results and those three derived numbers add up to 100.3%, when it should be just 100%. Again, the linear equation solution it is just a first cut at the "R/D/I" analysis. So now the next step in the numerical analysis is to define the inputs for further analysis. The poll numbers published by the various polling organizations are typically rounded to the 'ones' place for easy reading and comparison in news articles in both magazines and newspapers. Given the usual MoE ±3.0% that goes along with the published poll numbers, this degree of numerical accuracy is fine. However, the lack of decimal point accuracy makes it harder to 'back into' the 'R/D/I' breakdowns that are sought.
The ARG poll for August 2nd, 2004 consisted of responses from 776 Registered voters. This information combined with the candidate preferences detailed by political affiliation and the overall poll results shown in the above table are what are needed to solve the problem. The other factor is that of the 'rounding' which is done when the poll numbers are published. Given that there is no 'decimal place' accuracy, it implies that the published whole poll numbers might vary by ±0.49 due to common mathematical rounding rules. That is to say, if Bush has a 43% preference number in a published political poll, it could actually be a number from 42.51% to 43.49% that when rounded would be reported as just 43%.
This explains why trying to arrive at the solution via linear equations solved via matrix inversion does not give the correct answers. This solution method would work if the input poll data was exactly as is displayed in the table directly below, being actual numbers which were not rounded in any way. Not very likely...
However, give the use of 'rounding' explained above, the data that is actually available looks like that in the table displayed directly below. This explains the failure of the obvious direct use of linear equations and matrix math to arrive at the "R/D/I" breakdowns. Solution via linear equations/matrix inversion depends on more exact numbers, not numerical ranges. If the above listed numbers were accurate to a single decimal place, then this method would suffice for extracting the desired "R/D/I" breakdowns.
So the next way to attempt the solution is with the solver/optimizer engine which is included with Microsoft's Excel spreadsheet (from Excel 97 onwards) and the published polling data combined with the published number of registered voter poll respondents. The solver/optimizer engine combined with the necessary constraints should allow the proper solution set to be determined. This will be a non-linear problem due to the fact that actual whole numbers will be manipulated, representing poll responses. Since a poll respondent cannot fractionally assign their vote, the system of equations becomes an non-linear problem.
So below is the spreadsheet setup for the use of the solver engine in Excel 97. Subject to the constraints, we hold the yellow cell with 776 constant while allowing the solver engine to vary the whole numbers in the gray colored cells. This is the solution mechanism I used in my previous method of solving for the "R/D/I" breakdowns. However, this methodology even with the constraints allowed for "R/D/I" solutions which had the "I"ndependent component as high as 40% of the respondent sample, clearly too high as poster Torie pointed out. This is because the first solution of a set of solutions which mathematically meets the requirements of the constraints is given as the solution; there might be other equally mathematically valid solutions which are better representations of the "R/D/I" breakdowns actually used by the polling organization.
The simple solution was to use an additional constraint which required the percentage of Democrats in the mix to be maximized. Since in the real world, the Democrats have a 2% to 4% advantage over the Republicans among "Registered" voters according to most polling organizations, this simple additional constraint ensures that no anomalies such as "I"ndependents with 40% solutions are permitted, unless that is actually the number which was used by the polling organization. Also, with the percentage of Democrats in the sample mix maximized there can be no more of a "worst case" solution possible from a Republican point of view. Having such a constraint does not always mean that the percentage of Democrats will always be greater that the number of Republicans, it just means that there cannot be any higher number of Democrats. I think that it is a realistic constraint given the reality of the number of "Registered" Democrats versus Republicans nationwide.
Notice that the "R/D/I" solver solution for the ARG August 2nd, 2004 poll shows political affiliation numbers of 35.05% (R), 37.11% (D), 27.84% (I) which when rounded to no decimal places matches up exactly with political affiliation breakdowns of 35% (R), 37% (D), 28% (I) published by ARG in its August 2nd, 2004 poll.
The constraint listed in the window above, "$C$44 >= $E$44" is the one used to maximize the percentage of Democrats in the sample mix. Cell $E$44 is blank in the above listed table, but during 'solver runs' contains a number which is used to force the maximizing of the percentage of Democrats in the solution set.
Any comments on any additional errors in this methodology are welcome.
I have created a temporary 'ping' list to many who had expressed interest in these R/D/I poll breakdowns. If you wish to be added (or removed) from this 'ping' list, please place a reply on this thread and I will attempt to clean-up this list so as to not bother those who do not wish to read this type of poll analysis.
Many thanks to FR poster RWR8189 for his generous contribution of the 'Gallup' Organization internal poll information from his paid subscription, without which the bench-mark Gallup poll could not have been analyzed. Thanks again...
dvwjr
FYI.
dvwjr
Great job thank you...
Thanks for the ping - I'm very interested in this, and I'm glad I'm on your list.
Now I have to save it and study it. I'm in awe of your research!
If only someone could figure out how to reach those "Margin of Error" voters. I hate that these people, these Americans, are continually marginalized. And what errors did they make? No one will say.
Well if you were a margin of error voter, wouldn't you want to be marginalized???
This looks awesome, except I can only decipher about 25% of the information because the colors are too dark. Why don't you just do a normal white/black table? We can understand how to follow column labels. We're not Democrats! Thanks so much for doing this.
Could you please add me to your ping list? TIA
FWIW, latest from RealClearPolitics -
RealClearPolitics Poll Averages:
3-Way: Bush 48.3, Kerry 43.7, Nader 3.0
Head-to-Head: Bush 47.7, Kerry 45.3
Bush JA: 49.9 Approve/46.9 Disapprove
The "Electoral Vote Predictor" 2004 is not current --- it's all pre-swiftboat/convention.
Yes, but there is no demographic information available right now with two of those 'popular' vote polls, just results. There is no way to check to political affiliation breakdowns for them, and thus they do not fit into the purpose of this post.
Thanks anyway...
dvwjr
Thanks for this. I think polls are excellent indicators of TRENDS, if not precise numbers. The "polls don't matter" phrase is another way of saying "We're losing but won't admit it."
I expect to hear this from the Kerry supporters like Bob Bechel soon.
You know, I got bored one day and solved the equation after our discussion last month. Instead of using Excel, you can actually solve for the R/D/I breakdown algebraically. 3 variables, 3 unknowns, 3 equations. If you are interested, I can send you the formula or the spreadsheet.
The problem is that because these polling firms don't post poll numbers to 3-4 digits, you get errors introduced, and that causes the R/D/I to fluctuate greatly.
You hit it on the head, that is why my 'three-equations' and 'three-unknown' approach does not work when you do not have at least single decimal point accuracy. When a number or poll percentage of "X%" is reported, it can be ± 0.50% which renders the algebraic method almost useless until a more accurate set of inputs can be deduced.
My example uses Excel with the solution set derived with matrix algebra for the 'rough cut' analysis you propose. If you would like my spreadsheet I can make it available.
dvwjr
Do you have a blog set up? You should think about it. You do good analysis. If you want to email me your spreadsheet, that would be great. I am at cableguy@gmail.com
Excellent analysis!!! Thanks for sharing.
Thank you. I am glad I asked, and that you took the time to answer so completely. I recognized the nature of the problem, and my quick once over allows that your methods look fine. I'll study it at length soon.
/
George W. Bush will be reelected by a margin of at least ten per cent
Thanks for the ping.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.