Skip to comments.
Benford's law for fraud detection
Posted on 11/05/2020 11:17:39 AM PST by Truthsearcher
click here to read article
Navigation: use the links below to view more comments.
first previous 1-20, 21-40, 41-60, 61-73 last
To: wintertime
Why is the zero digit not listed?
Because the only counting number that starts with a '0' is 0. Once you start counting, every number you count to starts with 1-9 (assuming base ten system).
To: SelfhatingMillennial
To see a graphical presentation of data and say the original meaning of it is unimportant is horrifying. This is how the media bamboozles data-ignorant America with plots, charts, and other half-baked statistics (exhibit A: everything related to COVID). Understanding WHAT is being captured in any graphic is extremely important, both for validating the legitimacy of the graphic (and its conclusions) and for being able to explain it to others.
The original meaning of the data is unimportant for the analysis - it's simply the likelihood of your counting numbers starting with a particular digit once you stop counting. Sure, original meaning is important for the importance of the analysis and determining how the analysis applies to the world, but the analysis/formula itself only requires counting numbers from any source. As to the "control graph", that is based on nothing concrete, it's a theoretical formula that only deals with the mathematics. My post #59 walks someone else through it decently.
I now understand what is going on here, and the results are extremely compelling. I would like to know how many total data points there are, which I think would equal the total number of precincts/wards in each of the three cities. Curious if were talking tens or hundreds (probably not thousands) of individual election results
I assume you're responding to the set of six graphs posted above? That's Philly, Milwaukee, and Detroit.
Philly seems to have 718 poll locations.
Milwaukee looks like 478 of them.
And
Detroit appers to have a total of 503 locations within the city proper, but I don't know if there's more locations outside the city (within the county) but still counted as "Detroit" for the purposes of those graphs above. So 503 minimum.
To: Svartalfiar; William Tell
and 0 '4'-'9's.
That's wrong, I meant to say 11 of each. Single digits and tens, but nothing from the 100s because we stopped counting in the 300s.
To: freeandfreezing
Wow! I’m not a math girl, but I grasp the concept.
64
posted on
11/05/2020 9:41:00 PM PST
by
SE Mom
(Screaming Eagle mom)
To: Truthsearcher
I accept this about the numbers, but WHY?...why would number “1” have a 30% chance of occurring over the other numbers.
65
posted on
11/05/2020 9:46:11 PM PST
by
cherry
To: cherry
There is a complicated mathematical proof for this.
But just a simpler way of thinking about is that because the higher counts is always less likely than the lower counts. And the lower the count, the more likely for the count to start with 1.
For example If the count is between 1-100, then the chances are equal for the leading digit to be 1 thru 9, but if the count is anywhere between 1-199, then 1 suddenly because the by far the most likely, 2 does does match 1 unless the count is between 1-299, and so on and so forth, 9 doesn’t get to even it’s odds until the count is from 1-999. And 1 get to flips the odds again in it’s favor on all counts from 1-1999, and so on ao so forth.
So the first digit is always the most likely and each digit in decreasing likelihood.
Unless people are making up numbers because when they are making it up, when they do all digits are more equally likely and you deviance from Benford’s law.
To: Truthsearcher
This video also shows it.
Look at the graphs, ONLY Joe Biden’s vote in those swing state major dem counties violate this law, Trumps and Jorgensens’ numbers always follow this law all over the country. And Biden’s in other counties also follow this law.
Only Joe Biden’s vote tallies in big Dem counties of swing states doesn’t.
To: freeandfreezing
Would it be possible to post the data set that these charts were drawn from?
68
posted on
11/06/2020 3:50:52 PM PST
by
q49s
To: Svartalfiar
Yes, the principle is that a large collection of counted numbers from a valid, representative population of numbers will distribute themselves log-normally. The orange control graph IS the perfect log-normal distribution of vote totals when taken from a representative population; this is what youd expect Trump and Bidens respective vote totals from the many precincts to track. Bidens clearly does not, for ANY of the three cities, defying laws of probability. His votes were taken from a non-representative population. Now how would that happen...
Thanks for the precinct estimates. That many precincts makes for a healthy population from which to expect to see a log-normal distribution. Had it been 25% as many, these plots would be far less damning.
To: Truthsearcher
70
posted on
11/06/2020 4:00:39 PM PST
by
mewzilla
(Break out the mustard seeds.)
To: q49s
I did not do the original analysis, so I don’t have links to the data readily available. You can usually get election data from the websites of each state’s election offices at the precinct level as a spreadsheet file. From those files you just do a histogram of the first digit of the results.
To: Svartalfiar
"Except these numbers aren't randomly pulled from a list, they count, and they count up. " Just for clarification, I was talking about the last two digits not the first digit. A random sample taken from the range 0 to 999,999 will have around 90% six digit numbers. A Benford analysis for the least significant digits will be different than for the first digit.
To: William Tell
Just for clarification, I was talking about the last two digits not the first digit. A random sample taken from the range 0 to 999,999 will have around 90% six digit numbers. A Benford analysis for the least significant digits will be different than for the first digit.
A Benford analysis is not used to look at the ending digits. It does extrapolate into digits beyond the first, but by the time you reach the 4th digit you're at a near-even 10% across the board: at that point '0' is hitting 10.0176% of the time, and '9' is at 9.9824%.
Navigation: use the links below to view more comments.
first previous 1-20, 21-40, 41-60, 61-73 last
Disclaimer:
Opinions posted on Free Republic are those of the individual
posters and do not necessarily represent the opinion of Free Republic or its
management. All materials posted herein are protected by copyright law and the
exemption for fair use of copyrighted works.
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson