Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

Benford's law for fraud detection

Posted on 11/05/2020 11:17:39 AM PST by Truthsearcher

click here to read article


Navigation: use the links below to view more comments.
first previous 1-2021-4041-6061-73 last
To: wintertime
Why is the zero digit not listed?

Because the only counting number that starts with a '0' is 0. Once you start counting, every number you count to starts with 1-9 (assuming base ten system).
61 posted on 11/05/2020 9:12:56 PM PST by Svartalfiar
[ Post Reply | Private Reply | To 45 | View Replies]

To: SelfhatingMillennial
To see a graphical presentation of data and say the original meaning of it is unimportant is horrifying. This is how the media bamboozles data-ignorant America with plots, charts, and other half-baked statistics (exhibit A: everything related to COVID). Understanding WHAT is being captured in any graphic is extremely important, both for validating the legitimacy of the graphic (and its conclusions) and for being able to explain it to others.

The original meaning of the data is unimportant for the analysis - it's simply the likelihood of your counting numbers starting with a particular digit once you stop counting. Sure, original meaning is important for the importance of the analysis and determining how the analysis applies to the world, but the analysis/formula itself only requires counting numbers from any source. As to the "control graph", that is based on nothing concrete, it's a theoretical formula that only deals with the mathematics. My post #59 walks someone else through it decently.


I now understand what is going on here, and the results are extremely compelling. I would like to know how many total data points there are, which I think would equal the total number of precincts/wards in each of the three cities. Curious if we’re talking tens or hundreds (probably not thousands) of individual election results

I assume you're responding to the set of six graphs posted above? That's Philly, Milwaukee, and Detroit.
Philly seems to have 718 poll locations.
Milwaukee looks like 478 of them.
And Detroit appers to have a total of 503 locations within the city proper, but I don't know if there's more locations outside the city (within the county) but still counted as "Detroit" for the purposes of those graphs above. So 503 minimum.
62 posted on 11/05/2020 9:37:08 PM PST by Svartalfiar
[ Post Reply | Private Reply | To 58 | View Replies]

To: Svartalfiar; William Tell
and 0 '4'-'9's.

That's wrong, I meant to say 11 of each. Single digits and tens, but nothing from the 100s because we stopped counting in the 300s.
63 posted on 11/05/2020 9:40:13 PM PST by Svartalfiar
[ Post Reply | Private Reply | To 59 | View Replies]

To: freeandfreezing

Wow! I’m not a math girl, but I grasp the concept.


64 posted on 11/05/2020 9:41:00 PM PST by SE Mom (Screaming Eagle mom)
[ Post Reply | Private Reply | To 18 | View Replies]

To: Truthsearcher

I accept this about the numbers, but WHY?...why would number “1” have a 30% chance of occurring over the other numbers.


65 posted on 11/05/2020 9:46:11 PM PST by cherry
[ Post Reply | Private Reply | To 5 | View Replies]

To: cherry

There is a complicated mathematical proof for this.
But just a simpler way of thinking about is that because the higher counts is always less likely than the lower counts. And the lower the count, the more likely for the count to start with 1.

For example If the count is between 1-100, then the chances are equal for the leading digit to be 1 thru 9, but if the count is anywhere between 1-199, then 1 suddenly because the by far the most likely, 2 does does match 1 unless the count is between 1-299, and so on and so forth, 9 doesn’t get to even it’s odds until the count is from 1-999. And 1 get to flips the odds again in it’s favor on all counts from 1-1999, and so on ao so forth.

So the first digit is always the most likely and each digit in decreasing likelihood.

Unless people are making up numbers because when they are making it up, when they do all digits are more equally likely and you deviance from Benford’s law.


66 posted on 11/06/2020 7:42:18 AM PST by Truthsearcher
[ Post Reply | Private Reply | To 65 | View Replies]

To: Truthsearcher

This video also shows it.

Look at the graphs, ONLY Joe Biden’s vote in those swing state major dem counties violate this law, Trumps and Jorgensens’ numbers always follow this law all over the country. And Biden’s in other counties also follow this law.

Only Joe Biden’s vote tallies in big Dem counties of swing states doesn’t.


67 posted on 11/06/2020 7:48:17 AM PST by Truthsearcher
[ Post Reply | Private Reply | To 1 | View Replies]

To: freeandfreezing

Would it be possible to post the data set that these charts were drawn from?


68 posted on 11/06/2020 3:50:52 PM PST by q49s
[ Post Reply | Private Reply | To 17 | View Replies]

To: Svartalfiar

Yes, the principle is that a large collection of counted numbers from a valid, representative population of numbers will distribute themselves log-normally. The orange control graph IS the “perfect” log-normal distribution of vote totals when taken from a representative population; this is what you’d expect Trump and Biden’s respective vote totals from the many precincts to track. Biden’s clearly does not, for ANY of the three cities, defying laws of probability. His votes were taken from a non-representative population. Now how would that happen...

Thanks for the precinct estimates. That many precincts makes for a healthy population from which to expect to see a log-normal distribution. Had it been 25% as many, these plots would be far less damning.


69 posted on 11/06/2020 3:58:39 PM PST by SelfhatingMillennial
[ Post Reply | Private Reply | To 62 | View Replies]

To: Truthsearcher

BTTT.


70 posted on 11/06/2020 4:00:39 PM PST by mewzilla (Break out the mustard seeds.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: q49s

I did not do the original analysis, so I don’t have links to the data readily available. You can usually get election data from the websites of each state’s election offices at the precinct level as a spreadsheet file. From those files you just do a histogram of the first digit of the results.


71 posted on 11/07/2020 6:54:10 AM PST by freeandfreezing
[ Post Reply | Private Reply | To 68 | View Replies]

To: Svartalfiar
"Except these numbers aren't randomly pulled from a list, they count, and they count up. "

Just for clarification, I was talking about the last two digits not the first digit. A random sample taken from the range 0 to 999,999 will have around 90% six digit numbers. A Benford analysis for the least significant digits will be different than for the first digit.

72 posted on 11/07/2020 9:21:20 AM PST by William Tell
[ Post Reply | Private Reply | To 59 | View Replies]

To: William Tell
Just for clarification, I was talking about the last two digits not the first digit. A random sample taken from the range 0 to 999,999 will have around 90% six digit numbers. A Benford analysis for the least significant digits will be different than for the first digit.

A Benford analysis is not used to look at the ending digits. It does extrapolate into digits beyond the first, but by the time you reach the 4th digit you're at a near-even 10% across the board: at that point '0' is hitting 10.0176% of the time, and '9' is at 9.9824%.
73 posted on 11/08/2020 11:25:46 AM PST by Svartalfiar
[ Post Reply | Private Reply | To 72 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-2021-4041-6061-73 last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson