I still remember my prof in Statistics 101:
Statistics NEVER give you an answer, at best, they give you another question.
This example is why so many ‘scientist study shows’ papers cannot be reproduced. The data is ‘interpreted’ to get the result that the researcher wants. Further, often the question is framed so that the results are predetermined prior to the data being obtained.
Raw data that may invalidate the results need not apply.
I remember Global Warmists argued that hurricanes were getting worse because of the number of people impacted by them. But they ignored the fact that one hurricane hitting a major population center will shoe more people impacted than several WORSE hurricanes in unpopulated areas.
The dots seem to form a kitty, Doctor.
Plus the internet is full of shitposters who are deliberately flooding Big Data with garbage, sarcasm, trolling, and other things not easily detected by algorithm.
State universities LIVE on this. Only an average of 18 students per class... but 90% of your classes are lecture-center classes with 300 students!
I did a research paper for my MBA on cluster analysis. I took all privately owned sports teams and clustered them. I used Yes/No criteria to minimize subjectivity and bias. Twenty one variables included bought or inherited team, championship or not, wealth came from manufacturing or service industries, etc.
I got six clusters that indicated that owners who got their wealth from manufacturing and bought the team were most successful.
A similar study for bank profitability showed little clustering based on bank deposits, office space, age or other variables.
Good example:
My daughter is a Registered Nurse. She works in a hospital.
While a teenager she worked at a local restaurant called Eat n’ Park. That is a Pittsburgh area chain. The company that owns it is called The Eat n’ Park Hospitality Group.
My daughter has not lived at home in a decade. Despite this I get fistfuls of junk mail trying to sell computers and things addressed to my daughter, at the “Eat n’ Park Hospital Group”, which is apparently headquartered in my modest house in suburban Pittsburgh.
Data mining and AI obviously put some pieces together in a very incorrect way.
Reminds me of Jim Bouton in Ball Four when he was in contract negotiations and management was listing all of the “bad” statistics from the prior year. His response was “Tell Your Statistics to Shut up!”.
Always loved that one!
It always gets back to the base rate.....
If you divide data into arbitrary groups one can get whatever interpretation one wishes.
Why didn’t he continue with the baseball references? You can outscore the other team 50-7 over a five-game series. But you still lose the series if it goes like this:
- 1-0
- 0-30
- 0-17
- 4-2
- 2-1