Posted on 11/28/2017 1:27:41 PM PST by nickcarraway
As debate rumbles on about how and how much poor statistics is to blame for poor reproducibility, Nature asked influential statisticians to recommend one change to improve science. The common theme? The problem is not our maths, but ourselves.
To use statistics well, researchers must study how scientists analyse and interpret data and then apply that information to prevent cognitive mistakes.
In the past couple of decades, many fields have shifted from data sets with a dozen measurements to data sets with millions. Methods that were developed for a world with sparse and hard-to-collect information have been jury-rigged to handle bigger, more-diverse and more-complex data sets. No wonder the literature is now full of papers that use outdated statistics, misapply statistical tests and misinterpret results. The application of P values to determine whether an analysis is interesting is just one of the most visible of many shortcomings.
Its not enough to blame a surfeit of data and a lack of training in analysis1. Its also impractical to say that statistical metrics such as P values should not be used to make decisions. Sometimes a decision (editorial or funding, say) must be made, and clear guidelines are useful.
The root problem is that we know very little about how people analyse and process information. An illustrative exception is graphs. Experiments show that people struggle to compare angles in pie charts yet breeze through comparative lengths and heights in bar charts2. The move from pies to bars has brought better understanding.
We need to appreciate that data analysis is not purely computational and algorithmic it is a human behaviour. In this case, the behaviour is made worse by training that was developed for a data-poor era. This framing will enable us to address practical problems. For instance, how
(Excerpt) Read more at nature.com ...
In Signal and Noise, Nate Silver admits the truth. BIAS.
Statistics are based on a population, a sample, on collected data. There is bias in which data to collect and which data to ignore. There is bias in the weight given to each piece of data collected. There is bias in refusing to admit/recognize the bias. There is bias in refusing to admit what you do not know. There is bias in refusing to admit that you don’t know what you don’t know.
Then there is bias in believing the data. A famous Artificial Intelligence company did a study of immunizations. It believed in advance that immunizations were useful. When accurate math did not prove the pre-conceived bias, they adjusted the denominator to make it fit their bias. They did not do this to intentionally lie. They did it because they knew that the correct answer could not possibly be correct because everybody knew immunizations were good.
They then recommended more immunizations based on their failure of 5th grade math.
Their original math was correct but their understanding of the raw data was seriously flawed. Sick people go to the doctor more often than healthy people. When people go to the doctor, the doctor always pushes a flu shot or whatever immunization is available. So invariably sick people get more shots than healthy people. Naturally, From the thing they got a shot for, sick people get sick more often than healthy people, despite the shot.
But high paid AI gurus with PHDs don’t know what your uncle knows.
They drink objectivity from a chalice sent by Congress.
Beat me to it. That’s my favorite stats book of all time. Unfortunately academia and the media treat it as a guidebook rather than a warning.
I’m guessing that the misinformation created by either deliberate or ignorant misuse of statistics is a greater volume than the accurate information produced.
Academia has had decades and decades of experience manipulating data in order to get government grants.
Their expertise in this field is nonpareil.....................
I think the most common thing I am seeing is applying statistical analysis to a dataset then applying statistical tests to the processed data rather than the original dataset. Of course it is going have a positive result to what ever you are trying to prove or disprove.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.