Posted on 03/20/2015 11:36:31 AM PDT by E. Pluribus Unum
Dr Megan Head in her evolutionary biology lab at the Research School of Biology.
A new study has found some scientists are unknowingly tweaking experiments and analysis methods to increase their chances of getting results that are easily published.
The study conducted by ANU scientists is the most comprehensive investigation into a type of publication bias called p-hacking.
P-hacking happens when researchers either consciously or unconsciously analyse their data multiple times or in multiple ways until they get a desired result. If p-hacking is common, the exaggerated results could lead to misleading conclusions, even when evidence comes from multiple studies.
"We found evidence that p-hacking is happening throughout the life sciences," said lead author Dr Megan Head from the ANU Research School of Biology.
The study used text mining to extract p-values - a number that indicates how likely it is that a result occurs by chance - from more than 100,000 research papers published around the world, spanning many scientific disciplines, including medicine, biology and psychology.
"Many researchers are not aware that certain methods could make some results seem more important than they are. They are just genuinely excited about finding something new and interesting," Dr Head said.
"I think that pressure to publish is one factor driving this bias. As scientists we are judged by how many publications we have and the quality of the scientific journals they go in.
"Journals, especially the top journals, are more likely to publish experiments with new, interesting results, creating incentive to produce results on demand."
Dr Head said the study found a high number of p-values that were only just over the traditional threshold that most scientists call statistically significant.
"This suggests that some scientists adjust their experimental design, datasets or statistical methods until they get a result that crosses the significance threshold," she said.
"They might look at their results before an experiment is finished, or explore their data with lots of different statistical methods, without realising that this can lead to bias."
The concern with p-hacking is that it could get in the way of forming accurate scientific conclusions, even when scientists review the evidence by combining results from multiple studies.
For example, if some studies show a particular drug is effective in treating hypertension, but other studies find it is not effective, scientists would analyse all the data to reach an overall conclusion. But if enough results have been p-hacked, the drug would look more effective than it is.
"We looked at the likelihood of this bias occurring in our own specialty, evolutionary biology, and although p-hacking was happening it wasn't common enough to drastically alter general conclusions that could be made from the research," she said.
"But greater awareness of p-hacking and its dangers is important because the implications of p-hacking may be different depending on the question you are asking."
The research is published in PLOS Biology.
Michael Mann, you are being paged......
This applies to many things...like it’s OK to lie to the public if you’re a politician as long as it’s “unknowingly” done...
Better page Marie Harf too.
Unknowingly?
I doubt it.......................
Why is P-hacking the fault of scientists?
Seems more likely something that would be directly caused as a result of Anthropomorphic Glowbull Warming.
How do we know this study wasn't tweaked to increase its chances of getting results that are easily published?
These are NOT scientists. They are just the more educated members of the “Gimme Dat” crowd pushing in for their piece of the federal pie.
Unknowingly? Oh, puleez. My Aunt Fanny.
That is not unknowingly.
That is outright fraud. That is why there are lies, damn lies and statistics.
You run the numbers several different ways and then pick the one that is most common, the worst case scenario and the best cast scenario.
You present all three.
I’m not so certain about this.
In my field, results are usually validated with either a Student’s t-test or an Anova test.
You can set up an experiment with all of the proper controls, and when you graph the results at the end, they look wonderful. The graph bars are different heights, the standard deviations are fairly small. But then, you do the t-test and get a p-value of 0.051... which just barely makes it across the threshold. A repeat of the experiment gives a p-value of 0.048. Another repeat gives a p-value of 0.050.
An honest scientist would report that as seeing a difference that was not statistically significant. In other words, more study needs to be done to answer the question.
When you have the option of designing a study so that the only variable in the study is the one you are trying to manipulate in order to test the hypothesis, you can set the validation standards quite high.
In some studies, such as drug studies in large populations, it becomes quite difficult to discern the effects of the drug versus other variables that cannot be controlled. Human beings are notoriously difficult to standardize, and we can’t establish a population of genetically identical humans for research the way we can with mice or rabbits. So, then, it takes some really heavy-duty statistics to make sense of the data, and different statistical tests can give different statistical significance.
I think the issue is not so much bias, but the difficulty of interpreting study results in a highly variable background.
My statistics and design prof referred to it as data massaging.
Statistics is one of the easiest branch of mathematics. However, even if you do all the calculations correctly your answer could be garbage. Statistics is the most misapplied branch of mathematics. Most scientists do not have a clue about properly designing a statical experiment. It requires a significant amount of thought in selecting an appropriate P-value (nerds will get this pun). Often, the value is arbitrarily picked as p=0.05 (the most common p-value used). Rarely do the scientist consider the ramifications of a type I or type II statistical error in their research. Many scientists do no clearly understand statistics, they tend to mimic the statistics that they have seen in the past. I have had six different courses just in statistical design of experiments(5 A’s and 1 B) and I get it wrong sometimes, usually because I lack a proper understanding of the application. I have seen good scientists unknowingly get it wrong.
“You can set up an experiment with all of the proper controls, and when you graph the results at the end, they look wonderful. The graph bars are different heights, the standard deviations are fairly small. But then, you do the t-test and get a p-value of 0.051... which just barely makes it across the threshold. A repeat of the experiment gives a p-value of 0.048. Another repeat gives a p-value of 0.050.”
My attitude has always been if you need to use statistics to prove the results are significant, they probably aren’t. As we know the p=0.05 standard is an arbitrary standard indicating there is ~ 1/20 chance the results are not statistically significant. That’s hardly definitive.
there's an interesting history behind the choice of .05 as the threshold of statistical significance. Before the advent of electronic computers, tables of statistics were calculated, literally, by hand, possibly using a mechanical calculator for the actual arithmetic. Most published tables, for no reason better than tradition, included a column of the 5% value of the distribution. If a researcher didn't want to compute a new set of tables for himself, it was convenient to use the published tables, and choose 5% as the cutoff because that value was in the table. Now it's possible to compute the actual probability of the result you obtained, rather than just observing whether it's over or under the 5% limit. However, few researchers bother to do that.
Model selection, or even the choice of nonparametric modeling, is also not cut and dried. People can have honest differences of opinions on these, although I take the article to say that there is a bias in favor of justifying models that get you published (e.g., Hey, look, p=.048 under this model!!).
First thing my stats prof in undergrad said was “Figures lie and liars figure.” The second thing he said is that “if you have a pre-conceived notion, you can prove anything you want with statistics.”
In grad school, my stats class proved to me that most people who took undergrad stats really don’t know how to figure out how to properly build statistics. This is now being exacerbated by “Visualization Technology” where raw data goes in, and instead of creating data outputs, everything is visualized into some creative graph that lets people who don’t have a clue of what they are looking at go “ooh” and “ahh” and “oh, its obvious that (fill in the blank) is happening. Visualizations are stats for those who don’t understand stats.
I have a copy of this book from the early 50's. It's still in print!...............
One of the other things that people need to be aware of is that the conclusions can be the direct opposite of the data.
I wish I could find it, but there was a medical study published that had some inflammatory conclusion a few years back. I was curious enough to actually read the damn thing.
To my horror, I found that the data said the exact opposite of the conclusion. There was a paragraph within the study that dismissed the researchers’ own data. They carried on with why the data was wrong and went to the conclusion.
It kills me that I can’t find it.
The peer review process has totally broken down. There’s too much trust, not enough confirmation, and not enough rigorous examination. This is how we end up with decades of ‘don’t eat cholesterol’ being foisted onto people by their doctors and public policy.
In the field of biochemistry, we cannot get published without showing significance through the P value. Typically, we repeat identical experiments three times, with the P < 0.05 each time, before we even accept our result.
I know that the P value is somewhat arbitrary, but it is a good tool for discarding results that are absolute junk. The cut-off has to be somewhere.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.