Bad Ethics: Data Dredging in Research

Note: The information provided on this site is intended for your general knowledge only and is not a substitute for professional medical advice.

When researcher’s conduct experiments, they often attempt to determine whether or not their results are “statistically significant.” What exactly does this mean?

In statistics, a number called the p-value is used to quantify the significance of results. A p-value always lies between 0 and 1, and a p-value less than 0.05 is considered statistically significant.

Sometimes, results-driven researchers engage in a questionable practice called data dredging or p-hacking. In data dredging, researchers will conduct lots of analyses and hope that 1 or 2 of them will yield a statistically significant result.

Say, for example, that a researcher is studying the effects of hypothetical drug. The researcher might feed the drug to 100 groups of 10 people each, and measure changes in various biological indicators over time (things like blood pressure, cholesterol level, white blood cell count, etc).

Let’s assume that the drug has no significant overall effect on patients in the trial. However, there’s a very high chance that for at least one of the groups, the p-value will be less than 0.05, simply because of the sheer number of groups tested. Even if 99 of the groups show a statistically insignificant result, the ethically challenged researcher might claim that the drug has “statistically significant” results, touting the results of the 1 study with a p-value less than 0.05.

This is an example of data dredging. If you ever practice medical research, be sure to avoid this. You may end up endangering patients’ lives.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s