Six Methods Statistics Are Misapplied: Disproving Fallacies Regarding Organic Food, Autism, Nicolas Cage Films, and Drowning Hazard

Six Methods Statistics Are Misapplied: Disproving Fallacies Regarding Organic Food, Autism, Nicolas Cage Films, and Drowning Hazard


Back in the 1940s, prior to the creation of the polio vaccine, the illness instilled significant fear in the parents of young children. What measures could be taken to lessen the chance of your child contracting this dreadful ailment? Some misinformed public health officials seemingly advised steering clear of ice cream, owing to a study that indicated a link between ice cream consumption and polio outbreaks. This study was, fortunately, flawed. Yes, a link existed between ice cream consumption and polio outbreaks, but that was attributable to both being prevalent during the summer months. The study’s authors confused correlation (ice cream consumption and polio being more frequent at the same time) with causation (ice cream raising your risk of illness).

Medical researchers frequently sift through data sets to determine what environmental elements lead to chronic diseases. Regrettably, these types of studies occasionally make the same errors as the ice cream and polio research. Dr. John Ioannidis stirred up considerable debate when he declared in 2005 that “most published research findings are false.” While his principal assertion may be contentious, he was completely correct in highlighting significant issues with the usage of statistics — resulting in some medical research studies being misleading or flawed. Articles in popular science featured in the media and online exacerbate this issue by disregarding the study’s constraints they are discussing. Thankfully, one doesn’t need to be a math expert to identify these issues; a bit of basic critical thinking will suffice. Below are six ways that statistics can sometimes be misapplied and how to recognize them.

1) Equating correlation with causation.

Does allocating resources to science lead to suicides? Clearly, it does. Just look at the figures!

And what about the astonishing correlation between drowning incidents in pools and Nicolas Cage films? Come now. Surely you can see that Nicolas Cage films are responsible for drowning?

And my personal favorite: obviously, the true cause of autism is the heightened consumption of organic foods! Look folks, the data confirms it! It’s science!

As illustrated, merely because a correlation exists between two variables doesn’t imply that one is the cause of the other. There may be an additional factor at play. Reflect on the ice cream and polio study. Ice cream and polio were linked due to a third concealed factor ignored by the study (the summer) that affected both. Statisticians refer to this hidden third variable as a confounding element. Alternatively, you might encounter a correlation like the one between Nicolas Cage and drowning due to sheer luck — random chance.

2) Data dredging.

Data dredging poses a challenge in medical research. Allow me to create an entirely hypothetical scenario to illustrate how this operates. (And be warned, I’m going to make this as absurd as possible.)

Imagine you randomly select a sample of one thousand individuals and conduct a survey with two inquiries: 1) have you viewed a Nicolas Cage film in the last year and 2) at this moment on a scale from one to twenty, how strong is your urge to drown yourself in a freshwater pool? (The freshwater element is crucial, I think.) Suppose the average drowning desire among Nicolas Cage watchers was 12, while the average desire for non-Cage watchers was 10. Thus, individuals who watched Nick Cage were 1.2 times more inclined to want to drown themselves! OMG! But hold on… I merely randomly picked a sample of a thousand people. If I had selected a different sample, would I yield a different result? How can I be sure this outcome isn’t merely a product of random chance?

What many researchers in the medical field will do is compute a p-value. The easiest way to clarify this is with a visual representation.

In most human demographics, various characteristics will follow a bell-curve distribution like the one above. In our Nicolas Cage situation, let’s assume the urge to drown is represented on the x-axis, with the quantity of people expressing that desire on the y-axis. We’re presuming that if you were to analyze the entire population and plot how many individuals had a drowning desire of, say, 8, 9, or 10, etc., you’d create a bell-shaped curve similar to the one depicted, with the same mean observed in our non-Cage watching group. Thus, the p-value calculates, if there is no distinction between Cage and non-Cage watchers, if watching Cage genuinely has no effect, what is the probability of inadvertently selecting a sample with an average drowning desire of 12? In other terms, could we have coincidentally taken a sample situated near the extreme end of the bell curve?

Medical science researchers have whimsically settled on a p-value cutoff of 5% or 0.05. In essence, if the disparity between group A (individuals consuming meat, those who view Nicolas Cage films, individuals who handle chemical Y, etc.) and group B is substantial enough, the p-value becomes less than