#SOCRMx week 6 – Statistical Significance

CC0 Creative Commons


This week in SOCRMx we’ve been looking at quantitative data is analysis. While I’m not afraid of the maths, it’s a very new area for me, and all the terms can be a little overwhelming. The MOOC has a really nice approach, though, which integrates a historical perspective. From Guinness brewers to statistician wars and blanket bans on methods, it has made for very interesting reading.

The latest area I’ve been looking into is that of statistical significance. We’re prompted:


We can imagine how an over-reliance on p=0.05, or even on other values of p in significance testing, could lead researchers to claim strong support for a particular hypothesis when the evidence is actually weak. Do you know of any examples of such claims? What are the arguments in favour of significance testing? What are the arguments against? How might this debate influence your own research designs?

Here goes..

CC BY-SA 3.0 Repapetilto

A definition:

“The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true” (Biau, Jolles & Porcher, 2010).






The main advantage of using p-values seems to be that they are simple:

Of primary importance, the test of a null hypothesis is conducted in the context of a simple decision rule and provides a dichotomous outcome (Greenwald et al. 1996, 177). While critics would argue that hypothesis tests provide less information compared to alternative techniques, supporters argue that the binary decisions nevertheless enable scholarly progress and theory testing, which “requires nothing more than a binary decision about the relation between two variables” (Chow 1988, 105; Wainer 1999) (cited in Iacobucci, 2005)

It is also suggested that they can enable researchers to have greater confidence with smaller sample sizes, which might be required in preliminary research (Novella, 2015).


The disadvantages include:

– they are not reliable, since they are highly variable when tests are replicated (Cumming, 2009 [a cracking video – well worth a watch]; Novella, 2015).

– even when your concern is not replicability, “If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time” (Colquhoun, 2014) / “a P value of 0.05 raises that chance to at least 29%” (Nuzzo, 2014)

– Plausibility of hypothesis has a greater  impact on whether an exciting finding is a false alarm than p-values, so a Bayesian framework may be more effective (Nuzzo, 2014).

– P values can “deflect attention from the actual size of an effect”, and “[According to Geoff Cumming, an emeritus psychologist at La Trobe University in Melbourne, Australia] “We should be asking, ‘How much of an effect is there?’, not ‘Is there an effect?’” (Nuzzo, 2014)

– the widespread use (and demand for) p-values can encourage ‘p-hacking’, which Uri Simonsohn (of University of Pennsylvania) describes as “trying multiple things until you get the desired result” — even unconsciously”, which can diminish the criticality and skepticism which exploratory research should be approached with (Nuzzo, 2014).


How might my research be influenced by the debate on significance testing?

CC0 Creative Commons

– In the presentation of results, I might be inclined to present everything:

  • Simonsohn advises researchers to report how we determined our sample size, all data exclusions (if any), all manipulations and all measures in the study” (Nuzzo, 2014)
  • We also encourage the presentation of frequency or distributional data when this is feasible” (Trafimow & Marks, 2015)
  • Cumming [from La Trobe University in Melbourne] recommends reporting effect sizes and confidence intervals (Cumming, 2009; Nuzzo, 2014).

– I’m encouraged to “try multiple methods on the same data set”, and if “the various methods come up with different answers”, to attempt to work out why (Nuzzo, 2014)

– Rather than getting caught on the statistical significance, I might hold on to notions of practical significance:

Many statisticians also advocate replacing the P value with methods that take advantage of Bayes’ rule: an eighteenth-century theorem that describes how to think about probability as the plausibility of an outcome, rather than as the potential frequency of that outcome. This entails a certain subjectivity — something that the statistical pioneers were trying to avoid. But the Bayesian framework makes it comparatively easy for observers to incorporate what they know about the world into their conclusions, and to calculate how probabilities change as new evidence arises (Nuzzo, 2014)

I’m very interested to hear what others who have more experience working with quantitative data than I do see as the implications of the debate.