Even if you’re not a professional researcher or scientist, you might have at some point here about a study or research finding on the news that claims that something is statistically “significant.” You might have wondered what this means, and why it matters. If you’re a researcher, you might have wondered about why ¬the idea of statistical significance has recently become a topic of some debate.
Here’s a simple (and still pretty accurate) way to think about p-values:
Claim, based on the data of a study, that the effect being studied is “real” and is not just a fluke. The p-value is the probability that you’re wrong.
A slightly longer explanation:
A p-value is a probability applied to hypothesis testing. In hypothesis testing, one compares their interesting “alternative hypothesis” about how nature works against some boring “null hypothesis.” Say I hypothesized that there was a difference in income between men and women. The null hypothesis would be, “there is no difference,” and the alternative would be, “there is a difference.” Even though we design our experiments so carefully, the universe is a chaotic place, and so random chance and factors that we didn’t imagine still affect the results of an experiment.
Therefore, it’s useful to have a statistic that tells us something about how likely our results would be if the null hypothesis were actually true. Then we could say to everyone, “it’s so unlikely to see these results if the null hypothesis was true, we must reject it in favor of my super awesome and interesting alternative.” This is the role of the p-value: it gives the probability of obtaining a result at least as extreme as the result that you observed, given that the null hypothesis was correct (i.e., that there is no trend/difference). When p gets low, it means the pattern of data we see is increasingly unlikely, assuming nothing is happening. The more plausible “alternative,” becomes more likely: that something interesting is happening.