
• 13. Statistical Hypotheses
Motivating Scenario: You are beginning you journey into the world of null hypothesis significance testing. Wait… what even is a null hypothesis?
Learning Goals: By the end of this section, you should be able to:
- Explain why we create null models and what makes a good one.
- Differentiate between the null and alternative hypothesis.
- Differentiate between biological and statistical hypotheses.

Scientific hypotheses are exciting. As scientists, we ask interesting questions. For example, throughout this book, we are asking if parviflora flowers have evolved in ways to make them less likely to make hybrids with their close relative, xantiana. Other scientific questions include: do vaccines cause autism, does a novel drug have its claimed effect, etc etc… . These are our scientific hypotheses. They are meaningful, and grounded in our understanding of the biological world. They are the reason we do science.
As scientists, we’re usually trying to evaluate support for a scientific hypothesis. But in the null hypothesis significance testing framework of frequentist statistics (which we follow for most of this book), we do this in a somewhat backwards way. We evaluate the plausibility of a boring statistical hypothesis, known as the null hypothesis. If our observations are inconsistent with the null, we conclude that there is likely something else going on.
The null hypothesis
The Null Hypothesis (\(H_0\)) is the ultimate skeptic. It argues that any pattern you see in your data is just an illusion created by random chance or sampling error. It’s the voice that says, “nothing interesting is happening here” (Figure 2). This is a very specific claim so the null model is very specific.

The Alternative Hypothesis (\(H_A\) or \(H_1\)) is the simple opposite. It just claims that the null hypothesis is wrong… i.e. that something other than chance sampling error is likely responsible for the pattern in the data. This is a vague claim so the alternative hypothesis is not specific.
For our flower example, the null and alternative hypotheses are:
- \(H_0\): The proportion of moms with at least one hybrid seed does not differ between white and pink flowered plants.
- \(H_A\): The proportion of moms with at least one hybrid seed does differ between white and pink flowered plants.
The null hypothesis doesn’t care about your theories, it does not evaluate effect size, and has no sense of biological relevance.
Properties of good null hypotheses
Notice that we chose to compare proportions, not the raw counts of plants with hybrids. This is a crucial feature of a good hypothesis test: it must make a fair comparison. Because our sample sizes for pink (56) and white (58) flowers were unequal, comparing raw counts would be misleading and biologically uninteresting. More generally, because the null hypothesis is a skeptic that doesn’t understand biology, it’s our job to design studies where its rejection is both interesting and informative.
Good nulls are non-trivial: Testing the null that white flowers have zero hybrids is lame. If we see at least one hybrid then we couldn’t have gotten such a result by sampling error from a population with no hybrids. Similarly, the null hypothesis that mean petal length is zero mm squared should not be tested!
Good nulls represent a fair comparison: As stated above, we compared the proportion white and pink flowered plants with at least one hybrid seed, not the raw counts to avoid bias.When you design your studies make sure the comparison is fair!.
Great nulls isolate the effect of interest: A great null model creates a world where “all else is equal” (ceteris paribus). For example, the best test would ensure that other covariates, that differ (e.g. differences in petal length) between our explanatory variable (e.g. flower color morph), aren’t the real cause of a difference in our response variable.
tl/dr: It is our responsible to design studies with a clean link between our exciting scientific question and the rigid world of statistics.

A Technical Note: One-Tailed vs. Two-Tailed Tests
Notice that our \(H_A\) above says the proportions “differ” between white and pink morphs. It did not specify a direction. This is a two-tailed test, and it’s standard practice. We are open to the effect going in either direction. A one-tailed test is when we only care about a specific direction (e.g., \(H_A\): pink flowers have a higher proportion of hybrids). In practice, one-tailed tests are rare and often inappropriate because we’d almost always want to know about a strong effect in the unexpected direction. Additionally, one-tailed tests often breed distrust in your audience – they signal that you are trying to pull a fast one.
Rare cases when a one-tailed test is appropriate occur when both extremes of the outcome are on the same side of the null distribution. For instance, if I were studying the absolute value of something, the null hypothesis would be that it’s zero, and the alternative would be that it’s greater than zero. We’ll see that some test statistics, like the \(F\) statistic and (often) the \(\chi^2\) statistic, only have one relevant tail.