17. A binary explanatory variable

One of the most common questions is “What’s the difference?” We often want to know:

What’s the difference between the vaccinated and the unvaccinated?
What’s the difference in children’s mental health before and after the advent of social media?
What’s the difference in hybridization rate between pink and white plants?

NOTE: We usually want to know cause (e.g., “Will a vaccine help or hurt?”), but we’re often stuck with associations (e.g., “What’s the difference between vaccinated and unvaccinated people?”). Causal claims require careful experiments or other formal approaches to causal inference.

The t-distribution

We can use the t-distribution to evaluate the null hypothesis that two samples come from the same statistical population.

Paired comparisons: When data are naturally paired, we can use a one-sample t-test on the differences between members of each pair. This design provide us great statistical power to reject false null hypotheses because pairs “soak up” variability unrelated to treatment. This gives us more sensitivity to detect real effects.
Unpaired comparisons: In many cases, natural pairing is impractical or impossible. For example, there is no natural “pairing” in comparisons between pink and white parviflora RILs. In these situations, we test whether the means of two independent samples differ.

Our Path

We will return to our Clarkia RILs and compare pollinator visitation on to pink and white flowered RILs at site Sawmill Road. We might think that pink flowers attract more pollinator visits than white flowers (our scientific hypothesis). Recasting this idea as a (two-tailed) statistical hypotheses:

Null hypothesis: The mean number of pollinators visits to pink and white flowered parviflora RILs are the same.
Alternative hypothesis: The mean number of pollinators visits to pink and white flowered parviflora RILs differ.