14-B. χ2
Motivating Scenario:
We want more practice with the idea of permuting and hope to connect null distributions generated by simulation (in general), and (specifically) permutation to those using mathematical formulae.
Learning Goals: By the end of this chapter, you should be able to:
- Know that computational tools like can be used to generate null distributions. Specifically, we can
- Simulate (with
sample()
to generate the null distribution of expected counts to evaluate the “goodness of fit”.
- Permute, (with
infer
) to generate the null distribution for associations between categorical variables.
- Simulate (with
- Describe how these computationally generated nulls match the theoretical \(\chi^2\) distribution, and why that’s useful.
- Use R to perform and interpret \(\chi^2\) tests.
Fundamentally, a p-value quantifies the idea of some outcome being “unexpected.” P-values work by comparing our observed value of some test statistic to its expected distribution under the null hypothesis.
In the previous chapter, we generated a null sampling distribution by shuffling (or “permuting”) the relationship between the explanatory and response variables. Here, we will both introduce how to simulate to generate a null sampling distribution or “expected counts”, and how we can permute to generate a null for associations between categorical variables.
We begin by introducing the \(\chi^2\) statistic itself — a test statistic that quantifies the difference between expected and observed counts. We will work through this concept with a “goodness of fit” example, in which we see if data in categories are consistent with their null distribution. We then use the familiar permutation approach from the previous chapter to explore the distribution of this test statistic under the null.
We then show that these computationally-derived distribution match the mathematical \(\chi^2\) distribution. We can therefore use this distribution to test whether count data are truly “unexpected” under the null hypothesis without simulating or permuting the data! This section bridges our permutation-based intuition for a null distribution to analytical shortcuts used to generate null distributions with less computational effort.
We’ll use this test to see if we can reject the idea that students in class picked numbers at random, and to evaluate whether pink and white flowers are equally likely to receive zero pollinator visits. This statistical model gets at the motivating biological question: does petal color influence pollinator visitation (and perhaps hybridization rates)?