β€’ 14B. \(\chi^2\) summary

Chapter summary

The χ² test-statistic quantifies the difference between observed and expected counts of categorical variables. By comparing an observed \(\chi^2\) value to its null distribution (generated by simulation, permutation, or math) we can caclulate p-values and conduct NHST.

Chatbot tutor

Please interact with this custom chatbot (link here). I have made to help you with this chapter. I suggest interacting with at least ten back-and-forths to ramp up and then stopping when you feel like you got what you needed from it.

Practice Questions

Try these questions! By using the R environment you can work without leaving this β€œbook”. I even pre-loaded all the packages you need!


SETUP: Sometimes female Latrodectus hasselti (redback spiders) eat their mates. Is there anything in it for the males? Maydianne Andrade tested the idea that eating a male might prevent her from re-mating with a second male – that is if she’s too preoccupied eating/digesting her mate that she’s not looking to mate again. She observed whether a female accepted a second male after the first male either escaped or was eaten. link.

I have these data:

  • In long format with three columns (first_male, second_male, and count), and four columns in a tibble called long_spider.

  • In wide format as a contingency table with three columns (first_male, Accepted, and Rejected (both refer to second male)), with eaten and escaped as count data, in a tibble called widespider. first_male Accepted Rejected

Q1) This study is


Q2) Plot the data. Which trend is apparent?


Q3) Assuming independence, how many cases do you expect when the first male is eaten and the second male is accepted?

  • \(P_{first male eaten}\) =9/32.
  • \(P_{second male accpeted}\) =25/32.

\(\frac{9}{32} \times \frac{25}{32} \times 32 = 7.03\)


Q4) What is the (two-tailed) null hypothesis?

Q5) What is the (two-tailed) alternative hypothesis?

wide_spider |> 
  select(-1)|> # remve the label column that does not have numbers
  chisq.test()
Warning in stats::chisq.test(x, y, ...): Chi-squared approximation may be
incorrect

    Pearson's Chi-squared test with Yates' continuity correction

data:  select(wide_spider, -1)
X-squared = 11.28, df = 1, p-value = 0.0007836

Q6) Given our test,

πŸ“Š Glossary of Terms

Glossary: χ² (Chi-Squared) Tests

  • Chi-Squared (χ²) Statistic: A measure of how far observed counts differ from expected counts under a null model. Larger χ² values indicate greater deviation from expectation.

  • Chi-Squared Distribution: A probability distribution that describes the sampling distribution of the χ² statistic when the null hypothesis is true. It depends only on the number of degrees of freedom.

  • Degrees of Freedom (df): The number of independent values that can vary in a dataset after certain constraints are applied. For a χ² test, this typically equals the number of categories minus one (for goodness-of-fit) or (rows βˆ’ 1) Γ— (columns βˆ’ 1) for contingency tables.

  • Goodness-of-Fit Test: A χ² test used to assess whether observed frequencies across categories differ significantly from expected frequencies based on a specific theoretical distribution.

  • Contingency (or Independence) Test: A χ² test used to evaluate whether two categorical variables are independent. Observed frequencies in a contingency table are compared to the frequencies expected if there were no association.


R Packages Introduced

There a re no new packages, but we continue to use infer: this time for permutation.


πŸ› οΈ Key R Functions

Here’s the matching summary for the chi-square functions:


  • chisq.test(): Performs a chi-square test (ether a goodness-of-fit, or a contingency test). Returns the χ² statistic, degrees of freedom, and p-value.
chisq.test(table(species, visited), correct = FALSE)

  • pchisq(): Looks up p-values or cumulative probabilities from the theoretical chi-square distribution. Used to compute p-values from a \(\chi^2\) calculation/ Example:
pchisq(chi2_stat, df = 1, lower.tail = FALSE)

Additional resources