• 16. One sample t-test in R

Motivating Example: You understand the t-distribution, and using it for estimating uncertainty and testing the null that a mean equals \(\mu_0\). But it’s a lot of work to do this. If only there were an easier way 🤔. Here, we show how R can do this for us.

Learning Goals: By the end of this section, you will be able to:

Perform a one-sample t-test using the t.test() function.
Use the broom package (tidy(), augment()) to generate clean, predictable output from model objects.
Frame a one-sample t-test as a simple linear model using the lm() function.
Interpret the output of summary() for a simple linear model.

Code to load and process data

library(tidyr)
library(dplyr)
range_shift_file <- "https://whitlockschluter3e.zoology.ubc.ca/Data/chapter11/chap11q01RangeShiftsWithClimateChange.csv"
range_shift <- read_csv(range_shift_file) |>
  mutate(uphill = elevationalRangeShift > 0)|>
  separate(taxonAndLocation, into = c("taxon", "location"), sep = "_")

Stats in R

“What does a one-sample t-test in R mean”, you ask, realizing we just did that. Let me explain…

A poster saying "Easy peasy lemon squeezy." — Figure 1: R can make things easy! Image made with ChatGPT.

We just worked through the calculations for a confidence interval and a hypothesis test step by step. But R can do this for us in a single line. Now that you know how it works and what R is doing, I can show you how to have R do this work for you.

t.test()

The t.test() function is the easiest way to run a one-sample t-test in R. There are two relevant arguments:

x: A vector of observations.
mu: \(\mu_0\) – i.e. the expected value of x under the null hypothesis.

Note: Because x must be a vector, I pull() it out of its tibble.

t.test(x = pull(range_shift,elevationalRangeShift) , mu = 0)


    One Sample t-test

data:  pull(range_shift, elevationalRangeShift)
t = 7.1413, df = 30, p-value = 0.00000006057
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 28.08171 50.57635
sample estimates:
mean of x 
 39.32903

Tidying up R model outputs

The output of t-test() includes:

The test statistic, t.
The degrees of freedom.
The p-value.
The 95% confidence interval.
The sample estimate (in this case the mean).

While all of these results match our calculations (and are faster and easier to do), there is one problem – they are a mess. That is, they are stored in a format that you can read, but it’s not easy to process with a computer. The tidy() function in the broom package offers a convenient format:

library(broom)
t.test(x = pull(range_shift,elevationalRangeShift) , mu = 0) |>
  tidy()

t.test() output piped through broom's tidy() function
estimate	statistic	p.value	parameter	conf.low	conf.high	method	alternative
39.33	7.14	6.06e-08	30	28.08	50.58	One Sample t-test	two.sided

The statistic here is the t-value. The statistic column will contain the value of whichever test statistic your analysis produces.

The general linear model framework in R

An alternative approach to the t.test() function is to use the R’s framework for linear models. Remember that a linear model predicts the value of the response variable of the \(i^{th}\) observation \(\hat{Y_i}\) as a function of things in our model. For a one sample t-test, our model only contains the mean, which we represent as the “intercept”, \(a\):

\[\hat{Y_i} = a\]

Of course, actual observations \(Y_i\), will deviate from model predictions. This deviation, known as the “residual” is shown as \(e_i\):

\[Y_i = \hat{Y_i} + e_i\] \[Y_i = a+e_i\]

The lm() function in R builds such linear models. If we are only modelling the mean, the syntax is:

lm(y ~ 1) if y is a vector, or
lm(y ~ 1, data = dataset) if y is a column in a tibble (or dataframe)

Here the 1 ells R to fit the simplest possible model: an intercept-only model. The estimate for this intercept will be the overall mean of our response variable:

lm(elevationalRangeShift ~ 1, data = range_shift)


Call:
lm(formula = elevationalRangeShift ~ 1, data = range_shift)

Coefficients:
(Intercept)  
      39.33

This may seem like overkill for a one-sample t-test, but the linear model framework generalizes to regression, ANOVA, and more. Seeing a one-sample t-test as a linear model prepares us for the models we’ll meet next.

summary() provides p-values and test statistics

lm() just returns the mean, but it actually calculates so much more! To access information like the standard error, t-value, p value (reported as Pr(>|t|)), and degrees of freedom, pipe the output to summary():

lm(elevationalRangeShift ~ 1, data = range_shift) |>
  summary()


Call:
lm(formula = elevationalRangeShift ~ 1, data = range_shift)

Residuals:
    Min      1Q  Median      3Q     Max 
-58.629 -23.379  -3.529  24.121  69.271 

Coefficients:
            Estimate Std. Error t value     Pr(>|t|)    
(Intercept)   39.329      5.507   7.141 0.0000000606 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 30.66 on 30 degrees of freedom

tidy() cleans up the output

As above, we can use broom’s tidy() function to turn this output into a more convenient format:

library(broom)
lm(elevationalRangeShift ~ 1, data = range_shift) |>
  tidy()

# A tibble: 1 × 5
  term        estimate std.error statistic      p.value
  <chr>          <dbl>     <dbl>     <dbl>        <dbl>
1 (Intercept)     39.3      5.51      7.14 0.0000000606

Fitted values and residuals from augment()
elevationalRangeShift	.fitted	.resid
58.9	39.329	19.571
7.8	39.329	-31.529
108.6	39.329	69.271
44.8	39.329	5.471
11.1	39.329	-28.229

augment() shows residuals (and more)

broom’s augment() function shows each observation’s the actual value, predicted value (.fitted) and residual (.resid). I’m only showing the first three columns because they are the only ones we’ve covered so far, and I only show the first five rows to save space.

library(broom)
lm(elevationalRangeShift ~ 1, data = range_shift) |>
  augment()|>
  select(1:3)|>
  slice_head(n = 5)