Motivating Example: You understand the t-distribution, and using it for estimating uncertainty and testing the null that a mean equals \(\mu_0\). But it’s a lot of work to do this. If only there were an easier way 🤔. Here, we show how R can do this for us.
Learning Goals: By the end of this section, you will be able to:
Perform a one-sample t-test using the t.test() function.
Use the broom package (tidy(), augment()) to generate clean, predictable output from model objects.
Frame a one-sample t-test as a simple linear model using the lm() function.
Interpret the output of summary() for a simple linear model.
“What does a one-sample t-test in R mean”, you ask, realizing we just did that. Let me explain…
Figure 1: R can make things easy! Image made with ChatGPT.
We just worked through the calculations for a confidence interval and a hypothesis test step by step. But R can do this for us in a single line. Now that you know how it works and what R is doing, I can show you how to have R do this work for you.
t.test()
The t.test() function is the easiest way to run a one-sample t-test in R. There are two relevant arguments:
x: A vector of observations.
mu: \(\mu_0\) – i.e. the expected value of x under the null hypothesis.
Note: Because x must be a vector, I pull() it out of its tibble.
t.test(x =pull(range_shift,elevationalRangeShift) , mu =0)
One Sample t-test
data: pull(range_shift, elevationalRangeShift)
t = 7.1413, df = 30, p-value = 0.00000006057
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
28.08171 50.57635
sample estimates:
mean of x
39.32903
Tidying up R model outputs
The output of t-test() includes:
The test statistic, t.
The degrees of freedom.
The p-value.
The 95% confidence interval.
The sample estimate (in this case the mean).
While all of these results match our calculations (and are faster and easier to do), there is one problem – they are a mess. That is, they are stored in a format that you can read, but it’s not easy to process with a computer. The tidy() function in the broom package offers a convenient format:
library(broom)t.test(x =pull(range_shift,elevationalRangeShift) , mu =0) |>tidy()
t.test() output piped through broom's tidy() function
estimate
statistic
p.value
parameter
conf.low
conf.high
method
alternative
39.33
7.14
6.06e-08
30
28.08
50.58
One Sample t-test
two.sided
The statistic here is the t-value. The statistic column will contain the value of whichever test statistic your analysis produces.
The general linear model framework in R
An alternative approach to the t.test() function is to use the R’s framework for linear models. Remember that a linear model predicts the value of the response variable of the \(i^{th}\) observation \(\hat{Y_i}\) as a function of things in our model. For a one sample t-test, our model only contains the mean, which we represent as the “intercept”, \(a\):
\[\hat{Y_i} = a\]
Of course, actual observations \(Y_i\), will deviate from model predictions. This deviation, known as the “residual” is shown as \(e_i\):
\[Y_i = \hat{Y_i} + e_i\]\[Y_i = a+e_i\]
The lm() function in R builds such linear models. If we are only modelling the mean, the syntax is:
lm(y ~ 1) if y is a vector, or
lm(y ~ 1, data = dataset) if y is a column in a tibble (or dataframe)
Here the 1 ells R to fit the simplest possible model: an intercept-only model. The estimate for this intercept will be the overall mean of our response variable:
This may seem like overkill for a one-sample t-test, but the linear model framework generalizes to regression, ANOVA, and more. Seeing a one-sample t-test as a linear model prepares us for the models we’ll meet next.
summary() provides p-values and test statistics
lm() just returns the mean, but it actually calculates so much more! To access information like the standard error, t-value, p value (reported as Pr(>|t|)), and degrees of freedom, pipe the output to summary():
lm(elevationalRangeShift ~1, data = range_shift) |>summary()
Call:
lm(formula = elevationalRangeShift ~ 1, data = range_shift)
Residuals:
Min 1Q Median 3Q Max
-58.629 -23.379 -3.529 24.121 69.271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.329 5.507 7.141 0.0000000606 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 30.66 on 30 degrees of freedom
tidy() cleans up the output
As above, we can use broom’s tidy() function to turn this output into a more convenient format:
library(broom)lm(elevationalRangeShift ~1, data = range_shift) |>tidy()
broom’s augment() function shows each observation’s the actual value, predicted value (.fitted) and residual (.resid). I’m only showing the first three columns because they are the only ones we’ve covered so far, and I only show the first five rows to save space.
library(broom)lm(elevationalRangeShift ~1, data = range_shift) |>augment()|>select(1:3)|>slice_head(n =5)