In the previous section, we calculated a difference in \(\text{log}(\text{visits} + 0.2)\) between pink and white flowers at site SM of \(0.366- (-0.445) = 0.811\). We also quantified uncertainty about this estimate as a standard error of 0.161. Recalling the equation for t:
\[t = \frac{\bar{x}-\mu_0}{s_\bar{x}} = \frac{(\bar{x_1}-\bar{x_2})-0}{s_\overline{x_1-x_2}}\] In this case
\[t = \frac{0.366- (-0.445)}{0.161}=5.04\]
Since we had 105 degrees of freedom, our critical t is going to be close to 1.96. Because our observed t-value of \(\approx 5\) is way bigger than our critical t-value of \(\approx 2\), we strongly reject the null hypothesis that pink and white-flowered Clarkia xantiana subspecies parviflora RILs at site SM are visited equally by pollinators.
We can use the “formula” syntax in the t.test() function to have R test this null for us. Note that we set var.equal = TRUE. We see that this provides p-values and 95% confidence intervals identical to what we calculated ourselves:
t.test(log_visits ~ petal_color, data = SR_rils, var.equal =TRUE)
Two Sample t-test
data: log_visits by petal_color
t = 5.0312, df = 105, p-value = 0.000002024
alternative hypothesis: true difference in means between group pink and group white is not equal to 0
95 percent confidence interval:
0.4916664 1.1312798
sample estimates:
mean in group pink mean in group white
0.3661857 -0.4452874
Again, we can make this output tidy / easier to process in R with the tidy() function in the broom package:
For some reason that I don’t understand, tidy() labels the column with the “degrees of freedom” as “parameter”. smdh.
library(broom)t.test(log_visits ~ petal_color, data = SR_rils, var.equal =TRUE)|>tidy()
estimate
estimate1
estimate2
statistic
p.value
parameter
conf.low
conf.high
method
alternative
0.811
0.366
-0.445
5.031
2.0e-6
105
0.492
1.131
Two Sample t-test
two.sided
If our data were untransformed, or if the transformation led to a clean biological interpretation we would be done. Our (transformed) data met test assumptions, we got interesting results etc. However, I have no idea how to interpret \(\text{log}(\text{visits} + 0.2)\). So, for example the standard error and 95% confidence interval around our estimated mean difference are not easy to interpret. In such cases we have to use our understanding of biology, our value on clear communication, and our understanding of statistics and statistical assumptions to present a responsible and interpretable analysis. Below, I provide one potential route which builds on our bootstrap results.
A two sample t-test is often robust
Now that we know we can reject the null when data are transformed to meet assumptions of the two-sample t-test, and we have an estimate of the bootstrap 95% confidence interval, we could complement these analyses with an analysis of the untransformed data. This step is not always necessary or reliable - but we are using our brains to tell a coherent story.
Now let’s use R to test the null hypothesis and estimate confidence intervals.For fun let’s compare results from an analysis assuming equal variance to one that does not:
bind_rows(t.test(mean_visits ~ petal_color, data = SR_rils, var.equal =TRUE)|>tidy(),t.test(mean_visits ~ petal_color, data = SR_rils, var.equal =FALSE)|>tidy())
estimate
estimate1
estimate2
statistic
p.value
parameter
conf.low
conf.high
method
alternative
1.026
1.759
0.733
3.908
1.65e-4
105.0
0.505
1.546
Two Sample t-test
two.sided
1.026
1.759
0.733
4.047
1.61e-4
89.6
0.522
1.529
Welch Two Sample t-test
two.sided
The Welch’s two sample t-test does not assume equal variance, and in practice is universally better than the standard two-sample t-test (that’s why R has it as a default).
However the standard two-sample t-test is often good enough. It also has the benefit of being much like all linear modelling efforts, and is simpler mathematically, so we usually teach the standard two-sample t-test. In practice the difference between these rarely matters, except for when the variance between groups is massively different
Writing up results
Now we can write up our results. Note this takes some thinking because we had to make some decisions. Here’s my attempt:
At the Sawmill Road site, pink-flowered Clarkia xantiana ssp. parviflora RILs received, on average, more pollinators during a 15-minute observation than white-flowered RILs (mean pink = 1.76, mean white = 0.73). The mean difference of 1.03 visits was statistically significant (Welch’s t = 4.05, df = 89.6, p = 0.00016) and associated with a moderate-to-large effect size (Cohen’s D = 0.75). Results were robust to the right-skew in the data: a log(x + 0.2)-transformed analysis yielded an even stronger signal (t = 5.03, df = 106, p = 0.000002), and bootstrap confidence intervals closely matched analytic ones. We reject the null and conclude that pink-flowered plants attract more pollinators than white-flowered plants at this site.