• 19. Significance groups

Motivating Scenario: You have conducted your post-hoc tests. So, you know if a given pair differ significantly from one another. Now, you want to effectively summarize all these pairwise differences. Our goal here is to simplify a complex table of pairwise comparisons into a compact, easy-to-read and interpret summary.

Learning Goals: By the end of this subchapter, you should be able to:

  • Derive and label significance groups (“a”, “b”, etc.) from pairwise comparison results.
  • Explain the meaning of overlapping letters (e.g., “a,b”) in significance group displays.
  • Visualize significance groups on plots to clearly communicate post-hoc results with ggplot.

comparison p.adj significant
S22-SR 0.0000 TRUE
SM-SR 0.3378 FALSE
S6-SR 0.0000 TRUE
SM-S22 0.0000 TRUE
S6-S22 0.4859 FALSE
S6-SM 0.0000 TRUE

So we now know which pairs differ from one another! But, there is a real mental burden in trying to make sense of results from many pairwise tests. Defining “significance groups” helps us present results of a post-hoc test. Here, we assign the same letter to groups that do not significantly differ, and different letters to groups that do. In our case we see that:

These letter-based summaries are handy for communicating complex post-hoc results. But remember, they are shorthand for results of your post-hoc tests, not new analyses. We can communicate these groups by adding this information to a plot:

library(ggforce)
hz_summary <-clarkia_hz |>
  group_by(site)|>
  summarise(mean_admix_proportion = mean(admix_proportion ),
            mean_plus_two_sd = mean_admix_proportion + 2*sd(admix_proportion))|>
  mutate(sig_group = case_when(site %in% c("SR","SM")~"a",
                               site %in% c("S22","S6")~"b"))

clarkia_hz |>
  ggplot(aes(x = site, y = admix_proportion , color = site))+
  geom_boxplot(outliers = FALSE)+
  geom_sina(size =5, alpha = .64)+
  geom_text(data = hz_summary, aes(y = .05, label = sig_group),
            size = 8, position = position_nudge(x = .3))+
  theme(legend.position = "none")

Tricky cases

comparison significant
X-Y TRUE
X-Z FALSE
Y-Z FALSE

Consider the hypothetical (and not uncommon) outcome of a post-hoc test, displayed in the right margin. Here:

  • Groups X and Y significantly differ. So we put them in different significance groups.
    • We’ll assign X to group “a”.
    • We’ll assign Y to group “b”.
  • But Z differs from neither group. So,
    • We’ll assign Z to groups “a” and “b” because it does not significantly differ from either group. In a plot we would show this as “a,b”.

When you see a category with two letters (like ‘a,b’), it means the sample is not significantly different from categories in group a and in group b, but all categories uniquely assigned to a are significantly different from all categories uniquely assigned to group group b.

R automation

As you might imagine, logic-ing this all out can get messy. Luckily, you can pipe your glht() output into the cld() function which will assign samples to “significance groups”:

library(multcomp)
lm(admix_proportion ~ site, data = clarkia_hz)|>
    glht(linfct = mcp(site = "Tukey"))|>
    cld()
 SR S22  SM  S6 
"a" "b" "a" "b"