• 3. Two categorical variables

Motivating scenario: You want to explore how two categorical variables are associated.

Learning goals: By the end of this sub-chapter you should be able to

  1. Make barplots with geom_bar() and geom_col().
  2. Make stacked and grouped barplots.
  3. Know when to use geom_bar() and when to use geom_col().

Categorical explanatory and response variables

Above, we saw that most plants received no visits, so we might prefer to compare the proportion of plants that did and did not receive a visit from a pollinator by some explanatory variable (e.g. petal color or location). Recall that we have added the logical variable, visited, by typing mutate(visited = mean_visits > 0).

Making bar plots: A Step-by-Step guide. There are two main geoms for making bar plots, depending on the structure of our data:

  • If we have raw data (i.e. a huge dataset with values for each observation) use geom_bar().
  • If we have aggregated data (i.e. a summary of a huge dataset with counts for each combination of variables) use geom_col()

Note: Here we map petal color onto the x-axis, and visited (TRUE / FALSE) onto the fill aesthetic.

ril_data |> 
    filter(!is.na(petal_color), !is.na(mean_visits))|>
    mutate(visited = mean_visits >0)|>
  ggplot(aes(x = petal_color, fill = visited))+
  geom_bar()

ril_data |> 
    filter(!is.na(petal_color), !is.na(mean_visits))|>
    mutate(visited = mean_visits >0)|>
  ggplot(aes(x = petal_color, fill = visited))+
  geom_bar(position = "dodge")

If you had aggregated data, like that below. We need to plot these data somewhat differently. There are two key differences:

  1. We map our count (in this case n) onto the y aesthetic.
  2. We use geom_col() instead of geom_bar().
location petal_color visited n
GC pink FALSE 32
GC pink TRUE 23
GC white FALSE 46
GC white TRUE 2
SR pink FALSE 1
SR pink TRUE 56
SR white FALSE 11
SR white TRUE 39
ggplot(data = aggregated_pollinator_obs, 
       aes(x = petal_color, y = n, fill = visited))+
  geom_col()

Interpretation: We see that a greater proportion of pink-flowered plants receive visits compared to white-flowered plants.