• 10. Better ggplots summary

Links to: Summary. Chatbot tutor. Questions. Glossary. R functions. R packages. More resources.

Chapter summary

Polishing a ggplot plot is not about running a single command, but is an iterative process of refinement, moving from a default chart to a polished, explanatory figure.

We do this by layering components. We start with a basic plot and add functions to clarify labels (with the labs() function and the ggtext package) and control aesthetics like color and shape (with scale_*_manual()). We then adjust non-data elements with the theme() function, arrange categories logically with the forcats package, and combine plots into larger narratives with patchwork.

Throughout this process, we also make sure to consider the presentation format, tailoring our choices for each specific medium. A working understaing of ggplot is essential, but luckily you don’t need to have all this memorized—knowing how to use books, friends, chatbots, GUIs like ggThemeAssist, and other resources can help!

Chatbot tutor

Please interact with this custom chatbot (link here) I have made to help you with this chapter. I suggest interacting with at least ten back-and-forths to ramp up and then stopping when you feel like you got what you needed from it.


Practice Questions

The following questions will walk you through the iterative process of refining a plot, from a messy default to a polished, clear visualization.

To start, let’s create a basic plot from the palmerpenguins dataset. It shows the distribution of flipper lengths for each species. As you can see, it has several problems we need to fix!

library(ggplot2)
library(palmerpenguins)

# The base plot we will improve upon
base_plot <- ggplot(penguins, aes(x = species, y = flipper_length_mm)) +
  geom_point()

base_plot
Figure 1: Our starting point: a messy plot with overlapping points.

Q1) The plot above suffers from severe overplotting, making it hard to see the distribution of points. Which geom_* function is specifically designed to fix this by adding a small amount of random noise to the points’ positions?


Q2) Let’s fix the overplotting. In the R chunk below, replace geom_point() with the correct function from the previous question. To keep the data honest, make sure all y-values stay the same, and that x values are clearly associated with a category.

After running the corrected code, what is the best description of the resulting plot?

In the geom_*_ Set the height argument to 0 and providing a small width (e.g., 0.2)


Q3) Great! Now look at the x-axis. ggplot2 defaults to alphabetical order (Adelie, Chinstrap, Gentoo). Use fct_reorder() from the forcats package to reorder the species factor from largest to smallest median flipper length. Now which species now appears first (leftmost) on the x-axis?

Add a mutate() call before ggplot()

Add a .na_rm =TRUE to ignore NA values and .desc = TRUE to go from greatest to smallest.

library(forcats)

penguins |>
  mutate(species = fct_reorder(species, flipper_length_mm, median, .na_rm =TRUE, .desc = TRUE))|> 
  ggplot(aes(x = species, y = flipper_length_mm)) +
  geom_jitter(width = 0.2, height = 0)


Q4) Our plot is now well-organized, but the labels are not publication-ready. Add a labs() layer to the good_start plot above to achieve the following:

  • Set the title to “Penguin Flipper Lengths”
  • Set the x axis label to “Species”
  • Set the y axis label to “Flipper Length (mm)”

The labs() function is used to change titles and axis labels. If you also wanted to change the title of the color legend, which argument would you add inside labs()?


Q5) Now, imagine you need to put this plot on a slide for a presentation. The text is far too small. Add a theme() layer to the code below to make the axis titles (axis.title) size 20.

Inside theme(), the element_text() function has many arguments besides size. Which argument would you use to change the font from normal to bold?


Q6) Now you want the species names (e.g. Gentoo, Chinstrap, etc) and flipper lengths (e.g. 170, 210, etc.) to be large (size = 16). What would you add to the theme to do this?

Q7) Now you want the species names (e.g. Gentoo, Chinstrap, etc) but not flipper lengths (e.g. 170, 210, etc.) to be italicized. What would you add to the theme to do this? (NOTE we did this a different ay in the chapter)

Q8) The plot above shows the raw data well. Now, let’s add a summary statistic. In the webr chunk below, add a stat_summary() layer to display the mean value for each species as a large, black point (size = 5, color = "black"). Hint: You’ll need to specify fun = "mean" and geom = "point" inside stat_summary().

After successfully adding the summary layer, what new visual element appears on your plot?

Q9) Faceting is a powerful way to create “small multiples.” In the chunk above, add a facet_wrap() layer to the scatter plot to create separate panels for each island.

After adding the facet layer correctly, which Islands have Chinstrap penguins? (select all correct)


📊 Glossary of Terms

🎨 1. Core Visualization Concepts

  • Aesthetic Mapping: The process of connecting variables in the data to visual properties (aesthetics) of the plot, such as x/y position, color, shape, or size.
  • Geom (Geometric Object): The visual shape used to represent data, such as a point, a bar, or a line.
  • Layering: The process of building a plot by starting with a base and sequentially adding new visual elements on top of each other.
  • Cognitive Load: The mental effort required to interpret a plot. A well-designed figure reduces cognitive load by being clear and intuitive (e.g., using direct labels instead of a legend), allowing the reader to focus their brainpower on the data’s story, not on trying to figure out the plot itself.

📊 2. Representing and Arranging Data

  • Jittering: Adding a small amount of random noise to the position of data points to prevent them from overlapping perfectly, making it easier to see the distribution of dense data.
  • Data Summary: A statistical value (like a mean or confidence interval) calculated from raw data and added to a plot to help guide the reader’s eye to a key pattern.
  • Categorical Ordering: The deliberate arrangement of categorical data on an axis in a logical way (e.g., by size, by geographic location) to make trends more obvious, rather than using the default alphabetical order.
  • Redundant Coding: The practice of mapping a single variable to multiple aesthetics (e.g., mapping a category to both color and shape). This improves clarity and accessibility.

📝 3. Annotation and Polishing

  • Direct Labeling: The practice of placing text labels directly on or next to data elements, rather than in a separate legend, to make a plot easier to read.
  • Mathematical Notation: The use of specially formatted text in labels to represent mathematical symbols, such as superscripts (for units like mm²), subscripts, or Greek letters.
  • Plot Theme: The collection of all non-data elements of a plot that control its overall look and feel, such as the background color, grid lines, and font styles.
  • Color Palette: A set of colors chosen to represent data in a plot. Palettes can be chosen for clarity (qualitative), to show a gradient (sequential), or for aesthetic style.

🖼️ 4. Advanced Figure Types

  • Small Multiples (Faceting): A series of small plots that use the same scales and axes, with each plot showing a different subset of the data. This technique is used to make comparisons across groups.
  • Multi-Panel Figure: A single figure that combines several individual plots into a larger, organized layout, often used in scientific papers to tell a complex story in a compact space.
  • Isotype Plot: A type of chart that uses a repeated icon or image to represent quantity, often used in posters and infographics to be more engaging than a standard bar chart.
  • Infographic: A visual representation of information that blends data visualization, graphic design, and text to tell a compelling narrative, typically for a general audience.
  • Interactive Plot: A plot, typically for a digital medium, that allows the user to engage with the data by hovering, clicking, zooming, or filtering to explore the data themselves.

5. Principles & Philosophy

  • The “Brown M&M” Principle: A reference to the band Van Halen, who used a “no brown M&Ms” clause in their contract as a quick test for attention to detail. In data visualization, it refers to a small flaw in a plot (like a typo or misalignment) that signals a potential lack of care in the more critical underlying analysis, eroding audience trust.

Key R Functions

*All functions are in the ggplot2 package unless otherwise stated.

  • geom_image() ([ggimage]): Adds images to a plot, often used to represent data points or summaries.

📝 1. Labels & Annotations

  • labs(): The primary function for setting the plot’s title, subtitle, caption, and the labels for each axis and legend.

  • geom_label() / geom_text(): Adds text-based labels to a plot, mapping data variables to the label aesthetic. geom_label() adds a background box to the text.

  • annotate(): Adds a single, “one-off” annotation (like a piece of text or a rectangle) to a plot at specific, manually-defined coordinates.

  • element_markdown() In the ggtext package: Used inside theme() to render plot text (like axis labels or facet titles) that contains Markdown or HTML for styling.


🧮 2. Summaries & Ordering

  • stat_summary(): Calculates summary statistics (like means or confidence intervals) on the fly and adds them to the plot as a new layer (e.g., as bars or errorbars). This in gplot2, but required the Hmisc package.

  • fct_reorder() In the forcats package: Reorders the levels of a categorical variable (a factor) based on a summary of another variable (e.g., order sites by their mean petal area).

  • fct_relevel() In the forcats package: Reorders the levels of a factor “by hand” into a specific, manually-defined order.


🎨 3. Controlling Aesthetics (Colors, Shapes, etc.)


🖼️ 4. Arranging & Combining Plots

  • facet_wrap(): Creates “small multiples” by splitting a plot into a series of panels based on the levels of a categorical variable.

  • plot_annotation() In the forcats package: Adds overall titles and panel tags (e.g., A, B, C) to a combined figure.

  • plot_layout() In the forcats package: Controls the layout of a combined figure, such as collecting all legends into a single area.


⚙️ 5. Theming & Final Touches

  • theme(): The master function for modifying all non-data elements of the plot, such as backgrounds, grid lines, and text styles.

  • element_text(): Used inside theme() to specify the properties of text elements, like size, color, and face (e.g., “bold”).

  • theme_classic() / theme_bw(): Applies a complete, pre-made theme to a plot with a single command, often for a cleaner, publication-ready look.


R Packages Introduced

📦 Core Tidyverse & Plotting.

  • ggplot2: The core package we use for all plotting, based on the Grammar of Graphics.

  • dplyr: Used for data manipulation and wrangling, like filter() and mutate().

  • forcats: The essential tool for working with categorical variables (factors), especially for reordering them in a logical way.


🖼️ Arranging, Annotating & Polishing Plots.

  • patchwork: A tool for combining separate ggplot objects into a single, multi-panel figure.

  • ggtext: A powerful package that allows you to use rich text formatting (like Markdown and HTML) in plot labels for effects like italics and custom colors.

  • Hmisc: A general-purpose package that contains many useful functions, including mean_cl_normal for calculating confidence intervals in stat_summary().

  • scales: Provides tools for controlling the formatting of numbers and labels on plot axes and legends.


AFlair (Images, Animations & Themes)

  • ggimage: Used to add images as data points or layers in your plots.

  • ggtextures: Allows you to create isotype plots (bar graphs made of images).

  • gganimate: Brings your static ggplots to life by creating animations.

  • [ggthemes]https://github.com/jrnold/ggthemes): Provides a collection of additional plot themes, including styles from publications like The Economist and The Wall Street Journal.


🖱️ Interactivity

  • plotly: A powerful package for creating interactive web-based graphics. Its ggplotly() function can make almost any ggplot interactive with one line of code.

  • highcharter: Another popular and powerful package for creating a wide variety of interactive charts.

  • Shiny: R’s framework for building full, interactive web applications and dashboards directly from your R code.


🎨 Themed Color Palettes

  • MetBrewer: Provides beautiful and accessible color palettes inspired by artworks from the Metropolitan Museum of Art.

  • wesanderson: A fun package that provides color palettes inspired by the films of director Wes Anderson.


🛠️ Helper Tools

  • ggThemeAssist: An RStudio add-in that provides a graphical user interface (GUI) for editing theme() elements, helping you learn how to customize your plot’s appearance.

Additional Resources

Interactive Cookbooks & Galleries:

  • The R Graph Gallery: An extensive, searchable gallery of almost every chart type imaginable, created with ggplot2 and other R tools. Each example comes with the full, reproducible R code.

  • From Data to Viz: A fantastic tool that helps you choose the right type of plot for your data. It provides a decision tree that leads to ggplot2 code for each chart.

  • The ggplot2 Extensions Gallery: The official showcase of packages that extend the power of ggplot2 with new geoms, themes, and capabilities. A goodt place to discover new visualization tools in the R ecosystem.

  • A ggplot2 Tutorial for Beautiful Plotting in R Many fantastic examples. This is great! Have a look!

Online Books:

Videos & Community: