• 9. Dataviz Summary

Links to: Summary. Chatbot tutor. Questions. Glossary. R functions. R packages. More resources.

Chapter Summary

"By the year 2019, all information will be communicated in this format." The comic is filled with exaggerated visual cues including: a graph predicting "all information" converging to a point in 2019, a pie chart labeled "WILL BE" with a large arrow pointing to "6 years from now (72 months)," stick figures excitedly saying "Yes!" beneath the word "COMMUNICATED." Below, the phrase "IN THIS" appears in giant letters, followed by a Venn diagram of "CLEAR" and “CONCISE,” and a bar chart labeled "FORMAT," where the tallest bar is marked "THIS." The comic parodies overly designed and stylized infographics.

“Tall Infographics” cartoon from xkcd. Rollover text says: “Big data” does not just mean increasing font size. Explanation here.

An effective visualization allows you to rapidly communicate your key findings to your audience. The best visualizations are honest, transparent, clear, and accessible. They highlight important patterns while minimizing confusion or distraction, and they are thoughtfully tailored to both the audience and the format. Great figures avoid misleading elements and use design choices (e.g. captions, color, and labels) to guide interpretation.

Chatbot tutor

Please interact with this custom chatbot (link here) I have made to help you with this chapter. I suggest interacting with at least ten back-and-forths to ramp up and then stopping when you feel like you got what you needed from it.

Practice Questions

Try these questions!

A four-panel figure comparing hemoglobin levels across four populations: Andes (red), Ethiopia (green), Tibet (cyan), and USA (purple). *Panel a:* shows overlapping density plots for all four populations using only color to distinguish groups. *Panel b:* is similar to panel a but includes population names ("USA" in purple, "Andes" in red) directly labeled on the plot. *Panel c:* is a sina plot with population  (Andes, Ethiopia, Tibet, USA) on the x-axis and labelled in color. *Panel d:* uses "small multiples" via Rs `facet_wrap()` function to create  four vertically stacked individual density plots, each labeled with the population name (Andes, Ethiopia, Tibet, USA), with each density plot filled by the appropriate color.
Figure 1: Hemoglobin levels of people native to different countries.

Q1) Which of the plots in Figure 1 keep all information even if printed in black and white?

Q2) Which of the plots in Figure 1 is still somewhat useful but loses some information?

Two bar plots compare the proportion of offspring cannibalized by dominant male fish across three mating scenarios: "one father," "one sneaker," and "multiple sneakers." *Panel A* displays the bars in the order: multiple sneakers, one father, one sneaker. *Panel B* displays the same data but reorders the x-axis categories to: one father, one sneaker, multiple sneakers — aligning the categories with increasing risk of cuckoldry and making the pattern easier to interpret. In both panels, the proportion cannibalized increases with more sneaker males.
Figure 2: Canabalistic dads.

Q3) Which plot in Figure 2 is better?

Q4) Your chose your answer, above, because the better plot

A grouped bar chart titled "C) Dominant male fish are more likely to cannibalize their brood as risk of cuckoldry increases." The x-axis shows three mating conditions: "one father," "one sneaker," and "multiple sneakers." The y-axis represents count (frequency). For each mating type, two bars are shown: red for "no cannibalism" and teal for "yes cannibalism." The "one father" group shows a high number of "no" cases and moderate number of "yes" cases. The "one sneaker" and "multiple sneakers" groups have far fewer total cases, but a greater proportion of them result in cannibalism.
Figure 3: Canabalistic dads, revisited.

Q5) Which feature of Figure 3 is better than Figure 2?

Q6) Which feature of Figure 2 is better than Figure 3?

A bar plot titled "D) More males acting as the only father cannibalize their brood than do males with sneakers." The x-axis shows three mating conditions: "one father," "one sneaker," and "multiple sneakers." The y-axis shows the count of males who cannibalized their brood. The tallest bar corresponds to "one father" (~60 males), while the other two bars ("one sneaker" and "multiple sneakers") are much shorter (~18 males each). However, this plot shows only the absolute number of cannibalism cases and does not account for differing total group sizes or proportions.
Figure 4: Canabalistic dads, revisited (again).

Q7) What is the biggest problem with Figure 4?

The y-axis is labeled "count," so unlike Figure 2 and Figure 3, this plot shows only the number of cannibalism cases—not the proportion. As a result, a reader might misinterpret Figure 4 and incorrectly conclude that broods with only one father are the most susceptible to cannibalism.

Q8) Which of the figure above do you like the best and why?
A promotional graphic challenges viewers with the question: “Can you spot what’s wrong with this chart?” followed by the reassurance, "Its OK, most people miss it too..." The chart below shows quarterly bar heights labeled Q1 to Q1P, increasing progressively. A curved green arrow is superimposed over the bars. The vertical axis ranges from $10M to $275M with evenly spaced labels at $10M, $50M, $200M, $250M, and $275M. At the bottom, the graphic reads: "Time to level-up your data skills – TAKE THE FREE ASSESSMENT."
Figure 5: Data viz test from an online advertisement.
Q9) I stole Figure 5 from a company selling a data vizclass. Examine their plot and find at least three bad data viz practices. Then say which one you think its the worst and why.

📊 Glossary of Terms

🏷 1. ️ Figure Elements & Interpretation

  • Legend: A guide that explains the meaning of colors, symbols, or line types in a plot. Helpful when symbols are ambiguous, but often unnecessary when direct labeling is used.

  • Caption: Text beneath a figure that highlights the main point and guides the reader’s interpretation. A good caption doesn’t just restate what’s shown—it helps make sense of it.

  • Direct Labeling: Placing labels directly on or near data elements (e.g., lines, points, bars), so viewers don’t have to cross-reference with a legend. Especially useful in talks and posters.

  • Redundant Coding: Encoding the same variable multiple ways (e.g., using both color and shape for species). Can increase accessibility but should be used carefully to avoid clutter.

2. Accessibility & Universal Design

  • Alt Text: A textual description of a figure, written for people who cannot see it. Good alt text conveys the message of the figure, not just its parts.

  • Accessibility: Designing figures so they can be understood by people with diverse abilities (e.g., colorblindness, low vision, screen reader users). Often overlaps with universal design.

  • Colorblindness: A common visual condition that affects how people perceive color. Plots should use color palettes and redundancy (e.g., line types) to remain interpretable without relying on color alone.

  • Universal Design: The principle of creating products and experiences—like data visualizations—that work well for as many people as possible, regardless of ability.

💥 3. Visual Clarity & Distraction

  • Overplotting: When data points are so densely packed they obscure patterns or hide important features. Common with large datasets; solutions include transparency, jittering, or summarizing.

  • Chartjunk: Any visual element in a plot that doesn’t help convey the data—like heavy gridlines, excessive shading, or unnecessary 3D effects. Coined by Edward Tufte.

  • Data Viz “Duck”: A graphic with unnecessary visual decoration (named after a duck-shaped building in Long Island). A plot that prioritizes aesthetics or novelty over clarity.

  • Cognitive Burden: The mental effort required to interpret a figure. Good visualizations reduce cognitive burden by being clear, consistent, and well-structured.

Key R Functions

This section did not focus on R, but rather concepts for data visualization.

R Packages Introduced

This section did not focus on R, but rather concepts for data visualization.

Additional resources

Other web resources:

Videos:

Podcasts:

Social: