Section II: Summarizing data

The major goals of statistics are to: (1) Summarize data, (2) Estimate uncertainty, (3) Test hypotheses, (4) Infer cause, and (5) Build models. Now that we can do things in R, we are ready to begin our journey through these goals. In this section, we focus on the first goal—Summarizing data.


It is somewhat weird to start with summarizing data without also describing uncertainty because, in the real world, data summaries should always be accompanied by some estimate of uncertainty. However, biting off both of these challenges at once is too much, so for now, as we move forward in summarizing data, remember that this is just the beginning and is inadequate on its own.

Even though we aren’t tackling uncertainty yet, summarizing data on its own is already incredibly useful. Understanding and interpreting summaries helps us find patterns, spot errors, and build towards deeper statistical analysis.

Why summarize data?

Summarizing data serves several purposes:

A nice picture of Clarkia's home.
Figure 1: A pretty scene of Clarkia’s home showing the world we get to summarize.

In this section, we’ll not only learn how to compute summaries but also how to think about them in a meaningful way. That means: