• 11. Sampling Error

Motivating Scenario:

You’ve collected data and have a sample estimate, and avoided sampling bias, but you know it’s not the whole story. Because of sampling error, your estimate is almost certainly not the exact true value, and a different sample would give a different estimate. You want to know how to think about and quantify such uncertainty?

Learning Goals: By the end of this chapter, you should be able to:

Define sampling error and explain why it is an unavoidable aspect of working with samples.
Describe the sampling distribution and explain its role as the foundation for quantifying uncertainty.
Understand the to most common measure of uncertainty, the standard error.
Explain how sample size affects sampling error and the precision of estimates.
Describe the “file drawer problem” and explain why small sample sizes can lead to misleading or overestimated results.

Figure 1: Comparing parameters from our ‘population’ to estimates from our ‘sample’ for numerous traits.

We concluded the previous subsection on sampling by taking a sample from what we pretended was an entire population. We then compared the estimates from this sample to the true parameter. Repeating this exercise (Figure 1) reveals that sample estimates differ not just from population parameters, but also from each other.

Sampling error is unavoidable

Estimates from samples (e.g. the opaque bars in Figure 1) will differ from population parameters (e.g. the semi-transparent bars in Figure 1) due to chance. This chance deviation is called sampling error. But make no mistake — sampling error cannot be avoided.

Larger samples and more precise measurements can reduce sampling error, but it will always exist because we take our samples by chance. In fact, I would call it the rule of sampling rather than sampling error.

Much of the material in this chapter — and about half of the content for the rest of this term — focuses on how to handle the law of sampling error. Sampling error is the focus of many statistical methods.

The interactive document below works us through the idea of sampling and sampling error. Throughout we are pretending that the “sawmill” hybrid zone data represents a complete census of a population. We then “sample” for illustrative purposes.

Step 1: Sample from a population

Let’s take a sample of from our population. How big should this sample be? It should be the same size as the sample in your study. Remember we are simulating the process of sampling by taking a sample from our population. This is what we do in statistics!

The code below takes a random sample of ten plants from our population.

Run the code to see our sample!

Step 2: Estimate

From our sample we make an estimate. The code below shows a histogram of our sample and estimates means (blue lines)

Run the code to see our estimates!

Sample estimates differ from population parameters

We know we sampled randomly and without bias (because we used a computer to sample for us). But still parameter estimates will differ from population parameters by chance because sampling is random. This random difference between parameters and estimates is called sampling error.

Run the code below to visualize sampling error as the difference between estimates from our sample, and true parameters from a population.

The sampling distribution

Grasping the concept of the sampling distribution is critical to understanding our goals this term. It requires imagination and creativity because we almost never have or can create an actual sampling distribution (since we don’t have access to the full population). Instead, we have to imagine what it would look like under some model given our single sample. That is, we recognize that we only have one sample and will not take another, but we can imagine what estimates from another round of sampling might look like. Above we took one sample of size ten from a population. Figure 2 builds the distribution of estimates we would get by repeatedly taking many samples of size ten.

An animation with two plots that build up over time. The left plot repeatedly shows histograms of new 10-observation samples, each with a blue dashed line for its mean. The right plot accumulates the blue mean from each sample as a single dot. As the animation progresses through 500 samples, the dots on the right form a histogram-like distribution around a fixed, vertical red line, which is present in both plots. — Figure 2: An animation illustrating the creation of a sampling distribution. **The left panel** shows a histogram of each new random sample of 10 Clarkia flowers sampled from our sawmill population. The sample’s mean (the estimate) is shown as a dashed blue line. The true population parameter is in red. **The right panel** builds the sampling distribution by cumulatively building up a histogram, noting a new blue point representing the mean from each new sample. By the end we have 500 samples - a good approximation of the sampling distribution. Note that we did not show every sample, as that would be too slow and boring.

“Insanity is doing the same thing over and over and expecting different results.”

— Attributed to Albert Einstein

“The sampling distribution is doing the same thing over and over and expecting the different results.”

— Me

In case you haven’t noticed, I think that understanding the sampling distribution is fundamental to understanding any bit of statistics. Not only is the sampling distribution key to understanding statistics, but we use sampling distributions often.

First, when we make an estimate from a sample, we build a sampling distribution around this estimate to describe the uncertainty in our estimate (see the upcoming section on Uncertainty).
Second, in null hypothesis significance testing (see the upcoming section on Hypothesis Testing), we compare our statistics to their sampling distribution under the null hypothesis to assess how likely the results were due to sampling error.

Thus, the sampling distribution plays a key role in two of the major goals of statistics — estimation and hypothesis testing. Below, I introduce how we quantify uncertainty in relation to the sampling distribution.

But before doing so, I insist that you watch the first five minutes of the video below for the best explanation of the sampling distribution I’ve come across. I am reiterating this so many ways because it’s important.

Quantifying uncertainty due to sampling error

The most common summary of uncertainty is the standard error.

The standard error quantifies the expected variability in estimates as the standard deviation of the sampling distribution. If we had a sampling distribution in hand we could find this in R as sd(my_sampling_dist).

We almost never have a population characterized (after all that’s why we are sampling), so we never know the sampling distribution. In the real world, we use mathematical or computational tricks to guess a sampling distribution given the distribution of values in our sample.

Minimizing sampling error

We cannot eliminate sampling error, but we can do things to decrease it. Here are two ways we can reduce sampling error:

Decrease the standard deviation in a sample. We only have so much control over this, because nature is variable, but more precise measurements, more homogeneous experimental conditions, and the like can decrease the variability in a sample.
Increase the sample size. As the sample size increases, our sample estimate gets closer and closer to the true population parameter. This is known as the law of large numbers. Remember that changing the sample size will not decrease the variability in our sample, it will simply decrease the expected difference between the sample estimate and the population mean.

Return to our web exercise to explore how sample size (\(n\)) influences the extent of sampling error. To do so, simply change sample_size to a small (e.g. 3) and large (e.g. 30) number and compare the difference between estimates and parameters.You will need to rerun all three R bits, in sequential order, and its probably best to do so a small handful of times.

Be Wary of Exceptional Results from Small Samples

Because sampling error is most pronounced in small samples, estimates from small samples can easily mislead us. Figure 3 compares the sampling distributions for the proportion of Chinstrap penguins in samples of size five, thirty, and one hundred. About one-third of samples of size five have exactly zero Chinstrap penguins. Seeing no Chinstrap penguins in such a sample would be unsurprising but could lead to misinterpretation. Imagine the headlines:

“Chinstrap penguins have disappeared, and may be extinct!…”
— Some unscrupulous newspaper, probably.

The very same sampling procedure from that same population (with a sample size of five) could occasionally result in an extreme case where more than half the penguins are Chinstrap penguins (this happens in about 6% of samples of size five). Such a result would yield a quite different headline:

“Chinstrap penguins on the rise — could they be replacing other penguin species?”
— Some unscrupulous newspaper, probably.

A sample of size thirty is much less likely to mislead—it will only result in a sample with zero or a majority of Chinstrap penguins about once in a thousand times.

The numbers I provided above are correct and somewhat alarming. But it gets worse—since unremarkable numbers are hardly worth reporting (illustrated by the light grey coloring of unremarkable values in Figure 3), we’ll rarely see accurate headlines like this:

“A survey of penguins shows an unremarkable proportion of three well-studied penguin species…”
— A responsible, but quite boring newspaper.

Figure 3: Comparing the sampling distribution of chinstrap penguins in samples of size five, thirty, and one hundred. The true population proportion is 0.198. Bars are colored by whether they are likely to be reported (less than 5% or more than 39%), with unremarkable observations in dark green.

In summary – whenever you see an exceptional claim, be sure to look at the sample size and measures of uncertainty. For a deeper dive into this issue, check out this optional reading: The Most Dangerous Equation (Wainer, 2007).

Small Samples, Overestimation, and the File Drawer Problem

Let’s say you have a new and exciting idea—maybe a pharmaceutical intervention to cure a deadly cancer. Before you commit to a large-scale study, you might do a small pilot project with a limited sample size. This is a necessary step before getting the funding, permits, and time needed for a bigger study.

What if you found an amazing result? The drug worked even better than you expected! You would likely shout it from the rooftops—issue a press release, etc.
What if you found something subtle? The drug might have helped, but the result is inconclusive. You might keep working on it, but more likely, you’d move on to a more promising target.

After reading this section, you know that both of these outcomes could happen for two drugs with the exact same effect (see Figure 3). This combination of sampling and human nature has the unfortunate consequence that reported results are often biased toward extreme outcomes.

This issue, known as the file drawer problem (because underwhelming results are kept in a drawer somewhere, waiting for a mythical day when we have time to publish them), means that reported results are often overestimated, modest effects are under-reported, and follow-up studies tend to show weaker effects than the original studies. Importantly, this happens even when experiments are performed without bias, and insisting on statistical significance doesn’t solve the problem. It is therefore exceptionally important to report all results—even boring, negative ones.