• 11. Sampling Better

Motivating Scenario: We’ve seen that at best – because chance in inherent in sampling we cannot avoid sampling error. We’ve also seen that in the real world sampling bias and non-independence, can further complicate our aims of estimation and hypothesis testing. In this section I go over sampling strategies to generate precise and accurate estimates.

Learning Goals: By the end of this chapter, you should be able to:

  1. Identify the two key features of an ideal sample (large and random) and explain what problems they solve.
  2. Know how to generate a random sample.
  3. Have some familiarity with good enough sampling approaches when a random sample cannot be obtains.
  4. List the key principles of robust experimental design, including blinding and avoiding pseudoreplication, and what problems they solve.

Later, we will spend time considering experimental design, but given that we just worked through how sampling can go wrong, let’s consider how sampling and experimental design can go right.

The best sample is a large random sample.

According to the law of large numbers the average of a large random sample converges to the true population parameter.

Of course, while getting a random sample is ideal, it’s not always plausible. Your best alternatives include Systematic Sampling e.g. stretching a 100-foot transect line across the hillside and sampling the single plant closest to the line (and withing x square feet) every 5 feet. This disciplined approach prevents you from only sampling the convenient spots or the big, showy patches of flowers.

Best practice for experiments

Figure 1

There are also things we can do to minimize bias and non-independence in experiments. As stated previously, when conducting human trials a double blind study is best. Similarly, when working in non-humans (plants, animals, bacteria etc) its best if the observer is “blind” to the treatment. However, in some cases “blinding” is impossible or impractical – e.g. how can you watch pollinators and no see petal color? (see also Figure 1).

Finally, it is best to avoid pseudoreplication by spreading treatments across conditions however this is not always possible. For example, because pesticides often “drift” we usually need to put distance between a pesticide and no pesticide treatment. Later in the term we will learn how to model such non-independence with random effect models.