Which brand of chocolate chip makes the best tasting cookies? Is the tree outside your window causing your runny nose? Why won’t your car start? If you want to answer questions like these, you’ll probably need to do some testing. But all tests are not created equal. In order to figure out the real answers to such questions, you’ll need to test your ideas in a fair way.
The considerations that go into making “everyday” tests fair are the same ones that scientists consider when they test their ideas using experiments and other methods. Whether one wants to optimize a chocolate chip cookie recipe, develop effective treatments for Alzheimer disease, learn more about how mass extinctions work, or investigate the workings of gravity, the components of a fair test are the same:
- Comparing outcomes. To be confident in test results, it’s generally important to have something to compare them to. So, for example, in your cookie test, you’d want to actually compare batches of cookies made with different brands of chocolate chips. You might also want to make a batch without any chocolate chips at all — just to make sure that the chocolate chips are really making a difference in the cookies’ taste. Making just one batch of cookies with one brand of chocolate and seeing how they taste wouldn’t help answer your question. In experiments, whatever you are comparing your test results to is sometimes called the control group or control treatment. But don’t confuse the control group with …
- Controlling variables. In most tests, we want to be confident in the relationship between cause and effect. Is it really the chocolate chip brand, and not the baking temperature, that makes one cookie taste better than another? To be able to make a strong statement about cause and effect, you’ll need to control variables — that is, try to keep everything about the test comparisons the same, except for the variables you’re interested in. So in the cookie case, this would mean, for each batch standardizing the dough recipe, the method for mixing and baking the dough, and the procedure for tasting and rating the cookies. The only element that should vary across batches is the one variable you’re interested in: brand of chocolate.
- Avoiding bias. No matter how hard we humans try to be objective, bias can sneak into our observations and judgments. In a sense, bias occurs because it’s very difficult to “control” variables associated with human judgments. For example, your cookie tasters might be hungry and so the first cookie they eat could seem tastier to them than the rest. To avoid this potential source of bias, you’d want to set up the test so that different testers taste the cookies in different orders. And if testers knew which cookies were made with which brands of chocolate they might be subconsciously biased towards more expensive chocolate brands. To avoid this, you could label your cookie batches with letters instead of brand names. It’s even possible that you, the cookie baker, would give subtle clues to your tasters if you knew that Cookie B was made with your personal favorite brand of chocolate. So, you might want to arrange to stay out of the room while the tasting is going on.
- Distinguishing chance from real differences. All sorts of subtle things that you either don’t or cannot control can affect the outcome of a test. Some cookies in a batch might have wound up with a few less chocolate chips than others. The oven might have heated unevenly and burnt a few cookies. One taster might have been distracted during the test and not given careful ratings. All of these random factors will affect the outcome of the test — but in small ways. So how do you know if the difference between a cookie with an average rating of 4.1 and one with an average rating of 4.25 is due to random factors or a real difference in chocolate brand? First, sample size is important. Cookies from each batch should be rated by many different people. The larger your sample size, the more likely it is that these random factors will cancel each other out and that real differences (if they exist) can be detected statistically — which leads to our second point: Statistics can be used to analyze your raw data. The purpose of conducting such statistical tests is to tell you how likely it is that a difference in rating like the one that you observed is actually due to random factors.
DETECTING THE DIFFERENCES: STATISTICS AND SAMPLE SIZE
You might be wondering, what counts as a “large” sample size? Twenty, 200, or 2000 chocolate chip cookies? Well, it depends on how small a difference between groups you want to be able to detect. If you are interested in very tiny differences (e.g., subtle differences between chocolate chip brands), you need a very large sample size, and if you only care about pretty big differences (e.g., the difference between yummy and disgusting), you can get away with a smaller sample size. The appropriate sample size depends on the statistical tests you want to run and the sorts of differences you want to detect.
Take a sidetrip
Advanced: Visit the Visionlearning website to learn more about the role of statistics in science.
It is often impossible to make a test perfectly fair, and each issue listed above may be more or less important for a particular test — but by considering each of these factors in how your test is designed, you can maximize the amount of useful information you get from the test.
Above, we gave an example of testing in everyday life, but the same set of considerations can be applied to tests in more traditionally scientific realms — and to tests that don’t involve experiments. To see real-life examples of fair test design in science, follow the links below:
- Fair tests in the field of medicine: Aiding Alzheimer patients
- Fair tests in the fossil record: Avoiding extinction
- Fair tests in physics: Examining eclipses
Advanced: Visit the Visionlearning website to learn more about the history of experimentation in science and experimental design.
Now it’s your turn. Test your knowledge by applying what you’ve learned about fair tests to these situations: