Chapter 7.4 Experimental Validity
If a study is valid then it truly represents what it was intended to represent. Experimental validity refers to the manner in which variables that influence both the results of the research and the generalizability to the population at large. It is broken down into two groups: (1) Internal Validity and (2) External Validity.
Internal validity refers to a study’s ability to determine if a causal relationship exists between one or more independent variables and one or more dependent variables. In other words, can we be reasonably sure that the change (or lack of change) was caused by the treatment? Researchers must be aware of aspects that may reduce the internal validity of a study and do whatever they can to control for these threats. These threats, if left ignored, can reduce validity to the point that any results are meaningless rendering the entire study invalid. There are eight major threats to internal validity that are discussed below and summarized in Table 7.1.
History. History refers to any event outside of the research study that can alter or effect subjects’ performance. Since research does not occur within a vacuum, subjects often experience environmental events that are different from one another. These events can play a role in their performance and must therefore be addressed. One way to assure that these events do not impact the study is to control them, or make everyone’s experience identical except for the independent variable(s). Since this is often impossible, using randomization procedures can often minimize this risk, assuring that outside events that occur in one group are also likely to occur in the other.
Maturation. While not a major concern in very short studies such as a survey study, maturation can play a major role in longer-term studies. Maturation refers to the natural physiological or psychological changes that take place as we age. This is especially important in childhood and must be addressed through subject matching or randomization. For instance, an episode of major depression typically decreases significantly within a six-month period even without treatment. Imagine we tested a new medication designed to treat depression. If our results showed that subjects who took this medication showed a significant decrease in depressive symptoms within six months, could we truly say that the medication caused the decrease in symptoms? Probably not, especially since maturation alone would have shown similar results.
Testing. People tend to perform better at any activity the more they are exposed to that activity. Testing is no exception. When subjects, especially in single group studies, are given a test as a pretest and then the same test as a posttest, the chances that they will perform better the second time due merely to practice is a concern. For this reason, two group studies with a control group are recommended.
Statistical Regression. Statistical regression, or regression to the mean, is a concern especially in studies with extreme scores. It refers to the tendency for subjects who score very high or very low to score more toward the mean on subsequent testing. If you get a 99% on a test, for instance, the odds that your score will be lower the second time are much greater than the odds of increasing your score.
Instrumentation. If the measurement device(s) used in your study changes during the course of the study, changes in scores may be related to the instrument rather than the independent variable. For instance, if your pretest and posttest are different, the change in scores may be a result of the second test being easier than the first rather than the teaching method employed. For this reason, it is recommended that pre- and posttests be identical or at least highly correlated.
Selection. Selection refers to the manner in which subjects are selected to participate in a study and the manner in which they are assigned to groups. If there are differences between the groups prior to the study taking place, these differences will continue throughout the study and may appear as a change in a statistical analysis. Addressing these differences through subject matching or randomization is highly recommended.
Experimenter Bias. We engage in research in order to learn something new or to support a belief or theory. Therefore, we as researchers may be biased toward the results we want. This bias can effect our observations and possibly even result in blatant research errors that skew the study in the direction we want. Using an experimenter who is unaware of the anticipated results (usually called a double blind study because the tester is blind to the results) works best to control for this bias.
Mortality. Mortality, or subject dropout, is always a concern to researchers. They can drastically affect the results when the mortality rate or mortality quality is different between groups. Imagine in the work experience study if many motivated students dropped out of one group due to illness and many low motivated students dropped out of the other group due to personal factors. The result would be a difference in motivation between the two groups at the end and could therefore invalidate the results.
Table 7.1: Controlling for Threats to Internal Validity
Threat to Internal Validity
Random selection, random assignment
Subject matching, randomization
Omit extreme scores, randomization
Instrumental consistency, assure alternative form reliability
Random selection, random assignment
Double blind study
Subject matching and omission
External validity refers to the generalizability of a study. In other words, can we be reasonable sure that the results of our study consisting of a sample of the population truly represents the entire population? Threats to external validity can result in significant results within a sample group but an inability for this to be generalized to the population at large. Four of these threats are discussed below and summarized in Table 7.2.
Demand Characteristics. Subjects are often provided with cues to the anticipated results of a study. When asked a series of questions about depression, for instance, subjects may become wise to the hypothesis that certain treatments work better in treating mental illness. When subjects become wise to anticipated results (often called a placebo effect), they can begin to exhibit performance that they believe is expected of them. Making sure that subjects are not aware of anticipated outcomes (referred to as a blind study) reduces the possibility of this threat.
Hawthorne Effects. Similar to a placebo, research has found that the mere presence of others watching your performance causes a change in your performance. If this change is significant, can we be reasonably sure that it will also occur when no one is watching? Addressing this issue can be tricky but employing a control group to measure the Hawthorne effect of those not receiving any treatment can be very helpful. In this sense, the control groups is also being observed and will exhibit similar changes in their behavior as the experimental group therefore negating the Hawthorne effect.
Order Effects (or Carryover Effects). Order effects refer to the order in which treatment is administered and can be a major threat to external validity if multiple treatments are used. If subjects are given medication for two months, therapy for another two months, and no treatment for another two months, it would be possible, and even likely, that the level of depression would be least after the final no treatment phase. Does this mean that no treatment is better than the other two treatments? It likely means that the benefits of the first two treatments have carried over to the last phase, artificially elevating the no treatment success rates.
Treatment Interaction Effects. The term interaction refers to the fact that treatment can affect people differently depending on the subject’s characteristics. Potential threats to external validity include the interaction between treatment and any of the following: selection, history, and testing. As an example, assume a group of subjects volunteer for a study on work experience and college grades. One group agrees to find part time work the summer before starting their freshman year and the other group agrees to join a softball leaguer over the summer. The group that agreed to work is likely inherently different than the group that agreed to play softball. The selection itself may have placed higher motivated subjects in one group and lower motivated students in the other. If the work groups earn higher grades in the first semester, can we truly say it was caused by the work experience? It is likely that the motivation caused both the work experience and the higher grades.
Table 7.2: Controlling for Threats to External Validity
Threat to Internal Validity
Blind study, control group
Counterbalancing treatment order, multiple groups
Treatment Interaction Effects
Subject matching, naturalistic observation