Experimental
Validity
If
a study is valid then it truly represents what it
was intended to represent.
Experimental validity refers to the manner in
which variables that influence both the results of
the research and the generalizability to the
population at large.
It is broken down into two groups: (1)
Internal Validity and (2) External Validity.
Internal
Validity.
Internal validity refers to a study’s
ability to determine if a causal relationship exists
between one or more independent variables and one or
more dependent variables.
In other words, can we be reasonably sure
that the change (or lack of change) was caused by
the treatment?
Researchers must be aware of aspects that may
reduce the internal validity of a study and do
whatever they can to control for these threats.
These threats, if left ignored, can reduce
validity to the point that any results are
meaningless rendering the entire study invalid.
There are eight major threats to internal
validity that are discussed below and summarized in
Table 7.1.
History.
History refers to any event outside of the
research study that can alter or effect subjects’
performance. Since
research does not occur within a vacuum, subjects
often experience environmental events that are
different from one another.
These events can play a role in their
performance and must therefore be addressed.
One way to assure that these events do not
impact the study is to control them, or make
everyone’s experience identical except for the
independent variable(s).
Since this is often impossible, using
randomization procedures can often minimize this
risk, assuring that outside events that occur in one
group are also likely to occur in the other.
Maturation.
While not a major concern in very short
studies such as a survey study, maturation can play
a major role in longer-term studies.
Maturation refers to the natural
physiological or psychological changes that take
place as we age.
This is especially important in childhood and
must be addressed through subject matching or
randomization.
For instance, an episode of major depression
typically decreases significantly within a six-month
period even without treatment.
Imagine we tested a new medication designed
to treat depression.
If our results showed that subjects who took
this medication showed a significant decrease in
depressive symptoms within six months, could we
truly say that the medication caused the decrease in
symptoms? Probably
not, especially since maturation alone would have
shown similar results.
Testing.
People tend to perform better at any activity
the more they are exposed to that activity.
Testing is no exception.
When subjects, especially in single group
studies, are given a test as a pretest and then the
same test as a posttest, the chances that they will
perform better the second time due merely to
practice is a concern.
For this reason, two group studies with a
control group are recommended.
Statistical
Regression.
Statistical regression, or regression to the
mean, is a concern especially in studies with
extreme scores.
It refers to the tendency for subjects who
score very high or very low to score more toward the
mean on subsequent testing.
If you get a 99% on a test, for instance, the
odds that your score will be lower the second time
are much greater than the odds of increasing your
score.
Instrumentation.
If the measurement device(s) used in your
study changes during the course of the study,
changes in scores may be related to the instrument
rather than the independent variable.
For instance, if your pretest and posttest
are different, the change in scores may be a result
of the second test being easier than the first
rather than the teaching method employed.
For this reason, it is recommended that pre-
and posttests be identical or at least highly
correlated.
Selection.
Selection refers to the manner in which
subjects are selected to participate in a study and
the manner in which they are assigned to groups.
If there are differences between the groups
prior to the study taking place, these differences
will continue throughout the study and may appear as
a change in a statistical analysis.
Addressing these differences through subject
matching or randomization is highly recommended.
Experimenter
Bias.
We engage in research in order to learn
something new or to support a belief or theory. Therefore, we as researchers may be biased toward the results
we want. This
bias can effect our observations and possibly even
result in blatant research errors that skew the
study in the direction we want.
Using an experimenter who is unaware of the
anticipated results (usually called a double blind
study because the tester is blind to the results)
works best to control for this bias.
Mortality.
Mortality, or subject dropout, is always a
concern to researchers.
They can drastically affect the results when
the mortality rate or mortality quality is different
between groups.
Imagine in the work experience study if many
motivated students dropped out of one group due to
illness and many low motivated students dropped out
of the other group due to personal factors.
The result would be a difference in
motivation between the two groups at the end and
could therefore invalidate the results.
Table
7.1: Controlling for Threats to Internal Validity
|
Threat
to Internal Validity |
Controlling
Threat |
|
History
|
Random
selection, random assignment |
|
Maturation |
Subject
matching, randomization |
|
Testing |
Control
group |
|
Statistical
Regression |
Omit
extreme scores, randomization |
|
Instrumentation
|
Instrumental
consistency, assure alternative form
reliability |
|
Selection |
Random
selection, random assignment |
|
Experimenter
Bias |
Double
blind study |
|
Mortality |
Subject
matching and omission |
External
Validity. External validity refers
to the generalizability of a study.
In other words, can we be reasonable sure
that the results of our study consisting of a sample
of the population truly represents the entire
population?
Threats to external validity can result in
significant results within a sample group but an
inability for this to be generalized to the
population at large.
Four of these threats are discussed below and
summarized in Table 7.2.
Demand
Characteristics.
Subjects are often provided with cues to the
anticipated results of a study.
When asked a series of questions about
depression, for instance, subjects may become wise
to the hypothesis that certain treatments work
better in treating mental illness.
When subjects become wise to anticipated
results (often called a placebo effect), they can
begin to exhibit performance that they believe is
expected of them. Making sure that subjects are not aware of anticipated
outcomes (referred to as a blind study) reduces the
possibility of this threat.
Hawthorne
Effects.
Similar to a placebo, research has found that
the mere presence of others watching your
performance causes a change in your performance.
If this change is significant, can we be
reasonably sure that it will also occur when no one
is watching? Addressing
this issue can be tricky but employing a control
group to measure the Hawthorne effect of those not
receiving any treatment can be very helpful. In this sense, the control groups is also being observed and
will exhibit similar changes in their behavior as
the experimental group therefore negating the
Hawthorne effect.
Order
Effects (or Carryover Effects).
Order effects refer to the order in which
treatment is administered and can be a major threat
to external validity if multiple treatments are
used. If subjects are given medication for two months, therapy for
another two months, and no treatment for another two
months, it would be possible, and even likely, that
the level of depression would be least after the
final no treatment phase.
Does this mean that no treatment is better
than the other two treatments?
It likely means that the benefits of the
first two treatments have carried over to the last
phase, artificially elevating the no treatment
success rates.
Treatment
Interaction Effects.
The term interaction refers to the fact that
treatment can affect people differently depending on
the subject’s characteristics.
Potential threats to external validity
include the interaction between treatment and any of
the following: selection, history, and testing.
As an example, assume a group of subjects
volunteer for a study on work experience and college
grades. One
group agrees to find part time work the summer
before starting their freshman year and the other
group agrees to join a softball leaguer over the
summer. The group that agreed to work is likely inherently different
than the group that agreed to play softball.
The selection itself may have placed higher
motivated subjects in one group and lower motivated
students in the other. If the work groups earn higher grades in the first semester,
can we truly say it was caused by the work
experience? It is likely that the motivation caused both the work
experience and the higher grades.
Table
7.2: Controlling for Threats to External Validity
|
Threat
to Internal Validity |
Controlling
Threat
|
|
Demand
Characteristics |
Blind
study, control group |
|
Hawthorne
Effect |
Control
group |
|
Order
Effects |
Counterbalancing
treatment order, multiple groups |
|
Treatment
Interaction Effects |
Subject
matching, naturalistic observation |
|