Chapter 9.2 Inferential Procedures
Specific procedures used to make inferences about an unknown population or unknown score vary depending on the type of data used and the purpose of making the inference. There are five main categories of inferential procedures that will be discussed in this chapter: t-test, ANOVA, Factor Analysis, Regression Analysis, and Meta Analysis.
A t-test is perhaps the most simple of the inferential statistics. The purpose of this test is to determine if a difference exists between the means of two groups (think ‘t’ for two). For example, to determine if the GPA’s of students with prior work experience differs from the GPAs of students without this experience, we would employ the t-test by comparing the GPAs of each group to each other.
To compare these groups, the t-test statistical formula includes the means, standard deviations, and number of subjects for each group. Each of these sets of data can be derived by using descriptive statistics discussed in the previous chapter. Therefore, the t-test can be computed by hand in a relatively short amount of time depending on the number of subjects within each data set.
The term ANOVA is short for Analysis of Variance and is typically used when there are one or more independent variables and two or more dependent variables. If we were to study the effects of work experience on college grades, we would have one independent and one dependant variable and a simple t-test would suffice. What if we also wanted to understand the effects of age, race, and economic background on college grades? To use a simple t-test would mean we would have to perform one t-test for every pair of data. For this example, we would need to compare work and grades, age and grades, race and grades, and income and grades, resulting in four independent statistical procedures. Add an additional dependent variable, such as length of time it takes to graduate and we double the number of procedures required to eight.
We could do eight t-tests or we could simply do an ANOVA, which analyzes all eight sets of data at one time. The ANOVA is superior for complex analyses for two reasons, the first being its ability to combine complex data into one statistical procedure. The second benefit over a simple t-test is the ANOVA’s ability to determine what are called interaction effects. With a t-test we could determine if the means of older and younger students are different on the variable of grades (referred to as a main effect). We could also determine whether or not the means of whites and blacks differed in terms of grades (main effect as well), but we could not determine how these two variables (age and race) interact with each other. Consider the data in Table 9.1, representing the number of data points we would have for a study with just three independent variables (each with only two levels) and two dependent variables.
If you look at the data closely, you may notice that the mean GPA for blacks is 3.0 and the mean GPA for whites is also 3.0. A simple t-test comparing the means of blacks and whites would certainly not find a difference. However, when you combine with this the interaction of GPA and age, the data looks completely different. The mean GPA is 2.5 for older blacks, 3.5 for older whites, 3.5 for younger blacks, and 2.5 for younger whites. Now we can see that there is a difference between blacks and whites: (1) older blacks have higher GPAs than older whites and (2) younger whites have higher GPAs than younger blacks. This represents the interaction effects of race and age that would not have been detected by a simple t-test.
Table 9.1: Hypothetical Three Way Analysis of Variance with Two Means
Independent Variables Dependent Variables
Work Age Race GPA Time
Yes Older Black 3.0 12
No Older Black 2.0 8
Yes Older White 4.0 12
No Older White 3.0 8
Yes Younger Black 3.0 4
No Younger Black 4.0 8
Yes Younger White 2.0 4
No Younger White 3.0 8
Looking at work experience and length of time to graduation also reveals interesting results. For those with work experience, the mean time to graduation was eight years. For those without work experience, the average time to graduation was also eight years. But this simple main effect does not tell the whole story. See if you can determine any interaction effects that play a role in the length of time to graduation.
A factor analysis is used when an attempt is being made to break down a large data set into different subgroups or factors. By using a somewhat complex procedure that is typically performed using specialized software, a factor analysis will look at each question within a group of questions to determine how these questions accumulate together.
If we were to give a class a test on basic mathematics and then perform a factor analysis on the results, for example, we would likely find that questions related to addition tend to be answered at the same rate and questions related to subtraction would tend to be answered at the same rate. In other words, students who are good at addition would do well on most addition questions and students who were poor at addition would score poorly on most addition questions. Therefore a math test consisting of addition and subtraction would likely have two factors.
When a correlation is used we are able to determine the strength and direction of the relationship between two or more variables. If we determined that the correlation between a midterm and a final exam was +.95, we could say that these two tests are strongly and directly related to each other. In other words, a student who scored high on one would likely score high on the other.
Regression Analysis takes this a step further. By creating a regression formula based on the known data, we can predict a student’s score on the final (for example) merely by knowing her score on the midterm. If two variables were correlated at +1.0 or –1.0 (perfect correlations) this prediction would be extremely accurate. If the correlation coefficient was +/-0.9, the prediction would be good but less accurate than a perfect correlation. The farther from a perfect correlation, the less accurate the results of the prediction. Take a look at the perfectly correlated scores for the first five students below and see if you can predict the final exam score for the sixth student based on her score on the midterm.
Table 9.2: Hypothetical Test Scores
Student Midterm Final
Bob 80 88
Sue 50 55
Ling 60 66
Frank 80 88
Henry 90 99
Lisa 70 ??
When the data set is much larger and the correlation less than perfect, making a prediction requires the use of the statistical regression, which is basically a geometric formula used to determine where a score falls on a straight line. By using this statistic, we develop a formula that is used to estimate one data point based on another data point in a known correlation. The formula for the data above would be ‘Final = Midterm X 1.1.’ Did you predict Lisa’s score on the final correctly?
A meta analysis refers to the combining of numerous studies into one larger study. When this technique is used, each study becomes one subject in the new meta study. For instance, the combination of 12 studies on work experience and college grades would result in a meta study with 12 subjects. While the process is a little more complex than this in reality, the meta analysis basically combines many studies together to determine if the results of all of them, when taken as a whole, are significant. The meta study is especially helpful when different related studies conducted in the past have found different results.