Chapter 2.7: Analyzing Assessment Techniques

How Accurate are Personality Tests?

The accuracy of any assessment measure can be quite complicated, with some tests requiring hundreds of statistical analyses just to give you a few simple numbers. There are two statistics, however, that every professionally published assessment technique must provide in order to accepted as ‘good’ tests: reliability and validity.

Reliability

Reliability refers to an test’s ability to yield similar results each time the taken. It is best to see reliability as synonymous with consistency. When measuring personality traits we would expect results to be similar each time the test is taken due to the relative stability of personality. For example, if you scored high on a test of extroversion today, you would expect to score high on the same test next week or even next year.

Suppose, however, that you scored high today and scored low next week. How would you know your true score? A test measuring a stable trait must yield stable results in order to be reliable. As you can see from this example, an unreliable test is worthless as a measuring device. For a reliable assessment, on the other hand, you are more likely to get similar results each time you take it.

There are two major ways to determine the reliability of a test. The first is called test-retest reliability. To determine this statistic, the developers of an assessment technique would administer it to a group of individuals and then administer it again to the same people under the same circumstances some time in the future. A correlation would then be determined and knowing what you know now about this technique, you would expect the two scores to be positively correlated. In other words, a test with high test-retest reliability is one where the scores are strongly related in a positive manner.

Another technique to determine reliability is called internal consistency. Basically, a new assessment technique would be divided in two; the first half of the test versus the second half or odd questions versus even questions for example. The results of each score should be positively correlated if the test is truly a reliable technique. The benefits of splitting the test are in the test items themselves.

A 100 item test designed to measure assertiveness may have high test-retest reliability, but what if the first 50 questions are not correlated with the last 50 items? This test would have low internal consistency that suggests some of the questions are not measuring what they are intended to measure. This brings us the the next assessment statistic.

Validity

Simply put, a valid assessment is one that measures what it is intended to measure. Imagine taking your first test on the material you are learning here. As you sit down and the test is handed out, you look down and see only one question: 4 + 6 = ___. While this test may be very reliable since you are likely to answer “10” every time you take the test, it is not a valid measurement of your knowledge of personality theory. There are basically four different types of validity that we will discuss: face validity, predictive validity, congruent validity, and discriminant validity.

Face Validity. The easiest type of validity to determine is face validity because it basically asks ‘does the test look like it measures what it is intended to measure?’ The example above would have very low face validity because the question 4 + 6 = ___ obviously has little to do with psychology. However, a test of extroversion that asks questions such as “Do you enjoy group activities?” would have high face validity.

Predictive Validity. If you recall the five goals of psychology, you’ll remember that making predictions is an important aspect of reaching the ultimate goal of improving lives. Predictive validity refers to an assessment’s ability to do this. A valid test of relationship skills, for example, might predict an individual’s ease of making friends, comfort in group settings, or ability to effectively communicate.

Congruent Validity. Suppose you want to get an idea of a person’s intelligence but do not have the time to administer the more commonly used assessment techniques. You may want to use a less expensive or quicker measurement. If the test has high congruent validity, it would be a valid substitution. Congruent validity refers to a test’s congruency or relationship with a known valid and reliable measure of the same construct. In other words, a test that is positively correlated with a previously validated test is said to have high congruent validity with that test.

Discriminant Validity. Discriminant validity is just the opposite of congruent validity. If we want to validate our measurement of extroversion and we know of a valid test of introversion, we could give both tests to a group and expect the results to be opposite. Those who score high on the introversion test should score low on the extroversion test; they should be negatively correlated.

Specific Tests of Personality

As we progress through the text, we will discuss specific tests related to each theory. They will vary in terms of their validity and reliability as well as their approach, as no test has been shown to be perfect. In general, the higher the validity and the higher the reliability, the better the test. Understanding these concepts, the different types of assessment, as well as the basics of research will help you analyze the theories and assessment approaches that will be discussed throughout the rest of the text.