Introduction to Research Methods in Political Science: |
X. STANDARD SCORES AND THE NORMAL DISTRIBUTION
Subtopics |
SPSS Tools
|
There is an old saying that “you can’t compare apples and oranges.” Fortunately, this is not always the case. Any interval or ratio variable can be converted to a standard unit of measurement called a standard score. This is especially convenient whenever variables are normally distributed.
Examine the variable descriptions in the codebook for the countries data. Notice that different variables are measured in radically different units of measurement including, among other things, dollars, people, kilowatt hours, and percentages. Any of these can easily be transformed into a standard (z) score which, by definition, will have a mean of 0 and a standard deviation of 1 regardless of the original unit of measurement. The formula for converting raw scores on any variable X to z scores is:
, where:
zi is the standard score for case i,
Xi is the raw score for case i,
μ is the mean of the variable, and
σ is the standard deviation of the variable.
Many variables are normally distributed. A typical curve of the normal distribution is shown in the figure below.[1] (A normal curve is sometimes called a “bell-shaped” curve.) Normal curves have certain defining characteristics. The most frequent values are found in the middle of the distribution, and taper off the further away one goes from the middle. The distribution is symmetric, meaning that the upper half of the distribution is a mirror image of the lower half. Taken together, the result is that the mean, median, and mode are all the same.
While many variables are normally distributed,
many are not. An easy way to tell if a
variable is at least more or less normally distributed is to construct a
histogram of the variable, and compare the result to a normal curve. The next figure describes an index of the self-identified ideology (liberal, moderate, conservative) of residents of different states derived from analysis
of CBS/New York Times polls by Gerald C. Wright et
al.[1] The index ranges from -100 (all residents of a state identifying as conservative) to 100 (all residents identifying as liberal).
Consider, on the other hand, the distribution of voting records in the U. S. Senate. The following histogram uses DW-NOMINATE scores produced for members of the U.S. Senate by Jeff Lewis and Keith Poole. This index ranges from -1 (most liberal) to 1 (most conservative).[2] The distribution is not even close to forming a
normal curve. While the two measures are not directly comparable, senators would seem to be far more polarized than their constituents. There are other ways to examine a variable in order to determine whether it
is normally distributed. A boxplot provides another tool. If a variable is
normally distributed, the median (the 50th percentile) will be
midway in the inter-quartile range (the range between the 25th and
75th percentiles), the length of the top and bottom “whiskers” above
and below the box will be about the same, and there will be few if any outliers
or extreme values beyond the whiskers. There are also a couple of descriptive statistics that help measure
departures from the normal distribution. Skewness measures departures
from normality due to the impact of very high or very low values. In a perfectly normal distribution, it will
have the value 0. If the mean is higher
than the median (because the mean is inflated by some very high values), a
distribution will have a positive skew. If the reverse is the case (due to some extremely low values), the skew
will be negative. Kurtosis measures “peakedness,” the tendency of values to cluster
near the middle of the distribution. In
a perfectly normal distribution, it will have the value 0. A positive kurtosis indicates that values are
more closely clustered toward the middle than would be the case in a normal
distribution, while a negative kurtosis indicates that values are more spread
out. Many statistical techniques that require at least interval level
measurement also require that variables be normally distributed. It is a good idea, therefore, to begin data
analysis with some exploratory research into the distribution of the variables
in the dataset. Considerable caution
should be exercised in analyzing variables with markedly non-normal distributions. If variables are normally distributed, standard scores become extremely
useful. It turns out that, in a normal
distribution, 68 percent of cases will be within one standard deviation of the
mean (that is, will have a z score within the range of ±1), 95 percent will be
within two standard deviations of the mean, and 99.7 percent will be within 3
standard deviations of the mean. In
fact, if a variable is normally distributed, you can, by converting raw scores
to z scores: Most statistics texts include a “table of the normal distribution” for these
purposes. There are also “applets”
(little applications) on the Internet that do the same thing. histogram
1. Start SPSS, and open countries.sav. Look at
the countries codebook. Calculate the means and standard deviations
for any two interval or ratio variables. Now compute two new
variables by converting each of the original variables to z scores. For the new variables, calculate means and
standard deviations. 2. Pick several variables in the Countries file, and obtain histograms,
comparing the results to a normal distribution. Are the variables at least roughly normally distributed? Why or why not? 3. Open senate.sav. Look at the senate codebook. The file includes several measures of the
voting behavior of senate members: Using explore, examine the distributions
of each of the these measures of senators’ voting records. (Ask for histograms rather than stem and leaf
plots.) Are these variables normally
distributed? Repeat the analysis, this
time using party as a “factor.” What
do these distributions look like? Repeat, but using house.sav. 5. Open states.sav. Look at the states codebook.
Obtain the means and standard deviations for several interval or ratio level
variables. In “Data View,” find the scores on these variables for your
state. Convert these scores to z
scores. On which variables is your state
least typical? 6. Open house.sav. Look at the house codebook. Repeat exercise 5 for your congressional
district. (If you are not sure which
district you are in, go to http://www.vote-smart.org/.
7. Go to http://psych.colorado.edu/%7Emcclella/java/normal/normz.html or to http://faculty.vassar.edu/lowry/tabs.html#z and, using the applet found there, answer the following questions about a
normally distributed variable with a mean of 50 and a standard deviation of 10: a. What is the z score for a raw score of 72? b. What percent of cases will have scores over 72? c. What percent of cases will have scores between 28 and 72?

normal distribution
kurtosis
skewness
standard (z) score
Brown, James Dean, “Skewness and Kurtosis,” The JALT Testing & Evaluation SIG Newsletter. April 1997. http://www.jalt.org/test/bro_1.htm. Accessed November 23, 2003.
Lane, David M., “What is a Normal Distribution?” Hyperstat. http://davidmlane.com/hyperstat/normal_distribution.html.
[1] http://php.indiana.edu/~wright1/. Accessed July 3, 2007.
[2] Royce Carroll, et al., “DW-NOMINATE Scores with Bootstrapped Standard Errors,” VoteView. http://voteview.com/. Updated: May 27, 2007. Accessed June 19, 2007.
Except where indicated, © 2003-2008 John L. Korey. Last updated July 11, 2008