Molly McPup

Introduction to Research Methods in Political Science:
The POWERMUTT* Project
(for use with SPSS)

*Politically-Oriented Web-Enhanced Research Methods for Undergraduates — Topics & Tools
Resources for introductory research methods courses in political science and related disciplines

SITE
MAP

X. STANDARD SCORES AND THE NORMAL DISTRIBUTION

Subtopics

SPSS Tools


Introduction

There is an old saying that “you can’t compare apples and oranges.”  Fortunately, this is not always the case.  Any interval or ratio variable can be converted to a standard unit of measurement called a standard score.  This is especially convenient whenever variables are normally distributed.


Standard (z) scores

Examine the variable descriptions in the codebook for the countries data.  Notice that different variables are measured in radically different units of measurement including, among other things, dollars, people, and percentages.  Any of these can easily be transformed into a standard (z) score which, by definition, will have a mean of 0 and a standard deviation of 1 regardless of the original unit of measurement.  The formula for converting raw scores on any interval or ratio variable X to z scores is:

formula for standard (z) score, where:

zi is the standard score for case i,

Xi is the raw score for case i,

μ is the mean of the variable, and

σ is the standard deviation of the variable.

You can use the compute tool to convert any interval variable to z-scores once you know the mean and the standard deviation. An easier way is to let the descriptives procedure do the work for you − just check the box to "Save standardized values as variables."


The Normal Distribution

Many variables are normally distributed.  A typical curve of the normal distribution is shown in the figure below. (A normal curve is sometimes called a “bell-shaped” curve.)  Normal curves have certain defining characteristics.  The most frequent values are found in the middle of the distribution, and taper off the further away one goes from the middle.  The distribution is symmetric, meaning that the upper half of the distribution is a mirror image of the lower half.  Taken together, the result is that the mean, median, and mode are all the same.

Graph of normal distribution

While many variables are normally distributed, many are not.  An easy way to tell if a variable is at least more or less normally distributed is to construct a histogram of the variable, and compare the result to a normal curve.  The next figure describes an index of the self-identified ideology (liberal, moderate, conservative) of residents of different states derived from analysis of CBS/New York Times polls by Gerald C. Wright et al.[1]  The index ranges from -100 (all residents of a state identifying as conservative) to 100 (all residents identifying as liberal).   The distribution is approximately bell shaped.   In other words, there are a few very conservative and a few very liberal ones, but more are toward the middle of the road.

PUP (Pop Up Protocol) button

Figure 1: Distribution of Ideology in the American States

Consider, on the other hand, the distribution of voting records in the U. S. Senate.  The following histogram uses DW-NOMINATE scores produced for members of the U.S. Senate by Jeff Lewis and Keith Poole.  This index ranges from about -1 (most liberal) to about 1 (most conservative).[2]   The distribution is not even close to forming a normal curve.  While the two measures are not directly comparable, senators would seem to be far more polarized than their constituents.

PUP (Pop Up Protocol) button

Figure 2: Distribution of Ideology in the American Senate

There are other ways to examine a variable in order to determine whether it is normally distributed.  A boxplot provides another tool.   If a variable is at least approximately normally distributed, the median (the 50th percentile) will be midway in the inter-quartile range (the range between the 25th and 75th percentiles), the length of the top and bottom “whiskers” above and below the box will be about the same, and there will be few if any outliers or extreme values beyond the whiskers.  There are also a couple of descriptive statistics that help measure departures from the normal distribution.  Skewness measures departures from normality due to the impact of very high or very low values.  In a perfectly normal distribution, it will have the value 0.  If the mean is higher than the median (because the mean is inflated by some very high values), a distribution will have a positive skew.  If the reverse is the case (due to some extremely low values), the skew will be negative.  Kurtosis measures “peakedness,” the tendency of values to cluster near the middle of the distribution.  In a perfectly normal distribution, it will have the value 0.  A positive kurtosis indicates that values are more closely clustered toward the middle than would be the case in a normal distribution, while a negative kurtosis indicates that values are more spread out. 

Many statistical techniques that require at least interval level measurement also require that variables be normally distributed.   It is a good idea, therefore, to begin data analysis with some exploratory research into the distribution of the variables in the dataset.  Considerable caution should be exercised in analyzing variables with markedly non-normal distributions. 

If variables are normally distributed, standard scores become extremely useful.  It turns out that, in a normal distribution, 68 percent of cases will be within one standard deviation of the mean (that is, will have a z score within the range of ±1), 95 percent will be within two standard deviations of the mean, and 99.7 percent will be within 3 standard deviations of the mean.  In fact, if a variable is normally distributed, you can, by converting raw scores to z scores:

Most statistics texts include a “table of the normal distribution” for these purposes.  There are also “applets” (little applications) on the Internet that do the same thing (see exercise #6 below). 


Key Concepts

histogram
normal distribution
kurtosis
skewness
standard (z) score


Exercises 

 

Note: In SPSS, histograms can be produced using either the frequencies or the explore procedure.   There is also a separate procedure specifically designed to produce histograms.  Except for explore, these procedures include the option of superimposing a normal curve on the histogram.  Skewness and kurtosis can be produced with frequencies, descriptives, or explore.   Z-scores can be produced with descriptives or compute.

 1.  Start SPSS, and open countries.sav.  Look at the countries codebook.  Calculate the means and standard deviations for any two interval or ratio variables.  Now compute two new variables by converting each of the original variables to z scores.  For the new variables, calculate means and standard deviations. Use descriptives to accomplish the same purpose.

2.  Pick several variables in the Countries file, and obtain histograms, comparing the results to a normal distribution.  Are the variables at least roughly normally distributed?  Why or why not?

3.  Open senate.sav.  Look at the senate codebook.  The file includes several measures of the voting behavior of senate members:

Using explore, examine the distributions of each of these measures.  (Ask for histograms rather than stem and leaf plots.)  Are these variables normally distributed?  Repeat the analysis, this time using party as a “factor.”  What do these distributions look like?

As a test of validity, use boxplots to compare the distributions of acu, ada, and dwnom scores, broken down by party. (Note: Bernie Sanders of Vermont is coded as an independent for the party variable.  To treat party as a dummy variable either, 1) recode to treat Sanders as a Democrat (since he caucuses with the Democratic Party), 2) go to SPSS Variable View and make “3” a missing value for this variable, or 3) use select cases to exclude Sanders from your analysis.) Does the distribution for dwnom look anything like those for acu and ada? (Why not?) Now convert all three variables to standard scores and run the boxplots again. You should notice a dramatic difference.

As a test of reliability, make the same comparisons between the acu, ada, and unity measures for 2011 and their respective counterparts for 2012.

5. Open states.sav.  Look at the states codebook.  Obtain the means and standard deviations for several interval or ratio level variables.  In “Data View,” find the scores on these variables for your state.  Convert these scores to z scores.  On which variables is your state least typical? 

6.  Go to http://psych.colorado.edu/%7Emcclella/java/normal/normz.html or to http://faculty.vassar.edu/lowry/tabs.html#z and, using the applet found there, answer the following questions about a normally distributed variable with a mean of 50 and a standard deviation of 10:

a.      What is the z score for a raw score of 72?

b.      What percent of cases will have scores over 72?

c.      What percent of cases will have scores between 28 and 72?


For Further Study

Brown, James Dean, “Skewness and Kurtosis,” The JALT Testing & Evaluation SIG Newsletter.  April 1997. http://www.jalt.org/test/bro_1.htm. Accessed November 23, 2003.

Lane, David M., “What is a Normal Distribution?” Hyperstat.  http://davidmlane.com/hyperstat/normal_distribution.html. 


[1] http://php.indiana.edu/~wright1/. Accessed July 3, 2007.

[2]. Royce Carroll, et al., “DW-NOMINATE Scores with Bootstrapped Standard Errors,” VoteView. http://voteview.com/. Accessed February 25, 2013.

 


Last updated April 30, 2013 .
© 2003---2013  John L. Korey.  Licensed under a  Description: Creative Commons License Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.