Molly McPup

Introduction to Research Methods in Political Science:
The POWERMUTT* Project
(SPSS Version)

*Politically-Oriented Web-Enhanced Research Methods for Undergraduates — Topics and Tools
Resources for introductory research methods courses in political science and related disciplines

TABLE
OF
CONTENTS

IV. DISPLAYING CATEGORICAL DATA  

Subtopics

SPSS Tools

  • New with this topic
  • Review

 


Introduction

A picture is said to be worth a thousand words.  Tables and graphs, properly designed, can provide clear pictures of patterns contained in many thousands of pieces of information.   In this topic, we will describe several ways of displaying information about categorical variables in tabular and graphic form.  In later topics, ways of displaying information about continuous variables will be explained.


Frequency Tables

A frequency table (or frequency distribution) displays numbers and percentages for each value of a variable.  It is useful for categorical variables (that is, those with values falling into a relatively small number of discrete categories, such as party affiliation, religious affiliation, or region of a country) rather than for continuous variables (such as age in years or income in dollars). 

The following frequency distribution shows the number of seats in the U.S. House of Representatives by region of the country:

Pop Up Protocol (PUP) button

SPSS frequency distribution: region of country

The first column in the table provides a label for each category of the variable.  The second and third columns show, respectively, the number and percent of cases in each category for all cases.  The fourth column shows the percent in each category after eliminating cases for which we do not have information (missing data).  Since we know the location of every congressional district, the fourth column is identical to the third in this case.   The last column shows the cumulative percentages as one goes from the first to the last category.  Note that this last column makes sense only if the values of the variable can be meaningfully ranked.  In other words, cumulative frequencies assume at least ordinal level measurement.  The numbers in this column make no sense in this example, since it wouldn't be meaningful to say that 42.1 percent of seats "are held by the Midwest or less." 


Contingency Tables

A contingency table (also called a crosstabulation, or crosstab for short) displays the relationship between one categorical variable and another.  It is called a “contingency table” because it allows us to examine a hypothesis that the values of one variable are contingent (dependent) upon those of another.

The following crosstabulation shows the relationship between region and party affiation in the House of Representatives:

Pop Up Protocol (PUP) button

 

SPSS crosstab: party by region

There are several important things to notice about the way in which the table has been set up:

Do not let all the trees get in the way of seeing the forest.  In interpreting a crosstab, it is crucial to look for the overall pattern.  In this case, the table shows that there are substantial regional differences in party strength..  Look first for the overall pattern, and don’t allow yourself to get bogged down in the pursuit of trivia. 


Making Tables Presentable

The frequency distributions and crosstabs are presented above just as they were generated by SPSS.  This is the way in which tables will normally be presented in POWERMUTT, so that you can run your own analyses and compare your results to what is presented here.  For use in a term paper, however, you will probably want tables that are more aesthetic.  The following tables are a little more presentable, and also contain a bit more information, including 1) a title that briefly describes what the table is about, and 2) the source of the data used to generate the table.  By the same token, a lot of extraneous information has been omitted.  Tables usually present information only for valid cases.  Cumulative percentages are omitted from Table 1, since region is only a nominal variable.  Individual cell counts and row totals are omitted from Table 2, since this information can be reconstructed if needed from the information that is provided.  Ask your instructor whether it will be sufficient to copy and paste tables from SPSS into your word processor or whether you will need more formal tables such as shown here.

Table 1:
Regional Distribution of Seats
in the U.S. House of Representatives

 

 

Seats

Percent

Region

 

 

Northeast

 83

 19.1

Midwest

100

 23.0

South

154

 35.4

West

 98

 22.5

 

 

 

Totals

435

100.0

 

 

 

Source: Office of the Clerk, U.S. House of Representatives, Statistics of the Congressional Election of November 7, 2006 http://clerk.house.gov/member_info/electionInfo/2006election.pdf  Accessed August 10, 2007.

 

Table 2:
Party Distribution of Seats
in the U.S. House of Representatives
by Region

 

 

                                                         Region                                                        

 

Northeast

Midwest

South

West

Party

 

 

 

 

Democrat

 74.7%

  49.0%

 42.9%

  58.2%

Republican

25.3

51.0

57.1

41.8

 

 

 

 

 

Totals

100.0%

100.0%

100.0%

100.0%

 N

83

100

154

98

 

 

 

 

 

Source: Office of the Clerk, U.S. House of Representatives, Statistics of the Congressional Election of November 7, 2006.
http://clerk.house.gov/member_info/electionInfo/2006election.pdf

Accessed August 10, 2007.


Pie Charts

A pie chart is a simple way to show the distribution of a variable that has a relatively small number of values, or categories.  This figure, for example, is a pie chart showing the number of seats held in the U.S. House of Representatives by region:

Pop Up Protocol (PUP) button

Figure 1:
Regional Distribution of Seats in the U.S. House of Representatives


Bar Charts

Another way to portray this kind of information is with a bar chart, as shown in this figure:  

Pop Up Protocol (PUP) button

Figure 2:
Regional Distribution of Seats in the U.S. House of Representatives

 


Key Concepts

bar chart
contingency table
crosstab
crosstabulation
frequency distribution
frequency table
pie chart


Exercises

1.  Start SPSS .  Open the house.sav file. For the following variables, prepare frequency tables, pie charts, and bar charts: rc5 (Pork Barrell Spending), rc13 (United Nations Headquarters), and rc 20 (Voter Identification).  Crosstabulate each of these votes with member's party.  Which of the votes was close to a straight party-line vote?  Which was bipartisan (substantial majorities of both parties on the same side)?  Which was a "cross-partisan" vote (unified or nearly unified support from one party, with the other party much more divided)?  Can you find other examples from this dataset of partisan, bipartisan, and cross-partisan votes?

For one of these votes, convert your frequency and contingency tables into presentation-ready form.   

2.  In exercise 1 of the “Political Science as a Social Science” topic, you were asked to come up with hypotheses that might help explain party identification. Open the anes04s.sav file.  Open the 2004 American National Election Study Subset codebook.  Using partyid3 as the dependent variable, construct contingency tables to test the following hypotheses:

  • Men are more likely than women to identify with the Republican Party.
  • Southerners and midwesterners are more likely than others to identify with the Republican Party; northeasterners and westerners are more likely than others to identify with the Democratic Party.
  • The more education people have, the more likely they are to identify with the Democratic Party.
  • Married respondents are more likely than others to identify with the Republican Party.
  • The more regularly people attend religious services, the more likely they are to identify with the Republican Party.
  • Whites are less likely than others to identify with the Democratic Party

Come up with and test additional hypotheses.


For Further Study

Energy Information Administration, “Graphs and Charts,” Official Energy Statistics From the Government. I http://www.eia.doe.gov/pub/oil_gas/petroleum/analysis_publications/oil_market_basics/graphs_and_charts.htm.

Harris, Andy “Graphs and Charts,”   Syllabus of CSCI 100.  http://klingon.cs.iupui.edu/~aharris/mmcc/mod6/abss8.html.

Lane, David M., “Describing Univariate Data,” Hyperstat Online.  http://davidmlane.com/hyperstat/desc_univ.html.

Math League Multimedia, “Using Data and Statistics,” The Math League http://www.mathleague.com/help/data/data.htm.

Rosenberg, Scott, “The Data Artist,” Salon.com March 10, 1997 .  http://www.salon.com/march97/tufte970310.html.


[1] A more systematic method for assessing the reliability of percentages in a crosstab is discussed under the topic of contingency table analysis .

 

 


Except where indicated, © 2003-2008 John L. Korey.  Last updated September 9, 2008