Molly McPup

Introduction to Research Methods in Political Science:
The POWERMUTT* Project
(SPSS Version)

*Politically-Oriented Web-Enhanced Research Methods for Undergraduates — Topics and Tools
Resources for introductory research methods courses in political science and related disciplines

TABLE
OF
CONTENTS

VIII. CONTROL VARIABLES  

Subtopics

SPSS Tools

 


Introduction

This topic discusses the use of control variables in analyzing contingency tables.  The process of introducing one or more control variables into such analysis is sometimes called elaboration because it allows us to “elaborate,” or expand upon, the relationship between two variables by investigating how that relationship is influenced by other variables.

The fact that two variables in a table are related does not necessarily mean that one is a cause of the other, even if the relationship is statistically significant and we are willing to reject the notion that the relationship is due to chance.  Broadly speaking, there are four possible patterns that can result when a third variable is introduced into a relationship between two other variables.  (Since the examples we will use in this topic to illustrate the elaboration model employ real data, they will not fit any one pattern in pure form.)

To introduce a control variable into a relationship displayed in a contingency table, the original table is broken down into two or more subtables, one for each value of the control variable.  For example, if we control (as we will below) for region in examining the relationship between party identification and vote, we will have one subtable for each region.  For each subtable, as well as for the original table, we will want to test for statistical significance and for the strength of the relationships.  (Note: Because we are breaking one table down into two or more subtables, the number of cases in each subtable will be smaller than in the original table, and the relationships will tend to be less significant even when the degree of association is unchanged.  If we have too few cases in some categories of the control variable, introducing a control variable may have little effect on the strength of the relationship, but cause the relationship to become statistically insignificant.  If this happens, consider recoding the control variable into fewer categories.)

It is possible to control for two or more variables simultaneously.  For example, we could control for both region and religion.  This would result in a separate subtable for each combination of values of the control variables (Southern Protestants, Southern Catholics, etc.)  The problem with doing this, in addition to complexity, is that for at least some of the subtables, there will likely not be enough cases to permit reliable analysis.


Replication

The following table shows the relationship between voting in the 2004 presidential election and party identification.  (The data, from the 2004 American National Election Study, are weighted by "weight.")  Not surprisingly, there is a very strong relationship between the two.

Pop Up Protocol (PUP) button    

Crosstab of presidential vote by party ID

Statistics

The next table breaks this same relationship down by region.  Introducing a control for region has little effect on the relationship between the two variables.  The overall pattern is “replicated” within each region of the county. 

Pop Up Protocol (PUP) button  

Crosstab of presidential vote by party ID, controlling for region



Statistics


Explanation

In 1996, Thomas Friedman proposed the “Golden Arches Theory of Conflict Prevention,” noting that “no two countries that both have a McDonald's have ever fought a war against each other.”[1]  Friedman was not really suggesting that universal peace could be achieved simply by placing McDonald’s franchises in every country, but rather was arguing that economic development encourages both peace and the creation of establishments such as McDonald’s.  In other words, he was hypothesizing that the independent variable (the presence or absence of McDonald’s) and the dependent variable (war or peace) are spuriously related  —  that one does not cause the other, but that both are products of economic development, and that the control variable, economic development, “explains” their relationship.

In the 2004 American National Election Study, respondents were asked how likely they thought it was that recent immigrants to the U.S. would take away jobs from those already here.  When this variable was crosstabulated with income, the results (with respondents again weighted by "weight.") were as follows:  

POWERMUTT PUP

Crosstab of immigrants' perceived impact on jobs by income

Statistics

The results indicate a fairly strong and statistically significant relationship, with concern about loss of jobs to immigrants much lower among higher income respondents.  One explanation for this pattern might be that those with higher incomes are less likely to themselves face competition from immigrants for jobs.  Another explanation, however, might be that higher education is associated with both higher income and with more pro-immigrant attitudes.  In that case, the relationship between immigration attitudes and income might be spurious.

To test this, we can introduce a control for education.  If we do, we obtain these results: 

POWERMUTT PUP

Crosstab of immigrants' perceived impact on jobs by income, controlling for education

 

Statistics

We can see that, while some differences remain, the relationship between immigration attitude and income, within each category of education, is much weaker (with the Kendall's tauc statistic reduced from .208 to a range of .072 to .129), with none of the relationships being statistically significant.  The original relationship, in other words, is at least mostly spurious, and can be mostly explained by education level.


Interpretation

The tables below show the relationship (using data from house.sav) between a vote on the House of Representatives on a bill extending cuts on taxes paid mainly by the economically well off and the percentage of the population in each member's district living below the poverty line.  The bill was supported by the American Conservative Union. Not surprisingly, there is a fairly strong tendency of representatives from districts with high levels of poverty to be less likely to vote for this measure. 

POWERMUTT PUP

Crosstabulation of Vote on Tax Cuts with Poverty Level in District

Statistical Table Showing Cramer's V of .274 for Relationship

If we control for members’ political party, however, we see that the relationship all but disappears.  This is a "textbook" illustration of interpretation.  The reason why members from poorer districts were more liberal on this issue is because they tend to be Democrats.  Democrats were about equally liberal, and Republicans uniformly conservative, regardless of the level of poverty in their districts.   

POWERMUTT PUP

Crosstabulation of Vote on Tax Cuts with Poverty Level in District, Controlling for Party

Statistical Table Showing Cramer's V of .067 for Relationship Among Democrats, and None at all Among Republicans

In both explanation and interpretation, introducing a control variable reduces or eliminates the association between the independent and dependent variables.  The difference between explanation and interpretation has to do with the sequencing of the independent and control variables.  In the former case, the control variable is antecedent to (that is, comes before) the independent variable.  The independent and dependent variables are related because both are dependent on the control variable, not because either one is a cause of the other.  In the latter, the control variable is an intervening variable (that is, one that comes between the independent and dependent variables in a causal sequence).  The independent variable does have an effect on the dependent variable, but does do through the control variable.  


Specification

Sometimes the relationship between an independent and dependent variables will depend on the value of the control variable.  Consider, for example, the relationship between voting in the 2004 presidential election and ideology.  The following table (with data taken from the 2004 American National Election Study and weighted by "weight") shows a strong relationship.

POWERMUTT PUP

Crosstab of presidential vote by ideology

Statistics

The relationship is, however, very different for African American and for white respondents.  Among whites, ideology is a good predictor of how a respondent voted. Among African Americans, however, it is not.  In other words, one needs to specify race in order to understand the relationship between ideology and vote.  (Note that, though they roughly approximate their percentage of the population, African Americans in the sample are few in absolute numbers.  There were even fewer respondents in other racial/ethnic groups, and so they were not included in the analysis.) 

POWERMUTT PUP

Crosstab of peridential vote by ideology, controlling for race

Statistics

A Note Regarding Statistical Measures

In choosing measures of association and significance in conjunction with a crosstabulation using a control variable, what counts is the level of measurement of the independent and dependent variables, not that of the control variable.  For example, if you are crosstabulating two ordinal variables and using a nominal level control variable, choose Kendall's tau.


Key Concepts

antecedent variable
control variable
elaboration
explanation
interpretation
intervening variable
replication
specification
spurious


Exercises 

For each of the following exercises, describe and interpret the results.  In each case, do the resulting patterns more closely resemble replication, explanation, interpretation, or specification.

Start SPSS.  For exercises 1 through 4, open anes04s.sav and the 2004 American National Election Study Subset codebook.  

1. In Rage of a Privileged Class (New York: Harper Collins, 1995), journalist Ellis Cose argues that, among African Americans, higher socio-economic status serves to make people more rather than less aware of racial prejudice.  Compare the relationship between income and party identification among black respondents with the same relationship among whites.  What about the relationship between income and party identification?  Make these same comparisons, but substitute education for income.   Because of limitations in sample size, use Select Cases to limit your analysis to black and white respondents.  You will also need to recode income into no more than three categories each.

2.  Is the relationship between attitude toward government funding of abortions and attendance at religious services different for Protestants than it is for Catholics?  (Notes: there aren't enough people of other religions, or of church-going atheists, in the sample to permit reliable analysis.  Use Select Cases to limit your analysis to Protestant and Catholic respondents.  You will also need to recode both your independent and dependent variables into fewer categories. )

3.  Is the relationship between voting in the 2004 presidential election and attitude toward spending on social security different for respondents in different age categories?  (Note: you will need to recode age before doing this analysis.)

4. Does a person's level of education influence the strength of the relationship between ideology and party identification?  (You will need to recode ideology and education into fewer categories.)  If so, why?

5.  Open house.sav and the House codebook.  Does a member's gender have an impact on how he or she votes on the roll calls included in the data?  Does this relationship hold up if you introduce a control for the member's party?


For Further Study

Nelson, Elizabeth N., and Edward E. Nelson, “Introducing a Control Variable (Multivariate Analysis),” California Opinions on Women's Issues -- 1985-1995 http://www.csubak.edu/ssric-trd/modules/cowi/4.htm.  August 15, 1998. Accessed November 25, 2003.

Shaffer, Richard, “The Elaboration Model,” Soc 355: Social Data Collection and Analysishttp://cla.calpoly.edu/~rshaffer/Soc355/PowerPoint.355/Soc355_Pt2w_files/frame.htm.  Accessed November 27, 2003 . (A PowerPoint slide show.)

Vasu, Michael L., “The Elaboration Model,” PS 471: Public Opinion Research Methodologyhttp://www2.chass.ncsu.edu/mlvasu/ps471/D12.htm.  Accessed November 25, 2003 .


[1]  Thomas L. Friedman, “Foreign Affairs Big Mac I,” New York Times, December 8, 1996 .  Lexis-Nexis Academic Universe.  Critics of Friedman have pointed out that, when war broke out between Serbia and NATO forces in 1999, McDonald’s outlets in Belgrade were among the casualties.

 

 


Except where indicated, © 2003-2008 John L. Korey.  Last updated September 9, 2008