Introduction to Research Methods in Political Science: |
VIII. CONTROL VARIABLES
Subtopics |
SPSS Tools
|
This topic discusses the use of control variables in analyzing contingency tables. The process of introducing one or more control variables into such analysis is sometimes called elaboration because it allows us to “elaborate,” or expand upon, the relationship between two variables by investigating how that relationship is influenced by other variables.
The fact that two variables in a table are related does not necessarily mean that one is a cause of the other, even if the relationship is statistically significant and we are willing to reject the notion that the relationship is due to chance. Broadly speaking, there are four possible patterns that can result when a third variable is introduced into a relationship between two other variables. (Since the examples we will use in this topic to illustrate the elaboration model employ real data, they will not fit any one pattern in pure form.)
To introduce a control variable into a relationship displayed in a contingency table, the original table is broken down into two or more subtables, one for each value of the control variable. For example, if we control (as we will below) for region in examining the relationship between party identification and vote, we will have one subtable for each region. For each subtable, as well as for the original table, we will want to test for statistical significance and for the strength of the relationships. (Note: Because we are breaking one table down into two or more subtables, the number of cases in each subtable will be smaller than in the original table, and the relationships will tend to be less significant even when the degree of association is unchanged. If we have too few cases in some categories of the control variable, introducing a control variable may have little effect on the strength of the relationship, but cause the relationship to become statistically insignificant. If this happens, consider recoding the control variable into fewer categories.)
It is possible to control for two or more variables simultaneously. For example, we could control for both region and religion. This would result in a separate subtable for each combination of values of the control variables (Southern Protestants, Southern Catholics, etc.) The problem with doing this, in addition to complexity, is that for at least some of the subtables, there will likely not be enough cases to permit reliable analysis.
The
following table shows the relationship between voting in the 2004 presidential
election and party identification. (The
data, from the 2004 American National Election Study, are weighted by "weight.") Not surprisingly, there is a very strong
relationship between the two.

The next table breaks this same relationship down by region. Introducing a control for region has little effect on the relationship between the two variables. The overall pattern is “replicated” within each region of the county.

In 1996, Thomas Friedman proposed the “Golden Arches Theory of Conflict Prevention,” noting that “no two countries that both have a McDonald's have ever fought a war against each other.”[1] Friedman was not really suggesting that universal peace could be achieved simply by placing McDonald’s franchises in every country, but rather was arguing that economic development encourages both peace and the creation of establishments such as McDonald’s. In other words, he was hypothesizing that the independent variable (the presence or absence of McDonald’s) and the dependent variable (war or peace) are spuriously related — that one does not cause the other, but that both are products of economic development, and that the control variable, economic development, “explains” their relationship.
In the 2004 American National Election Study,
respondents were asked how likely they thought it was that recent immigrants to
the U.S. would take away jobs from those already here. When this variable
was crosstabulated with income, the results (with respondents again weighted by
"weight.") were as
follows:


The results indicate a fairly strong and statistically significant relationship, with concern about loss of jobs to immigrants much lower among higher income respondents. One explanation for this pattern might be that those with higher incomes are less likely to themselves face competition from immigrants for jobs. Another explanation, however, might be that higher education is associated with both higher income and with more pro-immigrant attitudes. In that case, the relationship between immigration attitudes and income might be spurious.
To test this, we can introduce a control for
education. If we do, we obtain these results:


We can see that, while some differences remain, the relationship between immigration attitude and income, within each category of education, is much weaker (with the Kendall's tauc statistic reduced from .208 to a range of .072 to .129), with none of the relationships being statistically significant. The original relationship, in other words, is at least mostly spurious, and can be mostly explained by education level.
The tables below show the relationship (using data from house.sav) between a vote on the House of Representatives on a bill extending cuts on taxes paid mainly by the economically well off and the percentage of the population in each member's district living below the poverty line. The bill was supported by the American Conservative Union. Not surprisingly, there is a fairly strong tendency of representatives from districts with high levels of poverty to be less likely to vote for this measure.


If
we control for members’ political party, however, we see that the relationship
all but disappears. This is a "textbook" illustration of
interpretation. The reason why members from poorer districts were more liberal on this issue is because they tend to be Democrats. Democrats
were about equally liberal, and Republicans uniformly conservative,
regardless of the level of poverty in their districts.


In both explanation and interpretation, introducing a control variable
reduces or eliminates the association between the independent and dependent
variables. The difference between
explanation and interpretation has to do with the sequencing of the independent
and control variables. In the former case,
the control variable is antecedent to
(that is, comes before) the independent variable. The independent and dependent variables are
related because both are dependent on the control variable, not because either
one is a cause of the other. In the latter,
the control variable is an intervening variable (that is, one that comes between the independent and dependent
variables in a causal sequence). The
independent variable does have an effect on the dependent variable, but does do
through the control variable.
Sometimes the relationship between an
independent and dependent variables will depend on the value of the control
variable. Consider, for example, the
relationship between voting in the 2004 presidential election and
ideology. The following table (with data
taken from the 2004 American National Election Study and weighted by "weight") shows a strong
relationship.

The relationship is, however, very different for African American and for white respondents. Among whites, ideology is a good predictor of how a respondent voted. Among African Americans, however, it is not. In other words, one needs to specify race in order to understand the relationship between ideology and vote. (Note that, though they roughly approximate their percentage of the population, African Americans in the sample are few in absolute numbers. There were even fewer respondents in other racial/ethnic groups, and so they were not included in the analysis.)


A Note Regarding Statistical Measures
In choosing measures of association and significance in conjunction with a crosstabulation using a control variable, what counts is the level of measurement of the independent and dependent variables, not that of the control variable. For example, if you are crosstabulating two ordinal variables and using a nominal level control variable, choose Kendall's tau.
antecedent variable
control variable
elaboration
explanation
interpretation
intervening variable
replication
specification
spurious
For each of the following exercises, describe and interpret the results. In each case, do the resulting patterns more closely resemble replication, explanation, interpretation, or specification.
Start SPSS. For exercises 1 through 4, open anes04s.sav and the 2004 American National Election Study Subset codebook.
1. In Rage of a Privileged Class (New York: Harper Collins, 1995), journalist Ellis Cose argues that, among African Americans, higher socio-economic status serves to make people more rather than less aware of racial prejudice. Compare the relationship between income and party identification among black respondents with the same relationship among whites. What about the relationship between income and party identification? Make these same comparisons, but substitute education for income. Because of limitations in sample size, use Select Cases to limit your analysis to black and white respondents. You will also need to recode income into no more than three categories each.
2. Is the relationship between attitude toward government funding of
abortions and attendance at religious services different for Protestants than
it is for Catholics? (Notes: there aren't enough people of other
religions, or of church-going atheists, in the sample to permit reliable
analysis. Use Select
Cases to limit your analysis to Protestant and Catholic respondents. You
will also need to recode both your independent and dependent variables into fewer categories.
3. Is the relationship between voting in the 2004 presidential election and attitude toward spending on social security different for respondents in different age categories? (Note: you will need to recode age before doing this analysis.)
4. Does a person's level of education influence the strength of the relationship between ideology and party identification? (You will need to recode ideology and education into fewer categories.) If so, why?
5. Open house.sav and the House codebook. Does a member's gender have an impact on how he or she votes on the roll calls included in the data? Does this relationship hold up if you introduce a control for the member's party?
Nelson, Elizabeth N., and Edward E. Nelson, “Introducing a Control Variable (Multivariate Analysis),” California Opinions on Women's Issues -- 1985-1995 http://www.csubak.edu/ssric-trd/modules/cowi/4.htm. August 15, 1998. Accessed November 25, 2003.
Shaffer,
Richard, “The Elaboration Model,” Soc
355: Social Data Collection and Analysishttp://cla.calpoly.edu/~rshaffer/Soc355/PowerPoint.355/Soc355_Pt2w_files/frame.htm. Accessed
Vasu,
Michael L., “The Elaboration Model,” PS
471: Public Opinion Research Methodology. http://www2.chass.ncsu.edu/mlvasu/ps471/D12.htm. Accessed
[1] Thomas L. Friedman, “Foreign Affairs Big Mac I,” New York Times,
Except where indicated, © 2003-2008 John L. Korey. Last updated September 9, 2008