Introduction to Research Methods in Political Science: |
VII. CONTINGENCY TABLE
ANALYSIS
|
Subtopics |
SPSS Tools
|
There’s a lot that a
contingency table can tell you, if you know the right questions to ask. How
strong is the relationship shown in the table? What are the odds that the
relationship might have occurred just by chance? We'll take up the second
question first.
Doing empirical research involves
testing hypotheses suggesting that the value of one variable is related to that
of another variable. If we are working with sample data, we may find that
there is a relationship between two variables in our sample, and we wish to
know how confident we can be that the relationship is not simply due to chance
(or what we call “random sampling error”) but instead reflects a
relationship in the population from which the sample was drawn. How do we
go about doing this?
If we flip a balanced quarter, the
probability of it coming down heads is .5 (or p = .5). The probability of
it coming down heads twice in a row is .5 squared (p = .52
= .25). For ten heads in a row, p = .510 = .0009765, or less
than one chance in a thousand. If we do get ten heads in a row, we will
probably begin to suspect that our growing familiarity with the Father of Our Country
isn’t just a coincidence, and that there is something wrong with the
coin. Notice what we are doing here. We aren’t directly
testing the idea that the coin is unbalanced. Instead, we start off with
the working assumption that it is balanced. If our tests show that this
is very unlikely, we will reject this assumption.
Similarly, when we wish to test a
hypothesis stating that the value of one variable is related to that of
another, we begin with the working assumption that the variables are not
related. This working assumption is called the null hypothesis (Ho,
pronounced “H sub-naught”). We then employ techniques (one of
which is described below) that tell us the probability that the relationship in
our sample has occurred by chance. If that probability is sufficiently
low, we “reject the null hypothesis” and risk concluding that our
original hypothesis, oddly referred to by statisticians as the alternative
hypothesis (Ha, pronounced "H sub-a"), is
supported by the data. In doing so, we have concluded that the
relationship is statistically significant,
and make a statistical
inference about the population based on the data in the sample.
If, on the other hand, the probability (risk) that the null hypothesis is true
is too great, and we conclude that the relationship is not statistically
significant; we “fail to reject” the null hypothesis. This
isn’t the same thing as saying that the null hypothesis is true. It
may simply be that we don’t have enough data on which to base a reliable
conclusion.
The language in the preceding
paragraph probably seems rather convoluted. You’ll get used to it.
How low does the probability of
the null hypothesis being true have to be before we reject it? By convention,
a null hypothesis is not rejected unless the odds of it being true are less
than one in twenty (p < .05). Of course, it would be even better if
the risk were even less, and the odds were less than one in a hundred (p <
.01, or “significant at the .01 level”) or one in a thousand (p
< .001, or “significant at the .001 level”).
Tests for statistical
significance assume a simple random sample. While you will rarely be able
to work with pure simple random samples, carefully designed studies like the American
National Election Study or the General Social Survey come close enough to make
the use of such tests reasonable. Ivory, however, doesn't come from a
rat’s mouth. If you have a non-probability based sample (such as
those discussed in the Data
Collection topic) tests for statistical significance won’t bail you
out.
There is some debate as to
whether tests for statistical significance are necessary or even appropriate
when you are working, not with a sample, but with population data (which would
include the data files included in POWERMUTT for the House of Representatives
and the Senate, the American states, and the countries of the world).
After all, any relationship you find in your data, insofar as it is
otherwise valid, necessarily applies to the population [1].
In the examples used in POWERMUTT, we'll calculate measures of statistical
significance even when using population data, but you should be aware that
doing so is controversial.
Finally, the fact that a
relationship is statistically significant only means that you’ve
concluded that there is some relationship between two variables.
It does not necessarily mean that the relationship is a strong
one. To assess that, you need measures of association, an idea to which
we will return later in this topic.
There are a number of tests for
statistical significance that are used for various specific purposes (t-tests,
z-tests, F-ratios). Here we discuss chi-square, a widely used measure of
the statistical significance of a relationship between two variables displayed
in a crosstabulation.
Chi–square
(χ2) (There are actually several
versions of chi-square. The most common, and the
one we'll be using, is Pearson's
chi-square.)
Chi-square should be employed
when one or both variables are nominal. If both variables are ordinal or
higher, other more powerful tests are appropriate.
Chi-square is used to calculate
the probability that a relationship found in a sample between two variables is
due to chance (random sampling error). It does this by measuring the
difference between the actual frequencies in each cell of a table and the
frequencies one would expect to find if there were no relationship between the
variables in the population from which the (random) sample has been
drawn. The larger these differences are, the less likely it is that they
occurred by chance.
Sometimes, in addition to finding out where people stand on an issue, it’s also important to know how important (or “salient”) they think it is. Consider the following table,
which shows opinions on the importance of controlling illegal immigration broken down by
region of the country as measured in the 2008 American National Election Study
(with data weighted using the "weight" variable). While our dependent variable
(attitude toward the importance of controlling illegal immigration) is ordinal, our independent variable
(region) is only nominal.

The first number in each cell of
the table indicates the count,
also called the observed
frequency because it is the actual number of cases observed in the
sample for that cell. The second number in each cell is the cell count
as a percentage of the total number of cases in the column. We can see
that, in the sample, there are some regional differences. People in the
West are most likely to think that the issue is very important,
while people in the South are most likely to say that it is not important at all. The question is, what are the odds that we would find
differences as large as these just by chance, that is, if no regional
differences existed in the general population from which the sample was taken
(in this case, American adults 18 and older)?
The next table is presented to illustrate the process of calculating chi-square, and would not normally be included in an actual research report. It is the same as the first except that in each cell we have added the expected count, or expected frequency, which represents the number of cases we would expect to find in the cell if there were no regional differences. We can see from the row totals of either this or the preceding table that, for the entire sample, 57.4% of all respondents thought that the issue was very important, 35.8% thought it somewhat important, and 6.8% thought it not important at all. If we apply these percentages within each region, we will (except for rounding error) produce the expected frequencies that make up the second numbers in each cell of the new table. (In row 1, column 3, for example, 57.4% of 844 equals about 484.5. In other words, if the South were just like the rest of the country, we would expect about 484.5 southern respondents to think that controlling illegal immigration is very important. In fact, only 456 did so.)


In general, if we compare observed and expected frequencies we will notice
that in some cells there are more cases than the null hypothesis would have
led us to expect, while in other cells there are fewer. Chi-square
provides a summary measure of these differences. In the calculation of
chi-square, there are several steps involved [2] .
Differences between observed and expected frequencies (called
“residuals”) must be squared (otherwise, they would always add up
to zero), then “standardized” to take into account the fact that
some cells have larger expected frequencies than others. For a more
detailed explanation of how chi-square is calculated, visit http://davidmlane.com/hyperstat/chi_square.html.
For this table, chi-square = 17.458.
We next need to adjust for the
fact that some tables have more cells than others. We do this by
calculating the degrees
of freedom (d.f.) for the table, which are equal to the number of
rows minus 1 times the number of columns minus 1. In this case, since the
table has three rows and four columns, d.f. = (3 – 1)(4 – 1) = 6.
Once we’ve calculated the
value of chi-square and determined the degrees of freedom, we can look up the
probability that the differences in the sample are due to chance by referring
to a table of “critical values of chi-square” found in the
appendices of most statistics texts. Better yet, we can let the computer
figure it out for us. In this case, a chi-square of 17.458 in a table
with 6 degrees of freedom would occur by chance 8 times in a thousand (which
we would write as “p=.008).”. The relationship is indeed
statistically significant.
(In SPSS, “Asymp.
Sig. (2-sided) ” is equivalent to “p.” If “Asymp.
Sig. (2-sided)” is shown as “.000,” this really means
“<.0005,” which SPSS rounds to the nearest thousandth.)
In this topic, we will deal with
measures of association between two variables
calculated in conjunction with a contingency table. Elsewhere, we discuss
measures of association used when comparing
means and in doing regression
analysis.
As already noted, statistical
significance means only that we can confidently infer that there is some
degree of relationship between our variables in the population from which the
sample was drawn. We would also like to know how strong the
relationship is in the sample. Assessing the strength of a relationship
is where measures of association come in.
The best way to determine whether
a relationship in a table is strong or weak is to examine the table
itself. If the percentage differences among categories of the independent
variable seem important, they probably are. Just how big the differences
need to be to be considered important will vary with the research questions you
are asking. A difference of 10 percentage points is not very dramatic,
but in a two-candidate political campaign it could spell the difference between
a comfortable 55 to 45 percent win and a decidedly uncomfortable defeat by the
same margin.
Still, measures of association
between variables are a useful way to summarize the strength of a
relationship. This is especially true if you are running a large number
of crosstabulations and need a convenient way of sorting out the results to
determine which relationships are most important.
In general, measures of
association range in value from 0 (indicating no relationship), to ±1
(indicating a perfect relationship). Some measures appropriate for use
with nominal data range from 0 to a number approaching, but never reaching,
1. Measures appropriate with nominal data are always positive (since
direction has no meaning with nominal data), while those appropriate for use
with ordinal data or higher may be either positive or negative. The
strength of the relationship is indicated by the absolute value of the measure,
not its sign. An association of -.7 is much stronger than one of
+.2. Moreover, the sign of the relationship may simply be an artifact of
the way we have coded the data. For example, if ideology is coded on a
scale from 1 to 5, it is entirely arbitrary whether the higher numbers are
associated with liberalism or with conservatism.
Most measures of association are
“symmetric.” This means that the value of the measure is the
same whichever variable is considered to be the dependent variable. Some
measures are “asymmetric.” The measure will have one value if
the row variable is the dependent variable, and another if the column variable
is dependent.
Some, but not all, measures of
association have a Proportional Reduction in
Error (PRE) interpretation. Basically, such measures
indicate how much knowing the value of the independent variable improves our
ability to guess the value of the dependent variable. More formally, PRE
measures employ the following general formula:
![]()
where E1represents the
errors we will make guessing the value of the dependent variable if we do not
know the value of the independent variable, and E2 represents the
errors we will make guessing the value of the dependent variable if we do know
the value of the independent variable. If two variables are completely
unrelated, then E2 will be no less than E1 and the PRE
will be 0. If two variables are perfectly related, then E2
will be 0, and the PRE will be 1. A PRE of, for instance, .25 would
indicate a 25 percent reduction in error.
A non-PRE measure has no such
interpretation. All we can say about its value is that the further away
from zero (in either direction) it is, the stronger the relationship.
Some texts suggest various rules
of thumb for thresholds between weak, moderate, and strong relationships.
All such rules are fairly arbitrary. In general, you can expect that
measures of association will tend to be higher when higher levels of
measurement are used. In addition, individual data (such as from an
opinion survey) will tend to produce measures of association with more modest
values than those obtained from aggregate data (such as data from a census, or
from election returns by county). This is because individual data will
often contain a lot of “noise” variance (for example, even though
party identification is generally a good predictor of how people vote, you may
vote for a member of another party because the two of you went to the same high
school) that tends to be filtered out when data are aggregated.
The most important factor in
deciding which of the many available measures of association to use is the
level of measurement of your variables. In the remainder of this topic,
we will briefly describe some of the various measures of association produced by the
SPSS Crosstabs procedure for nominal and ordinal data. In order to use
any of the ordinal measures, both variables must be at least ordinal
level. If either or both are nominal, you will have to use a nominal
measure.
One measure of
association that can be used when one or both variables are nominal level is
lambda (λ). This measure has,
as we will see, some severe limitations, but is easy to compute, and is useful
for illustrating how PRE measures work.
In the American National Election
Study’s 2008 postelection survey, respondents were asked how they had
voted in the contests for seats in the U.S. House of Representatives. The next table compares the responses
of Democrats, Independents, and Republicans in order to test the
hypothesis that the way people vote is influenced by their party
identification. Data are again
weighted by the "weight"
variable. Votes for minor party candidates have been excluded.)

While party identification might
be considered an ordinal measure, voting choice is nominal, and so a
nominal measure of association is required. If you knew nothing about a
person other than that he or she was one of the respondents included in the
table, your best guess as to how the respondent voted would be made by picking
the response with the highest frequency (called the mode). In other words, you would guess that the respondent
voted for a Democratic candidate. You would be correct 696 times, but would be
in error 600 times. Let us call this number of errors E1.
If you know the person’s
party identification, you will still guess that the respondent voted for
a Democratic candidate if he or she was a Democrat or an independent, but for Republicans, your
best guess would be that the respondent voted for a Republican. In other words, you would guess the modal
value of the dependent variable within each category of the independent
variable. You would now make 302 (55+188+59) errors. Let us call
this number of errors E2.
Entering E1and E2
into the general PRE formula, we obtain:
λ = (600-302)/600 = .497
(Lambda is an asymmetric measure.
Since SPSS does not know which variable we wish to treat as dependent, it
calculates the measure both ways. Be sure to use the measure
appropriate to your hypothesis. In this instance, we pick the middle
number (.497), since "housevote" is the dependent variable.
Ignore the "symmetric" lambda. Also ignore Goodman and Kruskal's tau, which we are not covering.)
Often lambda severely understates
the strength of a relationship. Consider the relationship discussed earlier between attitude toward the importance of controlling immigration and region. Since the most common response in each region was that controlling illegal immigration is "very important," we know, without even having to calculate it, that the value of lambda for the relationship is 0. Consider also the relationship between attitude toward capital punishment and gender. Again using the 2008 American National Election Study Subset (and the same weight variable), in which the attitudes toward capital punishment
of men and women are compared.


There is a substantial difference
between the opinions of men and women on this issue. While most women,
like most men, favored capital punishment, women were less likely than men to do
so. However, since the modal (i.e., most common) choice of both of both
men and women was to strongly favor the death penalty, knowing a
respondent’s gender would not help you guess his or her position –
in either case "favor strongly" would be the best guess. The
value of lambda for this table is therefore 0. Lambda has, in other
words, failed to capture the substantial difference shown in the table.
The Goodman and Kruskal tau
(τ) is generally similar to lambda, and
suffers from the same tendency to understate the strength of relationships,
though not to the same degree.
Another approach to measuring the
strength of a relationship with nominal data is to standardize chi-square so
that, regardless of the sample size, it ranges from 0 to a number approaching
1. Cramer’s V, called phi (φ) in the case of 2 X 2 tables, and the contingency coefficient
(available in SPSS but not covered here) are both chi-square based measures. The maximum possible value of the
contingency coefficient depends on the number of cells in the table, which
makes it difficult to compare results for tables of different size.
Remember that these chi-square based measures do not have a PRE
interpretation. All you can say in interpreting your results is that the
higher the value, the stronger the relationship. In this case, Cramer's V
is .131. This is a bit
stronger than, say, .10 but a little weaker than .15.
A measure of association that can
be used when both variables are ordinal level is gamma
(γ). The basic notion behind gamma is that, in a
contingency table, if one case has a higher value than another on one variable,
it will have a higher value on the other if there is a positive relationship
between the variables, and will have a lower value on the other if there is a
negative relationship.
When one case has a higher value
than another case on both variables, the cases are said to form a concordant pair.
(Concordant literally means “singing together.”) When one
case has a higher value than another on one variable, but a lower value on
another, the cases are said to form a discordant pair. The
formula for gamma is:
![]()
where C is the total number of
concordant pairs, and D is the total number of discordant pairs.
If all pairs are concordant, then
gamma will equal 1; if they are all discordant, it will equal -1; if there are
equal numbers of concordant and discordant pairs, it will equal 0.
Consider the relationship between
opinion on the war in Iraq and party identification. You hypothesize that Democrats will be more likely than Republicans to think that the war was not worth the cost, with independents somewhere in between. In other words, we are treating party identification as an ordinal variable. We are also treating attitude toward the war as an ordinal variable, since there are presumably different degrees of support and opposition even though our measure is dichotomous (with respondents forced to choose between "worth it" and "not worth it"). The following table
shows the relationship between these two variables for respondents to the 2008 American National Election
Study. (Data are, once again, weighted by "weight.")
Each of the 57 cases in row 1,
column 1 forms a concordant pair with each case in the four cells below and to
the right of that cell. Similarly, each of the 151 cases in row 1, column
2 forms a concordant pair with each case in the four cells below and to the right
of it. The total number of concordant cells is calculated by taking each
cell, multiplying the number of cases in the cell with the total number of
cases, if any, in cells below and to the right of the cell, and then summing
the results. Discordant pairs are calculated in a similar manner, except
that each cell is paired with cells below and to its left.
Completing this admittedly tedious process (were it done manually) produces the
following results:
Concordant pairs:
57 X
(646+234) = 50,160
151 X (234) = 35,334
85,494
Discordant pairs:
151 X
(639)
= 96,489
294 X (639+646)
= 377,790
474,279
γ = (85,494 - 474,279)/(85,494 + 474,279) = (-388,794)/(559,773) = -.695
The absolute value (that is,
ignoring the sign [3] ) of the
coefficient has a PRE interpretation. It tells us, as a proportion of the
total number of pairs, how many more correct than incorrect guesses we make in
guessing whether a pair of cases is concordant or discordant if we know knowing
which case has the "higher" value for the independent variable. Our
hypothesis leads us to guess
that if person A is in a column to the left of person B when it comes to party identification, he or she will also be more likely to think that the war has not been worth the cost. In other words, we guess that pairs will be
discordant. There are, in fact, 388,794 more discordant than concordant
pairs, which is about 69.5 percent of the total. (Of course, this assumes
that you started out with the correct hypothesis. If you had hypothesized
that Republicans were more opposed to the war than Democrats, you would
have predicted that pairs would be concordant, and would have made 388,794 more
incorrect than correct guesses.)
Just as lambda has some
shortcomings, so too does gamma. Notice that it ignores ties. The
57 cases in row 1, column 1, for example, are tied with each other on both
variables, with the other cases in column 1 on party identification, and with other cases
in row 1 on attitude toward the war. Gamma simply ignores these. Other measures have
been devised which correct for ties. One is Kendall’s tau
(τ). There are two versions of this measure. Taub (pronounced “tau
sub b”) is used when there are an equal number of rows and columns in the
table. Tauc
(“tau sub c”) is used when the numbers of rows and columns are not
the same. A similar measure, though probably less widely used than
Kendall's tau, is called Somers' D.
In picking the best measure of
association to use, first ask yourself what level of measurement you’re
dealing with. If one or both variables are nominal, use Cramer’s V
unless your instructor tells you otherwise. If both are ordinal (again,
unless directed differently by your instructor) use Kendall’s taub if
the table contains equal numbers of rows and columns, tauc otherwise.
(In SPSS, if you ask for
Cramer’s V, you automatically get the “Approx. Sig.”
Notice that, if you ask for both Cramer’s V and Chi-square, the
“Approx. Sig.” for Cramer’s V appears to be identical to the “Asymp.
Sig.” for the Pearson Chi-square. They are identical, so you
really don’t need to ask for chi-square.)
If you ask for Kendall’s
tau, you automatically get another measure of statistical significance, called
the t test. The "Approx. Sig." for
this test is the significance of t, that is, the probability (p) that the
relationship could have occurred by chance.
alternative
hypothesis
chi-square
concordant pair
count
Cramer's V
degrees of freedom
discordant pair
expected frequency
gamma
Kendall's tau
lambda
measures of association
null hypothesis
observed frequency
Pearson's chi-square
Proportional Reduction in Error
statistical inference
statistically significant
taub
tauc
Start SPSS. Repeat the
crosstabulation exercises in the More About
Measurements topic but, in addition, ask for appropriate measures of
association. SPSS will automatically calculate the statistical
significance of the association.
Note: In some cases, you
may need to combine categories of one or more variables before running
crosstabulations. See recode
and compute for more
information.
Which relationships are
statistically significant at at least the .05 level? Which are at least
relatively strong?
Becker, Lee A., “CROSSTABS: Measures for Nominal Data,” http://www.uccs.edu/~faculty/lbecker/spss80/ctabs1.htm.
Becker, Lee A., “CROSSTABS:
Measures for Ordinal Data,”
Creative Research Systems,”
Statistical Significance,” The Survey Research System: Your Complete
Software Solution for Survey Research.
Lane, David M., “Chapter
16: Chi-Square,” HyperStat Textbook Online. http://davidmlane.com/hyperstat/chi_square.html.
[1]
James Neill, "Why Use Effect Sizes Instead of Significance Testing in
Program Evaluation?" http://wilderdom.com/research/effectsizes.html.
Last updated: September 11, 2008. Accessed August 30, 2010. For a
different perspective on this issue, see W. Phillips Shively, The Craft of
Political Research (6th edition). Upper Saddle River, NJ: Prentice
Hall, 2004: 160-161.
[2] The formula for computing chi-square is:
Σ[(fo
– fe)2/fe] where:
fo = the observed frequency in each cell, and
fe
= the expected frequency in each cell.
[3] If we had listed either (but not both) of the variables
in the opposite orede, the sign of gamma would have been +.695.
Except where indicated, © 2003-2012 John L. Korey Last Updated:
December 18, 2012