Introduction to Research Methods in Political Science: The POWERMUTT* Project (for use with SPSS) *Politically-Oriented Web-Enhanced Research Methods for Undergraduates — Topics & Tools Resources for introductory research methods courses in political science and related disciplines SITEMAP

VII. CONTINGENCY TABLE ANALYSIS

 Subtopics SPSS Tools Review

Introduction

There’s a lot that a contingency table can tell you, if you know the right questions to ask.  How strong is the relationship shown in the table?  What are the odds that the relationship might have occurred just by chance?  We'll take up the second question first.

Statistical Significance

Doing empirical research involves testing hypotheses suggesting that the value of one variable is related to that of another variable.  If we are working with sample data, we may find that there is a relationship between two variables in our sample, and we wish to know how confident we can be that the relationship is not simply due to chance (or what we call “random sampling error”) but instead reflects a relationship in the population from which the sample was drawn.  How do we go about doing this?

If we flip a balanced quarter, the probability of it coming down heads is .5 (or p = .5).  The probability of it coming down heads twice in a row is .5 squared (p = .52 = .25).  For ten heads in a row, p = .510 = .0009765, or less than one chance in a thousand.  If we do get ten heads in a row, we will probably begin to suspect that our growing familiarity with the Father of Our Country isn’t just a coincidence, and that there is something wrong with the coin.  Notice what we are doing here.  We aren’t directly testing the idea that the coin is unbalanced.  Instead, we start off with the working assumption that it is balanced.  If our tests show that this is very unlikely, we will reject this assumption.

Similarly, when we wish to test a hypothesis stating that the value of one variable is related to that of another, we begin with the working assumption that the variables are not related.  This working assumption is called the null hypothesis (Ho, pronounced “H sub-naught”).  We then employ techniques (one of which is described below) that tell us the probability that the relationship in our sample has occurred by chance.  If that probability is sufficiently low, we “reject the null hypothesis” and risk concluding that our original hypothesis, oddly referred to by statisticians as the alternative hypothesis (Ha, pronounced "H sub-a"), is supported by the data.  In doing so, we have concluded that the relationship is statistically significant, and make a statistical inference about the population based on the data in the sample.  If, on the other hand, the probability (risk) that the null hypothesis is true is too great, and we conclude that the relationship is not statistically significant; we “fail to reject” the null hypothesis.  This isn’t the same thing as saying that the null hypothesis is true.  It may simply be that we don’t have enough data on which to base a reliable conclusion.

The language in the preceding paragraph probably seems rather convoluted.  You’ll get used to it.

How low does the probability of the null hypothesis being true have to be before we reject it?  By convention, a null hypothesis is not rejected unless the odds of it being true are less than one in twenty (p < .05).  Of course, we could be even more confident in rejecting the null hypothesis if the risk were even smaller, and the odds were less than one in a hundred (p < .01, or “significant at the .01 level”) or one in a thousand (p < .001, or “significant at the .001 level”).

Tests for statistical significance assume a simple random sampleWhile you will rarely be able to work with pure simple random samples, carefully designed studies like the American National Election Study or the General Social Survey come close enough to make the use of such tests reasonable.  Ivory, however, doesn't come from a rat’s mouth.  If you have a non-probability based sample (such as those discussed in the Varieties of Data topic) tests for statistical significance won’t bail you out.

There is some debate as to whether tests for statistical significance are necessary or even appropriate when you are working, not with a sample, but with population data (which would include the data files included in POWERMUTT for the Senate, the American states, and the countries of the world).  After all, any relationship you find in your data, insofar as it is otherwise valid, necessarily applies to the population.[1]  In the examples used in POWERMUTT, we'll calculate measures of statistical significance even when using population data, but you should be aware that doing so is controversial.

Finally, the fact that a relationship is statistically significant only means that you’ve concluded that there is some relationship between two variables.  It does not necessarily mean that the relationship is a strong one.  To assess that, you need measures of association, an idea to which we will return later in this topic.

There are a number of tests for statistical significance that are used for various specific purposes (t-tests, z-tests, F-ratios).  Here we discuss chi-square, a widely used measure of the statistical significance of a relationship between two variables displayed in a crosstabulation.

Chi–square (χ2)  (There are actually several versions of chi-square.  The most common, and the one we'll be using, is Pearson's chi-square.)

Chi-square should be employed when one or both variables are nominal.  If both variables are ordinal or higher, other more powerful tests are appropriate.

Chi-square is used to calculate the probability that a relationship found in a sample between two variables is due to chance (random sampling error).  It does this by measuring the difference between the actual frequencies in each cell of a table and the frequencies one would expect to find if there were no relationship between the variables in the population from which the (random) sample has been drawn.  The larger these differences are, the less likely it is that they occurred by chance.

Sometimes, in addition to finding out where people stand on an issue, it’s also important to know how important (or “salient”) they think it is.  In the following table, using data from the 2008 American National Election Study, opinion on the importance of the issue of controlling illegal immigration is broken down by region of the country (with data weighted using the "weight" variable).  While our dependent variable (attitude toward the importance of the issue) is ordinal, our independent variable (region) is only nominal.

The first number in each cell of the table indicates the count, also called the observed frequency because it is the actual number of cases observed in the sample for that cell.   The second number in each cell is the cell count as a percentage of the total number of cases in the column.  We can see that, in the sample, there are some regional differences.  People in the West are most likely to think that the issue is very important, while people in the South are most likely to say that it is not important at all.  The question is, what are the odds that we would find differences as large as these just by chance, that is, if no regional differences existed in the general population from which the sample was taken (in this case, American adults 18 and older)?

The next table is presented to illustrate the process of calculating chi-square, and would not normally be included in an actual research report.  It is the same as the first except that in each cell we have added the expected count, or expected frequency, which represents the number of cases we would expect to find in the cell if there were no regional differences.  We can see from the row totals of either this or the preceding table that, for the entire sample, 57.4% of all respondents thought that the issue was very important, 35.8% thought it somewhat important, and 6.8% thought it not important at all.  If we apply these percentages within each region, we will (except for rounding error) produce the expected frequencies that make up the second numbers in each cell of the new table.  (In row 1, column 3, for example, 57.4% of 844 equals about 484.5.  In other words, if the South were just like the rest of the country, we would expect about 484.5 southern respondents to think that the issue of controlling illegal immigration is very important.  In fact, only 456 did so.)

In general, if we compare observed and expected frequencies we will notice that in some cells there are more cases than the null hypothesis would have led us to expect, while in other cells there are fewer.  Chi-square provides a summary measure of these differences.  In the calculation of chi-square, there are several steps involved.[2]  Differences between observed and expected frequencies (called “residuals”) must be squared (otherwise, they would always add up to zero), then “standardized” to take into account the fact that some cells have larger expected frequencies than others.  For a more detailed explanation of how chi-square is calculated, visit http://davidmlane.com/hyperstat/chi_square.html.  For this table, chi-square = 17.458.

We next need to adjust for the fact that some tables have more cells than others.  We do this by calculating the degrees of freedom (d.f.) for the table, which are equal to the number of rows minus 1 times the number of columns minus 1.  In this case, since the table has three rows and four columns, d.f. = (3 – 1)(4 – 1) = 6.

Once we’ve calculated the value of chi-square and determined the degrees of freedom, we can look up the probability that the differences in the sample are due to chance by referring to a table of “critical values of chi-square” found in the appendices of most statistics texts.  Better yet, we can let the computer figure it out for us.  In this case, a chi-square of 17.458 in a table with 6 degrees of freedom would occur by chance 8 times in a thousand (which we would write as “p=.008).”.  The relationship is indeed statistically significant.

(In SPSS, “Asymp. Sig. (2-sided) ” is equivalent to “p.”  If “Asymp. Sig. (2-sided)” is shown as “.000,” this really means “<.0005,” which SPSS rounds to the nearest thousandth.)

Measures of Association

In this topic, we will deal with measures of association between two variables calculated in conjunction with a contingency table.  Elsewhere, we discuss measures of association used when comparing means and in doing regression analysis.

As already noted, statistical significance means only that we can confidently infer that there is some degree of relationship between our variables in the population from which the sample was drawn.  We would also like to know how strong the relationship is in the sample.  Assessing the strength of a relationship is where measures of association come in.

The best way to determine whether a relationship in a table is strong or weak is to examine the table itself.  If the percentage differences among categories of the independent variable seem important, they probably are.  Just how big the differences need to be to be considered important will vary with the research questions you are asking.  A difference of 10 percentage points is not very dramatic, but in a two-candidate political campaign it could spell the difference between a comfortable 55 to 45 percent win and a decidedly uncomfortable defeat by the same margin.

Still, measures of association between variables are a useful way to summarize the strength of a relationship.  This is especially true if you are running a large number of crosstabulations and need a convenient way of sorting out the results to determine which relationships are most important.

In general, measures of association range in value from 0 (indicating no relationship), to ±1 (indicating a perfect relationship).  Some measures appropriate for use with nominal data range from 0 to a number approaching, but never reaching, 1.  Measures appropriate with nominal data are always positive (since direction has no meaning with nominal data), while those appropriate for use with ordinal data or higher may be either positive or negative.  The strength of the relationship is indicated by the absolute value of the measure, not its sign.  An association of -.7 is much stronger than one of +.2.  Moreover, the sign of the relationship may simply be an artifact of the way we have coded the data.  For example, if ideology is coded on a scale from 1 to 5, it is entirely arbitrary whether the higher numbers are associated with liberalism or with conservatism.

Most measures of association are “symmetric.”  This means that the value of the measure is the same whichever variable is considered to be the dependent variable.  Some measures are “asymmetric.”  The measure will have one value if the row variable is the dependent variable, and another if the column variable is dependent.

Some, but not all, measures of association have a Proportional Reduction in Error (PRE) interpretation.  Basically, such measures indicate how much knowing the value of the independent variable improves our ability to guess the value of the dependent variable.  More formally, PRE measures employ the following general formula:

where E1represents the errors we will make guessing the value of the dependent variable if we do not know the value of the independent variable, and E2 represents the errors we will make guessing the value of the dependent variable if we do know the value of the independent variable.  If two variables are completely unrelated, then E2 will be no less than E1 and the PRE will be 0.  If two variables are perfectly related, then E2 will be 0, and the PRE will be 1.  A PRE of, for instance, .25 would indicate a 25 percent reduction in error.

A non-PRE measure has no such interpretation.  All we can say about its value is that the further away from zero (in either direction) it is, the stronger the relationship.

Some texts suggest various rules of thumb for thresholds between weak, moderate, and strong relationships.  All such rules are fairly arbitrary.  In general, you can expect that measures of association will tend to be higher when higher levels of measurement are used.  In addition, individual data (such as from an opinion survey) will tend to produce measures of association with more modest values than those obtained from aggregate data (such as data from a census, or from election returns by county).  This is because individual data will often contain a lot of “noise” variance (for example, even though party identification is generally a good predictor of how people vote, you may vote for a member of another party because the two of you went to the same high school) that tends to be filtered out when data are aggregated.

The most important factor in deciding which of the many available measures of association to use is the level of measurement of your variables.  In the remainder of this topic, we will briefly describe some of the various measures of association produced by the SPSS Crosstabs procedure for nominal and ordinal data.  In order to use any of the ordinal measures, both variables must be at least ordinal level.  If either or both are nominal, you will have to use a nominal measure.

Nominal Measures

One measure of association that can be used when one or both variables are nominal level is lambda (λ).  This measure has, as we will see, some severe limitations, but is easy to compute, and is useful for illustrating how PRE measures work.

In the American National Election Study’s 2008 postelection survey, respondents were asked how they had voted in the contests for seats in the U.S. House of Representatives.  The next table compares the responses of Democrats, Independents, and Republicans in order to test the hypothesis that the way people vote is influenced by their party identification.  Data are again weighted by the "weight" variable.  Votes for minor party candidates have been excluded.)

While party identification might be considered an ordinal measure, voting choice is nominal, and so a nominal measure of association is required.  If you knew nothing about a person other than that he or she was one of the respondents included in the table, your best guess as to how the respondent voted would be made by picking the response with the highest frequency (called the mode).  In other words, you would guess that the respondent voted for a Democratic candidate.  You would be correct 696 times, but would be in error 600 times.  Let us call this number of errors E1.

If you know the person’s party identification, you will still guess that the respondent voted for a Democratic candidate if he or she was a Democrat or an independent, but for Republicans, your best guess would be that the respondent voted for a Republican.  In other words, you would guess the modal value of the dependent variable within each category of the independent variable.  You would now make 302 (55+188+59) errors.  Let us call this number of errors E2.

Entering E1and E2 into the general PRE formula, we obtain:

λ = (600-302)/600 = .497

In other words, by knowing a respondent’s party identification, we are able to realize a proportional reduction in error of 49.7 percent.

(Lambda is an asymmetric measure.  Since SPSS does not know which variable we wish to treat as dependent, it calculates the measure both ways.  Be sure to use the measure appropriate to your hypothesis.  In this instance, we pick the middle number (.497), since "housevote" is the dependent variable.  Ignore the "symmetric" lambda. Also ignore Goodman and Kruskal's tau, which we are not covering.)

Often lambda severely understates the strength of a relationship.  Consider the relationship discussed earlier between attitude toward the importance of controlling illegal immigration and region. Since the most common response in each region was that the issue is "very important," we know, without even having to calculate it, that the value of lambda for the relationship is 0. Consider also the relationship between attitude toward capital punishment and gender. Again using the 2008 American National Election Study Subset (and the same weight variable), in which the attitudes toward capital punishment of men and women are compared.

There is a substantial difference between the opinions of men and women on this issue.  While most women, like most men, favored capital punishment, women were less likely than men to do so.  However, since the modal (i.e., most common) choice of both of both men and women was to strongly favor the death penalty, knowing a respondent’s gender would not help you guess his or her position – in either case "favor strongly" would be the best guess.  The value of lambda for this table is therefore 0.  Lambda has, in other words, failed to capture the substantial difference shown in the table.

The Goodman and Kruskal tau (τ) is generally similar to lambda, and suffers from the same tendency to understate the strength of relationships, though not to the same degree.

Another approach to measuring the strength of a relationship with nominal data is to standardize chi-square so that, regardless of the sample size, it ranges from 0 to a number approaching 1.  Cramer’s V, called phi (φ) in the case of 2 X 2 tables, and the contingency coefficient (available in SPSS but not covered here) are both chi-square based measures.  (A disadvantage of the contingency coefficient is that its maximum possible depends on the number of cells in the table, which makes it difficult to compare results for tables of different size.)  Remember that these chi-square based measures do not have a PRE interpretation.  All you can say in interpreting your results is that the higher the value, the stronger the relationship.  In this case, Cramer's V is .131.  This is a bit stronger than, say, .10 but a little weaker than .15.

Ordinal Measures

A measure of association that can be used when both variables are ordinal level is gamma (γ).  The basic notion behind gamma is that, in a contingency table, if one case has a higher value than another on one variable, it will have a higher value on the other if there is a positive relationship between the variables, and will have a lower value on the other if there is a negative relationship.

When one case has a higher value than another case on both variables, the cases are said to form a concordant pair.  (Concordant literally means “singing together.”)  When one case has a higher value than another on one variable, but a lower value on another, the cases are said to form a discordant pair.  The formula for gamma is:

where C is the total number of concordant pairs, and D is the total number of discordant pairs.

If all pairs are concordant, then gamma will equal 1; if they are all discordant, it will equal -1; if there are equal numbers of concordant and discordant pairs, it will equal 0.

Consider the relationship between opinion on the war in Iraq and party identification.  You hypothesize that Democrats will be more likely than Republicans to think that the war was not worth the cost, with independents somewhere in between. In other words, we are treating party identification as an ordinal variable. We are also treating attitude toward the war as an ordinal variable, since there are presumably different degrees of support and opposition even though our measure is dichotomous (with respondents forced to choose between "worth it" and "not worth it"). The following table shows the relationship between these two variables for respondents to the 2008 American National Election Study.  (Data are, once again, weighted by "weight.")

Each of the 57 cases in row 1, column 1 forms a concordant pair with each case in the four cells below and to the right of that cell.  Similarly, each of the 151 cases in row 1, column 2 forms a concordant pair with each case in the four cells below and to the right of it.  The total number of concordant cells is calculated by taking each cell, multiplying the number of cases in the cell with the total number of cases, if any, in cells below and to the right of the cell, and then summing the results.  Discordant pairs are calculated in a similar manner, except that each cell is paired with cells below and to its left.  Completing this admittedly tedious process (were it done manually) produces the following results:

Concordant pairs:

57  X (646+234) =  50,160
151 X (234)     =  35,334
85,494

Discordant pairs:

151 X (639)     =   96,489
294 X (639+646) =  377,790
474,279

γ = (85,494 - 474,279)/(85,494 + 474,279) = (-388,794)/(559,773) = -.695

The absolute value (that is, ignoring the sign[3] of the coefficient has a PRE interpretation.  It tells us, as a proportion of the total number of pairs, how many more correct than incorrect guesses we make in guessing whether a pair of cases is concordant or discordant if we know knowing which case has the "higher" value for the independent variable.  Our hypothesis leads us to guess that if person A is in a column to the left of person B when it comes to party identification, he or she will also be more likely to think that the war has not been worth the cost.  In other words, we guess that pairs will be discordant.  There are, in fact, 388,794 more discordant than concordant pairs, which is about 69.5 percent of the total.  (Of course, this assumes that you started out with the correct hypothesis.  If you had hypothesized that Republicans were more opposed to the war than Democrats, you would have predicted that pairs would be concordant, and would have made 388,794 more incorrect than correct guesses.)

Just as lambda has some shortcomings, so too does gamma.  Notice that it ignores ties.  The 57 cases in row 1, column 1, for example, are tied with each other on both variables, with the other cases in column 1 on party identification, and with other cases in row 1 on attitude toward the war.  Gamma simply ignores these.  Other measures have been devised which correct for ties. One is Kendall’s tau (τ).  There are two versions of this measure.  Taub (pronounced “tau sub b”) is used when there are an equal number of rows and columns in the table.  Tauc (“tau sub c”) is used when the numbers of rows and columns are not the same.  A similar measure, though probably less widely used than Kendall's tau, is called Somers' D.

Which Tests Should You Use?

In picking the best measure of association to use, first ask yourself what level of measurement you’re dealing with.  If one or both variables are nominal, use Cramer’s V unless your instructor tells you otherwise.  If both are ordinal (again, unless directed differently by your instructor) use Kendall’s taub if the table contains equal numbers of rows and columns, tauc otherwise.

(In SPSS, if you ask for Cramer’s V, you automatically get the “Approx. Sig.”  Notice that, if you ask for both Cramer’s V and Chi-square, the “Approx. Sig.” for Cramer’s V appears to be identical to the “Asymp. Sig.” for the Pearson Chi-square.  They are identical, so you really don’t need to ask for chi-square.)

If you ask for Kendall’s tau, you automatically get another measure of statistical significance, called the t test.  The "Approx. Sig." for this test is the significance of t, that is, the probability (p) that the relationship could have occurred by chance.

Key Concepts

Exercises:

Start SPSS.  Repeat the crosstabulation exercises in the More About Measurements topic but, in addition, ask for appropriate measures of association.  SPSS will automatically calculate the statistical significance of the association.

Note:  In some cases, you may need to combine categories of one or more variables before running crosstabulations.  See recode and compute for more information.

Which relationships are statistically significant at at least the .05 level?  Which are at least relatively strong?

For Further Study

Becker, Lee A., “CROSSTABS: Measures for Nominal Data,” http://www.uccs.edu/~faculty/lbecker/spss80/ctabs1.htm.

Becker, Lee A., “CROSSTABS: Measures for Ordinal Data,” http://www.uccs.edu/~faculty/lbecker/spss80/ctabs2.htm.

Creative Research Systems,” Significance in Statistics and Surveys,” The Survey System: Customize Your Surveys with Our Packages. http://www.surveysystem.com/signif.htm.

Lane, David M., “Chapter 16: Chi-Square,” HyperStat Textbook Onlinehttp://davidmlane.com/hyperstat/chi_square.html.

[1] James Neill, "Why Use Effect Sizes Instead of Significance Testing in Program Evaluation?" http://wilderdom.com/research/effectsizes.htmlLast updated: September 11, 2008.  Accessed February 25, 2013.  For a different perspective on this issue, see W. Phillips Shively, The Craft of Political Research (6th edition).  Upper Saddle River, NJ: Prentice Hall, 2004: 160-161.

[2] The formula for computing chi-square is:
Σ[(fo –  fe)2/fe] where:
fo = the observed frequency in each cell, and
fe = the expected frequency in each cell.

Last updated April 28, 2013 .
© 2003---2013  John L. Korey.  Licensed under a   Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.