|
CHI-SQUARE INDEPENDENCE TESTName:
A common question with regards to a two-way contingency table is whether we have independence. By independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the row variable will not help us predict the value of column variable and likewise knowing the value of the column variable will not help us predict the value of the row variable). A more technical definition for independence is that
One such test is the chi-square test for independence.
This test statistic can also be formulated as
where
The dij are referred to as the standardized residuals and they show the contribution to the chi-square test statistic of each cell.
<SUBSET/EXCEPT/FOR qualification> where <y1> is the first response variable; <y2> is the second response variable; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax is used for the case where you have raw data (i.e., the data has not yet been cross tabulated into a two-way table).
<SUBSET/EXCEPT/FOR qualification> where <m> is a matrix containing the two-way table; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax is used for the case where we the data have already been cross-tabulated into a two-way contingency table.
where <n11> is a parameter containing the value for row 1, column 1 of a 2x2 table; <n12> is a parameter containing the value for row 1, column 2 of a 2x2 table; <n21> is a parameter containing the value for row 2, column 1 of a 2x2 table; and <n22> is a parameter containing the value for row 2, column 2 of a 2x2 table. This syntax is used for the special case where you have a 2x2 table. In this case, you can enter the 4 values directly, although you do need to be careful that the parameters are entered in the order expected above.
CHI-SQUARE INDEPENDENCE TEST M CHI-SQUARE INDEPENDENCE TEST N11 N12 N21 N22
Cochran suggests that if the minimum expected frequency is less than 1 or if 20% of the expected frequencies are less than 5, the approximation may be poor. However, Conover suggests that this is probably too conservative, particularly if r and c are not too small. He suggests that the minimum expected frequency should be 0.5 and at least half the expected frequencies should be greater than 1. In any event, if there are too many low expected frequencies, you can do one of the following:
Note that in all three cases, the test statistic and the chi-square approximation are the same. What differs is the exact distribution of the test statistic. When either the row or column totals (or both) are fixed, the possible number of contingency tables is reduced. As long as the expected frequencies are sufficiently large, the chi-square approximation should be adequate for practical purposes.
Column 2 - column id Column 3 - row total Column 4 - column total Column 5 - expected frequency (Eij) Column 6 - observed frequency (Oij) To read this information into Dataplot, enter
READ DPST1F.DAT ROWID COLID ROWTOT COLTOT ... EXPFREQ OBSFREQ
The ODDS RATIO INDEPDNENCE TEST is an alternative test for independence based on the LOG(odds ratio).
Friendly (2000), "Visualizing Categorical Data", SAS Institute Inc., p. 90. Cochran (1952), "The Chi-Square Test of Goodness of Fit", Annals of Mathematical Statistics, 23, pp. 315-345.
. Example from page 61 of Friendly read matrix m 5 29 14 16 15 54 14 10 20 84 17 94 68 119 26 7 end of data . chi-square independence test mThe following output is generated: CHI-SQUARE TEST FOR INDEPENDENCE (RXC TABLE) NULL HYPOTHESIS: THE TWO VARIABLES ARE INDEPENDENT ALTERNATIVE HYPOTHESIS: THE TWO VARIABLES ARE NOT INDEPENDENT SAMPLE 1: NUMBER OF OBSERVATIONS = 592 NUMBER OF LEVELS (ROWS) = 4 SAMPLE 2: NUMBER OF OBSERVATIONS = 592 NUMBER OF LEVELS (COLUMNS) = 4 WITHOUT YATES CONTINUITY CORRECTION: CHI-SQUARE TEST STATISTIC = 138.2898 DEGREES OF FREEDOM = 9 CDF VALUE OF TEST STATISTIC = 1.000000 WITH YATES CONTINUITY CORRECTION: CHI-SQUARE TEST STATISTIC = 132.0374 DEGREES OF FREEDOM = 9 CDF VALUE OF TEST STATISTIC = 1.000000 WITHOUT YATES CONTINUITY CORRECTION NULL HYPOTHESIS NULL NULL CONFIDENCE CRITICAL ACCEPTANCE HYPOTHESIS HYPOTHESIS LEVEL VALUE INTERVAL CONCLUSION =================================================================== INDEPENDENT 50.0% 8.34 (0,0.500) REJECT INDEPENDENT 80.0% 12.24 (0,0.800) REJECT INDEPENDENT 90.0% 14.68 (0,0.900) REJECT INDEPENDENT 95.0% 16.92 (0,0.950) REJECT INDEPENDENT 97.5% 19.02 (0,0.975) REJECT INDEPENDENT 99.0% 21.67 (0,0.990) REJECT WITH YATES CONTINUITY CORRECTION NULL HYPOTHESIS NULL NULL CONFIDENCE CRITICAL ACCEPTANCE HYPOTHESIS HYPOTHESIS LEVEL VALUE INTERVAL CONCLUSION =================================================================== INDEPENDENT 50.0% 8.34 (0,0.500) REJECT INDEPENDENT 80.0% 12.24 (0,0.800) REJECT INDEPENDENT 90.0% 14.68 (0,0.900) REJECT INDEPENDENT 95.0% 16.92 (0,0.950) REJECT INDEPENDENT 97.5% 19.02 (0,0.975) REJECT INDEPENDENT 99.0% 21.67 (0,0.990) REJECT
Date created: 07/25/2007 |
Last updated: 12/11/2023 Please email comments on this WWW page to alan.heckert@nist.gov. |