SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 2 Vol 1

CRAMER CONTINGENCY COEFICIENT

Name:
    CRAMER CONTINGENCY COEFICIENT (LET)
Type:
    Let Subcommand
Purpose:
    Compute Cramer's contingency coefficient for an RxC contingency table.
Description:
    If we have N observations with two variables where each observation can be classified into one of R mutually exclusive categories for variable one and one of C mutually exclusive categories for variable two, then a cross-tabulation of the data results in a two-way contingency table (also referred to as an RxC contingency table). The resulting contingency table has R rows and C columns.

    A common question with regards to a two-way contingency table is whether we have independence. By independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the row variable will not help us predict the value of column variable and likewise knowing the value of the column variable will not help us predict the value of the row variable).

    A more technical definition for independence is that

      P(row i, column j) = P(row i)*P(column j)       for all i,j

    The standard test statistic for determing independence is the chi-square test statistic:

      \( T = \sum_{i=1}^{r}{\sum_{j=1}^{c}{\frac{O_{ij} - E_{ij}} {E_{ij}}}} \)

    One criticism of this statistic is that it does not give a meaningful description of the degree of dependence (or strength of association). That is, it is useful for determining whether there is dependence. However, since the strength of that association also depends on the degrees of freedom as well as the value of the test statistic, it is not easy to interpert the strength of association.

    The Cramer's contingency coefficient is one method to provide an easier to interpret measure of strength of association. Specifically, it is:

      \( \mbox{Cramer's Coefficient} = \sqrt{\frac{T}{N(q -1}} \)

    where

      T = the chi-square test statistic given above
      N = the total sample size
      q = minimum(number of rows,number of columns)

    This statistic is based on the fact that the maximum value of T is:

      N (q - 1)

    So this statistic basically scales the chi-square statistic to a value between 0 (no association) and 1 (maximum association). It has the desirable property of scale invariance. That is, if the sample size increases, the value of Cramer's contingency coefficient does not change as long as values in the table change the same relative to each other.

    The data for the contingency table can be specified in either of the following two ways:

    1. raw data

      In this case, you will have two variables. The first will contain r distinct values and the second will contain c distinct values. Dataplot will automatically perform the cross-tabulation to obtain the counts for each cell. Although the distinct values will typically be integers, this is not strictly required.

    2. table data

      If you only have the resulting contingency table (i.e., the counts for each cell), then you can use the READ MATRIX (or CREATE MATRIX) command to create a matrix with the data. This is demonstrated in the example program below.

      In this case, your data should contain non-negative integers since they represent the counts for each cell.

Syntax 1:
    LET <par> = CRAMER CONTINGENCY COEFICIENT <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <par> is a parameter where the computed Cramer contingency coefficient is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    Use this syntax for raw data.

Syntax 2:
    LET <p> = MATRIX GRAND CRAMER CONTINGENCY COEFICIENT <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <m> is a matrix containing the contingency table;
                <p> is a parameter where the computed Cramer contingency coefficient is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    Use this syntax if your data is a contingency table.

Examples:
    LET A = CRAMER CONTINGENCY COEFICIENT Y1 Y2
    LET A = MATRIX GRAND CRAMER CONTINGENCY COEFICIENT M
Note:
    For the raw data case, the two variables should have the same number of elements.
Note:
    Dataplot statistics can be used in a number of commands. For details, enter

    Note that these commands are only available if you have raw data.

Default:
    None
Synonyms:
    None
Related Commands: Reference:
    Conover (1999), "Practical Nonparametric Statistics", Third Edition, Wiley, pp. 229-230.

    Friendly (2000), "Visualizing Categorical Data", SAS Institute Inc., p. 61.

Applications:
    Categorical Data Analysis
Implementation Date:
    2007/5
Program:
     
    . Example from page 61 of Friendly
    read matrix m
     5  29 14 16
    15  54 14 10
    20  84 17 94
    68 119 26 7
    end of data
    .
    let a = matrix cramer contingency coefficient m
        
    The result is 0.279.

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 07/24/2007
Last updated: 11/02/2015

Please email comments on this WWW page to alan.heckert@nist.gov.