Section 6 Comparing categorical data

6.1 Contingency tables

To test whether or not categorical values of a variable are random, we can compare the counts of categories. This gives us contingency tables where cells indicate how many times each value appears in a categorical variable. When we compare more than one variable, contingency tables have more than one dimension. Usually we tabulate two variables against each other. If one variable has \(m\) and another variable has \(n\) unique values (categories), then we obtain a 2-dimensional \(m*n\) contingency table.

6.2 \(\chi^2\)-test

6.2.1 Goodness of fit \(\chi^2\)-test

For goodness of fit \(\chi^2\)-test see Navarro and Foxcroft (2018) section 10.1.3.

We test if frequencies of categories are different from some expected frequencies for a single categorical variable:

\(H_0\): Frequencies of categories are as expected.
\(H_1\): Frequencies of categories are different from what is expected.

The test statistic is calculated as

\[\chi^2 = \sum{^k_{i=1}{\frac{(O_i-E_i)^2}{E_i}}},\]

where O is the observed (empirical) frequency and E the expected (theoretical) frequency of \(i\)th category (\(k\)).

In Jamovi: Frequencies > N Outcomes
In R: chisq.test(x)

6.2.2 \(\chi^2\)-test of independence

For \(\chi^2\)-test of independence see Navarro and Foxcroft (2018) section 10.2.

In this case we test if two categorical variables are associated:

\(H_0\): Variables are independent.
\(H_1\): Variables are associated.

The test statistic is calculated as

\[\chi^2 = \sum{^r_{i=1}\sum{^c_{j=1}}{\frac{(O_{ij}-E_{ij})^2}{E_{ij}}}},\]

where O is the observed (empirical) frequency and E the expected (theoretical) frequency in \(i\)th row (\(r\)) and \(j\)th column (\(c\)).

An assumption of \(\chi^2\)-test is that there are \(>5\) observations in each cell of a contingency table. If there are less, then the test results are not accurate.

In Jamovi: Frequencies > Independent Samples
In R: chisq.test(x, y)

6.3 Degrees of freedom

See Navarro and Foxcroft (2018) section 10.1.5.

Degrees of freedom (DoF) indicates the number of values that are free to vary. This is relevant for evaluating the \(\chi^2\)-test statistic as it indicates the critical value on \(\chi^2\)-distribution.

For goodness of fit \(\chi^2\)-test DOF is calculated simply as \(df = k - 1\) where \(k\) is the number of categories.

For \(\chi^2\)-test of independence DoF is calculated as

\[df = (r - 1)(c - 1),\]

where \(r\) is the number of rows and \(c\) the number of columns in a contingency table.

6.3.1 Interpretation of test results

Suppose that a \(\chi^2\)-test indicates that p-value is \(\le \alpha\). There are several ways in which we could interpret this result:

  • at least one of the frequencies is different from what is expected,
  • there is a statistically significant relationship between the two variables (\(\chi^2\)-test of independence), or
  • one variable has a statistically significant effect on another (based on theory).

References

Navarro, Danielle J, and David R Foxcroft. 2018. Learning Statistics with Jamovi: A Tutorial for Psychology Students and Other Beginners. Danielle J. Navarro; David R. Foxcroft. https://doi.org/10.24384/HGC3-7P15.