Section 3 Data management

3.1 Data structure

Usually data is stored in 2-dimensonal tables, i.e. data has rows and columns. To apply statistical procedures, most software requires data to be formatted in a conventional way. In this respect it’s useful to follow “tidy data” principle (Wickham 2014):

  1. Every column is a variable.
  2. Every row is an observation.
  3. Every cell is a single value.

This is how data should be formatted before running most statistical procedures. One way to tidy a data table is to use the Pivot table feature in a spreadsheet app.

3.2 Scales of measurement

Scale of measurement determines the procedures that can be applied to the data.

3.2.1 Steven’s operational theory of measurement

The following typology was first published by Stevens (1946). See Navarro and Foxcroft (2018) section 2.2 for a more detailed explanation. Each scale in the table includes also the properties, operations and measures a on previous rows.

Scale Property Operations Central tendency
Nominal Classification =, ≠ Mode
Ordinal Level >, < Median
Interval Difference +, − Arithmetic mean
Ratio Magnitude ×, / Geometric mean, harmonic mean

3.2.1.1 Nominal scale

Nominal variables contain names (i.e. characters, factors or strings) that do not have a natural order. Such data can be summarized only by counting values or determining the mode.

3.2.1.2 Ordinal scale

Ordinal variables have the characteristics of nominal variables with the added possibility of naturally ordering these names. As such, ordinal variables can be said to have levels.

It’s important to note that an increase of one level to the next is not necessarily numerically equivalent to an increase of another level to the next. Thus, it is not always meaningful to convert these levels to numbers and do calculations on them.

3.2.1.3 Interval scale

Interval variables are expressed numerically. While differences between numbers on interval scale are meaningful, there isn’t a natural zero value. Thus, calculations on interval variables is limited to finding differences and division or multiplication of such values is not reasonable.

A classic example of an interval variable is temperature. Difference between 5°C and 15°C is 10°C but 15°C is not 3 times warmer than 5°C.

3.2.1.4 Ratio scale

Ratio variables are also numeric but have a natural zero value. This means that division and multiplication is meaningful.

3.2.2 Numeric and categorical variables

Sometimes variables expressed as numbers are also referred to as numeric variables. These can be variables measured on either interval or ratio scale, rarely on ordinal scale and never on nominal scale.

Categorical variables are usually measured on a nominal scale.

3.2.3 Binary scale

Binary (or dichotomous) variables do not constitute a distinct scale but can be considered as a subgroup of nominal, ordinal or interval variables. Binary variables can only take two values. Many statistical procedures convert or require the conversion of nominal variables to binary for technical reasons (e.g. “dummy” variables). Binary variables are often coded as 0 for false and 1 for true. As such, calculating the sum or mean of binary variables conveys useful information (think why that is!).

3.2.4 Continuous and discrete variables

In addition to the previous typology, we can also distinguish between continuous and discrete variables. These are well defined by Navarro and Foxcroft (2018, 20):

  • “A continuous variable is one in which, for any two values that you can think of, it’s always logically possible to have another value in between.
  • A discrete variable is, in effect, a variable that isn’t continuous. For a discrete variable it’s sometimes the case that there’s nothing in the middle.”

3.2.5 Scales in statistical software

A lot of statistical software (e.g. Jamovi) distinguishes between numbers, text and ordered text. The difference between ratio and interval variables is only theoretical. Statistical software treats all numbers as integers or floating-point numbers and does not distinguish between interval and ratio variables. When binary variables are coded so that they only have values 0 and 1, these are also considered numerical.

References

Navarro, Danielle J, and David R Foxcroft. 2018. Learning Statistics with Jamovi: A Tutorial for Psychology Students and Other Beginners. Danielle J. Navarro; David R. Foxcroft. https://doi.org/10.24384/HGC3-7P15.
Stevens, S. S. 1946. “On the Theory of Scales of Measurement.” Science 103 (2684): 677–80. https://doi.org/10.1126/science.103.2684.677.
Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (1): 1–23. https://doi.org/10.18637/jss.v059.i10.