Section 3 Data management
3.1 Data structure
Usually data is stored in 2-dimensonal tables, i.e. data has rows and columns. To apply statistical procedures, most software requires data to be formatted in a conventional way. In this respect it’s useful to follow “tidy data” principle (Wickham 2014):
- Every column is a variable.
- Every row is an observation.
- Every cell is a single value.
This is how data should be formatted before running most statistical procedures. One way to tidy a data table is to use the Pivot table feature in a spreadsheet app.
3.2 Scales of measurement
Scale of measurement determines the procedures that can be applied to the data.
3.2.1 Steven’s operational theory of measurement
The following typology was first published by Stevens (1946). See Navarro and Foxcroft (2018) section 2.2 for a more detailed explanation. Each scale in the table includes also the properties, operations and measures a on previous rows.
Scale | Property | Operations | Central tendency |
---|---|---|---|
Nominal | Classification | =, ≠ | Mode |
Ordinal | Level | >, < | Median |
Interval | Difference | +, − | Arithmetic mean |
Ratio | Magnitude | ×, / | Geometric mean, harmonic mean |
3.2.1.1 Nominal scale
Nominal variables contain names (i.e. characters, factors or strings) that do not have a natural order. Such data can be summarized only by counting values or determining the mode.
3.2.1.2 Ordinal scale
Ordinal variables have the characteristics of nominal variables with the added possibility of naturally ordering these names. As such, ordinal variables can be said to have levels.
It’s important to note that an increase of one level to the next is not necessarily numerically equivalent to an increase of another level to the next. Thus, it is not always meaningful to convert these levels to numbers and do calculations on them.
3.2.1.3 Interval scale
Interval variables are expressed numerically. While differences between numbers on interval scale are meaningful, there isn’t a natural zero value. Thus, calculations on interval variables is limited to finding differences and division or multiplication of such values is not reasonable.
A classic example of an interval variable is temperature. Difference between 5°C and 15°C is 10°C but 15°C is not 3 times warmer than 5°C.
3.2.2 Numeric and categorical variables
Sometimes variables expressed as numbers are also referred to as numeric variables. These can be variables measured on either interval or ratio scale, rarely on ordinal scale and never on nominal scale.
Categorical variables are usually measured on a nominal scale.
3.2.3 Binary scale
Binary (or dichotomous) variables do not constitute a distinct scale but can be considered as a subgroup of nominal, ordinal or interval variables. Binary variables can only take two values. Many statistical procedures convert or require the conversion of nominal variables to binary for technical reasons (e.g. “dummy” variables). Binary variables are often coded as 0 for false and 1 for true. As such, calculating the sum or mean of binary variables conveys useful information (think why that is!).
3.2.4 Continuous and discrete variables
In addition to the previous typology, we can also distinguish between continuous and discrete variables. These are well defined by Navarro and Foxcroft (2018, 20):
- “A continuous variable is one in which, for any two values that you can think of, it’s always logically possible to have another value in between.
- A discrete variable is, in effect, a variable that isn’t continuous. For a discrete variable it’s sometimes the case that there’s nothing in the middle.”
3.2.5 Scales in statistical software
A lot of statistical software (e.g. Jamovi) distinguishes between numbers, text and ordered text. The difference between ratio and interval variables is only theoretical. Statistical software treats all numbers as integers or floating-point numbers and does not distinguish between interval and ratio variables. When binary variables are coded so that they only have values 0 and 1, these are also considered numerical.