Agency Statistical Consulting

Christopher W. Ryan, MD, MS, MSPH

Helping those in public service get the most from their data

Muppets, statistics, and levels of measurement (28 March 2023)

Consider the following table of data:

Muppet measurements (with some liberties taken)
Name Color Size Height (in cm)
Kermit green medium 20
Big Bird yellow large 250
Elmo red small 15
Fozzie Bear brown medium 30
Snuffleupagus brown large 180

What is the average color of this sample of Muppets? Average size? Average height? Each of these measurements requires a different type of "average," because each is made at a different level of measurement.

Nominal variables

In this sample, color is a nominal variable, also sometimes called "categorical." It consists of names, specifically the names of colors. You can't do math with names. Even if you change the names to numbers, they are still just labels. The best you can do is determine the modal category--the most common one. In this sample, brown. Or better yet, show a table with the frequency of each color.

Ordinal variables

Size here is an ordinal variable. It consists of names or words, true, but they are more than just categories; they imply a certain order. The words convey how much size that Muppet has. We know that small Muppets "have less size" than medium Muppets, which in turn "have less size" than large Muppets. But we don't know how much bigger medium is than small, or how much bigger large is than medium. So an ordinal variable contains more information than a nominal variable but still not enough to do arithmetic with it. Best you can do is line up all the sizes in order, and pick the middle one as the "average" size. That's called the median. The important thing to understand here is that slapping on number labels, like 1 for small, 2 for medium, and 3 for large, does not change anything---they are still just labels, and it is still not valid to do arithmetic with them.

Interval/Ratio variables

Interval/ratio variables contain even more information. Height here is an interval variable, and beyond that it is a ratio variable. With interval/ratio variables, the difference between adjacent "notches" or values always represents the same amount. So 4 cm is 1 more than 3 cm, just as 400 cm is 1 more than 399. This is different from an ordinal variable, where we have no reason to assume that the difference between adjacent levels is the same.

Interval variables have no meaningful zero. Take temperature for example. We commonly think of it as measuring how hot something is, but technically it measures the speed with which the molecules are moving---they move faster when they are warmer. The Celsius scale sets its zero at the freezing point of water, but those water molecules are still moving---they don't have "zero motion" at that point. The Farenheit scale sets its zero point 32° F lower than the freezing point of water. So the zero point is arbitrary.

Ratio variables contain even more information than interval variables. A ratio variable is an interval variable with a zero point that means the absence of the thing that is being measured. The zero on the Kelvin temperature scale means the complete absence of molecular motion---the thing that temperature measures. The molecules are stationary. So degrees Kelvin is a ratio variable.

Sometimes the distinction between a ratio variable and an interval variable is subtle. Fortunately it is rarely necessary to distinguish them. They are similar enough that they are often considered together.

And finally we can do math! The mean is a valid summary measure of an interval/ratio variable. So we can calculate the mean height of the Muppets in this sample, by adding all the heights and dividing by the number of Muppets in our table.

Summary: Levels of measurement
Level Information content Examples
Ratio most systolic BP, heart rate, vehicle speed
Interval Celsius or Farenheit temperature
Ordinal the classic mild/moderate/severe; letter grades
Nominal least occupation; census tract; religious affiliation

Why this matters

You can always go "down" in levels of measurement. For example, if you have recorded for a sample of students the number of questions on an exam that they answered correctly (a ratio variable) you could categorize them into letter grades (ordinal). But you cannot go "up:" if all you have recorded are letter grades, you cannot re-create the number of correct answers. Whenever possible, collect and record your data at the highest level of measurement that's feasible. And during analysis, it is generally best to avoid categorizing interval/ratio variables into ordinal or nominal by setting cut-points, especially if those cut-points are arbitrary (made up). You lose information that way. There may be exceptions when a cut-point has some substantive meaning, like the social/legal meaning our society gives to being less than or greater than 18 (or 21) years old.


Agency Statistical Consulting

PO Box 181

Johnson City, NY 13790

cwr@agencystatistical.com