Data is the most powerful force in the world today. When used correctly, it has the potential for making revolutionary changes. Hence, there is an imminent need to understand what data is and the various types of data.
In the simplest terms possible, Data is any kind of information. Yes, that’s it. There is no obligation for data to be useful to us. You could just write down the times at which the pressure cooker’s whistle goes off and that would be considered data. It’s that simple!
Hierarchical view of Types of Data
Now comes the fun part. Take a look at the hierarchy below.
Seems a little complex? Guess what? It’s not! Let me show you how! Every possible kind of data can be classified under one of the following nodes in the diagram. Let us now go through each of these nodes:
Quantitative Data vs Qualitative Data
An easy trick I use to remember the differences between these two is to separate the first 2 syllables from the rest of the word. That would leave us with “Quali” and “Quanti”. “Quali” suggests Quality. Qualitative data deals with types and categories of data. Categorical data answering questions like how? and why? fall under this category. “Quanti” suggests Quantity. Quantitative data deals with numbers. Numeric data answering questions like how many? and how much? fall under this category.
Types of Qualitative Data
Categorical data that cannot be compared and has no order falls under this category. For example, consider gender. Gender is categorical data and Female, Male, Transgender are its categories. These categories cannot be compared to each other – No category is greater than the other. Another good example is the jersey numbers on the backs of cricketers. They only serve the purpose of differentiating between the players and in no way reflect their skills in the game.
Categorical data that can be compared and ordered in a meaningful manner falls under this category. For example, consider grades. Grades are categorical data and A, B, C, D are its categories. Assuming the conventional notation, these grades can be ordered in a meaningful way – A>B>C>D.
Types of Quantitative Data
Any numeric variable that is discrete or distinct and can be counted is called Discrete data. For example, the number of integers between 1-10 is discrete, i.e., it can be counted. It is a finite value -There are 10 integers between 1-10.
Any numeric variable that is continuous and can be measured is called Continuous data. For example, the number of rational numbers between 1-10 is continuous, i.e., it can be measured. It is an infinite value – There are infinite rational numbers between 1-10 (1.23, 5.989, and so on).
Types of Continuous Data
Just like ordinal data, interval data is also ordered. However, unlike ordinal data, interval data requires the presence of equal intervals between adjacent categories. For example, consider temperature. The difference between 5 degrees and 10 degrees is the same as 105 degrees and 110 degrees.
Ratio data has all the features of Interval data along with something called a True Zero Point. For example, consider weight. The difference between 5kg and 10kg is the same as 105kg and 110kg. Moreover, 0kg would mean an absence of weight. But 0 degrees does not mean there is no heat.
Let’s how well I trained you! Categorize the following variables into one of these categories – Nominal, Ordinal, Interval, Ratio.
(Answers are given at the end of this article)
- The Marital status
- The positions finished by players in a 100-meter sprint
- The results of IQ scores
- The Age of a person
- The Year
Statistical Tests and Measures
For any of you who are willing to learn more, here’s the section for you. I have displayed brief definitions on the statistical tests associated with the variables and sources to learn more about them.
Short for Analysis of Variance, it is used as a significance testing method to see how Nominal variables affect a Quantitative Variable. This is a good place to dive a little deeper into this – Here
This is also a significance testing method that uses the difference between the expected and observed values to make conclusions when the variables at hand are Qualitative. This is a good place to learn more about it – Here
This is a statistical measure that tells us the relationship between variables and how they affect each other. It can be used to understand the relationship between the interval and ratio data. For a more mathematical understanding of this, refer here.
- Marital Status could be either Married or Single, and it can’t be ordered. So it is a Nominal variable.
- This is a tricky one. The positions would be something like 1st, 2nd 3rd. We see numbers, but does that always mean it’s quantitative? Let’s look at it this way. Do these numbers indicate categories or numeric values? Notice the word positions. This gives it a sort of ordering and categorization. This is a classic example of an Ordinal variable.
- The results of an IQ test are maybe from 0 to 150 or 160. We don’t need to know the exact number. Some example test scores are 95, 102.5, 140, etc. We cannot expect each test score to be a category. Sure they might be a range of test scores divided into categories but since that’s not explicitly mentioned, we consider this to be quantitative data. Since there are infinite possibilities of test scores, it is Continuous data. Now we are left with the options ratio or interval. So we look at the one distinction between them – True Zero Point. Does 0 IQ mean an absence of Intelligent Quotient (IQ)? No, it does not! The person will still have intelligence to an extent. It’s just that the test rated him as 0. So, it falls under Interval Data.
- The Age is a number, that has equal intervals. So it is continuous data. So it again comes down to the True Zero Point. When we say the age of a person is 0, does it mean the absence of age? Yes I know it’s a weirdly phrased sentence but think about it and compare it to the IQ example above. 0 IQ does not mean no IQ, but 0 age means no age, i.e., the person has no age which means he doesn’t exist or hasn’t been born yet. So there is a true zero point and hence it is a Ratio variable.
- A year is again a number that has equal intervals. So it is continuous data and yet again it all comes down to the True Zero Point. Does Year 0000 mean the absence of years? No! Year 0000 is still a year because it is the boundary for BC (Before Christ) and AD (Anno Domini). So there is no true zero point. And yes, that makes it an interval variable.
Full disclosure: I work for https://www.ml-concepts.com/ and this article was first published there