Types of Missing Data
Below are the different types of missing data generally found in machine learning problems
1. MCAR(Missing completely at random)
These values do not depend on any other features. In this case, there may be no pattern as to why a column’s data is missing. For example, survey data is missing because someone could not make it to an appointment, or an administrator misplaces the test results he is supposed to enter into the computer. The reason for the missing values is unrelated to the data in the dataset.
2. MAR(Missing at random)
These values may be dependent on some other features. In this scenario, the reason the data is missing in a column can be explained by the data in other columns. For example, a school student who scores above the cutoff is typically given a grade. So, a missing grade for a student can be explained by the column that has scored below the cutoff. The reason for these missing values can be described by data in another column.
3. MNAR(Missing not at random)
These missing values have some reason for why they are missing. Sometimes, the missing value is related to the value itself. For example, higher-income people may not disclose their incomes. Here, there is a correlation between the missing values and the actual income. The missing values are not dependent on other variables in the dataset.