This is another statistical method that’s commonly used for testing relationships between categorical variables. Therefore, it’s suited for categorical variables and binary targets only, and the variables should be non-negative and typically Boolean frequencies or counts.
What it does is simply compare the observed distribution between various features in the dataset and the target variable.
How do you calculate the chi-square:
Let’s learn the use of chi-square with an intuitive example using the Titanic dataset.
1 — Get the sum of Male and Female with the Survived and not Survived Categories
Expected frequency is the sum of male and Female
2 — Calculate the frequencies by observations/total in each column
3- In the Green is the expected Frequency and we can clearly see that the Female and Male Real Frequencies don’t match that.
Hence the Hypothesis that Male and females had equal survival rates is false
4 – Sum of eg (0.19 -0.38) squared / 0.38 + (0.81 -0.62) squared / 0.62……….. n numbers.
5 — Once you have this you can put it in distribution and compare it with a known distribution of chi-square
Best Used for – Categorical which are Boolean, Frequency and Counts that are non-negative