Being human, we always have questions about almost everything. It can be about buying a house, a car or simply eating ice cream. We always think, whether chocolate ice cream is better than vanilla or a sports car is better than a simple pickup van. Similarly, businesses and researchers also have questions such as whether the new methodology is better or not, whether the new product will generate more revenue than the existing one, and so on…
To be able to answer such questions, we need to translate them into something known as Hypothesis. Then we need to collect the data through experiments, surveys, and processes. After performing some statistical tests, in the end, we can answer the posed questions.
Hypothesis: It is a methodology or precisely a statistic method to determine the nature of data and to make assumptions from those populated data for business needs, where assumptions are made using probability factors. Like, if we make a statement that “Dhoni is the best Indian Captain ever.” This is an assumption that we are making based on the average wins and losses the team had under his captaincy. We can test this statement based on all the match data.
Now let us go step by step…
Example: Let us consider an example, which makes things easy to understand. There is a company ABC, who wants to know if the new design of the welcome web page results in more website subscriptions or not.
So, let us consider the following notation.
N_new = Average number of users subscribing to the website after receiving a new design of welcome web page
N_old = Average number of users subscribing to the website after receiving an old design of welcome web page
Hypothesis tests are used to understand the population parameters such as mean and standard deviation.
Step 1: Translate the Question into the Hypothesis
The hypothesis is an argument, made as a basis for research…
The question to be answered is translated into 2 competing and non-overlapping hypotheses. We must make the following two types of hypotheses.
a) H₀ : Null Hypothesis
- This is the argument which we believe to be true even before we collect any data. A null hypothesis represents the hypothesis that there is “no relationship” or “no association” or “no difference” between two variables.
- For example, chocolate ice-cream is as tasty as vanilla or a new methodology gives poor or same results as existing methodology.
- Hence, this argument usually contains mathematical operators such as: = , ≤ or ≥
In our example, H₀ = N_new ≤ N_old
b) H₁ : Alternative Hypothesis
- This is the argument which we would like to prove to be true. An alternative hypothesis is the opposite of the null hypothesis where we can find some statistical importance or relationship between two variables.
- For example, chocolate ice-cream tastes better than vanilla or new methodology generates better results than the existing one.
- Hence, this argument often contains mathematical operators such as: ≠, < or >.
In our example, H₁ = N_new > N_old
- If the alternate hypothesis gives the alternate in both directions (less than and greater than) of the value of the parameter specified in the null hypothesis, it is called a Two-tailed test.
- If the alternate hypothesis gives the alternate in only one direction (either less than or greater than) of the value of the parameter specified in the null hypothesis, it is called a One-tailed test.
- >: also known as a right-tailed test;
- <: also known as a left-tailed test;
- ≠: also known as a two-tailed test
Step 2: Determine the Significance Level
Significance level: The significance level is the probability of rejecting the null hypothesis when it is actually true. It is denoted by alpha (α). It is determined before conducting the experiment. Usually, the alpha is set to be 5% or 1% that is the probability factor to be 0.05 or 0.01. The significance level varies depending on the business problem statement.
Confidence level: It is a 1-significance level, used to show how confident you are about your conclusion. In our example, there will be a 5% probability that N_new > N_old, when in reality N_new ≤ N_old
Step 3: Calculate the p-Value
The p-value is the probability of observing the results of the Null Hypothesis. The significance level is the target value, which should be achieved if we want to retain the Null Hypothesis. The p-value is calculated based on the sample data. It evaluates how well the sample data support the null hypothesis.
A lower p-value means the population or the entire data has strong evidence against the null hypothesis.
A higher p-value indicates that the sampled data is really supporting the null hypothesis. In other words, there is not enough evidence in sampled data to reject the null hypothesis.
Make Decision: To determine which hypothesis to retain, the p-value is compared with the significance level.
- p-value < significance value, we reject the null hypothesis.
- p-value > significance value, we fail to reject the null hypothesis.
The p-value is calculated using the sampling distribution of the test statistic under the null hypothesis, the sample data, and the type of test being done (lower-tailed test, upper-tailed test, or two-sided test).
The p-value for:
- a lower-tailed test is specified by: p-value = P(TS ts | H0 is true) = cdf(ts)
- an upper-tailed test is specified by: p-value = P(TS ts | H0 is true) = 1 – cdf(ts)
- assuming that the distribution of the test statistic under H0 is symmetric about 0, a two-sided test is specified by: p-value = 2 * P(TS |ts| | H0 is true) = 2 * (1 – cdf(|ts|))
P – Probability of an event
Ts – The observed value of the test statistic calculated from your sample
Cdf () – Cumulative distribution function of the distribution of the test statistic (TS) under the null hypothesis
TS – Test statistic: Do check out our article on Test Statistics here.