How to Write Null Hypothesis and Alternative Hypothesis
Null and Alternate Hypothesis
… in simple plain English! Let's get the basics clear.
Statistics sounds intimidating because most of the text we read defines terms using difficult statistical words. 😩 When we start reading a concept with its definition we hardly visualize it in our mind and without a visual picture, we always remain confused. 🙇🏻 This often leads us to google the same concept multiple times. 😰
In this article, I will try to explain the concept of null and alternate hypothesis in a simple way. So, let's get started. 🏄
Suppose, we have two d rugs, Drug A and Drug B, to treat flu. 💊 We give Drug A to treat 6 people with flu and Drug B to treat 6 different people with flu and measure the time (in days) these people take to recover from the flu. 🤕 As we know that every person is different, we are already aware of the fact that not everyone will recover in the same number of days. Due to this, we will look at the average time and not the individual time.
Here are our statistics from the test:
It is quite easy to find the mean (average) value for each drug. After doing our calculation, we found that people who took Drug A took an average of 3 days to recover whereas people who took Drug B took an average of 5.67 days to recover. 💪🏼
We can clearly see that there is a difference of 2.67 days in the recovery time and that is huge. While we know that each person has a different immunity and a different lifestyle (eating, exercising, work, etc.) which can lead to such results, it is most probable for us to say that Drug A is better than Drug B as it helped people recover faster. As a result, we might start taking this drug when we get flu the next time. 🤒
This data leads us to form a hypothesis that the recovery time with Drug A is 2.67 days less than the recovery time with Drug B.
Note that this is just a toy example and in reality, we test a lot more people before forming any hypothesis.
Now suppose we test these two drugs on a different set of people and find that the recovery time with Drug B is 1.5 days less than the recovery time with Drug A. Well, this is now entirely different from our earlier hypothesis. So, we go ahead and try these drugs on another set of people. This time we find that the recovery time with Drug A is 4 days less than the recovery time with Drug B. If we continue doing this exercise, we might get similar results, or quite possible that we might get entirely different results.
At this point, statisticians take an important step, and instead of testing the formed hypothesis again and again on a different set of people, they actually design a new hypothesis.
In this case, the new hypothesis will be:
There is no difference in recovery time between Drug A and Drug B. And they name such a hypothesis as a null hypothesis to determine if there is any difference or not.
The null hypothesis is tested against a formed hypothesis which in this case is that the recovery time with Drug A is 2.67 days less than the recovery time with Drug B. This hypothesis is named as an alternate hypothesis.
To do our hypothesis testing, we sample a lot of people. Now suppose we found that there is actually a difference in the recovery time between both the drugs. (Though there can be many hidden reasons behind this result but for now we will not consider those.) In such a case, we reject our null hypothesis. 🙅
Let us see what would have been done if the results were opposite. Suppose we sample a lot of people and find that there is no difference in the recovery time between both the drugs. In this case, we would fail to reject the null hypothesis. 🙆
A good question to ask at this point is why do we not accept the null hypothesis based on our observation. 🙋
Null hypotheses are never accepted. We either reject them or fail to reject them. The reason is simple. There is always randomness in the sample we take. We never test on the entire population and always take a small sample from the population to form our hypothesis.
So, if our randomly selected sample shows results different from the null hypothesis, we are all set and things get easy for us. We simply reject the null hypothesis.
However, if we get results similar to our null hypothesis, we fail to reject the null hypothesis indicating that not enough evidence is available to suggest that the null hypothesis is false. We do this because there is still a lot of data about the population which we do not have.
Both null and alternate hypotheses are opposite of each other. The main purpose of hypothesis testing is to reject or fail to reject the null hypothesis as statisticians never accept it even if the sample they took is favoring the null hypothesis and opposing the alternate hypothesis.
For any statistical test, we need data, null hypothesis, and alternate hypothesis.
We generally denote null hypothesis by H₀ and alternate hypothesis by Hₐ.
How to state the null hypothesis? ✍️
We generally convert a word problem into a statistical hypothesis test by defining the null and alternate hypotheses.
Let us consider our above example to write both the hypothesis.
An alternate hypothesis is extracted from the word problem and we then write it into a mathematical form.
Alternate hypothesis:
The recovery time with Drug A is 2.67 days less than the recovery time with Drug B.
Let us denote this difference in the average recovery time using μ.
Hₐ: μ = 2.67
Null hypothesis:
Now, in order to write the null hypothesis, we state what will happen if the hypothesis doesn't come true. Based on our alternate hypothesis, we have two possibilities:
μ >2.67 or μ <2.67
Any of these possibilities can form a null hypothesis. However, since the statisticians don't have an idea about the truth, they generally frame it by saying that there is no difference in the recovery time between Drug A and Drug B. The mathematical form for the same will be:
H₀: μ =0
What is 'null' in the null hypothesis? 🤔
As far as I understood, the null hypothesis doesn't mean that the hypothesis is null rather it is called null because statisticians work to nullify this hypothesis. Quite possible that there can be some other hidden reason behind it.
The concept of the null and alternate hypothesis is not only applicable in the field of statistics but in other fields as well. Researchers and scientists form these hypotheses before conducting any statistical test. 🔭 🔬 💊 💉 🌡
I hope this gave you an idea about the null and alternate hypotheses. There is a concept of the confidence interval and p-value related to this but for this article, I didn't deep dive into that. My main aim was to make you all understand the concept of null and alternate hypotheses in easy words and not in statistical terms. 😊
References:
https://www.youtube.com/playlist?list=PLblh5JKOoLUK0FLuzwntyYI10UQFUhsY9
How to Write Null Hypothesis and Alternative Hypothesis
Source: https://towardsdatascience.com/null-and-alternate-hypothesis-c5b6ae9d7845