Statistical Testing Methods in Machine Learning

Statistical testing methods are used in machine learning to ensure that a finding is statistically significant. Hypothesis testing most commonly termed statistical testing evaluates two mutually exclusive statements about a population and determines which among the two statements best describe the experimental/sample data.

Hypothesis testing is a statistical method to determine whether two samples have significant difference or it follows the same distribution. Hence it is often referred to as statistical testing.

Parameters in Hypothesis/Statistical Testing

Null Hypothesis(H0)

Null hypothesis is often a basic assumption formed based on domain knowledge. It is a default position or general statement given which implies that no relationship exists among two measured cases.

Example: There is difference in the productivity of employees based on gender.

Alternate Hypothesis (Ha)

The assumption made in contrast to the null hypothesis in a hypothesis test is called an alternate hypothesis. Alternate hypothesis directly contradicts with the null hypothesis. Hence the alternate hypothesis for aforementioned example statement in null hypothesis is, there is no difference in the productivity of employees based on gender.

The possibility of the likelihood of the null hypothesis being true will be tested and based on the result, we can decide whether to accept the alternate hypothesis or not

Critical Value

The value on the scale of test statistics beyond which the null hypothesis is rejected is called critical value. The critical value being higher indicates lower the probability that the two samples belong to the same distribution.

The p-value and degree of freedom are the two critical values usually employed in machine learning.


Probability value or p-value reveals how likely that an outcome occurred by chance alone. The smaller values of the p-value can be considered as a reason to reject the null hypothesis. If the p-value is greater than 0.05, we accept the null hypothesis, else consider the alternate hypothesis.

Degree of Freedom

The number of independent variables present is called degree of freedom.

Different Statistical tests to analyze the Hypothesis


In the Z test, the mean of two samples and standard deviation are used to infer whether the samples belong to the same distribution or not. Z test cannot be followed in cases where the sample size is less than 30. Also, the data has to be independent of each other and randomly get picked from a population so that there is equal probability for each data point to get selected. Z test prefers equal sample size in case possible.

Z test

Infographics of statistical testing method of Z test in machine learning ( Source:

In the method, the sample is assumed to be normally distributed ( as shown in Fig. 1 ) and the z score is calculated from the statistical parameters such as population mean and population standard deviation.

The z-score is calculated with the formula,

$$z = \frac{x-\mu}{\frac{\sigma}{\sqrt{n}}}$$,

where $x$ is sample mean, $\mu$ is population mean, and $\frac{\sigma}{\sqrt{n}}$ is population standard deviation.

The z-score is used to validate that the sample belongs to same population distribution or not. The null hypothesis and the alternate hypothesis are as follows,

Null Hypothesis: Sample mean and population mean are same. Alternate Hypothesis: Sample mean and population mean are not same. If the z-score is greater than the critical value, we reject the null hypothesis and consider the alternate hypothesis.


The T-test is another statistical test used for hypothesis testing in machine learning. Similar to Z-test, the T-test also derives information from the normal distribution of samples. Unlike, Z-test, here the test statistics like mean and standard deviation are unknown. The score $t$ for t-test statistics is calculated by,

$$t = \frac{(x_1-x_2)}{\frac{\sigma}{\sqrt{n_1}}+\frac{\sigma}{\sqrt{n_2}}}$$,

Where, $x_1$ and $x_2$ are mean of sample 1 and sample 2 respectively. $\sigma$ is the standard deviation and $n_1$ and $n2$ are the size of the corresponding samples. 

There are three different kinds of T-tests as follows,

  1. The simple T-test which test the mean of a sample in comparison with a known mean
  2. The T-test which perform the test on two independent samples from different groups
  3. The T-test on paired samples which compares the mean of same group or samples calculated at different times


F-test is a statistical method used to compare two variances but its usage expands to a wide variety of tests including regression analysis such as Scheffler’s test and Chow test.

Methodology of F-test to compare two variances

The F-test uses $F$ statistics to analyze two variances $v_1$ and $v_2$ using the formula,

$F=v_1/v_2$, If we have two sample dataset with same variance, the $F$ value will be 1. 

While running an F-test it is always assumed that the population variances are same, so that the null hypothesis is that the variances of the two sample are equal. In order to perform an F-test, the populations must follow normal distribution and the samples has to be independent events. 

In general, an F-test follows the following steps,

  1. Fix the critical value or F-statistics for the test: F-statistics = variance of the sample means/ variance of within sample means
  2. Accept or reject the null hypothesis

Chi-square Test

Chi-square test is used to analyze the independence of two categorical variables. In a more generic sense, it is a statistical procedure to find the difference between observed and expected outcome. We can drag inferences regarding the difference between an expected and observed outcome by chance or due to any link between the categorical variables under consideration.

The formula use for Chi-square test is,

$${x_c}^2 = \frac{\Sigma {(O_i – E_i)}^2}{E_i}$$

Where, $c$ is the degree of freedom that can reveal the number of variables which can vary in an estimation. $O$ is the observed value and $E$ is the expected value.

The null hypothesis and alternate hypothesis are as follows,

Null Hypothesis: The two categorical variables are independent

Alternate Hypothesis: The two categorical variables are not independent

A possibility table containing chi square statistics for the independent variables under study will be created. A small Chi-square value indicates the data fits well and a high Chi-square value indicates the data doesn’t fit well. The critical value will be calculated from the Chi-square distribution. By comparing the Chi-square statistics with the critical value, one can decide whether to accept the null hypothesis or consider the alternate hypothesis.

Analysis Variance (ANOVA)

ANOVA is a statistical method used to compare multiple samples in a single test. The hypothesis being tested in ANOVA looks like,

Null Hypothesis: All samples look alike

Alternate Hypothesis: At least one among the samples looks different

There are two types of ANOVA, called one way ANOVA and MANOVA. In one way ANOVA, three or more samples of the same population will be compared. Whereas MANOVA offers the possibility to compare one or more independent samples corresponding to three or more dependent populations. The test statistics in ANOVA will be decided based on the type of variable.

In all the Hypothesis testing methods discussed in this article, the statistical test to perform will be chosen based on the number of parameters to be analyzed, the statistical parameters such as mean, standard deviation etc. are known or not and also based on the type of variable. The particular statistics in each test method will be compared with the critical value to decide whether to accept the null hypothesis or consider the alternate hypothesis.

If you wish to know more on the topic classical machine learning, please check here.

Leave a Comment

Your email address will not be published. Required fields are marked *