The **Shapiro-Wilk test** is used to calculate whether a random sample of data comes from a normal distribution which is a common assumption used in many statistical tests including regression, ANOVA, t-test, etc.

**Shapiro-Wilk test** was proposed in 1965 by Samuel Sanford Shapiro and Martin Wilk.

It is believed to be a reliable statistical test of normality.

In this article, we will discuss how to perform a **Shapiro-Wilk test in python** with different examples.

## Scipy for Shapiro-Wilk test

We will be using scipy library available in python to perform Shapiro-Wilk test.

If you don’t have `scipy `

package installed then use below command on windows command prompt for `scipy `

library installation.

pip install scipy

## How to Perform Shapiro-Wilk test in Python?

`scipy `

library provide scipy.stats.shapiro() function to perform **Shapiro-Wilk test**.

**scipy.stats.shapiro(x)**

where:

**x : **an array list containing sample data.

This function returns a test statistic and a corresponding p-value. We will determine the result by using the below decision rule.

**Decision Rule**:-

- If the p-value ≤
**α**, then we reject the null hypothesis i.e. we assume the distribution of our variable is not normal/gaussian. - If the p-value >
**α**, then we fail to reject the null hypothesis i.e. we assume the distribution of our variable is normal/gaussian.

where **α** is a significance level.

Let’s discuss the examples to *perform Shapiro-Wilk test in python*.

## Example 1: **Shapiro-Wilk Test on Normal Data**

In this example, we will generate a random normally distributed data set and perform this test to understand functionality by using the below python code.

#import modules import numpy as np from scipy.stats import shapiro # Using seed function to generate the same random number every time with the given seed value np.random.seed(0) #generate sample of 150 values that follow a normal distribution with mean =0 and standard deviation=1 mean1 = 0 sd1 = 1 data = np.random.normal(mean1,sd1,150) #perform Shapiro-Wilk test stat,p = shapiro(data) print("The Test-Statistic and p-value are as follows:\nTest-Statistic = %.3f , p-value = %.3f"%(stat,p))

In the above code, first we import `numpy`

package to use random.randint()` `

function to generate a normally distributed array.

From `scipy `

library `shapiro() `

function is used to perform the Shapiro-Wilk test on data. It returns the test-statistic and corresponding p-value.

Here assume significance level is 0.05 (i.e. 95% confidence intervel)

The output of the above code is shown as below

The Test-Statistic and p-value are as follows: Test-Statistic = 0.990 , p-value = 0.345

Since p-value = 0.345 is greater than 0.05, then we fail to reject the null hypothesis i.e. we do not have sufficient evidence to say that sample does not come from a normal distribution.

This is already known to us as we generated the normally distributed sample using `normal() `

function from `numpy `

library.

Now , let’s take a look at a visual representation for the above dataset using following code.

#import modules import numpy as np from scipy.stats import shapiro import matplotlib.pyplot as plt # Using seed function to generate the same random number every time with the given seed value np.random.seed(0) #generate sample of 150 values that follow a normal distribution with mean =0 and standard deviation=1 mean1 = 0 sd1 = 1 data = np.random.normal(mean1,sd1,150) #plot the histogram count, bins, ignored = plt.hist(data, 10) plt.show()

We are using `matplotlib `

package in order to visually represent the histogram for the dataset.

`matplotlib.pyplot`

the package is used to plot the histogram to visualize data for generated data values.

We used `hist() `

function to display histogram of the samples data values**.**

The histogram also shows that the distribution is fairly bell-shaped with one peak in the center of the distribution, which is typical of data that is normally distributed.

## Example 2: **Shapiro-Wilk Test on Non-Normal Data**

In this example, we will generate a random sample dataset from the Poisson distribution and perform test by using the below python code.

#import modules import numpy as np from scipy.stats import shapiro # Using seed function to generate the same random number every time with the given seed value np.random.seed(1) #generate sample of 100 values that follow a Poisson Distribution with mean =6 mean1 = 6 data = np.random.poisson(mean1,100) #perform Shapiro-Wilk test stat,p = shapiro(data) print("The Test-Statistic and p-value are as follows:\nTest-Statistic = %.3f , p-value = %.3f"%(stat,p))

In the above code, we import `numpy`

package to use random.poisson()` `

function to generate a Poisson distributed dataset.

From `scipy `

library `shapiro()`

function is used to perform the Shapiro-Wilk test on data. It returns the test-statistic and corresponding p-value.

Here assume significance level is 0.05 (i.e. 95% confidence intervel)

The output of the above code is shown as below

The Test-Statistic and p-value are as follows: Test-Statistic = 0.971 , p-value = 0.026

Since p-value = 0.026 is less than 0.05, then we reject the null hypothesis i.e. we have sufficient evidence to say that sample does not come from a normal distribution.

This is already known to us as we generated the sample from Poisson Distribution using `poisson()`

function from `numpy `

library.

Now, let’s take a look at a visual representation for the above dataset using the following code.

#import modules import numpy as np from scipy.stats import shapiro import matplotlib.pyplot as plt # Using seed function to generate the same random number every time with the given seed value np.random.seed(1) #generate sample of 100 values that follow a Poisson Distribution with mean =6 mean1 = 6 data = np.random.poisson(mean1,100) #plot the histogram count, bins, ignored = plt.hist(data, 30) plt.show()

We are using `matplotlib`

package in order to visually represent the histogram for the dataset.

`matplotlib.pyplot`

the package is used to plot the histogram to visualize data for generated data values.

We used `hist() `

function to display histogram of the samples data values**.**

The histogram also shows that the distribution is not fairly bell-shaped.

It is right-skewed. This histogram also agrees with the results of the **Shapiro-Wilk test** and confirms sample data does not come from a normal distribution.

## Conclusion

I hope, you may find **how to perform a shapiro-wilk test in python** tutorial with step by step illustration of examples educational and helpful.