Home » Python » How to Perform a Shapiro-Wilk Test in Python

How to Perform a Shapiro-Wilk Test in Python

The Shapiro-Wilk test is used to calculate whether a random sample of data comes from a normal distribution which is a common assumption used in many statistical tests including regression, ANOVA, t-test, etc.

Shapiro-Wilk test was proposed in 1965 by Samuel Sanford Shapiro and Martin Wilk.

It is believed to be a reliable statistical test of normality.

In this article, we will discuss how to perform a Shapiro-Wilk test in python with different examples.

Scipy for Shapiro-Wilk test

We will be using scipy library available in python to perform Shapiro-Wilk test.

If you don’t have scipy package installed then use below command on windows command prompt for scipy library installation.

pip install scipy

How to Perform Shapiro-Wilk test in Python?

scipy library provide scipy.stats.shapiro() function to perform Shapiro-Wilk test.

scipy.stats.shapiro(x)

where:

x : an array list containing sample data.

This function returns a test statistic and a corresponding p-value. We will determine the result by using the below decision rule.

Decision Rule:-

  • If the p-value ≤ α, then we reject the null hypothesis i.e. we assume the distribution of our variable is not normal/gaussian.
  • If the p-value > α, then we fail to reject the null hypothesis i.e. we assume the distribution of our variable is normal/gaussian.

where α is a significance level.

Let’s discuss the examples to perform Shapiro-Wilk test in python.

Example 1: Shapiro-Wilk Test on Normal Data

In this example, we will generate a random normally distributed data set and perform this test to understand functionality by using the below python code.

#import modules
import numpy as np
from scipy.stats import shapiro

# Using seed function to generate the same random number every time with the given seed value 
np.random.seed(0)

#generate sample of 150 values that follow a normal distribution with mean =0 and standard deviation=1
mean1 = 0
sd1 = 1

data = np.random.normal(mean1,sd1,150)

#perform Shapiro-Wilk test
stat,p = shapiro(data)

print("The Test-Statistic and p-value are as follows:\nTest-Statistic = %.3f , p-value = %.3f"%(stat,p))

In the above code, first we import numpy package to use random.randint() function to generate a normally distributed array.

From scipy library shapiro() function is used to perform the Shapiro-Wilk test on data. It returns the test-statistic and corresponding p-value.

Here assume significance level is 0.05 (i.e. 95% confidence intervel)

The output of the above code is shown as below

The Test-Statistic and p-value are as follows:
Test-Statistic = 0.990 , p-value = 0.345

Since p-value = 0.345 is greater than 0.05, then we fail to reject the null hypothesis i.e. we do not have sufficient evidence to say that sample does not come from a normal distribution.

This is already known to us as we generated the normally distributed sample using normal() function from numpy library.

Now , let’s take a look at a visual representation for the above dataset using following code.

#import modules
import numpy as np
from scipy.stats import shapiro
import matplotlib.pyplot as plt

# Using seed function to generate the same random number every time with the given seed value 
np.random.seed(0)

#generate sample of 150 values that follow a normal distribution with mean =0 and standard deviation=1
mean1 = 0
sd1 = 1
data = np.random.normal(mean1,sd1,150)

#plot the histogram
count, bins, ignored = plt.hist(data, 10)
plt.show()

We are using matplotlib package in order to visually represent the histogram for the dataset.

matplotlib.pyplot the package is used to plot the histogram to visualize data for generated data values.

We used hist() function to display histogram of the samples data values.

Shapiro-Wilk test on Normal Data
Shapiro-Wilk test on Normal Data

The histogram also shows that the distribution is fairly bell-shaped with one peak in the center of the distribution, which is typical of data that is normally distributed.

Example 2: Shapiro-Wilk Test on Non-Normal Data

In this example, we will generate a random sample dataset from the Poisson distribution and perform test by using the below python code.

#import modules
import numpy as np
from scipy.stats import shapiro

# Using seed function to generate the same random number every time with the given seed value 
np.random.seed(1)

#generate sample of 100 values that follow a Poisson Distribution with mean =6
mean1 = 6

data = np.random.poisson(mean1,100)

#perform Shapiro-Wilk test
stat,p = shapiro(data)

print("The Test-Statistic and p-value are as follows:\nTest-Statistic = %.3f , p-value = %.3f"%(stat,p))

In the above code, we import numpy package to use random.poisson() function to generate a Poisson distributed dataset.

From scipy library shapiro() function is used to perform the Shapiro-Wilk test on data. It returns the test-statistic and corresponding p-value.

Here assume significance level is 0.05 (i.e. 95% confidence intervel)

The output of the above code is shown as below

The Test-Statistic and p-value are as follows:
Test-Statistic = 0.971 , p-value = 0.026

Since p-value = 0.026 is less than 0.05, then we reject the null hypothesis i.e. we have sufficient evidence to say that sample does not come from a normal distribution.

This is already known to us as we generated the sample from Poisson Distribution using poisson() function from numpy library.

Now, let’s take a look at a visual representation for the above dataset using the following code.

#import modules
import numpy as np
from scipy.stats import shapiro
import matplotlib.pyplot as plt

# Using seed function to generate the same random number every time with the given seed value 
np.random.seed(1)

#generate sample of 100 values that follow a Poisson Distribution with mean =6
mean1 = 6

data = np.random.poisson(mean1,100)

#plot the histogram
count, bins, ignored = plt.hist(data, 30)
plt.show()

We are using matplotlib package in order to visually represent the histogram for the dataset.

matplotlib.pyplot the package is used to plot the histogram to visualize data for generated data values.

We used hist() function to display histogram of the samples data values.

Shapiro-Wilk test on Non-Normal Data
Shapiro-Wilk test on Non-Normal Data

The histogram also shows that the distribution is not fairly bell-shaped.

It is right-skewed. This histogram also agrees with the results of the Shapiro-Wilk test and confirms sample data does not come from a normal distribution.

Conclusion

I hope, you may find how to perform a shapiro-wilk test in python tutorial with step by step illustration of examples educational and helpful.

Leave a Comment