Home » Python » How to Calculate Z-Score in Python

How to Calculate Z-Score in Python

In Statistics, a z-score indicates how many standard deviations away an element is from the mean.

Use zscore() function from scipy.stats library to calculate zscore in python.

Z-Score Formula

The following formula is used to calculate a z-score:

z=(X-µ)/σ 

where,

z = calculated z-score

X = value of an element

µ = population mean

σ  = population standard deviation

In this article, we will discuss about how to calculate z-score in python.

Scipy for Z-Score

We will be using scipy library available in python to calculate z-score.

If you don’t have scipy library installed then use below command on windows command prompt for scipy library installation

pip install scipy

How to Calculate Z-Score?

scipy library provide scipy.stats.zscore function to calculate z-score.

scipy.stats.zscore(a, axis=0, ddof=0, nan_policy=’propagate’)

where:

Parameters : a : an array list containing sample data.

axis: the axis along which to calculate z-score. Its default value is 0.

ddof: Degree of freedom correction in the calculation of the standard deviation. Its default value is zero if you are not providing any input.

nan_policy: It defines how to handle when input contains nan values.By Default it is propagate, which returns nan. If we give value as raise than it throws an error .If we give value as ‘omit’ it performs the calculations ignoring nan values.

Returns: Calculated z-score array list which is standardized by the mean and standard deviation of input array a.

Example -1 How to calculate z-score for one-dimensional array in python.

Lets assume the given data is as below:-

data = [7,4,8,9,6,11,16,17,19,12,11].

Z-Score can be calculated for the one dimensional array data using below python code.

#import modules
import numpy as np
from scipy.stats import zscore

#create one-dimensional array
data = np.array([7,4,8,9,6,11,16,17,19,12,11])

#calculate z-score
result = zscore(data)

#Print the result
print("Z-score array: ",result)

In the above code, we import z-score function from scipy library.

Using zscore(), we get the zscore array in which each z-score value indicates how many standard deviations away an individual element is from the mean. The output of the above code is as below.

#Output
Z-score array:[-0.87476705 -1.53084234 -0.65607529 -0.43738352 -1.09345881  0.  1.09345881  1.31215057  1.7495341   0.21869176  ]

Interpret the Z-score:

The z-score of the first element i.e. ‘7’ in the array list is -0.874767 times the standard deviation below the mean.

The z-score of the sixth element i.e.’11’ in the array list is 0 times the standard deviation away from the mean i.e it is equal to mean.

The z-score of the seventh element i.e.’16’ in the array list is 1.093458 times the standard deviation above the mean.

Example -2 How to calculate z-score for multi-dimensional array in python.

In this example, we create a multi-dimensional array of size(3,3) using python numpy package and calculates the z-score for all elements in the array using the below python code.

#import modules
import numpy as np
from scipy.stats import zscore

# Using seed function to generate the same random number every time with the given seed value 
np.random.seed(4)

#create multi-dimensional array
data = np.random.randint(0, 10, size=(3, 3))

#calculate z-score
result = zscore(data)

#Print the array
print('The array elements are as follows:\n',data)

#Print the result
print("Z-score array: \n",result)

In the above code using print() we are displaying the array.

By Using zscore() we are calculating the z-score for all the elements of the array. The output of the above code is shown below.

#Output
The array elements are as follows:
 [[7 5 1]
 [8 7 8]
 [2 9 7]]
Z-score array: 
 [[ 0.50800051 -1.22474487 -1.40182605]
 [ 0.88900089  0.          0.86266219]
 [-1.3970014   1.22474487  0.53916387]]

Interpret the Z-score:

The z-score for every element is shown relative to the array they are in. for example:

The z-score of the first element of the first row i.e.. ‘7’ in the array is 0.50800 times the standard deviation above the mean value of its array.

The z-score of the second element of the second row i.e.. ‘7’ in the array is 0 times the standard deviation away from the mean i.e.. it is equal to the mean value of its array.

The z-score of the first element of the third row i.e.. ‘2’ in the array is -1.39700 times the standard deviation below the mean value of its array.

Example -3 How to calculate z-score for Dataframe in python.

In this example, we create a random dataframe of dimension 5*5 and calculates the z-score for the whole dataframe using the below python code.

#import modules
import numpy as np
from scipy.stats import zscore
import pandas as pd 

# Using seed function to generate the same random number every time with the given seed value 
np.random.seed(4)

#create a random 5*5 dataframe using pandas module
data = pd.DataFrame(np.random.randint(0, 10, size=(5, 5)), columns=['A', 'B', 'C','D','E'])

#calculate z-score of the above dataframe 
result = zscore(data)

#Print the dataframe
print("Created DataFrame is: \n",data)

#Print the Calculate Z-score
print("Z-score array: \n",result)

In the above code using print() we are displaying our dataframe.

By using z-score() we are calculating z-score for all the elements of dataframe by column.

//Output
Created DataFrame is: 
    A  B  C  D  E
0  7  5  1  8  7
1  8  2  9  7  7
2  7  9  8  4  2
3  6  4  3  0  7
4  5  5  9  6  6
Z-score array: 
 [[ 0.39223227  0.         -1.49403576  1.06066017  0.61885275]
 [ 1.37281295 -1.31558703  0.89642146  0.70710678  0.61885275]
 [ 0.39223227  1.75411604  0.5976143  -0.35355339 -1.95970037]
 [-0.58834841 -0.43852901 -0.89642146 -1.76776695  0.61885275]
 [-1.56892908  0.          0.89642146  0.35355339  0.10314212]]

Interpret the Z-score:

The z-score for every element is shown relative to the column they are in. for example:

The z-score of the first element of the first row i.e.. ‘8’ in the dataframe is 0.39223 times the standard deviation above the mean value of its column.

The z-score of the second element of the first row i.e.. ‘5’ in the dataframe is 0 times the standard deviation away from the mean i.e.. it is equal to the mean value of its column.

The z-score of the first element of the fourth row i.e.. ‘6’ in the dataframe is -0.58834 times the standard deviation below the mean value of its column.

Conclusion

I hope, you may find how to calculate z-score in python using scipy library  with step by step illustration of examples educational and helpful.

Leave a Comment