In Statistics, a z-score indicates how many standard deviations away an element is from the mean.
Use zscore() function from scipy.stats library to calculate zscore in python.
Z-Score Formula
The following formula is used to calculate a z-score:
z=(X-µ)/σ
where,
z = calculated z-score
X = value of an element
µ = population mean
σ = population standard deviation
In this article, we will discuss about how to calculate z-score in python.
Scipy for Z-Score
We will be using scipy library available in python to calculate z-score.
If you don’t have scipy
library installed then use below command on windows command prompt for scipy library installation
pip install scipy
How to Calculate Z-Score?
scipy library provide scipy.stats.zscore function to calculate z-score.
scipy.stats.zscore(a, axis=0, ddof=0, nan_policy=’propagate’)
where:
Parameters : a : an array list containing sample data.
axis: the axis along which to calculate z-score. Its default value is 0.
ddof: Degree of freedom correction in the calculation of the standard deviation. Its default value is zero if you are not providing any input.
nan_policy: It defines how to handle when input contains nan values.By Default it is propagate, which returns nan. If we give value as raise than it throws an error .If we give value as ‘omit’ it performs the calculations ignoring nan values.
Returns: Calculated z-score array list which is standardized by the mean and standard deviation of input array a.
Example -1 How to calculate z-score for one-dimensional array in python.
Lets assume the given data is as below:-
data = [7,4,8,9,6,11,16,17,19,12,11].
Z-Score can be calculated for the one dimensional array data using below python code.
#import modules import numpy as np from scipy.stats import zscore #create one-dimensional array data = np.array([7,4,8,9,6,11,16,17,19,12,11]) #calculate z-score result = zscore(data) #Print the result print("Z-score array: ",result)
In the above code, we import z-score function from scipy
library.
Using zscore()
, we get the zscore array in which each z-score value indicates how many standard deviations away an individual element is from the mean. The output of the above code is as below.
#Output Z-score array:[-0.87476705 -1.53084234 -0.65607529 -0.43738352 -1.09345881 0. 1.09345881 1.31215057 1.7495341 0.21869176 ]
Interpret the Z-score:
The z-score of the first element i.e. ‘7’ in the array list is -0.874767 times the standard deviation below the mean.
The z-score of the sixth element i.e.’11’ in the array list is 0 times the standard deviation away from the mean i.e it is equal to mean.
The z-score of the seventh element i.e.’16’ in the array list is 1.093458 times the standard deviation above the mean.
Example -2 How to calculate z-score for multi-dimensional array in python.
In this example, we create a multi-dimensional array of size(3,3) using python numpy package and calculates the z-score for all elements in the array using the below python code.
#import modules import numpy as np from scipy.stats import zscore # Using seed function to generate the same random number every time with the given seed value np.random.seed(4) #create multi-dimensional array data = np.random.randint(0, 10, size=(3, 3)) #calculate z-score result = zscore(data) #Print the array print('The array elements are as follows:\n',data) #Print the result print("Z-score array: \n",result)
In the above code using print() we are displaying the array.
By Using zscore()
we are calculating the z-score for all the elements of the array. The output of the above code is shown below.
#Output The array elements are as follows: [[7 5 1] [8 7 8] [2 9 7]] Z-score array: [[ 0.50800051 -1.22474487 -1.40182605] [ 0.88900089 0. 0.86266219] [-1.3970014 1.22474487 0.53916387]]
Interpret the Z-score:
The z-score for every element is shown relative to the array they are in. for example:
The z-score of the first element of the first row i.e.. ‘7’ in the array is 0.50800 times the standard deviation above the mean value of its array.
The z-score of the second element of the second row i.e.. ‘7’ in the array is 0 times the standard deviation away from the mean i.e.. it is equal to the mean value of its array.
The z-score of the first element of the third row i.e.. ‘2’ in the array is -1.39700 times the standard deviation below the mean value of its array.
Example -3 How to calculate z-score for Dataframe in python.
In this example, we create a random dataframe of dimension 5*5 and calculates the z-score for the whole dataframe using the below python code.
#import modules import numpy as np from scipy.stats import zscore import pandas as pd # Using seed function to generate the same random number every time with the given seed value np.random.seed(4) #create a random 5*5 dataframe using pandas module data = pd.DataFrame(np.random.randint(0, 10, size=(5, 5)), columns=['A', 'B', 'C','D','E']) #calculate z-score of the above dataframe result = zscore(data) #Print the dataframe print("Created DataFrame is: \n",data) #Print the Calculate Z-score print("Z-score array: \n",result)
In the above code using print() we are displaying our dataframe.
By using z-score() we are calculating z-score for all the elements of dataframe by column.
//Output Created DataFrame is: A B C D E 0 7 5 1 8 7 1 8 2 9 7 7 2 7 9 8 4 2 3 6 4 3 0 7 4 5 5 9 6 6 Z-score array: [[ 0.39223227 0. -1.49403576 1.06066017 0.61885275] [ 1.37281295 -1.31558703 0.89642146 0.70710678 0.61885275] [ 0.39223227 1.75411604 0.5976143 -0.35355339 -1.95970037] [-0.58834841 -0.43852901 -0.89642146 -1.76776695 0.61885275] [-1.56892908 0. 0.89642146 0.35355339 0.10314212]]
Interpret the Z-score:
The z-score for every element is shown relative to the column they are in. for example:
The z-score of the first element of the first row i.e.. ‘8’ in the dataframe is 0.39223 times the standard deviation above the mean value of its column.
The z-score of the second element of the first row i.e.. ‘5’ in the dataframe is 0 times the standard deviation away from the mean i.e.. it is equal to the mean value of its column.
The z-score of the first element of the fourth row i.e.. ‘6’ in the dataframe is -0.58834 times the standard deviation below the mean value of its column.
Conclusion
I hope, you may find how to calculate z-score in python using scipy library with step by step illustration of examples educational and helpful.