The Standard Error of the Mean (SEM) describes how far a sample mean varies from the actual population mean.
It is used to estimate the approximate confidence intervals for the mean.
In this tutorial, we will discuss two methods you can use to calculate the Standard Error of the Mean in python with step-by-step examples.
Standard Error of the Mean Formula
The Standard error of the mean for a sample is calculated using below formula:
Standard error of the mean (SEM) = s / √n
where:
s : sample standard deviation
n : sample size
Method 1: Use Numpy
We will be using the numpy
available in python, it provides std()
function to calculate the standard error of the mean.
If you don’t have numpy
package installed, use the below command on windows command prompt for numpy library installation.
pip install numpy
Example 1: How to calculate SEM in Python
Let’s understand, how to calculate the standard error of mean (SEM) with the given below python code.
#import modules import numpy as np #define dataset data = np.array([4,7,3,9,12,8,14,10,12,12]) #calculate standard error of the mean result = np.std(data, ddof=1) / np.sqrt(np.size(data)) #Print the result print("The Standard error of the mean : %.3f"%result)
In the above code, we import numpy
library to define the dataset.
Using std()
function we calculated the standard error of the mean.
Note that we must specify ddof=1 in the argument for std()
function to calculate the sample standard deviation instead of population standard deviation.
The Output of the above code is shown below.
#Output The Standard error of the mean : 1.149
The Standard error of the mean is 1.149.
Method 2: Use Scipy
We will be using Scipy
library available in python, it provides sem()
function to calculate the standard error of the mean.
If you don’t have the scipy
library installed then use the below command on windows command prompt for scipy
library installation.
pip install scipy
Example 2: How to calculate SEM in Python
Lets assume we have dataset as below
data = [4,7,3,9,12,8,14,10,12,12]
Lets calculate the standard error of mean by using below python code.
#import modules import scipy.stats as stat #define dataset data = [4,7,3,9,12,8,14,10,12,12] #calculate standard error of the mean result = stat.sem(data) #Print the result print("The Standard error of the mean : %.3f"%result)
In the above code, we import numpy
library to define the dataset.
Using sem()
function we calculated the standard error of the mean.
The Output of the above code is shown below.
#Output The Standard error of the mean : 1.149
How to Interpret the Standard Error of the Mean
The two important factors to keep in mind while interpreting the SEM are as follows:-
1 Sample Size:- With the increase in sample size, the standard error of mean tends to decrease.
Let’s see this with below example:-
#import modules import scipy.stats as stat #define dataset 1 data1 = [4,7,3,9,12,8,14,10,12,12] #define dataset 2 by repeated the first dataset twice data2 = [4,7,3,9,12,8,14,10,12,12,4,7,3,9,12,8,14,10,12,12] #calculate standard error of the mean result1 = stat.sem(data1) result2 = stat.sem(data2) #Print the result print("The Standard error of the mean for the original dataset: %.3f"%result1) print("The Standard error of the mean for the repeated dataset : %.3f"%result2)
In the above example, we created the two datasets i.e. data1 & data2 where data2 is just the twice of data1.
The Output of the above code is shown below:-
# Output The Standard error of the mean for the original dataset: 1.149 The Standard error of the mean for the repeated dataset : 0.791
We seen that for data1 the SEM is 1.149 and for data2 SEM is 0.791.
It clearly shows that with an increase in size the SEM decreases.
Values of data2 are less spread out around the mean as compared to data1, although both have the same mean value.
2 The Value of SEM : The larger value of the SEM indicates that the values are more spread around the mean .
Let’s discuss this with below example:-
#import modules import scipy.stats as stat #define dataset 1 data1 = [4,7,3,9,12,8,14,10,12,12] #define dataset 2 by replace last value with 120 data2 = [4,7,3,9,12,8,14,10,12,120] #calculate standard error of the mean result1 = stat.sem(data1) result2 = stat.sem(data2) #Print the result print("The Standard error of the mean for the original dataset: %.3f"%result1) print("The Standard error of the mean for the repeated dataset : %.3f"%result2)
In the above example, we created the two datasets i.e. data1 & data2 where data2 is created by replacing the last value with 120.
The Output of the above code is shown below:-
#Output The Standard error of the mean for the original dataset: 1.149 The Standard error of the mean for the repeated dataset : 11.177
We seen that for data1 the SEM is 1.149 and for data2 SEM is 11.177.
It clearly shows that SEM for data2 is larger as compared to data1.
It means the values of data2 are more spread out around the mean as compared to data1.
Conclusion
I hope, you may find how to calculate the Standard Error of the Mean in the python tutorial with a step-by-step illustration of examples educational and helpful.