Confidence interval can be used to estimate the population parameter with the help of an interval with some degree of confidence. It calculates an upper and lower bound for the population value of the statistic at a specified level of confidence based on sample data. One such parameter that can be estimated is the population mean.
In this tutorial, we will discuss how to calculate confidence interval in python with step by step examples.
Confidence Interval for Mean
Confidence Interval = x̄ ± (t * standard error)
Where :
x̄ = mean
t = t-multiplier is calculated based on degree of freedom and desired confidence interval
standard error = sample standard error/√sample size
n = sample size
Note:- 1. If sample size (n<30) we will use t-distribution to calculate the confidence intervals for the mean.
2. If sample size (n>30) we will use the normal distribution to calculate the confidence intervals for the mean by assuming the sample mean is normally distributed due to central limit theorem.
Cool Tip: Learn How to calculate z score in python !
How to Interpret Confidence Intervals
Assume our confidence interval is 95%
It can be interpreted as if we repeat this process,95% of our calculated confidence intervals would contain the true population mean.
Another way of saying this is that there is only 5% probability that the true mean is less than or greater than the confidence interval values.
Scipy for Confidence Interval
We will be using scipy library available in python to calculate confidence interval.
If you don’t have scipy
library installed then use the below command on windows command prompt for scipy
library installation.
pip install scipy
Python – Confidence interval for mean
Lets understand with example to calculate confidence interval for mean using t-distribution in python.
Lets assume we have data given below :
data = [45, 55, 67, 45, 68, 79, 98, 87, 84, 82]
In this example, we calculate the 95% confidence interval for the mean using the below python code.
#import modules import numpy as np import scipy.stats as st #define given sample data data = [45, 55, 67, 45, 68, 79, 98, 87, 84, 82] #Calculate the sample parameters confidenceLevel = 0.95 # 95% CI given degrees_freedom = len(data)-1 #degree of freedom = sample size-1 sampleMean = np.mean(data) #sample mean sampleStandardError = st.sem(data) #sample standard error #create 95% confidence interval for the population mean confidenceInterval = st.t.interval(alpha=confidenceLevel, df=degrees_freedom, loc=sampleMean, scale=sampleStandardError) #print the 95% confidence interval for the population mean print('The 95% confidence interval for the population mean :',confidenceInterval)
In the above example since sample size < 30 ,so we are using t-distribution here.
We import scipy.stats
library then calculates all the sample parameters required for the calculation mentioned above.
scipy.stats.t.interval()
function accepts sample mean ,degree of freedom, confidence level sample standard error as input parameters and returns confidence interval as result. The output of the above python code is shown below.
#Output The 95% confidence interval for the population mean : (58.00052174294386, 83.99947825705614)
Let’s understand calculation of confidence interval in python using some real world examples as given below
Cool Tip: Learn How to calculate cosine similarity in python !
Python- Confidence interval for mean
Lets understand with example given below to calculate confidence interval for mean using t-distribution
In order to estimate the average weight of corona patients visited over one week, data of 15 patients was collected from a district. Make a 98% confidence interval for the true mean weight of all patients.
The weights of patients are 87,80,68,72,56,58,60,63,82,70,58,55,48,50,77
Solution:
Here in above example,
Sample size = 15
Confidence Ievel = 98%
Since sample size < 30 ,so using t-distribution we calculate the confidence interval using below python code.
#import modules import numpy as np import scipy.stats as st #define given sample data data = [87,80,68,72,56,58,60,63,82,70,58,55,48,50,77] #Calculate the sample parameters confidenceLevel = 0.98 #98% CI given degrees_freedom = len(data)-1 #degree of freedom = sample size-1 sampleMean = np.mean(data) #sample mean sampleStandardError = st.sem(data) #sample standard error #create 98% confidence interval for the population mean confidenceInterval = st.t.interval(alpha=confidenceLevel, df=degrees_freedom, loc=sampleMean, scale=sampleStandardError) #print the 98% confidence interval for the population mean print('The 98% confidence interval for the population mean weight :',confidenceInterval)
In the above code by using scipy.stats.t.interval()
function we calculate the 98% confidence interval for the population mean weight.
The output of the above python code is shown below
#Output The 98% confidence interval for the population mean weight : (57.41683559393023, 73.78316440606976)
Cool Tip: Learn How to calculate inter quartile range in python !
Confidence interval for mean using normal distribution
Let’s understand with example on confidence intervals for mean using normal distribution.
Lets generate a random sample data of 100 values between 50 and 100.
In this example, we calculate the 95% & 99% confidence interval for the mean using the below python code.
import numpy as np import scipy.stats as st # Using seed function to generate the same random number every time with the same seed value np.random.seed(1) # Create a random array of 100 integers between 50 and 100 data = np.random.randint(50,100,100) #Calculate the sample parameters confidenceLevel_1 = 0.95 #99% CI given confidenceLevel_2 = 0.99 #99% CI given degrees_freedom = len(data)-1 #degree of freedom = sample size-1 sampleMean = np.mean(data) #sample mean sampleStandardError = st.sem(data) #sample standard error #create 95% confidence interval for the population mean confidenceInterval_1 = st.norm.interval(alpha=confidenceLevel_1,loc=sampleMean,scale=sampleStandardError) #create 99% confidence interval for the population mean confidenceInterval_2 = st.norm.interval(alpha=confidenceLevel_2,loc=sampleMean,scale=sampleStandardError) #print the 95% confidence interval for the population mean print('The 95% confidence interval for the population mean weight :',confidenceInterval_1) #print the 99% confidence interval for the population mean print('The 99% confidence interval for the population mean weight :',confidenceInterval_2)
In the above example since sample size > 30 ,we are assuming the sample is normally distributed due to central limit theorem.
We import scipy.stats
library, calculates all the sample parameters required for the calculation mentioned above.
norm.interval() function accepts sample mean, degree of freedom, confidence level sample standard error as input parameters, and returns confidence interval as result.
The output of the above python code is shown below.
#Output The 95% confidence interval for the population mean weight : (69.27695945565647, 74.86304054434352) The 99% confidence interval for the population mean weight : (68.39932250956205, 75.74067749043793)
In the above output, we had observed that with larger confidence level the confidence intervals got more wider.
The 95% confidence interval for the population mean weight : (69.279,74.86)
The 99% confidence interval for the population mean weight : (68.39,75.74)
Clearly 99% CI gives more wider range for the true population mean.
Cool Tip: Learn How to calculate binomial distribution in python !
Conclusion
I hope you find the above article on how to calculate Confidence intervals in python code useful and educational.