Calculate Confidence Interval in Python(With Examples)

Confidence interval can be used to estimate the population parameter with the help of an interval with some degree of confidence. It calculates an upper and lower bound for the population value of the statistic at a specified level of confidence based on sample data. One such parameter that can be estimated is the population mean.

In this tutorial, we will discuss how to calculate confidence interval in python with step by step examples.

Confidence Interval for Mean

Confidence Interval = x̄ ± (t * standard error)

Where :

x̄ = mean

t = t-multiplier is calculated based on degree of freedom and desired confidence interval

standard error = sample standard error/sample size

n = sample size

Note:- 1. If sample size (n<30) we will use t-distribution to calculate the confidence intervals for the mean.

2. If sample size (n>30) we will use the normal distribution to calculate the confidence intervals for the mean by assuming the sample mean is normally distributed due to central limit theorem.

Cool Tip: Learn How to calculate z score in python !

How to Interpret Confidence Intervals

Assume our confidence interval is 95%

It can be interpreted as if we repeat this process,95% of our calculated confidence intervals would contain the true population mean.

Another way of saying this is that there is only 5% probability that the true mean is less than or greater than the confidence interval values.

Scipy for Confidence Interval

We will be using scipy library available in python to calculate confidence interval.

If you don’t have scipy library installed then use the below command on windows command prompt for scipy library installation.

pip install scipy

Python – Confidence interval for mean

Lets understand with example to calculate confidence interval for mean using t-distribution in python.

Lets assume we have data given below :

data = [45, 55, 67, 45, 68, 79, 98, 87, 84, 82]

In this example, we calculate the 95% confidence interval for the mean using the below python code.

#import modules
import numpy as np
import scipy.stats as st

#define given sample data
data = [45, 55, 67, 45, 68, 79, 98, 87, 84, 82]

#Calculate the sample parameters
confidenceLevel = 0.95   # 95% CI given
degrees_freedom = len(data)-1  #degree of freedom = sample size-1
sampleMean = np.mean(data)    #sample mean
sampleStandardError = st.sem(data)  #sample standard error

#create 95% confidence interval for the population mean
confidenceInterval = st.t.interval(alpha=confidenceLevel, df=degrees_freedom, loc=sampleMean, scale=sampleStandardError)

#print the 95% confidence interval for the population mean
print('The 95% confidence interval for the population mean :',confidenceInterval)

In the above example since sample size < 30 ,so we are using t-distribution here.

We import scipy.stats library then calculates all the sample parameters required for the calculation mentioned above.

scipy.stats.t.interval() function accepts sample mean ,degree of freedom, confidence level sample standard error as input parameters and returns confidence interval as result. The output of the above python code is shown below.

#Output 
The 95% confidence interval for the population mean : (58.00052174294386, 83.99947825705614)

Let’s understand calculation of confidence interval in python using some real world examples as given below

Cool Tip: Learn How to calculate cosine similarity in python !

Python- Confidence interval for mean

Lets understand with example given below to calculate confidence interval for mean using t-distribution

In order to estimate the average weight of corona patients visited over one week, data of 15 patients was collected from a district. Make a 98% confidence interval for the true mean weight of all patients.

The weights of patients are 87,80,68,72,56,58,60,63,82,70,58,55,48,50,77

Solution:

Here in above example,

Sample size = 15

Confidence Ievel = 98%

Since sample size < 30 ,so using t-distribution we calculate the confidence interval using below python code.

#import modules
import numpy as np
import scipy.stats as st

#define given sample data
data = [87,80,68,72,56,58,60,63,82,70,58,55,48,50,77]

#Calculate the sample parameters
confidenceLevel = 0.98             #98% CI given
degrees_freedom = len(data)-1      #degree of freedom = sample size-1
sampleMean = np.mean(data)          #sample mean
sampleStandardError = st.sem(data)   #sample standard error

#create 98% confidence interval for the population mean
confidenceInterval = st.t.interval(alpha=confidenceLevel, df=degrees_freedom, loc=sampleMean, scale=sampleStandardError)

#print the 98% confidence interval for the population mean
print('The 98% confidence interval for the population mean weight :',confidenceInterval)

In the above code by using scipy.stats.t.interval() function we calculate the 98% confidence interval for the population mean weight.

The output of the above python code is shown below

#Output
The 98% confidence interval for the population mean weight : (57.41683559393023, 73.78316440606976)

Cool Tip: Learn How to calculate inter quartile range in python !

Confidence interval for mean using normal distribution

Let’s understand with example on confidence intervals for mean using normal distribution.

Lets generate a random sample data of 100 values between 50 and 100.

In this example, we calculate the 95% & 99% confidence interval for the mean using the below python code.

import numpy as np
import scipy.stats as st

# Using seed function to generate the same random number every time with the same seed value
np.random.seed(1)

# Create a random array of 100 integers between 50 and 100
data = np.random.randint(50,100,100)

#Calculate the sample parameters
confidenceLevel_1 = 0.95           #99% CI given
confidenceLevel_2 = 0.99           #99% CI given
degrees_freedom = len(data)-1      #degree of freedom = sample size-1
sampleMean = np.mean(data)         #sample mean 
sampleStandardError = st.sem(data) #sample standard error

#create 95% confidence interval for the population mean
confidenceInterval_1 = st.norm.interval(alpha=confidenceLevel_1,loc=sampleMean,scale=sampleStandardError)

#create 99% confidence interval for the population mean
confidenceInterval_2 = st.norm.interval(alpha=confidenceLevel_2,loc=sampleMean,scale=sampleStandardError)

#print the 95% confidence interval for the population mean
print('The 95% confidence interval for the population mean weight :',confidenceInterval_1)

#print the 99% confidence interval for the population mean
print('The 99% confidence interval for the population mean weight :',confidenceInterval_2)

In the above example since sample size > 30 ,we are assuming the sample is normally distributed due to central limit theorem.

We import scipy.stats library, calculates all the sample parameters required for the calculation mentioned above.

norm.interval() function accepts sample mean, degree of freedom, confidence level sample standard error as input parameters, and returns confidence interval as result.

The output of the above python code is shown below.

#Output
The 95% confidence interval for the population mean weight : (69.27695945565647, 74.86304054434352)
The 99% confidence interval for the population mean weight : (68.39932250956205, 75.74067749043793)

In the above output, we had observed that with larger confidence level the confidence intervals got more wider.

The 95% confidence interval for the population mean weight : (69.279,74.86)


The 99% confidence interval for the population mean weight : (68.39,75.74)

Clearly 99% CI gives more wider range for the true population mean.

Cool Tip: Learn How to calculate binomial distribution in python !

Conclusion

I hope you find the above article on how to calculate Confidence intervals in python code useful and educational.