Home » Python » How to Calculate Autocorrelation in Python

How to Calculate Autocorrelation in Python

Autocorrelation measures the degree of same variable correlation in time series and lagged version of the value of a variable.

Autocorrelation also referred as serial correlation or lagged correlation. It find how much lagged version of the value of a variable relationship with the current value of variable in time series.

Autocorrelation can be either positive or negative. It has value ranges in -1 to 1. Value near to -1 represents perfect negative autocorrelation and value near to 1 represents perfect autocorrelation in positive direction.

The Dublin-Watson statistic is used to test autocorrelation. The value of autocorrelation ranges from 0 to 4 for Dublin-Watson tests. If the value near to 0 represents stronger positive autocorrelation, and value near to 4 represents negative autocorrelation. If the value near to 2, it means less autocorrelation.

In this tutorial, we will discuss about how to calculate autocorrelation in python with step by step examples.

We will need to import statsmodel library and numpy package to calculate autocorrelation in python and matplotlib library to visualize data on chart.

pip install numpy

If you don’t have numpy package installed on your system, use below command in command prompt

pip install numpy

pip install statsmodels

statsmodels in python provides many classes and functions to conduct different statistical tests, estimate statistical models.

We will be using statsmodels api and graphics for calculation of autocorrelation and show positive and negative autocorrelation for given time series on graphics.

pip install statsmodels

pip install matplotlib

We will be using matplotlib library to visualize autocorrelation data.

pip install matplotlib

How to Calculate Autocorrelation in Python

Lets understand autocorrelation calculation with the help of examples.

Lets assume, we have temperature on different days of months, we will find out positive autocorrelation and negative autocorrelation.

temps = [68.2,65.6,67.2,67.8,66.1,66.5,68.2,67.8,68.4,68.6,68.3,69,68.7,68.9,69,69.5,69.7]

Using below python code, we will find autocorrelation and for lags = 10

import statsmodels.api as spi
import numpy as np
import matplotlib.pyplot as plt
from  statsmodels.graphics import tsaplots

#Create temperature data array

temps = np.array([68.2,65.6,67.2,67.8,66.1,66.5,68.2,67.8,68.4,68.6,68.3,69,68.7,68.9,69,69.5,69.7])

# Calculate Autocorrelations

res = spi.tsa.acf(temps,nlags = 10,fft = False)

# Print Autocorrelations lag = 10 observation
print(res)

# Plot Autocorrelation observation on chart
acr = tsaplots.plot_acf(temps,lags = 10)

plt.show()

In the above example, we have 17 days of temperature data as below

temp = [68.2,65.6,67.2,67.8,66.1,66.5,68.2,67.8,68.4,68.6,68.3,69,68.7,68.9,69,69.5,69.7]

We have used numpy package array function to create 17 days temperature array.

We have imported statsmodels.api library, which provide tsa.acf() function to calcualte autocorrelation.

tsa.acf() function takes parameters as temperature time series data, lags. We have specified lags = 10 to get number of lags.

It returns the output as array

[ 1. 0.53602483 0.35338064 0.47522452 0.23967742 0.03610916 -0.01787539 -0.14130361 -0.18734858 -0.22611561 -0.323244 ]

Based on output, we can interpret it as below

  • The autocorrelation at lag = 0 is 1
  • The autocorrelation at lag = 1 is 0.53602483
  • The autocorrelation at lag = 2 is 0.35338064

and like this for rest of the lags till 10.

Plot Autocorrelation in Python

We have imported statsmodels.graphics library which provides tsaplots.plot_acf() function to plot autocorrelation function for a given temperature time series.

In the above example,

we have used, acr = tsaplots.plot_acf(temps,lags = 10) to plot autocorrelation observation for lag 10 on charts.

Autocorrelation Chart Python
Autocorrelation Chart

On chart, it can be easily visualize to determine how the temperature in given days of months are autocorrelated. Autocorrelation is ranges in between -1 to 1. On X axis, it displays number of lags and on Y axis, it displays autocorrelation for lag.

If the temperature in days are increasing, it may tends to increase next day and similar when temperature decreases, it tends to decrease in next day.

Conclusion

I hope, you may have like above tutorial on how to calculate autocorrelation in python educational and helpful.