Cosine similarity measures the similarity between two vectors of an inner product space by calculating the cosine of the angle between the two vectors.
Python Cosine similarity is one of the most widely used and powerful similarity measures.
Use dot() and norm()
functions of python NumPy package to calculate Cosine Similarity
in python.
Cosine Similarity Formula
For two vectors, A and B, the Cosine Similarity in Python is calculated as:
Cosine Similarity = ΣAiBi / (√ΣAi2√ΣBi2)
In this article, we will discuss how to calculate cosine similarity in python and cosine similarity examples.
Cool Tip: Learn how to calculate mean squared error (MSE) in python!
Using Numpy for Cosine Similarity
We will be using numpy
library available in python to calculate cosine similarity between two vectors.
If you don’t have numpy
library installed then use the below command on the windows command prompt for NumPy library installation
pip install numpy
Let’s understand with examples about how to calculate Cosine similarity in python with given below python code
Calculate Cosine Similarity in Python
lets assume we have data as below;- x = [1, 1, 1, 1, 0, 0, 0, 0, 0] y = [0, 0, 1, 1, 1, 1, 0, 0, 0]
Using numpy.array()function we will create x & y arrays of the
same length.
#import modules import numpy as np from numpy import dot from numpy.linalg import norm #define arrays x = np.array([1, 1, 1, 1, 0, 0, 0, 0, 0]) y = np.array([0, 0, 1, 1, 1, 1, 0, 0, 0]) #calculate Cosine Similarity python result = dot(x, y)/(norm(x)*norm(y)) print("The Cosine Similarity between two vectors is: ",result)
In the above code, we import numpy
package to use dot() and norm()
functions to calculate Cosine Similarity
in python.
Using dot(x, y)/(norm(x)*norm(y))
, we calculate the cosine similarity between two vectors x & y in python.
The output of the above cosine similarity in python code :
//Output The Cosine Similarity between two vectors is: 0.5
Cool Tip: Check here article on how to calculate MAPE in python!
Calculate Cosine Similarity between arrays of same length in Python
In this example, we will calculate Python Cosine similarity between two randomly generated arrays of the same length in python with the given below code.
#import modules import numpy as np from numpy import dot from numpy.linalg import norm #define arrays x = np.random.randint(10, size=100) y = np.random.randint(10, size=100) # Calculate Cosine Similarity Python result = dot(x, y)/(norm(x)*norm(y)) print("The Cosine Similarity between two vectors is: ",result)
In the above code using numpy.random.randint()
, we create two random arrays of size 100.
Using dot(x, y)/(norm(x)*norm(y))
we calculate the cosine similarity between two vectors x & y in Python.
The output of the above cosine similarity in python code.
#output The Cosine Similarity between two vectors is: 0.6373168018459651
Cool Tip: Learn how to calculate SMAPE in python!
Calculate Cosine Similarity between arrays of different length in Python
In this example, we will calculate Cosine similarity Python between two randomly generated arrays of different lengths in python with the given below code.
import numpy as np from numpy import dot from numpy.linalg import norm #define arrays x = np.random.randint(10, size=90) #length=90 y = np.random.randint(10, size=100) #length=100 #calculate Cosine Similarity Python result = dot(x, y)/(norm(x)*norm(y)) print("The Cosine Similarity between two vectors is: ",result)
In the above code, it will raise the ValueError because the arrays are of different lengths.
Note:- We are not able to calculate the cosine similarity between the arrays of different lengths.
The Error of the above cosine similarity in python code is shown below.
#ERROR ValueError: shapes (90,) and (100,) not aligned: 90 (dim 0) != 100 (dim 0)
Cool Tip: Learn how to calculate Euclidean distance in python!
Conclusion
I hope, you may find how to calculate Cosine Similarity in python tutorial with step by step illustration of cosine similarity examples educational and helpful.
Using NumPy package in Python, cosine similarity can be calculated using dot() and norm() functions.