function calculate_similarity
Computes the cosine similarity between two embedding vectors, returning a normalized score between 0 and 1 that measures their directional alignment.
/tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py
6 - 21
simple
Purpose
This function calculates the cosine similarity metric between two numerical vectors, commonly used in machine learning and NLP applications to measure semantic similarity between embeddings, compare document representations, or find nearest neighbors in vector spaces. The cosine similarity measures the cosine of the angle between two vectors, with 1 indicating identical direction, 0 indicating orthogonality, and values approaching 0 indicating dissimilarity.
Source Code
def calculate_similarity(vec1: List[float], vec2: List[float]) -> float:
"""
Calculate cosine similarity between two embedding vectors.
Args:
vec1: First embedding vector
vec2: Second embedding vector
Returns:
Cosine similarity score between 0 and 1
"""
# Reshape vectors for sklearn's cosine_similarity
v1 = np.array(vec1).reshape(1, -1)
v2 = np.array(vec2).reshape(1, -1)
return float(cosine_similarity(v1, v2)[0][0])
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
vec1 |
List[float] | - | positional_or_keyword |
vec2 |
List[float] | - | positional_or_keyword |
Parameter Details
vec1: First embedding vector as a list of floating-point numbers. Must be a non-empty list with the same dimensionality as vec2. Typically represents a numerical embedding from a machine learning model (e.g., word embeddings, sentence embeddings, or feature vectors).
vec2: Second embedding vector as a list of floating-point numbers. Must be a non-empty list with the same dimensionality as vec1. Should represent the same type of embedding as vec1 for meaningful comparison.
Return Value
Type: float
Returns a float representing the cosine similarity score between the two input vectors. The value ranges from 0 to 1, where 1 indicates the vectors point in exactly the same direction (maximum similarity), 0 indicates orthogonal vectors (no similarity), and values closer to 0 indicate increasing dissimilarity. Note: While cosine similarity can theoretically range from -1 to 1, the docstring indicates this implementation returns values between 0 and 1, suggesting the input vectors are expected to have non-negative components or the context assumes similarity interpretation.
Dependencies
numpyscikit-learntyping
Required Imports
import numpy as np
from typing import List
from sklearn.metrics.pairwise import cosine_similarity
Usage Example
import numpy as np
from typing import List
from sklearn.metrics.pairwise import cosine_similarity
def calculate_similarity(vec1: List[float], vec2: List[float]) -> float:
v1 = np.array(vec1).reshape(1, -1)
v2 = np.array(vec2).reshape(1, -1)
return float(cosine_similarity(v1, v2)[0][0])
# Example usage
vector1 = [1.0, 2.0, 3.0, 4.0]
vector2 = [2.0, 4.0, 6.0, 8.0]
similarity_score = calculate_similarity(vector1, vector2)
print(f"Cosine similarity: {similarity_score}")
# Output: Cosine similarity: 1.0 (vectors are in same direction)
# Compare different vectors
vector3 = [1.0, 0.0, 0.0, 0.0]
vector4 = [0.0, 1.0, 0.0, 0.0]
similarity_score2 = calculate_similarity(vector3, vector4)
print(f"Cosine similarity: {similarity_score2}")
# Output: Cosine similarity: 0.0 (vectors are orthogonal)
Best Practices
- Ensure both input vectors have the same dimensionality; mismatched dimensions will cause numpy/sklearn errors
- Input vectors should be non-empty lists to avoid division by zero or invalid operations
- For large-scale similarity computations, consider batch processing using sklearn's cosine_similarity directly with 2D arrays instead of calling this function repeatedly
- Be aware that cosine similarity is scale-invariant (only considers direction, not magnitude), so vectors [1,2,3] and [2,4,6] will have similarity of 1.0
- If working with sparse vectors or very high-dimensional data, consider using scipy.sparse matrices for memory efficiency
- The function converts the result to float explicitly, which is useful for JSON serialization or database storage
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function build_similarity_matrix 63.2% similar
-
function find_similar_documents 46.9% similar
-
function calculate_cv_v1 46.5% similar
-
function calculate_cv 43.8% similar
-
function calculate_cv_v2 42.1% similar