🔍 Code Extractor

function setup_similarity_cleaner

Maturity: 20

A pytest fixture that creates and returns a configured SimilarityCleaner instance with a threshold of 0.8 for use in test cases.

File:
/tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py
Lines:
5 - 7
Complexity:
simple

Purpose

This fixture provides a reusable SimilarityCleaner object for testing purposes. It initializes the cleaner with a predefined similarity threshold of 0.8, allowing test functions to use a consistent cleaner instance without repeatedly instantiating it. The SimilarityCleaner is likely used to identify and remove similar or duplicate items based on a similarity metric.

Source Code

def setup_similarity_cleaner():
    cleaner = SimilarityCleaner(threshold=0.8)  # Example threshold for similarity
    return cleaner

Return Value

Returns a SimilarityCleaner instance configured with a similarity threshold of 0.8. This object can be used to perform similarity-based cleaning operations on data, identifying items that exceed the 0.8 similarity threshold.

Dependencies

  • pytest
  • src.cleaners.similarity_cleaner

Required Imports

import pytest
from src.cleaners.similarity_cleaner import SimilarityCleaner

Usage Example

import pytest
from src.cleaners.similarity_cleaner import SimilarityCleaner

@pytest.fixture
def setup_similarity_cleaner():
    cleaner = SimilarityCleaner(threshold=0.8)
    return cleaner

def test_similarity_cleaning(setup_similarity_cleaner):
    cleaner = setup_similarity_cleaner
    # Use the cleaner in your test
    result = cleaner.clean(some_data)
    assert result is not None

Best Practices

  • This fixture should only be used in test files, not in production code
  • The threshold value of 0.8 is hardcoded; consider parameterizing it if different thresholds are needed for different tests
  • Ensure the SimilarityCleaner class is properly implemented before using this fixture
  • Use this fixture by including it as a parameter in test functions that need a SimilarityCleaner instance
  • Consider using pytest.fixture(scope='module') or 'session' if the cleaner can be reused across multiple tests to improve performance

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_similarity_threshold_effect 80.1% similar

    A pytest test function that validates the behavior of SimilarityCleaner with different similarity threshold values, ensuring that higher thresholds retain more texts while lower thresholds are more aggressive in removing similar content.

    From: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py
  • function test_nearly_similar_text_handling 73.0% similar

    A pytest test function that verifies the SimilarityCleaner's ability to identify and remove nearly similar text entries while preserving distinct ones.

    From: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py
  • function test_identical_text_removal 67.4% similar

    A pytest test function that verifies the SimilarityCleaner's ability to remove identical duplicate text entries from a list while preserving unique documents.

    From: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py
  • function test_single_text_input 64.1% similar

    A pytest test function that verifies the SimilarityCleaner correctly handles a single text document by returning it unchanged.

    From: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py
  • function test_empty_input 63.0% similar

    A pytest test function that verifies the SimilarityCleaner correctly handles empty input by returning an empty list.

    From: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py
← Back to Browse