class TestCombinedCleaner
A unittest test class that validates the functionality of the CombinedCleaner class, testing its ability to remove duplicate and similar texts from collections.
/tf/active/vicechatdev/chromadb-cleanup/tests/test_combined_cleaner.py
6 - 47
simple
Purpose
This test class provides comprehensive unit tests for the CombinedCleaner component. It verifies three key aspects: (1) removal of identical duplicate texts, (2) similarity-based screening to filter out near-duplicate texts, and (3) combined functionality handling both exact duplicates and similar texts. The tests ensure that the CombinedCleaner correctly deduplicates text collections while preserving unique and sufficiently different texts.
Source Code
class TestCombinedCleaner(unittest.TestCase):
def setUp(self):
self.cleaner = CombinedCleaner()
def test_identical_text_removal(self):
texts = [
"This is a test.",
"This is a test.",
"This is another test."
]
cleaned_texts = self.cleaner.clean(texts)
self.assertEqual(len(cleaned_texts), 2)
self.assertIn("This is a test.", cleaned_texts)
self.assertIn("This is another test.", cleaned_texts)
def test_similarity_screening(self):
texts = [
"This is a test.",
"This is a test.",
"This is a similar test.",
"Completely different text."
]
cleaned_texts = self.cleaner.clean(texts)
self.assertEqual(len(cleaned_texts), 3)
self.assertIn("This is a test.", cleaned_texts)
self.assertIn("This is a similar test.", cleaned_texts)
self.assertIn("Completely different text.", cleaned_texts)
def test_combined_functionality(self):
texts = [
"This is a test.",
"This is a test.",
"This is a similar test.",
"This is a test.",
"Another unique text."
]
cleaned_texts = self.cleaner.clean(texts)
self.assertEqual(len(cleaned_texts), 3)
self.assertIn("This is a test.", cleaned_texts)
self.assertIn("This is a similar test.", cleaned_texts)
self.assertIn("Another unique text.", cleaned_texts)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
unittest.TestCase | - |
Parameter Details
bases: Inherits from unittest.TestCase, which provides the testing framework infrastructure including assertion methods and test execution capabilities
Return Value
As a test class, it does not return values directly. When instantiated and run by a test runner, it produces test results (pass/fail) for each test method. Individual test methods use assertions to validate expected behavior and raise AssertionError on failure.
Class Interface
Methods
setUp(self) -> None
Purpose: Initializes test fixtures before each test method runs, creating a fresh CombinedCleaner instance
Returns: None - sets up instance attributes for use in test methods
test_identical_text_removal(self) -> None
Purpose: Tests that the CombinedCleaner correctly removes exact duplicate texts from a list, keeping only unique entries
Returns: None - raises AssertionError if test fails
test_similarity_screening(self) -> None
Purpose: Tests that the CombinedCleaner handles both exact duplicates and similar texts, removing duplicates while preserving sufficiently different texts
Returns: None - raises AssertionError if test fails
test_combined_functionality(self) -> None
Purpose: Tests the complete functionality of CombinedCleaner with a complex scenario involving multiple identical duplicates, similar texts, and unique texts
Returns: None - raises AssertionError if test fails
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
cleaner |
CombinedCleaner | Instance of CombinedCleaner being tested, initialized fresh before each test method in setUp | instance |
Dependencies
unittestsrc.cleaners.combined_cleanersrc.utils.hash_utilssrc.utils.similarity_utils
Required Imports
import unittest
from src.cleaners.combined_cleaner import CombinedCleaner
from src.utils.hash_utils import hash_text
from src.utils.similarity_utils import calculate_similarity
Usage Example
import unittest
from src.cleaners.combined_cleaner import CombinedCleaner
# Run a single test
test = TestCombinedCleaner()
test.setUp()
test.test_identical_text_removal()
# Run all tests using unittest runner
if __name__ == '__main__':
unittest.main()
# Or run specific test
suite = unittest.TestLoader().loadTestsFromTestCase(TestCombinedCleaner)
unittest.TextTestRunner().run(suite)
Best Practices
- The setUp method is called before each test method, ensuring a fresh CombinedCleaner instance for each test to avoid state pollution
- Tests are independent and can be run in any order without affecting each other
- Each test method focuses on a specific aspect of functionality (single responsibility)
- Test method names clearly describe what is being tested
- Use unittest.main() to run all tests or unittest.TestLoader() for selective test execution
- Assertions verify both the count of results and the presence of expected items
- Tests cover edge cases like multiple identical duplicates and combinations of duplicates with similar texts
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_identical_text_removal 72.3% similar
-
function test_nearly_similar_text_handling 72.0% similar
-
class CombinedCleaner 70.3% similar
-
function test_identical_chunks_with_different_cases 68.3% similar
-
function test_similarity_threshold_effect 68.0% similar