function test_us_with_thousands
A unit test function that validates the smart_read_csv function's ability to correctly parse US-formatted CSV files containing numbers with thousand separators (commas) and decimal points.
/tf/active/vicechatdev/vice_ai/test_regional_formats.py
118 - 149
simple
Purpose
This test function verifies that the smart_read_csv utility can properly handle US number formatting conventions where commas are used as thousand separators and periods as decimal points. It creates a temporary CSV file with US-formatted numeric data, reads it using smart_read_csv, and asserts that numeric columns are correctly parsed as float types with accurate values. This ensures the CSV parser can distinguish between US and European number formats.
Source Code
def test_us_with_thousands():
"""Test US format with thousand separators"""
print("\n" + "="*60)
print("Test 4: US CSV with thousand separators")
print("="*60)
# Create test CSV with US thousand separators (commas) and decimal points
test_file = Path('/tmp/test_us_thousands.csv')
us_data = """Product,Price,Quantity,Revenue
Widget A,1234.56,100,123456.00
Widget B,2567.89,50,128394.50
Widget C,987.65,200,197530.00"""
test_file.write_text(us_data)
# Read with smart_read_csv
df = smart_read_csv(str(test_file))
print(f"\nLoaded DataFrame:")
print(df)
print(f"\nColumn types:")
print(df.dtypes)
print(f"\nPrice column (should be numeric):")
print(f" Type: {df['Price'].dtype}")
print(f" Values: {df['Price'].tolist()}")
# Verify conversion
assert df['Price'].dtype in ['float64', 'Float64'], f"Price should be numeric, got {df['Price'].dtype}"
assert 1234.0 < df['Price'].iloc[0] < 1235.0, f"Widget A price should be ~1234.56, got {df['Price'].iloc[0]}"
print("\nā US thousands separator test PASSED")
test_file.unlink()
Return Value
This function does not return any value (implicitly returns None). It performs assertions and prints test results to stdout. If assertions fail, it raises an AssertionError.
Dependencies
pandaspathlibsmartstat_service
Required Imports
import pandas as pd
from pathlib import Path
from smartstat_service import smart_read_csv
Usage Example
# Run the test function
test_us_with_thousands()
# Expected output:
# ============================================================
# Test 4: US CSV with thousand separators
# ============================================================
#
# Loaded DataFrame:
# Product Price Quantity Revenue
# 0 Widget A 1234.56 100 123456.00
# 1 Widget B 2567.89 50 128394.50
# 2 Widget C 987.65 200 197530.00
#
# Column types:
# Product object
# Price float64
# Quantity int64
# Revenue float64
# dtype: object
#
# Price column (should be numeric):
# Type: float64
# Values: [1234.56, 2567.89, 987.65]
#
# ā US thousands separator test PASSED
Best Practices
- This is a test function and should be run in a testing environment, not in production code
- The function creates and deletes temporary files in /tmp - ensure proper cleanup even if assertions fail by using try-finally blocks
- The test assumes /tmp directory exists and is writable (Unix/Linux convention)
- Assertions check both data type and value accuracy to ensure proper numeric conversion
- The test file is cleaned up after execution using unlink() to avoid leaving temporary files
- Consider using pytest fixtures or unittest setUp/tearDown methods for better test isolation
- The function prints verbose output for debugging - consider using logging or test framework reporting instead
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function test_european_with_thousands 88.6% similar
-
function test_us_csv 87.9% similar
-
function test_tab_delimited_european 78.1% similar
-
function test_european_csv 76.5% similar
-
function main_v67 66.4% similar