raw_cleanup_database - Code Extractor

function raw_cleanup_database

Maturity: 36

Performs raw database cleanup on a SQLite database to identify and fix corrupted chat_session_id values in the text_sections table by converting invalid string representations ('{}', '[]', 'null', '') to NULL.

File:
/tf/active/vicechatdev/vice_ai/raw_database_cleanup.py

Lines:
8 - 90

Complexity:
moderate

Purpose

This function directly accesses a SQLite database to diagnose and repair data corruption issues with chat_session_id fields. It bypasses ORM models to inspect raw stored values, identifies corrupted entries (empty objects, arrays, or string 'null'), displays diagnostic information about the corruption, and updates corrupted values to proper NULL. This is useful for database maintenance and data integrity verification when ORM-level cleaning might mask underlying storage issues.

Source Code

def raw_cleanup_database():
    db_path = '/tf/active/vice_ai/complex_documents.db'
    
    if not os.path.exists(db_path):
        print(f"❌ Database not found at: {db_path}")
        return
    
    print("🧹 Starting RAW database cleanup for corrupted chat_session_id values...")
    print("📍 This bypasses models.py cleaning to see actual stored values")
    
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    try:
        # First, let's see ALL chat_session_id values to understand what's actually stored
        cursor.execute("""
            SELECT id, chat_session_id, LENGTH(COALESCE(chat_session_id, '')) as len
            FROM text_sections 
            ORDER BY id
            LIMIT 20
        """)
        
        all_rows = cursor.fetchall()
        print(f"📊 Sample of stored chat_session_id values (first 20 rows):")
        
        corrupted_ids = []
        for section_id, chat_session_id, length in all_rows:
            if chat_session_id in ['{}', '[]', 'null', '']:
                print(f"   🔴 CORRUPTED: {section_id}: '{chat_session_id}' (len: {length})")
                corrupted_ids.append(section_id)
            elif chat_session_id is None:
                print(f"   ⚪ NULL: {section_id}: NULL")
            else:
                print(f"   ✅ VALID: {section_id}: '{chat_session_id}' (len: {length})")
        
        # Now check ALL rows for corrupted values
        cursor.execute("""
            SELECT COUNT(*), GROUP_CONCAT(id) as section_ids
            FROM text_sections 
            WHERE chat_session_id IN ('{}', '[]', 'null', '')
        """)
        
        result = cursor.fetchone()
        corrupted_count = result[0] if result[0] else 0
        section_ids = result[1] if result[1] else ""
        
        print(f"\n📈 Total corrupted chat_session_id values: {corrupted_count}")
        if corrupted_count > 0:
            print(f"📋 Corrupted section IDs: {section_ids}")
            
            # Fix the corrupted values
            print(f"\n🔧 Fixing {corrupted_count} corrupted values...")
            cursor.execute("""
                UPDATE text_sections 
                SET chat_session_id = NULL 
                WHERE chat_session_id IN ('{}', '[]', 'null', '')
            """)
            
            rows_updated = cursor.rowcount
            conn.commit()
            
            print(f"✅ Updated {rows_updated} rows - set corrupted values to NULL")
            
            # Verify the fix
            cursor.execute("""
                SELECT COUNT(*) 
                FROM text_sections 
                WHERE chat_session_id IN ('{}', '[]', 'null', '')
            """)
            
            remaining_corrupted = cursor.fetchone()[0]
            if remaining_corrupted == 0:
                print("✅ Verification passed - no corrupted values remain")
            else:
                print(f"⚠️ Warning: {remaining_corrupted} corrupted values still remain")
        else:
            print("✅ No corrupted values found - database is clean!")
        
    except Exception as e:
        print(f"❌ Error during cleanup: {e}")
        conn.rollback()
    finally:
        conn.close()

Return Value

This function returns None (implicitly). It performs side effects by printing diagnostic information to stdout and modifying the database. The function prints status messages including: sample data inspection, count of corrupted values, section IDs affected, update results, and verification status.

Dependencies

sqlite3
os

Required Imports

import sqlite3
import os

Usage Example

import sqlite3
import os

# Ensure database exists at expected path
db_path = '/tf/active/vice_ai/complex_documents.db'
if os.path.exists(db_path):
    # Run the cleanup function
    raw_cleanup_database()
    # Output will be printed to console showing:
    # - Sample of stored values
    # - Count of corrupted entries
    # - Update results
    # - Verification status
else:
    print('Database not found')

Best Practices

Always backup the database before running this cleanup function as it performs UPDATE operations
This function is hardcoded to a specific database path (/tf/active/vice_ai/complex_documents.db) - ensure this path is correct for your environment
The function commits changes automatically - there is no dry-run mode
Review the printed diagnostic output before running in production to understand the scope of corruption
The function uses GROUP_CONCAT which may have length limits for very large datasets
Consider running during maintenance windows as it may lock the database table during updates
The function handles exceptions and performs rollback on errors, but connection is always closed in finally block
This is a diagnostic/repair tool and should not be part of regular application flow

Similar Components

AI-powered semantic similarity - components with related functionality:

function cleanup_corrupted_chat_session_ids 84.2% similar

Database maintenance function that identifies and fixes corrupted chat_session_id values in the text_sections table by replacing invalid string representations with NULL.
From: /tf/active/vicechatdev/vice_ai/cleanup_database.py
function direct_fix 83.2% similar

A database maintenance function that detects and fixes corrupted chat_session_id values in a SQLite database's text_sections table by identifying invalid patterns and setting them to NULL.
From: /tf/active/vicechatdev/vice_ai/direct_sqlite_fix.py
function check_and_fix_corruption 80.6% similar

Scans a SQLite database for corrupted chat_session_id values in the text_sections table and automatically fixes them by setting invalid entries to NULL.
From: /tf/active/vicechatdev/vice_ai/direct_corruption_checker.py
function check_specific_corruption 79.0% similar

Detects and fixes specific corruption patterns in the chat_session_id column of a SQLite database's text_sections table, replacing invalid values with NULL.
From: /tf/active/vicechatdev/vice_ai/check_specific_corruption.py
function inspect_database 73.5% similar

Inspects a SQLite database at a hardcoded path, examining table structure, row counts, and identifying potentially corrupted chat_session_id values in the text_sections table.
From: /tf/active/vicechatdev/vice_ai/database_inspector.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def raw_cleanup_database():
    db_path = '/tf/active/vice_ai/complex_documents.db'
    
    if not os.path.exists(db_path):
        print(f"❌ Database not found at: {db_path}")
        return
    
    print("🧹 Starting RAW database cleanup for corrupted chat_session_id values...")
    print("📍 This bypasses models.py cleaning to see actual stored values")
    
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    try:
        # First, let's see ALL chat_session_id values to understand what's actually stored
        cursor.execute("""
            SELECT id, chat_session_id, LENGTH(COALESCE(chat_session_id, '')) as len
            FROM text_sections 
            ORDER BY id
            LIMIT 20
        """)
        
        all_rows = cursor.fetchall()
        print(f"📊 Sample of stored chat_session_id values (first 20 rows):")
        
        corrupted_ids = []
        for section_id, chat_session_id, length in all_rows:
            if chat_session_id in ['{}', '[]', 'null', '']:
                print(f"   🔴 CORRUPTED: {section_id}: '{chat_session_id}' (len: {length})")
                corrupted_ids.append(section_id)
            elif chat_session_id is None:
                print(f"   ⚪ NULL: {section_id}: NULL")
            else:
                print(f"   ✅ VALID: {section_id}: '{chat_session_id}' (len: {length})")
        
        # Now check ALL rows for corrupted values
        cursor.execute("""
            SELECT COUNT(*), GROUP_CONCAT(id) as section_ids
            FROM text_sections 
            WHERE chat_session_id IN ('{}', '[]', 'null', '')
        """)
        
        result = cursor.fetchone()
        corrupted_count = result[0] if result[0] else 0
        section_ids = result[1] if result[1] else ""
        
        print(f"\n📈 Total corrupted chat_session_id values: {corrupted_count}")
        if corrupted_count > 0:
            print(f"📋 Corrupted section IDs: {section_ids}")
            
            # Fix the corrupted values
            print(f"\n🔧 Fixing {corrupted_count} corrupted values...")
            cursor.execute("""
                UPDATE text_sections 
                SET chat_session_id = NULL 
                WHERE chat_session_id IN ('{}', '[]', 'null', '')
            """)
            
            rows_updated = cursor.rowcount
            conn.commit()
            
            print(f"✅ Updated {rows_updated} rows - set corrupted values to NULL")
            
            # Verify the fix
            cursor.execute("""
                SELECT COUNT(*) 
                FROM text_sections 
                WHERE chat_session_id IN ('{}', '[]', 'null', '')
            """)
            
            remaining_corrupted = cursor.fetchone()[0]
            if remaining_corrupted == 0:
                print("✅ Verification passed - no corrupted values remain")
            else:
                print(f"⚠️ Warning: {remaining_corrupted} corrupted values still remain")
        else:
            print("✅ No corrupted values found - database is clean!")
        
    except Exception as e:
        print(f"❌ Error during cleanup: {e}")
        conn.rollback()
    finally:
        conn.close()
                        

Improved Code

🔍 Code Extractor

function raw_cleanup_database

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function cleanup_corrupted_chat_session_ids 84.2% similar

function direct_fix 83.2% similar

function check_and_fix_corruption 80.6% similar

function check_specific_corruption 79.0% similar

function inspect_database 73.5% similar

function raw_cleanup_database

Purpose

Source Code

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function cleanup_corrupted_chat_session_ids 84.2% similar

function direct_fix 83.2% similar

function check_and_fix_corruption 80.6% similar

function check_specific_corruption 79.0% similar

function inspect_database 73.5% similar

✨ Improve Code: raw_cleanup_database

Code Comparison