function raw_cleanup_database
Performs raw database cleanup on a SQLite database to identify and fix corrupted chat_session_id values in the text_sections table by converting invalid string representations ('{}', '[]', 'null', '') to NULL.
/tf/active/vicechatdev/vice_ai/raw_database_cleanup.py
8 - 90
moderate
Purpose
This function directly accesses a SQLite database to diagnose and repair data corruption issues with chat_session_id fields. It bypasses ORM models to inspect raw stored values, identifies corrupted entries (empty objects, arrays, or string 'null'), displays diagnostic information about the corruption, and updates corrupted values to proper NULL. This is useful for database maintenance and data integrity verification when ORM-level cleaning might mask underlying storage issues.
Source Code
def raw_cleanup_database():
db_path = '/tf/active/vice_ai/complex_documents.db'
if not os.path.exists(db_path):
print(f"โ Database not found at: {db_path}")
return
print("๐งน Starting RAW database cleanup for corrupted chat_session_id values...")
print("๐ This bypasses models.py cleaning to see actual stored values")
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
try:
# First, let's see ALL chat_session_id values to understand what's actually stored
cursor.execute("""
SELECT id, chat_session_id, LENGTH(COALESCE(chat_session_id, '')) as len
FROM text_sections
ORDER BY id
LIMIT 20
""")
all_rows = cursor.fetchall()
print(f"๐ Sample of stored chat_session_id values (first 20 rows):")
corrupted_ids = []
for section_id, chat_session_id, length in all_rows:
if chat_session_id in ['{}', '[]', 'null', '']:
print(f" ๐ด CORRUPTED: {section_id}: '{chat_session_id}' (len: {length})")
corrupted_ids.append(section_id)
elif chat_session_id is None:
print(f" โช NULL: {section_id}: NULL")
else:
print(f" โ
VALID: {section_id}: '{chat_session_id}' (len: {length})")
# Now check ALL rows for corrupted values
cursor.execute("""
SELECT COUNT(*), GROUP_CONCAT(id) as section_ids
FROM text_sections
WHERE chat_session_id IN ('{}', '[]', 'null', '')
""")
result = cursor.fetchone()
corrupted_count = result[0] if result[0] else 0
section_ids = result[1] if result[1] else ""
print(f"\n๐ Total corrupted chat_session_id values: {corrupted_count}")
if corrupted_count > 0:
print(f"๐ Corrupted section IDs: {section_ids}")
# Fix the corrupted values
print(f"\n๐ง Fixing {corrupted_count} corrupted values...")
cursor.execute("""
UPDATE text_sections
SET chat_session_id = NULL
WHERE chat_session_id IN ('{}', '[]', 'null', '')
""")
rows_updated = cursor.rowcount
conn.commit()
print(f"โ
Updated {rows_updated} rows - set corrupted values to NULL")
# Verify the fix
cursor.execute("""
SELECT COUNT(*)
FROM text_sections
WHERE chat_session_id IN ('{}', '[]', 'null', '')
""")
remaining_corrupted = cursor.fetchone()[0]
if remaining_corrupted == 0:
print("โ
Verification passed - no corrupted values remain")
else:
print(f"โ ๏ธ Warning: {remaining_corrupted} corrupted values still remain")
else:
print("โ
No corrupted values found - database is clean!")
except Exception as e:
print(f"โ Error during cleanup: {e}")
conn.rollback()
finally:
conn.close()
Return Value
This function returns None (implicitly). It performs side effects by printing diagnostic information to stdout and modifying the database. The function prints status messages including: sample data inspection, count of corrupted values, section IDs affected, update results, and verification status.
Dependencies
sqlite3os
Required Imports
import sqlite3
import os
Usage Example
import sqlite3
import os
# Ensure database exists at expected path
db_path = '/tf/active/vice_ai/complex_documents.db'
if os.path.exists(db_path):
# Run the cleanup function
raw_cleanup_database()
# Output will be printed to console showing:
# - Sample of stored values
# - Count of corrupted entries
# - Update results
# - Verification status
else:
print('Database not found')
Best Practices
- Always backup the database before running this cleanup function as it performs UPDATE operations
- This function is hardcoded to a specific database path (/tf/active/vice_ai/complex_documents.db) - ensure this path is correct for your environment
- The function commits changes automatically - there is no dry-run mode
- Review the printed diagnostic output before running in production to understand the scope of corruption
- The function uses GROUP_CONCAT which may have length limits for very large datasets
- Consider running during maintenance windows as it may lock the database table during updates
- The function handles exceptions and performs rollback on errors, but connection is always closed in finally block
- This is a diagnostic/repair tool and should not be part of regular application flow
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function cleanup_corrupted_chat_session_ids 84.2% similar
-
function direct_fix 83.2% similar
-
function check_and_fix_corruption 80.6% similar
-
function check_specific_corruption 79.0% similar
-
function inspect_database 73.5% similar