🔍 Code Extractor

function store_document

Maturity: 44

Thread-safe function that stores document information (file path, text content, metadata) in a global dictionary indexed by user email and document ID.

File:
/tf/active/vicechatdev/vice_ai/app.py
Lines:
94 - 105
Complexity:
simple

Purpose

This function manages document storage for a multi-user Flask application by maintaining a nested dictionary structure where documents are organized by user email. It uses thread locking to ensure safe concurrent access when multiple users upload documents simultaneously. The function creates user entries on-demand and timestamps each document with creation time.

Source Code

def store_document(user_email, document_id, file_path, text_content, metadata):
    """Store document information for a user session"""
    with document_lock:
        if user_email not in uploaded_documents:
            uploaded_documents[user_email] = {}
        
        uploaded_documents[user_email][document_id] = {
            'file_path': file_path,
            'text_content': text_content,
            'metadata': metadata,
            'created_at': datetime.now()
        }

Parameters

Name Type Default Kind
user_email - - positional_or_keyword
document_id - - positional_or_keyword
file_path - - positional_or_keyword
text_content - - positional_or_keyword
metadata - - positional_or_keyword

Parameter Details

user_email: String representing the user's email address, used as the primary key to organize documents by user. Expected to be a valid email string.

document_id: Unique identifier for the document, typically a UUID or hash. Used as the secondary key to retrieve specific documents for a user.

file_path: String containing the file system path where the uploaded document is stored. Should be an absolute or relative path to the document file.

text_content: String containing the extracted text content from the document. This is the parsed/processed text that can be used for search, RAG, or other text processing operations.

metadata: Dictionary or object containing additional information about the document such as filename, file size, upload time, document type, or any custom metadata fields.

Return Value

This function returns None (implicit). It performs an in-place modification of the global 'uploaded_documents' dictionary and does not return any value.

Dependencies

  • datetime
  • threading

Required Imports

from datetime import datetime
from threading import Lock

Usage Example

from datetime import datetime
from threading import Lock

# Initialize required global variables
uploaded_documents = {}
document_lock = Lock()

# Define the function
def store_document(user_email, document_id, file_path, text_content, metadata):
    with document_lock:
        if user_email not in uploaded_documents:
            uploaded_documents[user_email] = {}
        
        uploaded_documents[user_email][document_id] = {
            'file_path': file_path,
            'text_content': text_content,
            'metadata': metadata,
            'created_at': datetime.now()
        }

# Example usage
user_email = 'user@example.com'
document_id = 'doc_12345'
file_path = '/uploads/document.pdf'
text_content = 'This is the extracted text from the document.'
metadata = {'filename': 'document.pdf', 'size': 1024, 'type': 'pdf'}

store_document(user_email, document_id, file_path, text_content, metadata)

# Verify storage
print(uploaded_documents[user_email][document_id])

Best Practices

  • Always initialize 'uploaded_documents' as an empty dictionary and 'document_lock' as a Lock() object in the global scope before using this function
  • Ensure document_id is unique per user to avoid overwriting existing documents
  • Consider implementing a cleanup mechanism to remove old documents and prevent unbounded memory growth
  • The function modifies global state, so be cautious in multi-threaded environments and ensure proper lock usage
  • Consider adding validation for parameters (e.g., checking if user_email is a valid email format, if file_path exists)
  • For production use, consider replacing the in-memory dictionary with a persistent storage solution (database, Redis, etc.)
  • The created_at timestamp uses the server's local time; consider using UTC for consistency across time zones

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function store_uploaded_document 88.6% similar

    Stores uploaded document metadata and content in a thread-safe application state dictionary, organized by user email and document ID.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function get_user_documents 83.9% similar

    Thread-safe function that retrieves all documents associated with a specific user email from a global document storage dictionary.

    From: /tf/active/vicechatdev/vice_ai/app.py
  • function store_uploaded_document_v1 83.2% similar

    Stores an uploaded document in a thread-safe global application state dictionary, organizing documents by user email and document ID with metadata including name, content, file type, and upload timestamp.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function get_uploaded_document_v1 75.4% similar

    Retrieves a specific uploaded document for a given user from a thread-safe global application state dictionary.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function get_uploaded_document 73.8% similar

    Retrieves a specific uploaded document from the application state for a given user and document ID, returning document metadata and content in a thread-safe manner.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
← Back to Browse