🔍 Code Extractor

function load_document_from_file

Maturity: 44

Loads a document from a JSON file stored in a documents directory, deserializes it into a ComplexDocument object, and returns it.

File:
/tf/active/vicechatdev/vice_ai/complex_app.py
Lines:
85 - 97
Complexity:
simple

Purpose

This function retrieves persisted document data from the file system. It reads a JSON file identified by a document ID, deserializes the JSON data into a ComplexDocument object using the from_dict method, and handles errors gracefully. It's used for document persistence and retrieval in a document management system, likely part of a Flask web application with RAG (Retrieval-Augmented Generation) capabilities.

Source Code

def load_document_from_file(doc_id):
    """Load document from file"""
    try:
        file_path = os.path.join(DOCUMENTS_DIR, f"{doc_id}.json")
        if os.path.exists(file_path):
            with open(file_path, 'r') as f:
                data = json.load(f)
            document = ComplexDocument.from_dict(data)
            logger.info(f"📂 Document loaded from file: {doc_id}")
            return document
    except Exception as e:
        logger.error(f"❌ Failed to load document {doc_id}: {e}")
    return None

Parameters

Name Type Default Kind
doc_id - - positional_or_keyword

Parameter Details

doc_id: A unique identifier for the document to be loaded. This is used to construct the filename as '{doc_id}.json' in the DOCUMENTS_DIR directory. Expected to be a string, typically a UUID or other unique identifier.

Return Value

Returns a ComplexDocument object if the file exists and is successfully loaded and deserialized. Returns None if the file doesn't exist, if there's an error during loading/parsing, or if deserialization fails. The ComplexDocument type is a custom class that must have a from_dict class method for deserialization.

Dependencies

  • os
  • json
  • logging

Required Imports

import os
import json
import logging

Usage Example

import os
import json
import logging

# Setup
DOCUMENTS_DIR = './documents'
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())

# Assuming ComplexDocument class exists
class ComplexDocument:
    def __init__(self, id, content):
        self.id = id
        self.content = content
    
    @classmethod
    def from_dict(cls, data):
        return cls(data['id'], data['content'])

# Create documents directory if it doesn't exist
os.makedirs(DOCUMENTS_DIR, exist_ok=True)

# Use the function
doc_id = 'doc_12345'
document = load_document_from_file(doc_id)

if document:
    print(f'Successfully loaded document: {document.id}')
else:
    print('Document not found or failed to load')

Best Practices

  • Ensure DOCUMENTS_DIR exists and is writable before calling this function
  • The ComplexDocument class must implement a from_dict() class method that can reconstruct the object from a dictionary
  • Handle the None return value appropriately in calling code to account for missing or corrupted files
  • Consider implementing file locking if multiple processes/threads might access the same document files concurrently
  • The function silently returns None on errors - check logs for detailed error information
  • Validate doc_id to prevent directory traversal attacks if it comes from user input
  • Consider adding file validation (e.g., JSON schema validation) before deserialization for production use

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function load_all_documents 72.9% similar

    Loads all JSON documents from a designated documents directory and returns them as a dictionary indexed by document ID.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function save_document_to_file 64.8% similar

    Persists a document object to the filesystem as a JSON file, using the document's ID as the filename.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function get_document_v7 61.5% similar

    Retrieves a document by its ID from an in-memory cache or loads it from persistent storage if not cached.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function load_session_from_disk 61.2% similar

    Loads a session from disk storage by reading a JSON file identified by session_id, deserializing the data, and converting timestamp strings back to datetime objects.

    From: /tf/active/vicechatdev/docchat/app.py
  • function load_chat_session_from_file 59.6% similar

    Loads a chat session from a JSON file stored in the CHAT_SESSIONS_DIR directory using the provided session_id as the filename.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
← Back to Browse