🔍 Code Extractor

function api_chat_upload_document

Maturity: 56

Flask API endpoint that handles document upload for chat context, processes the document to extract text content, and stores it for later retrieval in chat sessions.

File:
/tf/active/vicechatdev/vice_ai/complex_app.py
Lines:
2259 - 2331
Complexity:
complex

Purpose

This endpoint enables users to upload documents (PDF, DOCX, TXT, etc.) that will be processed and stored for use as context in chat conversations. It validates the uploaded file, extracts text content using a document processor, generates a unique document ID, and stores the processed content associated with the authenticated user's email.

Source Code

def api_chat_upload_document():
    """Upload document for chat context"""
    try:
        if 'file' not in request.files:
            return jsonify({'error': 'No file provided'}), 400
        
        file = request.files['file']
        if file.filename == '':
            return jsonify({'error': 'No file selected'}), 400
        
        user_email = get_user_email()
        if not user_email:
            return jsonify({'error': 'User not authenticated'}), 401
        
        # Generate unique document ID
        document_id = str(uuid.uuid4())
        
        try:
            # Process the document using the document processor
            if not document_processor:
                return jsonify({'error': 'Document processor not available'}), 500
            
            # Save file content to temporary file for processing
            with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)[1]) as temp_file:
                file_content = file.read()
                temp_file.write(file_content)
                temp_file.flush()
                
                try:
                    # Process the document
                    processed_result = document_processor.process_document(temp_file.name)
                    
                    if 'error' in processed_result:
                        return jsonify({'error': f'Document processing failed: {processed_result["error"]}'}), 400
                    
                    # Extract combined text content
                    extracted_content = document_processor.get_combined_text(processed_result)
                    
                    if not extracted_content or not extracted_content.strip():
                        return jsonify({'error': 'Could not extract text from document'}), 400
                    
                finally:
                    # Clean up temp file
                    try:
                        os.unlink(temp_file.name)
                    except:
                        pass
            
            # Store the document
            store_uploaded_document(
                user_email=user_email,
                document_id=document_id,
                name=file.filename,
                content=extracted_content,
                file_type=file.content_type or 'application/octet-stream'
            )
            
            return jsonify({
                'document_id': document_id,
                'name': file.filename,
                'size': len(file_content),
                'type': file.content_type,
                'content_length': len(extracted_content),
                'message': 'Document uploaded successfully'
            })
            
        except Exception as e:
            logger.error(f"Document processing error: {e}")
            return jsonify({'error': f'Failed to process document: {str(e)}'}), 500
            
    except Exception as e:
        logger.error(f"Upload document error: {e}")
        return jsonify({'error': 'Failed to upload document'}), 500

Return Value

Returns a JSON response with HTTP status code. On success (200): {'document_id': str, 'name': str, 'size': int, 'type': str, 'content_length': int, 'message': str}. On error: {'error': str} with status codes 400 (bad request/validation failure), 401 (authentication failure), or 500 (server/processing error).

Dependencies

  • flask
  • uuid
  • os
  • tempfile
  • logging

Required Imports

from flask import request, jsonify
import uuid
import os
import tempfile

Conditional/Optional Imports

These imports are only needed under specific conditions:

from document_processor import DocumentProcessor

Condition: Required for document processing functionality; must be available as 'document_processor' instance

Required (conditional)
import logging

Condition: Required for error logging via 'logger' instance

Required (conditional)

Usage Example

# Client-side usage example (JavaScript fetch):
const formData = new FormData();
formData.append('file', fileInput.files[0]);

fetch('/api/chat-upload-document', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ' + authToken
  },
  body: formData
})
.then(response => response.json())
.then(data => {
  if (data.error) {
    console.error('Upload failed:', data.error);
  } else {
    console.log('Document uploaded:', data.document_id);
    console.log('Extracted content length:', data.content_length);
  }
})
.catch(error => console.error('Error:', error));

# Server-side context:
# This function is called automatically by Flask when POST request is made to /api/chat-upload-document
# Ensure document_processor and store_uploaded_document are properly initialized before use

Best Practices

  • Always validate file presence and filename before processing
  • Use temporary files with proper cleanup (try-finally blocks) to avoid disk space issues
  • Implement proper error handling at multiple levels (file validation, processing, storage)
  • Log errors with sufficient context for debugging
  • Validate extracted content is not empty before storing
  • Generate unique document IDs using UUID to prevent collisions
  • Clean up temporary files even if processing fails
  • Return appropriate HTTP status codes for different error scenarios
  • Verify user authentication before processing uploads
  • Check document_processor availability before attempting to use it
  • Consider implementing file size limits to prevent resource exhaustion
  • Consider implementing file type validation based on content, not just extension
  • Store both original file metadata and extracted content for reference

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function api_get_chat_uploaded_documents 84.1% similar

    Flask API endpoint that retrieves a list of documents uploaded by the authenticated user for chat functionality, returning document metadata without full content.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function api_upload_document_v1 80.8% similar

    Flask API endpoint that handles document file uploads, validates file type and size, stores the file temporarily, and extracts basic text content for processing.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function api_upload_document 78.9% similar

    Flask API endpoint that handles document upload, validates file type and size, processes the document to extract text content, and stores the document metadata in the system.

    From: /tf/active/vicechatdev/vice_ai/app.py
  • function api_delete_chat_uploaded_document 78.8% similar

    Flask API endpoint that deletes a user's uploaded document by document ID, requiring authentication and returning success/error responses.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function upload_document 75.2% similar

    Flask route handler that processes file uploads, saves them securely to disk, and indexes the document content for retrieval-augmented generation (RAG) search.

    From: /tf/active/vicechatdev/docchat/blueprint.py
← Back to Browse