🔍 Code Extractor

function api_upload_document_v1

Maturity: 54

Flask API endpoint that handles document file uploads, validates file type and size, stores the file temporarily, and extracts basic text content for processing.

File:
/tf/active/vicechatdev/vice_ai/new_app.py
Lines:
2339 - 2417
Complexity:
moderate

Purpose

This endpoint provides a secure document upload mechanism for a web application. It accepts various document formats (PDF, Word, Excel, PowerPoint, RTF, ODT), validates them against size (10MB max) and type constraints, generates unique identifiers, stores files in temporary directories, and maintains upload metadata in the user's session. The function is designed to be the entry point for document-based workflows where users need to upload files for further processing or analysis.

Source Code

def api_upload_document():
    """Upload and process a document"""
    try:
        if 'file' not in request.files:
            return jsonify({'error': 'No file provided'}), 400
        
        file = request.files['file']
        if file.filename == '':
            return jsonify({'error': 'No file selected'}), 400
        
        # Validate file type
        allowed_extensions = {'.pdf', '.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx', '.rtf', '.odt'}
        file_ext = os.path.splitext(file.filename)[1].lower()
        
        if file_ext not in allowed_extensions:
            return jsonify({'error': f'File type not supported: {file_ext}'}), 400
        
        # Validate file size (10MB limit)
        file.seek(0, os.SEEK_END)
        file_size = file.tell()
        file.seek(0)
        
        if file_size > 10 * 1024 * 1024:  # 10MB
            return jsonify({'error': 'File too large (max 10MB)'}), 400
        
        # Generate unique document ID and secure filename
        import uuid
        from werkzeug.utils import secure_filename
        import tempfile
        
        document_id = str(uuid.uuid4())
        filename = secure_filename(file.filename)
        
        # Create temp file
        temp_dir = tempfile.mkdtemp()
        file_path = os.path.join(temp_dir, f"{document_id}_{filename}")
        
        # Save file
        file.save(file_path)
        
        # For now, just extract basic text content
        # This could be enhanced with DocumentProcessor if needed
        try:
            if file_ext == '.pdf':
                # Basic PDF text extraction
                text_content = f"Document content placeholder for {filename}"
            else:
                # For other document types
                text_content = f"Document content placeholder for {filename}"
        except Exception as e:
            text_content = f"Could not extract text from {filename}"
        
        # Store in session for this user
        user_email = get_current_user()
        if 'uploaded_documents' not in session:
            session['uploaded_documents'] = {}
        
        session['uploaded_documents'][document_id] = {
            'id': document_id,
            'filename': filename,
            'file_path': file_path,
            'text_content': text_content,
            'size': file_size,
            'uploaded_at': datetime.now().isoformat()
        }
        
        logger.info(f"✅ Document uploaded successfully: {filename} ({file_size} bytes)")
        
        return jsonify({
            'document_id': document_id,
            'filename': filename,
            'text_content': text_content[:500] + '...' if len(text_content) > 500 else text_content,
            'size': file_size,
            'text_length': len(text_content)
        })
        
    except Exception as e:
        logger.error(f"Document upload error: {e}")
        return jsonify({'error': 'Failed to process document'}), 500

Return Value

Returns a Flask JSON response tuple. On success (200): {'document_id': str (UUID), 'filename': str (sanitized filename), 'text_content': str (first 500 chars or full content), 'size': int (bytes), 'text_length': int (total characters)}. On error: {'error': str (error message)} with status codes 400 (validation errors like no file, wrong type, too large) or 500 (processing errors).

Dependencies

  • flask
  • werkzeug
  • uuid
  • tempfile
  • os
  • datetime
  • logging

Required Imports

from flask import request, jsonify, session
import os
import uuid
from werkzeug.utils import secure_filename
import tempfile
from datetime import datetime
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

import uuid

Condition: imported inside function for document ID generation

Required (conditional)
from werkzeug.utils import secure_filename

Condition: imported inside function for filename sanitization

Required (conditional)
import tempfile

Condition: imported inside function for temporary directory creation

Required (conditional)

Usage Example

# Client-side usage (JavaScript fetch example):
const formData = new FormData();
formData.append('file', fileInput.files[0]);

fetch('/api/upload-document', {
  method: 'POST',
  body: formData,
  credentials: 'include'
})
.then(response => response.json())
.then(data => {
  console.log('Document ID:', data.document_id);
  console.log('Filename:', data.filename);
  console.log('Size:', data.size, 'bytes');
  console.log('Text preview:', data.text_content);
})
.catch(error => console.error('Upload failed:', error));

# Python requests example:
import requests

with open('document.pdf', 'rb') as f:
    files = {'file': f}
    response = requests.post(
        'http://localhost:5000/api/upload-document',
        files=files,
        cookies={'session': 'your_session_cookie'}
    )
    result = response.json()
    print(f"Uploaded: {result['document_id']}")

Best Practices

  • Always send files as multipart/form-data with the key 'file' in the request
  • Ensure user is authenticated before calling this endpoint (require_auth decorator enforces this)
  • Maximum file size is 10MB - larger files will be rejected with 400 error
  • Supported file types: .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx, .rtf, .odt
  • Document metadata is stored in session and will be lost when session expires
  • Files are stored in temporary directories - implement cleanup mechanism for production
  • The current implementation uses placeholder text extraction - enhance with actual DocumentProcessor for production use
  • Document IDs are UUIDs and should be stored client-side for subsequent operations
  • The function sanitizes filenames using secure_filename to prevent directory traversal attacks
  • Consider implementing virus scanning for uploaded files in production environments
  • Temporary files are not automatically cleaned up - implement a cleanup job or use context managers
  • Session storage of file paths may not work in distributed environments - consider database storage for production

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function api_upload_document 89.1% similar

    Flask API endpoint that handles document upload, validates file type and size, processes the document to extract text content, and stores the document metadata in the system.

    From: /tf/active/vicechatdev/vice_ai/app.py
  • function api_chat_upload_document 80.8% similar

    Flask API endpoint that handles document upload for chat context, processes the document to extract text content, and stores it for later retrieval in chat sessions.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function api_upload 80.8% similar

    Flask API endpoint that handles file uploads, validates file types, saves files to a configured directory structure, and automatically indexes the uploaded document for search/retrieval.

    From: /tf/active/vicechatdev/docchat/app.py
  • function upload_document 79.2% similar

    Flask route handler that processes file uploads, saves them securely to disk, and indexes the document content for retrieval-augmented generation (RAG) search.

    From: /tf/active/vicechatdev/docchat/blueprint.py
  • function api_list_documents_v1 74.7% similar

    Flask API endpoint that retrieves and returns a list of all documents uploaded by the currently authenticated user, including metadata such as filename, size, and creation date.

    From: /tf/active/vicechatdev/vice_ai/app.py
← Back to Browse