🔍 Code Extractor

function api_upload

Maturity: 52

Flask API endpoint that handles file uploads, validates file types, saves files to a configured directory structure, and automatically indexes the uploaded document for search/retrieval.

File:
/tf/active/vicechatdev/docchat/app.py
Lines:
1198 - 1266
Complexity:
moderate

Purpose

This endpoint provides a secure file upload mechanism for a document management system with RAG (Retrieval-Augmented Generation) capabilities. It validates uploaded files against supported extensions, organizes them into optional folder structures, prevents duplicate uploads, and automatically indexes documents for semantic search. The endpoint is protected by login_required decorator and returns detailed JSON responses about upload and indexing status.

Source Code

def api_upload():
    """Upload and index a document"""
    try:
        if 'file' not in request.files:
            return jsonify({'error': 'No file provided'}), 400
        
        file = request.files['file']
        
        if file.filename == '':
            return jsonify({'error': 'Empty filename'}), 400
        
        # Check file extension
        file_ext = Path(file.filename).suffix.lower()
        if file_ext not in config.SUPPORTED_EXTENSIONS:
            return jsonify({
                'error': f'Unsupported file type. Supported: {", ".join(config.SUPPORTED_EXTENSIONS)}'
            }), 400
        
        # Get optional folder path (relative to qa_docs)
        folder_path = request.form.get('folder', '').strip()
        
        # Determine save location
        if folder_path:
            # Clean and validate folder path
            folder_path = folder_path.replace('\\', '/').strip('/')
            # Save to qa_docs subfolder
            target_dir = config.DOCUMENT_DIR / folder_path
        else:
            # Save to qa_docs root
            target_dir = config.DOCUMENT_DIR
        
        # Create directory if it doesn't exist
        target_dir.mkdir(parents=True, exist_ok=True)
        
        # Save file with secure filename
        filename = secure_filename(file.filename)
        filepath = target_dir / filename
        
        # Check if file already exists
        if filepath.exists():
            return jsonify({'error': f'File "{filename}" already exists in this folder'}), 400
        
        file.save(str(filepath))
        logger.info(f"Uploaded file: {filepath.relative_to(config.DOCUMENT_DIR)}")
        
        # Auto-index the document
        if not document_indexer:
            return jsonify({'error': 'Document indexer not initialized'}), 500
        
        result = document_indexer.index_document(filepath)
        
        if result.get('success'):
            return jsonify({
                'message': f'Successfully uploaded and indexed {filename}',
                'doc_id': result['doc_id'],
                'num_chunks': result['num_chunks'],
                'path': str(filepath.relative_to(config.DOCUMENT_DIR))
            })
        else:
            # If indexing fails, still report success for upload but mention indexing issue
            return jsonify({
                'message': f'Uploaded {filename} but indexing failed',
                'error': result.get('error', 'Unknown indexing error'),
                'path': str(filepath.relative_to(config.DOCUMENT_DIR))
            }), 206  # 206 Partial Content
    
    except Exception as e:
        logger.error(f"Upload error: {e}")
        return jsonify({'error': str(e)}), 500

Return Value

Returns a JSON response tuple with (data, status_code). On success (200): {'message': str, 'doc_id': str, 'num_chunks': int, 'path': str}. On partial success (206): {'message': str, 'error': str, 'path': str} when upload succeeds but indexing fails. On error (400/500): {'error': str} with appropriate HTTP status code. Status codes: 200 (full success), 206 (partial success), 400 (validation error), 500 (server error).

Dependencies

  • flask
  • werkzeug
  • pathlib
  • logging
  • config (custom module)
  • document_indexer (custom module)

Required Imports

from flask import request, jsonify
from werkzeug.utils import secure_filename
from pathlib import Path
import logging
import config
from document_indexer import DocumentIndexer

Usage Example

# Client-side usage example (using requests library)
import requests

# Upload file to root directory
with open('document.pdf', 'rb') as f:
    files = {'file': f}
    response = requests.post(
        'http://localhost:5000/api/upload',
        files=files,
        cookies={'session': 'your_session_cookie'}
    )
    print(response.json())

# Upload file to specific folder
with open('report.docx', 'rb') as f:
    files = {'file': f}
    data = {'folder': 'reports/2024'}
    response = requests.post(
        'http://localhost:5000/api/upload',
        files=files,
        data=data,
        cookies={'session': 'your_session_cookie'}
    )
    result = response.json()
    if response.status_code == 200:
        print(f"Uploaded: {result['doc_id']}, Chunks: {result['num_chunks']}")

Best Practices

  • Always use secure_filename() to sanitize uploaded filenames to prevent directory traversal attacks
  • Check file existence before saving to prevent accidental overwrites
  • Validate file extensions against a whitelist (config.SUPPORTED_EXTENSIONS) to prevent malicious uploads
  • Use Path objects for cross-platform file path handling
  • Create directories with parents=True and exist_ok=True to handle nested folder structures safely
  • Return appropriate HTTP status codes: 200 for success, 206 for partial success, 400 for client errors, 500 for server errors
  • Log all upload operations for audit trails and debugging
  • Handle indexing failures gracefully - still report upload success even if indexing fails
  • Ensure document_indexer is initialized before accepting uploads
  • Normalize folder paths by replacing backslashes and stripping leading/trailing slashes
  • Use relative paths in responses to avoid exposing absolute server paths

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function api_upload_document 87.7% similar

    Flask API endpoint that handles document upload, validates file type and size, processes the document to extract text content, and stores the document metadata in the system.

    From: /tf/active/vicechatdev/vice_ai/app.py
  • function upload_document 87.6% similar

    Flask route handler that processes file uploads, saves them securely to disk, and indexes the document content for retrieval-augmented generation (RAG) search.

    From: /tf/active/vicechatdev/docchat/blueprint.py
  • function api_upload_document_v1 80.8% similar

    Flask API endpoint that handles document file uploads, validates file type and size, stores the file temporarily, and extracts basic text content for processing.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function api_delete_chat_uploaded_document 70.8% similar

    Flask API endpoint that deletes a user's uploaded document by document ID, requiring authentication and returning success/error responses.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function api_list_documents_v1 69.5% similar

    Flask API endpoint that retrieves and returns a list of all documents uploaded by the currently authenticated user, including metadata such as filename, size, and creation date.

    From: /tf/active/vicechatdev/vice_ai/app.py
← Back to Browse