🔍 Code Extractor

function upload_document

Maturity: 50

Flask route handler that processes file uploads, saves them securely to disk, and indexes the document content for retrieval-augmented generation (RAG) search.

File:
/tf/active/vicechatdev/docchat/blueprint.py
Lines:
187 - 216
Complexity:
moderate

Purpose

This endpoint handles the complete workflow of document ingestion in a document chat application. It validates incoming file uploads, securely saves files to a configured upload directory, and triggers document indexing to make the content searchable. The function is protected by login authentication and returns JSON responses indicating success or failure with appropriate HTTP status codes.

Source Code

def upload_document():
    """Handle document upload and indexing"""
    try:
        if 'file' not in request.files:
            return jsonify({'error': 'No file provided'}), 400
        
        file = request.files['file']
        if file.filename == '':
            return jsonify({'error': 'No file selected'}), 400
        
        # Save file
        filename = secure_filename(file.filename)
        upload_path = Path(config.UPLOAD_DIR) / filename
        file.save(str(upload_path))
        
        # Index document
        indexer = DocumentIndexer()
        result = indexer.index_single_file(str(upload_path))
        
        if result['success']:
            return jsonify({
                'success': True,
                'message': f'Document indexed: {result["chunks_added"]} chunks added'
            })
        else:
            return jsonify({'error': result.get('error', 'Indexing failed')}), 500
            
    except Exception as e:
        logger.error(f"Error uploading document: {e}", exc_info=True)
        return jsonify({'error': str(e)}), 500

Return Value

Returns a Flask JSON response tuple. On success: ({'success': True, 'message': str}, 200) containing the number of chunks indexed. On validation error: ({'error': str}, 400) for missing/invalid files. On processing error: ({'error': str}, 500) for indexing failures or exceptions. The response is always a JSON object with either 'success' and 'message' keys or an 'error' key.

Dependencies

  • flask
  • flask-login
  • werkzeug
  • pathlib

Required Imports

from flask import Blueprint, request, jsonify
from flask_login import login_required, current_user
from werkzeug.utils import secure_filename
from pathlib import Path
import logging
from config import config
from document_indexer import DocumentIndexer

Usage Example

# In Flask application setup
from flask import Flask
from flask_login import LoginManager
from your_module import docchat_bp

app = Flask(__name__)
app.config['SECRET_KEY'] = 'your-secret-key'
login_manager = LoginManager()
login_manager.init_app(app)

# Register blueprint
app.register_blueprint(docchat_bp)

# Client-side usage (JavaScript fetch example)
const formData = new FormData();
formData.append('file', fileInput.files[0]);

fetch('/api/upload', {
  method: 'POST',
  body: formData,
  credentials: 'include'
})
.then(response => response.json())
.then(data => {
  if (data.success) {
    console.log(data.message); // 'Document indexed: 42 chunks added'
  } else {
    console.error(data.error);
  }
});

# Or using Python requests
import requests
with open('document.pdf', 'rb') as f:
    files = {'file': f}
    response = requests.post('http://localhost:5000/api/upload', files=files, cookies=session_cookies)
    print(response.json())

Best Practices

  • Always use secure_filename() to sanitize uploaded filenames and prevent directory traversal attacks
  • Implement file size limits and file type validation before processing uploads
  • Ensure UPLOAD_DIR exists and has appropriate permissions before deployment
  • Consider implementing virus scanning for uploaded files in production
  • Add rate limiting to prevent abuse of the upload endpoint
  • Implement cleanup mechanisms for failed uploads to prevent disk space issues
  • Log all upload attempts with user information for audit trails
  • Consider using background tasks (Celery, RQ) for indexing large documents to avoid request timeouts
  • Validate file extensions against an allowlist of supported document types
  • Implement proper error messages that don't expose internal system details to users

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function api_upload 87.6% similar

    Flask API endpoint that handles file uploads, validates file types, saves files to a configured directory structure, and automatically indexes the uploaded document for search/retrieval.

    From: /tf/active/vicechatdev/docchat/app.py
  • function api_upload_document 85.7% similar

    Flask API endpoint that handles document upload, validates file type and size, processes the document to extract text content, and stores the document metadata in the system.

    From: /tf/active/vicechatdev/vice_ai/app.py
  • function api_upload_document_v1 79.2% similar

    Flask API endpoint that handles document file uploads, validates file type and size, stores the file temporarily, and extracts basic text content for processing.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function api_chat_upload_document 75.2% similar

    Flask API endpoint that handles document upload for chat context, processes the document to extract text content, and stores it for later retrieval in chat sessions.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function api_delete_chat_uploaded_document 74.0% similar

    Flask API endpoint that deletes a user's uploaded document by document ID, requiring authentication and returning success/error responses.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
← Back to Browse