function api_upload
Flask API endpoint that handles file uploads, validates file types, saves files to a configured directory structure, and automatically indexes the uploaded document for search/retrieval.
/tf/active/vicechatdev/docchat/app.py
1198 - 1266
moderate
Purpose
This endpoint provides a secure file upload mechanism for a document management system with RAG (Retrieval-Augmented Generation) capabilities. It validates uploaded files against supported extensions, organizes them into optional folder structures, prevents duplicate uploads, and automatically indexes documents for semantic search. The endpoint is protected by login_required decorator and returns detailed JSON responses about upload and indexing status.
Source Code
def api_upload():
"""Upload and index a document"""
try:
if 'file' not in request.files:
return jsonify({'error': 'No file provided'}), 400
file = request.files['file']
if file.filename == '':
return jsonify({'error': 'Empty filename'}), 400
# Check file extension
file_ext = Path(file.filename).suffix.lower()
if file_ext not in config.SUPPORTED_EXTENSIONS:
return jsonify({
'error': f'Unsupported file type. Supported: {", ".join(config.SUPPORTED_EXTENSIONS)}'
}), 400
# Get optional folder path (relative to qa_docs)
folder_path = request.form.get('folder', '').strip()
# Determine save location
if folder_path:
# Clean and validate folder path
folder_path = folder_path.replace('\\', '/').strip('/')
# Save to qa_docs subfolder
target_dir = config.DOCUMENT_DIR / folder_path
else:
# Save to qa_docs root
target_dir = config.DOCUMENT_DIR
# Create directory if it doesn't exist
target_dir.mkdir(parents=True, exist_ok=True)
# Save file with secure filename
filename = secure_filename(file.filename)
filepath = target_dir / filename
# Check if file already exists
if filepath.exists():
return jsonify({'error': f'File "{filename}" already exists in this folder'}), 400
file.save(str(filepath))
logger.info(f"Uploaded file: {filepath.relative_to(config.DOCUMENT_DIR)}")
# Auto-index the document
if not document_indexer:
return jsonify({'error': 'Document indexer not initialized'}), 500
result = document_indexer.index_document(filepath)
if result.get('success'):
return jsonify({
'message': f'Successfully uploaded and indexed {filename}',
'doc_id': result['doc_id'],
'num_chunks': result['num_chunks'],
'path': str(filepath.relative_to(config.DOCUMENT_DIR))
})
else:
# If indexing fails, still report success for upload but mention indexing issue
return jsonify({
'message': f'Uploaded {filename} but indexing failed',
'error': result.get('error', 'Unknown indexing error'),
'path': str(filepath.relative_to(config.DOCUMENT_DIR))
}), 206 # 206 Partial Content
except Exception as e:
logger.error(f"Upload error: {e}")
return jsonify({'error': str(e)}), 500
Return Value
Returns a JSON response tuple with (data, status_code). On success (200): {'message': str, 'doc_id': str, 'num_chunks': int, 'path': str}. On partial success (206): {'message': str, 'error': str, 'path': str} when upload succeeds but indexing fails. On error (400/500): {'error': str} with appropriate HTTP status code. Status codes: 200 (full success), 206 (partial success), 400 (validation error), 500 (server error).
Dependencies
flaskwerkzeugpathlibloggingconfig (custom module)document_indexer (custom module)
Required Imports
from flask import request, jsonify
from werkzeug.utils import secure_filename
from pathlib import Path
import logging
import config
from document_indexer import DocumentIndexer
Usage Example
# Client-side usage example (using requests library)
import requests
# Upload file to root directory
with open('document.pdf', 'rb') as f:
files = {'file': f}
response = requests.post(
'http://localhost:5000/api/upload',
files=files,
cookies={'session': 'your_session_cookie'}
)
print(response.json())
# Upload file to specific folder
with open('report.docx', 'rb') as f:
files = {'file': f}
data = {'folder': 'reports/2024'}
response = requests.post(
'http://localhost:5000/api/upload',
files=files,
data=data,
cookies={'session': 'your_session_cookie'}
)
result = response.json()
if response.status_code == 200:
print(f"Uploaded: {result['doc_id']}, Chunks: {result['num_chunks']}")
Best Practices
- Always use secure_filename() to sanitize uploaded filenames to prevent directory traversal attacks
- Check file existence before saving to prevent accidental overwrites
- Validate file extensions against a whitelist (config.SUPPORTED_EXTENSIONS) to prevent malicious uploads
- Use Path objects for cross-platform file path handling
- Create directories with parents=True and exist_ok=True to handle nested folder structures safely
- Return appropriate HTTP status codes: 200 for success, 206 for partial success, 400 for client errors, 500 for server errors
- Log all upload operations for audit trails and debugging
- Handle indexing failures gracefully - still report upload success even if indexing fails
- Ensure document_indexer is initialized before accepting uploads
- Normalize folder paths by replacing backslashes and stripping leading/trailing slashes
- Use relative paths in responses to avoid exposing absolute server paths
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function api_upload_document 87.7% similar
-
function upload_document 87.6% similar
-
function api_upload_document_v1 80.8% similar
-
function api_delete_chat_uploaded_document 70.8% similar
-
function api_list_documents_v1 69.5% similar