function api_upload_document_v1
Flask API endpoint that handles document file uploads, validates file type and size, stores the file temporarily, and extracts basic text content for processing.
/tf/active/vicechatdev/vice_ai/new_app.py
2339 - 2417
moderate
Purpose
This endpoint provides a secure document upload mechanism for a web application. It accepts various document formats (PDF, Word, Excel, PowerPoint, RTF, ODT), validates them against size (10MB max) and type constraints, generates unique identifiers, stores files in temporary directories, and maintains upload metadata in the user's session. The function is designed to be the entry point for document-based workflows where users need to upload files for further processing or analysis.
Source Code
def api_upload_document():
"""Upload and process a document"""
try:
if 'file' not in request.files:
return jsonify({'error': 'No file provided'}), 400
file = request.files['file']
if file.filename == '':
return jsonify({'error': 'No file selected'}), 400
# Validate file type
allowed_extensions = {'.pdf', '.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx', '.rtf', '.odt'}
file_ext = os.path.splitext(file.filename)[1].lower()
if file_ext not in allowed_extensions:
return jsonify({'error': f'File type not supported: {file_ext}'}), 400
# Validate file size (10MB limit)
file.seek(0, os.SEEK_END)
file_size = file.tell()
file.seek(0)
if file_size > 10 * 1024 * 1024: # 10MB
return jsonify({'error': 'File too large (max 10MB)'}), 400
# Generate unique document ID and secure filename
import uuid
from werkzeug.utils import secure_filename
import tempfile
document_id = str(uuid.uuid4())
filename = secure_filename(file.filename)
# Create temp file
temp_dir = tempfile.mkdtemp()
file_path = os.path.join(temp_dir, f"{document_id}_{filename}")
# Save file
file.save(file_path)
# For now, just extract basic text content
# This could be enhanced with DocumentProcessor if needed
try:
if file_ext == '.pdf':
# Basic PDF text extraction
text_content = f"Document content placeholder for {filename}"
else:
# For other document types
text_content = f"Document content placeholder for {filename}"
except Exception as e:
text_content = f"Could not extract text from {filename}"
# Store in session for this user
user_email = get_current_user()
if 'uploaded_documents' not in session:
session['uploaded_documents'] = {}
session['uploaded_documents'][document_id] = {
'id': document_id,
'filename': filename,
'file_path': file_path,
'text_content': text_content,
'size': file_size,
'uploaded_at': datetime.now().isoformat()
}
logger.info(f"✅ Document uploaded successfully: {filename} ({file_size} bytes)")
return jsonify({
'document_id': document_id,
'filename': filename,
'text_content': text_content[:500] + '...' if len(text_content) > 500 else text_content,
'size': file_size,
'text_length': len(text_content)
})
except Exception as e:
logger.error(f"Document upload error: {e}")
return jsonify({'error': 'Failed to process document'}), 500
Return Value
Returns a Flask JSON response tuple. On success (200): {'document_id': str (UUID), 'filename': str (sanitized filename), 'text_content': str (first 500 chars or full content), 'size': int (bytes), 'text_length': int (total characters)}. On error: {'error': str (error message)} with status codes 400 (validation errors like no file, wrong type, too large) or 500 (processing errors).
Dependencies
flaskwerkzeuguuidtempfileosdatetimelogging
Required Imports
from flask import request, jsonify, session
import os
import uuid
from werkzeug.utils import secure_filename
import tempfile
from datetime import datetime
import logging
Conditional/Optional Imports
These imports are only needed under specific conditions:
import uuid
Condition: imported inside function for document ID generation
Required (conditional)from werkzeug.utils import secure_filename
Condition: imported inside function for filename sanitization
Required (conditional)import tempfile
Condition: imported inside function for temporary directory creation
Required (conditional)Usage Example
# Client-side usage (JavaScript fetch example):
const formData = new FormData();
formData.append('file', fileInput.files[0]);
fetch('/api/upload-document', {
method: 'POST',
body: formData,
credentials: 'include'
})
.then(response => response.json())
.then(data => {
console.log('Document ID:', data.document_id);
console.log('Filename:', data.filename);
console.log('Size:', data.size, 'bytes');
console.log('Text preview:', data.text_content);
})
.catch(error => console.error('Upload failed:', error));
# Python requests example:
import requests
with open('document.pdf', 'rb') as f:
files = {'file': f}
response = requests.post(
'http://localhost:5000/api/upload-document',
files=files,
cookies={'session': 'your_session_cookie'}
)
result = response.json()
print(f"Uploaded: {result['document_id']}")
Best Practices
- Always send files as multipart/form-data with the key 'file' in the request
- Ensure user is authenticated before calling this endpoint (require_auth decorator enforces this)
- Maximum file size is 10MB - larger files will be rejected with 400 error
- Supported file types: .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx, .rtf, .odt
- Document metadata is stored in session and will be lost when session expires
- Files are stored in temporary directories - implement cleanup mechanism for production
- The current implementation uses placeholder text extraction - enhance with actual DocumentProcessor for production use
- Document IDs are UUIDs and should be stored client-side for subsequent operations
- The function sanitizes filenames using secure_filename to prevent directory traversal attacks
- Consider implementing virus scanning for uploaded files in production environments
- Temporary files are not automatically cleaned up - implement a cleanup job or use context managers
- Session storage of file paths may not work in distributed environments - consider database storage for production
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function api_upload_document 89.1% similar
-
function api_chat_upload_document 80.8% similar
-
function api_upload 80.8% similar
-
function upload_document 79.2% similar
-
function api_list_documents_v1 74.7% similar