function view_document
Flask route handler that serves documents for in-browser viewing by accepting a file path as a query parameter, validating security constraints, and returning the file with appropriate MIME types and CORS headers.
/tf/active/vicechatdev/docchat/app.py
1426 - 1504
moderate
Purpose
This function provides a secure API endpoint for viewing documents (PDF, text, Office formats) in a web browser. It handles path resolution (both absolute and relative), performs security checks to prevent directory traversal attacks, determines appropriate MIME types based on file extensions, and sets CORS headers to enable embedding in viewers like Google Docs Viewer. The function is designed to work within a document management or Q&A system where users need to view indexed documents.
Source Code
def view_document():
"""
Serve document for viewing in browser
Supports PDF, text files, etc.
File path should be passed as 'path' query parameter
"""
try:
# Get path from query parameter
filepath = request.args.get('path', '')
if not filepath:
return jsonify({'error': 'No path specified'}), 400
logger.info(f"[VIEW_DOCUMENT] Requested path: {filepath}")
# Convert to Path object
file_path = Path(filepath)
# Determine the full path
# If it's already absolute, use as-is
if file_path.is_absolute():
full_path = file_path
logger.info(f"[VIEW_DOCUMENT] Using absolute path: {full_path}")
else:
# Otherwise, prepend DOCUMENTS_DIR
full_path = config.DOCUMENTS_DIR / filepath
logger.info(f"[VIEW_DOCUMENT] Constructed relative path: {full_path}")
# Security check: ensure file exists
if not full_path.exists():
logger.error(f"File not found: {full_path}")
return jsonify({'error': 'File not found'}), 404
# Additional security: check if file is within expected paths
try:
# Allow files from DOCUMENTS_DIR or any parent directory that contains qa_docs
is_safe = (
full_path.resolve().is_relative_to(config.DOCUMENTS_DIR.resolve()) or
'qa_docs' in str(full_path) or
full_path.resolve().is_relative_to(config.BASE_DIR.resolve())
)
if not is_safe:
logger.warning(f"Access denied to file outside allowed paths: {full_path}")
return jsonify({'error': 'Access denied'}), 403
except (ValueError, OSError):
logger.warning(f"Path resolution failed for: {full_path}")
return jsonify({'error': 'Invalid path'}), 400
# Get file extension
ext = full_path.suffix.lower()
# Determine MIME type
mime_types = {
'.pdf': 'application/pdf',
'.txt': 'text/plain',
'.md': 'text/markdown',
'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'.doc': 'application/msword',
'.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'.xls': 'application/vnd.ms-excel',
'.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'.ppt': 'application/vnd.ms-powerpoint'
}
mime_type = mime_types.get(ext, 'application/octet-stream')
# For PDFs, serve directly for inline viewing
# For other formats, may need conversion or download
response = send_file(full_path, mimetype=mime_type, as_attachment=False)
# Add headers to allow embedding in Google Docs Viewer
response.headers['Access-Control-Allow-Origin'] = '*'
response.headers['Access-Control-Allow-Methods'] = 'GET, OPTIONS'
response.headers['Access-Control-Allow-Headers'] = 'Content-Type'
return response
except Exception as e:
logger.error(f"Error serving document: {e}")
return jsonify({'error': str(e)}), 500
Return Value
Returns a Flask Response object containing the requested file with appropriate MIME type and CORS headers for successful requests. For errors, returns a JSON response with an 'error' key and appropriate HTTP status code: 400 (no path/invalid path), 403 (access denied), 404 (file not found), or 500 (server error).
Dependencies
flaskpathlibloggingwerkzeug
Required Imports
from flask import request
from flask import jsonify
from flask import send_file
from pathlib import Path
import logging
import config
Usage Example
# Setup (in main Flask app)
from flask import Flask, request, jsonify, send_file
from pathlib import Path
import logging
import config
app = Flask(__name__)
logger = logging.getLogger(__name__)
# Configure paths in config.py
config.DOCUMENTS_DIR = Path('/path/to/documents')
config.BASE_DIR = Path('/path/to/app')
@app.route('/api/view-document')
def view_document():
# ... function code here ...
pass
# Client-side usage:
# GET request to: http://localhost:5000/api/view-document?path=reports/document.pdf
# Or with absolute path: http://localhost:5000/api/view-document?path=/absolute/path/to/file.pdf
# Example with requests library:
import requests
response = requests.get('http://localhost:5000/api/view-document', params={'path': 'reports/document.pdf'})
if response.status_code == 200:
with open('downloaded_file.pdf', 'wb') as f:
f.write(response.content)
Best Practices
- Always validate and sanitize file paths to prevent directory traversal attacks
- Use Path.resolve() and is_relative_to() for secure path validation
- Set appropriate CORS headers only when necessary for cross-origin access
- Log all file access attempts for security auditing
- Use send_file with as_attachment=False for inline viewing in browser
- Implement proper error handling with appropriate HTTP status codes
- Maintain a whitelist of allowed MIME types for security
- Consider implementing rate limiting to prevent abuse
- Ensure config.DOCUMENTS_DIR and config.BASE_DIR are properly configured before deployment
- The security check allows files from DOCUMENTS_DIR, paths containing 'qa_docs', or BASE_DIR - adjust this logic based on your security requirements
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function upload_document 68.4% similar
-
function api_upload_document_v1 66.8% similar
-
function serve_generated_file 66.5% similar
-
function get_document_v4 64.5% similar
-
function export_document 64.2% similar