🔍 Code Extractor

function view_document

Maturity: 52

Flask route handler that serves documents for in-browser viewing by accepting a file path as a query parameter, validating security constraints, and returning the file with appropriate MIME types and CORS headers.

File:
/tf/active/vicechatdev/docchat/app.py
Lines:
1426 - 1504
Complexity:
moderate

Purpose

This function provides a secure API endpoint for viewing documents (PDF, text, Office formats) in a web browser. It handles path resolution (both absolute and relative), performs security checks to prevent directory traversal attacks, determines appropriate MIME types based on file extensions, and sets CORS headers to enable embedding in viewers like Google Docs Viewer. The function is designed to work within a document management or Q&A system where users need to view indexed documents.

Source Code

def view_document():
    """
    Serve document for viewing in browser
    Supports PDF, text files, etc.
    File path should be passed as 'path' query parameter
    """
    try:
        # Get path from query parameter
        filepath = request.args.get('path', '')
        if not filepath:
            return jsonify({'error': 'No path specified'}), 400
        
        logger.info(f"[VIEW_DOCUMENT] Requested path: {filepath}")
        
        # Convert to Path object
        file_path = Path(filepath)
        
        # Determine the full path
        # If it's already absolute, use as-is
        if file_path.is_absolute():
            full_path = file_path
            logger.info(f"[VIEW_DOCUMENT] Using absolute path: {full_path}")
        else:
            # Otherwise, prepend DOCUMENTS_DIR
            full_path = config.DOCUMENTS_DIR / filepath
            logger.info(f"[VIEW_DOCUMENT] Constructed relative path: {full_path}")
        
        # Security check: ensure file exists
        if not full_path.exists():
            logger.error(f"File not found: {full_path}")
            return jsonify({'error': 'File not found'}), 404
        
        # Additional security: check if file is within expected paths
        try:
            # Allow files from DOCUMENTS_DIR or any parent directory that contains qa_docs
            is_safe = (
                full_path.resolve().is_relative_to(config.DOCUMENTS_DIR.resolve()) or
                'qa_docs' in str(full_path) or
                full_path.resolve().is_relative_to(config.BASE_DIR.resolve())
            )
            if not is_safe:
                logger.warning(f"Access denied to file outside allowed paths: {full_path}")
                return jsonify({'error': 'Access denied'}), 403
        except (ValueError, OSError):
            logger.warning(f"Path resolution failed for: {full_path}")
            return jsonify({'error': 'Invalid path'}), 400
        
        # Get file extension
        ext = full_path.suffix.lower()
        
        # Determine MIME type
        mime_types = {
            '.pdf': 'application/pdf',
            '.txt': 'text/plain',
            '.md': 'text/markdown',
            '.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
            '.doc': 'application/msword',
            '.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
            '.xls': 'application/vnd.ms-excel',
            '.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
            '.ppt': 'application/vnd.ms-powerpoint'
        }
        
        mime_type = mime_types.get(ext, 'application/octet-stream')
        
        # For PDFs, serve directly for inline viewing
        # For other formats, may need conversion or download
        response = send_file(full_path, mimetype=mime_type, as_attachment=False)
        
        # Add headers to allow embedding in Google Docs Viewer
        response.headers['Access-Control-Allow-Origin'] = '*'
        response.headers['Access-Control-Allow-Methods'] = 'GET, OPTIONS'
        response.headers['Access-Control-Allow-Headers'] = 'Content-Type'
        
        return response
        
    except Exception as e:
        logger.error(f"Error serving document: {e}")
        return jsonify({'error': str(e)}), 500

Return Value

Returns a Flask Response object containing the requested file with appropriate MIME type and CORS headers for successful requests. For errors, returns a JSON response with an 'error' key and appropriate HTTP status code: 400 (no path/invalid path), 403 (access denied), 404 (file not found), or 500 (server error).

Dependencies

  • flask
  • pathlib
  • logging
  • werkzeug

Required Imports

from flask import request
from flask import jsonify
from flask import send_file
from pathlib import Path
import logging
import config

Usage Example

# Setup (in main Flask app)
from flask import Flask, request, jsonify, send_file
from pathlib import Path
import logging
import config

app = Flask(__name__)
logger = logging.getLogger(__name__)

# Configure paths in config.py
config.DOCUMENTS_DIR = Path('/path/to/documents')
config.BASE_DIR = Path('/path/to/app')

@app.route('/api/view-document')
def view_document():
    # ... function code here ...
    pass

# Client-side usage:
# GET request to: http://localhost:5000/api/view-document?path=reports/document.pdf
# Or with absolute path: http://localhost:5000/api/view-document?path=/absolute/path/to/file.pdf

# Example with requests library:
import requests
response = requests.get('http://localhost:5000/api/view-document', params={'path': 'reports/document.pdf'})
if response.status_code == 200:
    with open('downloaded_file.pdf', 'wb') as f:
        f.write(response.content)

Best Practices

  • Always validate and sanitize file paths to prevent directory traversal attacks
  • Use Path.resolve() and is_relative_to() for secure path validation
  • Set appropriate CORS headers only when necessary for cross-origin access
  • Log all file access attempts for security auditing
  • Use send_file with as_attachment=False for inline viewing in browser
  • Implement proper error handling with appropriate HTTP status codes
  • Maintain a whitelist of allowed MIME types for security
  • Consider implementing rate limiting to prevent abuse
  • Ensure config.DOCUMENTS_DIR and config.BASE_DIR are properly configured before deployment
  • The security check allows files from DOCUMENTS_DIR, paths containing 'qa_docs', or BASE_DIR - adjust this logic based on your security requirements

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function upload_document 68.4% similar

    Flask route handler that processes file uploads, saves them securely to disk, and indexes the document content for retrieval-augmented generation (RAG) search.

    From: /tf/active/vicechatdev/docchat/blueprint.py
  • function api_upload_document_v1 66.8% similar

    Flask API endpoint that handles document file uploads, validates file type and size, stores the file temporarily, and extracts basic text content for processing.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function serve_generated_file 66.5% similar

    Flask route handler that serves generated files (images, HTML, CSS, JS, etc.) from session-specific directories, with security checks and automatic MIME type detection.

    From: /tf/active/vicechatdev/full_smartstat/app.py
  • function get_document_v4 64.5% similar

    Flask API endpoint that retrieves a specific document with its text and data sections, including optional sharing information, for authenticated users.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function export_document 64.2% similar

    Flask route handler that exports a document in either DOCX or PDF format, verifying user ownership and document access before generating the export file.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
← Back to Browse