🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "PDF"

Found 50 matching component(s)

  • function create_folder_hierarchy_v2

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.

    File: /tf/active/vicechatdev/offline_parser_docstore.py

    neo4j graph-database hierarchy folder-structure file-system
  • class RegulatoryExtractor

    A class for extracting structured metadata from regulatory guideline PDF documents using LLM-based analysis and storing the results in an Excel tracking spreadsheet.

    File: /tf/active/vicechatdev/reg_extractor.py

    pdf-extraction regulatory-documents llm-extraction ocr data-extraction
  • class OneCo_hybrid_RAG

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    class oneco_hybrid_rag
  • class FixedProjectVictoriaGenerator

    Fixed Project Victoria Disclosure Generator that properly handles all warranty sections.

    File: /tf/active/vicechatdev/fixed_project_victoria_generator.py

    class fixedprojectvictoriagenerator
  • function publish_document

    Publishes an approved controlled document by converting it to PDF with signatures and audit trail, uploading to FileCloud, and updating the document status to PUBLISHED.

    File: /tf/active/vicechatdev/document_controller_backup.py

    document-management publishing pdf-conversion audit-trail controlled-documents
  • function get_document_download_url

    Retrieves a download URL for a controlled document, automatically selecting between editable (Word) and PDF formats based on document status or explicit request.

    File: /tf/active/vicechatdev/document_controller_backup.py

    document-management file-download url-generation version-control filecloud
  • function convert_document_to_pdf_v1

    Converts a document version from an editable format (e.g., Word) to PDF without changing the document's status, uploading the result to FileCloud and updating the version record.

    File: /tf/active/vicechatdev/document_controller_backup.py

    document-conversion pdf-generation filecloud controlled-documents document-management
  • function archive_document_v1

    Archives a controlled document by changing its status to ARCHIVED, ensuring a PDF version exists and logging the action with audit trail and notifications.

    File: /tf/active/vicechatdev/document_controller_backup.py

    document-management lifecycle archive controlled-documents audit-trail
  • function update_document

    Updates properties of a controlled document including title, description, status, owner, and metadata, with special handling for status transitions that require format conversions or publishing workflows.

    File: /tf/active/vicechatdev/document_controller_backup.py

    document-management update controlled-document status-transition audit-trail
  • function _view_document

    Views and downloads the current version of a document, with special handling for FileCloud-stored documents versus locally stored documents.

    File: /tf/active/vicechatdev/document_controller_backup.py

    document-management file-download filecloud storage panel
  • function _download_current_version

    Downloads the current version of a document from either FileCloud storage or standard storage, handling different storage types and triggering a browser download.

    File: /tf/active/vicechatdev/document_controller_backup.py

    document-management file-download filecloud storage panel
  • class DocumentDetail

    Document detail view component

    File: /tf/active/vicechatdev/document_detail_backup.py

    class documentdetail
  • class OneCo_hybrid_RAG_v1

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

    class oneco_hybrid_rag
  • class DocumentProcessor_v5

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

    class documentprocessor
  • class DocumentConverter

    A class that converts various document formats (Word, Excel, PowerPoint, OpenDocument, Visio) to PDF using LibreOffice's headless conversion capabilities, with support for parallel processing and directory structure preservation.

    File: /tf/active/vicechatdev/pdfconverter.py

    document-conversion pdf libreoffice batch-processing parallel-processing
  • class DocumentDetail_v1

    Document detail view component

    File: /tf/active/vicechatdev/document_detail_old.py

    class documentdetail
  • class ImprovedProjectVictoriaGenerator

    Improved Project Victoria Disclosure Generator with proper reference management.

    File: /tf/active/vicechatdev/improved_project_victoria_generator.py

    class improvedprojectvictoriagenerator
  • class ReferenceManager_v4

    Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class referencemanager
  • class OneCo_hybrid_RAG_v2

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag
  • class PDFConverter

    A class that converts various document formats (Word, PowerPoint, Excel, images) to PDF format using LibreOffice and ReportLab libraries.

    File: /tf/active/vicechatdev/msg_to_eml.py

    pdf conversion document-processing file-conversion libreoffice
  • function merge_pdfs_v1

    Merges multiple PDF files into a single output PDF file with robust error handling and fallback mechanisms.

    File: /tf/active/vicechatdev/msg_to_eml.py

    pdf merge file-processing document-processing pdf-manipulation
  • function generate_html_from_msg

    Converts an email message object into a formatted HTML representation with styling, headers, body content, and attachment information.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email html-generation email-parsing formatting msg-file
  • function html_to_pdf

    Converts HTML content to a PDF file using ReportLab with intelligent parsing of email-formatted HTML, including metadata extraction, body content processing, and attachment information.

    File: /tf/active/vicechatdev/msg_to_eml.py

    pdf-generation html-to-pdf email-conversion document-generation reportlab
  • function msg_to_pdf_improved

    Converts a Microsoft Outlook .msg file to PDF format using EML as an intermediate format for improved reliability, with fallback to direct conversion if needed.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email-conversion msg-to-pdf file-conversion pdf-generation outlook
  • function msg_to_pdf

    Converts a Microsoft Outlook .msg email file to a single PDF document, including the email body and all attachments merged together.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email-conversion pdf-generation msg-file outlook document-processing
  • function eml_to_pdf

    Converts an .eml email file to PDF format, including the email body and all attachments merged into a single PDF document.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email-processing pdf-conversion eml-parser document-conversion attachment-handling
  • class FileCloudEmailProcessor

    A class that processes email files (.msg format) stored in FileCloud by finding, downloading, converting them to EML and PDF formats, and organizing them into mail_archive folders.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email-processing file-conversion cloud-storage filecloud msg-to-eml
  • class ProjectVictoriaDisclosureGenerator

    Main class for generating Project Victoria disclosures from warranty claims.

    File: /tf/active/vicechatdev/project_victoria_disclosure_generator.py

    class projectvictoriadisclosuregenerator
  • class DocumentProcessor_v6

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi.py

    class documentprocessor
  • function create_folder_hierarchy_v1

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, connecting each folder level with PATH relationships.

    File: /tf/active/vicechatdev/offline_docstore_multi.py

    neo4j graph-database folder-hierarchy file-system path-processing
  • class DocumentExtractor

    A document text extraction class that supports multiple file formats including Word, PowerPoint, PDF, and plain text files, with automatic format detection and conversion capabilities.

    File: /tf/active/vicechatdev/leexi/document_extractor.py

    document-processing text-extraction pdf word powerpoint
  • function test_document_extractor

    A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

    File: /tf/active/vicechatdev/leexi/test_document_extractor.py

    testing document-extraction file-processing validation text-extraction
  • function search_and_locate

    Searches for specific numbered folders (01-08) in a SharePoint site and traces their locations, contents, and file distributions by type.

    File: /tf/active/vicechatdev/SPFCsync/search_detailed.py

    sharepoint search diagnostic folder-discovery microsoft-graph
  • function build_document_tree_lazy

    Builds a single-level document tree structure for lazy loading, scanning only immediate children of a target directory without recursively loading subdirectories.

    File: /tf/active/vicechatdev/docchat/app.py

    file-system directory-tree lazy-loading document-management file-browser
  • function build_document_tree_recursive

    Recursively builds a complete hierarchical tree structure of documents and folders from a target directory path, filtering for supported file types and skipping hidden/cache directories.

    File: /tf/active/vicechatdev/docchat/app.py

    file-system directory-traversal recursive document-management tree-structure
  • function view_document

    Flask route handler that serves documents for in-browser viewing by accepting a file path as a query parameter, validating security constraints, and returning the file with appropriate MIME types and CORS headers.

    File: /tf/active/vicechatdev/docchat/app.py

    flask file-serving document-viewer security path-validation
  • function export_to_pdf_v1

    Flask route handler that exports a chat conversation to a PDF file with formatted messages, roles, and references using the reportlab library.

    File: /tf/active/vicechatdev/docchat/app.py

    pdf-export document-generation chat-export reportlab flask-route
  • class DocumentProcessor_v4

    Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

    File: /tf/active/vicechatdev/docchat/document_processor.py

    class documentprocessor
  • class DocumentProcessor_v8

    Process different document types for indexing

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    class documentprocessor
  • class DocumentIndexer

    A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    document-indexing vector-database chromadb embeddings pdf-processing
  • function test_libreoffice_conversion

    Tests LibreOffice's ability to convert a document file to PDF format using headless mode, with timeout protection and comprehensive error reporting.

    File: /tf/active/vicechatdev/docchat/test_problematic_files.py

    libreoffice pdf-conversion document-processing testing subprocess
  • function test_document_processor

    A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py

    testing document-processing pdf-extraction text-extraction integration-test
  • function test_extraction_methods

    A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

    testing pdf-extraction ocr document-processing text-extraction
  • function test_local_document

    Integration test function that validates end date extraction from a local PDF document using document processing and LLM-based analysis.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_local_document.py

    testing integration-test document-processing pdf-extraction llm
  • function test_single_document

    Tests end date extraction from a specific PDF document by downloading it from FileCloud, extracting text, and using LLM-based analysis to identify contract expiry dates.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_single_document.py

    testing integration-test document-processing pdf-extraction contract-analysis
  • function test_filecloud_connection

    Tests the connection to a FileCloud server by establishing a client connection and performing a document search operation to verify functionality.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py

    testing filecloud connection-test document-search integration-test
  • function test_document_processing

    A test function that validates document processing functionality by creating a test PDF file, processing it through a DocumentProcessor, and verifying the extraction results or error handling.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py

    testing document-processing pdf-extraction integration-test contract-analysis
  • function test_llm_client

    Tests the LLM client functionality by analyzing a sample contract text and verifying the extraction of key contract metadata such as third parties, dates, and status.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py

    testing llm contract-analysis integration-test validation
  • class TestDocumentProcessor

    A test subclass of DocumentProcessor that simulates llmsherpa PDF processing failures and triggers OCR fallback mechanisms for testing purposes.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

    testing document-processing pdf ocr fallback
  • function test_ocr_fallback

    A test function that validates OCR fallback functionality when the primary llmsherpa PDF text extraction method fails.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

    testing ocr pdf-processing text-extraction fallback-mechanism

Search Examples