Search - Code Extractor

function clean_text_for_xml

Sanitizes text by removing or replacing XML-incompatible characters to ensure compatibility with Word document XML structure.

File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

text-processing xml sanitization word-documents character-encoding

function create_word_report_improved

Generates a formatted Microsoft Word document report containing warranty disclosures with table of contents, structured sections, and references.

File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

document-generation word-processing report-generation docx warranty-management

function main_v14

Orchestrates the conversion of an improved markdown file containing warranty disclosures into multiple tabular formats (CSV, Excel, Word) with timestamp-based file naming.

File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

file-conversion markdown-processing warranty-data csv-export excel-export

function test_complex_url_hyperlink

A test function that validates the creation of Word documents with complex FileCloud URLs containing special characters, query parameters, and URL fragments as clickable hyperlinks.

File: /tf/active/vicechatdev/test_complex_hyperlink.py

testing word-document hyperlink docx url-handling

function create_word_report

Generates a formatted Microsoft Word document report containing warranty disclosures with a table of contents, metadata, and structured sections for each warranty.

File: /tf/active/vicechatdev/convert_disclosures_to_table.py

document-generation word-document docx report-generation warranty

function main_v25

Converts a markdown file containing warranty disclosure data into multiple tabular formats (CSV, Excel, Word) with timestamped output files.

File: /tf/active/vicechatdev/convert_disclosures_to_table.py

markdown-conversion data-extraction report-generation csv-export excel-export

function create_enhanced_word_document

Converts markdown-formatted warranty disclosure content into a formatted Microsoft Word document with hierarchical headings, styled text, lists, and special formatting for block references.

File: /tf/active/vicechatdev/improved_word_converter.py

document-generation markdown-to-word docx warranty-processing legal-documents

function main_v30

Main entry point function that reads a markdown file, converts it to an enhanced Word document with preserved heading structure, and saves it with a timestamped filename.

File: /tf/active/vicechatdev/improved_word_converter.py

document-conversion markdown-to-word file-processing docx main-entry-point

function clean_text_for_xml_v1

Sanitizes text strings to ensure XML 1.0 compatibility by removing or replacing invalid control characters and ensuring all characters meet XML specification requirements for Word document generation.

File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

text-processing xml sanitization data-cleaning word-documents

function create_enhanced_word_document_v1

Converts markdown content into a formatted Microsoft Word document with proper styling, table of contents, warranty sections, and reference handling for Project Victoria warranty disclosures.

File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

document-generation word-processing markdown-conversion docx formatting

function format_inline_references

Formats inline citation references (e.g., [1], [2]) in a Word document paragraph by applying italic styling to them while preserving the rest of the text.

File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

document-formatting word-processing python-docx text-formatting citations

function main_v2

Main orchestration function that reads an improved markdown file and converts it to an enhanced Word document with comprehensive formatting, including table of contents, warranty sections, disclosures, and bibliography.

File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

document-generation word-processing markdown-conversion docx file-processing

class OneCo_hybrid_RAG

A class named OneCo_hybrid_RAG

File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

class oneco_hybrid_rag

function publish_document

Publishes an approved controlled document by converting it to PDF with signatures and audit trail, uploading to FileCloud, and updating the document status to PUBLISHED.

File: /tf/active/vicechatdev/document_controller_backup.py

document-management publishing pdf-conversion audit-trail controlled-documents

function get_document_download_url

Retrieves a download URL for a controlled document, automatically selecting between editable (Word) and PDF formats based on document status or explicit request.

File: /tf/active/vicechatdev/document_controller_backup.py

document-management file-download url-generation version-control filecloud

function convert_document_to_pdf_v1

Converts a document version from an editable format (e.g., Word) to PDF without changing the document's status, uploading the result to FileCloud and updating the version record.

File: /tf/active/vicechatdev/document_controller_backup.py

document-conversion pdf-generation filecloud controlled-documents document-management

function get_document_edit_url

Generates an online editing URL for a document stored in FileCloud, allowing users to edit documents that are in editable states.

File: /tf/active/vicechatdev/document_controller_backup.py

document-management filecloud url-generation permission-control version-control

class OneCo_hybrid_RAG_v1

A class named OneCo_hybrid_RAG

File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

class oneco_hybrid_rag

class DocumentProcessor_v5

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

class documentprocessor

class DocumentConverter

A class that converts various document formats (Word, Excel, PowerPoint, OpenDocument, Visio) to PDF using LibreOffice's headless conversion capabilities, with support for parallel processing and directory structure preservation.

File: /tf/active/vicechatdev/pdfconverter.py

document-conversion pdf libreoffice batch-processing parallel-processing

class DocumentDetail_v1

Document detail view component

File: /tf/active/vicechatdev/document_detail_old.py

class documentdetail

class OneCo_hybrid_RAG_v2

A class named OneCo_hybrid_RAG

File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

class oneco_hybrid_rag

class PDFConverter

A class that converts various document formats (Word, PowerPoint, Excel, images) to PDF format using LibreOffice and ReportLab libraries.

File: /tf/active/vicechatdev/msg_to_eml.py

pdf conversion document-processing file-conversion libreoffice

class DocxMerger

A class named DocxMerger

File: /tf/active/vicechatdev/word_merge.py

class docxmerger

function merge_word_documents

Merges track changes and comments from a revision Word document into a base Word document, creating a combined output document.

File: /tf/active/vicechatdev/word_merge.py

document-processing word-documents docx merge track-changes

class DocumentProcessor_v6

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi.py

class documentprocessor

class DocumentExtractor

A document text extraction class that supports multiple file formats including Word, PowerPoint, PDF, and plain text files, with automatic format detection and conversion capabilities.

File: /tf/active/vicechatdev/leexi/document_extractor.py

document-processing text-extraction pdf word powerpoint

function test_document_extractor

A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

File: /tf/active/vicechatdev/leexi/test_document_extractor.py

testing document-extraction file-processing validation text-extraction

function save_config_to_file

Persists current application configuration values from the config module to a .env file, maintaining existing entries and formatting multi-value fields appropriately.

File: /tf/active/vicechatdev/docchat/app.py

configuration persistence file-io environment-variables dotenv

function export_to_word

Flask route handler that exports a chat conversation to a formatted Microsoft Word (.docx) document with styled headings, timestamps, and references.

File: /tf/active/vicechatdev/docchat/app.py

export word-document docx chat-history conversation-export

class DocumentProcessor_v4

Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

File: /tf/active/vicechatdev/docchat/document_processor.py

class documentprocessor

class DocumentProcessor_v8

Process different document types for indexing

File: /tf/active/vicechatdev/docchat/document_indexer.py

class documentprocessor

class DocumentIndexer

A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

File: /tf/active/vicechatdev/docchat/document_indexer.py

document-indexing vector-database chromadb embeddings pdf-processing

function test_docx_file

Tests the ability to open and read a Microsoft Word (.docx) document file, validating file existence, size, and content extraction capabilities.

File: /tf/active/vicechatdev/docchat/test_problematic_files.py

document-testing file-validation docx word-document diagnostic

function main_v109

A test harness function that validates the ability to open and process PowerPoint and Word document files, with fallback to LibreOffice conversion for problematic files.

File: /tf/active/vicechatdev/docchat/test_problematic_files.py

testing document-processing file-validation powerpoint word

class ContractAnalyzer

Main class for analyzing contract validity from FileCloud documents.

File: /tf/active/vicechatdev/contract_validity_analyzer/core/analyzer.py

class contractanalyzer

class DocumentProcessor_v1

A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py

document-processing text-extraction pdf-processing word-processing llmsherpa

class DocumentProcessor_v2

A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py

document-processing text-extraction pdf-processing word-processing llmsherpa

class DocumentConverter_v1

A class that converts various document formats (Word, Excel, PowerPoint, images) to PDF format using LibreOffice, unoconv, or PIL.

File: /tf/active/vicechatdev/document_auditor/src/document_converter.py

document-conversion pdf file-processing office-documents image-to-pdf

function process_inline_markdown

Processes inline markdown formatting by unescaping HTML entities in text. Currently performs basic cleanup while preserving markdown syntax for downstream processing.

File: /tf/active/vicechatdev/vice_ai/complex_app.py

markdown text-processing html-entities preprocessing formatting

function add_formatted_content_to_word_v1

Converts processed markdown elements into formatted content within a Microsoft Word document, handling headers, paragraphs, lists, tables, and code blocks with appropriate styling.

File: /tf/active/vicechatdev/vice_ai/complex_app.py

markdown-conversion word-document document-generation formatting docx

function add_table_to_word_v1

Adds a formatted table to a Microsoft Word document using the python-docx library, with automatic column detection, header row styling, and debug logging.

File: /tf/active/vicechatdev/vice_ai/complex_app.py

word-document table-generation docx document-formatting python-docx

function add_inline_formatting_to_paragraph

Parses markdown-formatted text and applies inline formatting (bold, italic, code) to a Microsoft Word paragraph object using the python-docx library.

File: /tf/active/vicechatdev/vice_ai/complex_app.py

markdown word-document text-formatting docx inline-formatting

function export_to_docx_v1

Exports a document object to Microsoft Word DOCX format, converting sections, content, and references into a formatted Word document with proper styling and structure.

File: /tf/active/vicechatdev/vice_ai/complex_app.py

document-export docx word-document file-generation content-formatting

class DocumentProcessor_v7

Lightweight document processor for chat upload functionality

File: /tf/active/vicechatdev/vice_ai/document_processor.py

class documentprocessor

function export_document

Flask route handler that exports a document in either DOCX or PDF format, verifying user ownership and document access before generating the export file.