Search - Code Extractor

function extract_warranty_data_improved

Parses markdown-formatted warranty documentation to extract structured warranty data including IDs, titles, sections, disclosure text, and reference citations.

File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

markdown-parsing text-extraction warranty-processing document-parsing regex

function parse_references_section

Parses a formatted references section string and extracts structured data including reference numbers, sources, and content previews using regular expressions.

File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

parsing text-processing references citations regex

function main_v14

Orchestrates the conversion of an improved markdown file containing warranty disclosures into multiple tabular formats (CSV, Excel, Word) with timestamp-based file naming.

File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

file-conversion markdown-processing warranty-data csv-export excel-export

function validate_and_alternatives

Validates whether a given keyword is a valid chemical compound, biochemical concept, or drug-related term using GPT-4, and returns alternative names/synonyms if valid.

File: /tf/active/vicechatdev/offline_parser_docstore.py

validation chemistry biochemistry drug-research llm

class RegulatoryExtractor

A class for extracting structured metadata from regulatory guideline PDF documents using LLM-based analysis and storing the results in an Excel tracking spreadsheet.

File: /tf/active/vicechatdev/reg_extractor.py

pdf-extraction regulatory-documents llm-extraction ocr data-extraction

function test_markdown_link_parsing

A test function that validates markdown link parsing capabilities, specifically testing extraction and URL encoding of complex URLs containing special characters from Quill editor format.

File: /tf/active/vicechatdev/test_complex_hyperlink.py

testing markdown url-parsing regex url-encoding

function extract_warranty_data

Parses markdown-formatted warranty documentation to extract structured warranty information including IDs, titles, sections, source document counts, warranty text, and disclosure content.

File: /tf/active/vicechatdev/convert_disclosures_to_table.py

markdown-parsing data-extraction warranty-processing text-processing regex

function main_v25

Converts a markdown file containing warranty disclosure data into multiple tabular formats (CSV, Excel, Word) with timestamped output files.

File: /tf/active/vicechatdev/convert_disclosures_to_table.py

markdown-conversion data-extraction report-generation csv-export excel-export

function extract_warranty_sections

Parses markdown content to extract warranty section headers, returning a list of dictionaries containing section IDs and titles for table of contents generation.

File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

markdown-parsing text-processing warranty-documents table-of-contents document-structure

class ReferenceManager_v2

Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

class referencemanager

class OneCo_hybrid_RAG

A class named OneCo_hybrid_RAG

File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

class oneco_hybrid_rag

class FixedProjectVictoriaGenerator

Fixed Project Victoria Disclosure Generator that properly handles all warranty sections.

File: /tf/active/vicechatdev/fixed_project_victoria_generator.py

class fixedprojectvictoriagenerator

class PatternBasedExtractor

Extract flocks based on farm-level In-Ovo usage patterns.

File: /tf/active/vicechatdev/pattern_based_extraction.py

class patternbasedextractor

function main_v11

Command-line interface function that orchestrates pattern-based extraction of poultry flock data, including data loading, pattern classification, geocoding, and export functionality.

File: /tf/active/vicechatdev/pattern_based_extraction.py

cli command-line-interface data-extraction poultry-data pattern-analysis

class DocumentDetail

Document detail view component

File: /tf/active/vicechatdev/document_detail_backup.py

class documentdetail

class ReferenceManager_v3

Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

class referencemanager

class OneCo_hybrid_RAG_v1

A class named OneCo_hybrid_RAG

File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

class oneco_hybrid_rag

class DocumentProcessor_v5

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

class documentprocessor

class DocumentDetail_v1

Document detail view component

File: /tf/active/vicechatdev/document_detail_old.py

class documentdetail

class ImprovedProjectVictoriaGenerator

Improved Project Victoria Disclosure Generator with proper reference management.

File: /tf/active/vicechatdev/improved_project_victoria_generator.py

class improvedprojectvictoriagenerator

class MeetingMinutesGenerator_v1

A class that generates professional meeting minutes from meeting transcripts using either OpenAI's GPT-4o or Google's Gemini AI models.

File: /tf/active/vicechatdev/advanced_meeting_minutes_generator.py

meeting-minutes transcript-processing llm gpt-4o gemini

class QueryBasedExtractor_v2

A class that performs targeted information extraction from text using LLM-based query-guided extraction, with support for handling long documents through chunking and token management.

File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

information-extraction text-processing llm openai query-based

class ReferenceManager_v4

Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

class referencemanager

class OneCo_hybrid_RAG_v2

A class named OneCo_hybrid_RAG

File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

class oneco_hybrid_rag

function msg_to_eml

Converts Microsoft Outlook .msg files to standard .eml format, preserving email headers, body content (plain text and HTML), and attachments.

File: /tf/active/vicechatdev/msg_to_eml.py

email-conversion msg-to-eml outlook email-processing file-conversion

function msg_to_eml_alternative

Converts Microsoft Outlook .msg files to .eml (email) format using the extract_msg library, preserving email headers, body content (plain text and HTML), and attachments.

File: /tf/active/vicechatdev/msg_to_eml.py

email-conversion msg-to-eml outlook mime email-processing

function html_to_pdf

Converts HTML content to a PDF file using ReportLab with intelligent parsing of email-formatted HTML, including metadata extraction, body content processing, and attachment information.

File: /tf/active/vicechatdev/msg_to_eml.py

pdf-generation html-to-pdf email-conversion document-generation reportlab

function generate_neo4j_schema_report

Generates a comprehensive schema report of a Neo4j graph database, including node labels, relationships, properties, constraints, indexes, and sample data, outputting multiple file formats (JSON, HTML, Python snippets, Cypher examples).

File: /tf/active/vicechatdev/neo4j_schema_report.py

neo4j graph-database schema-analysis database-introspection documentation-generation

class ProjectVictoriaDisclosureGenerator

Main class for generating Project Victoria disclosures from warranty claims.

File: /tf/active/vicechatdev/project_victoria_disclosure_generator.py

class projectvictoriadisclosuregenerator

class DocumentProcessor_v6

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi.py

class documentprocessor

function test_mixed_previous_reports

A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (TXT and Markdown) and combine them into a unified previous reports summary.

File: /tf/active/vicechatdev/leexi/test_enhanced_reports.py

testing document-extraction file-processing integration-test text-extraction

class DocumentExtractor

A document text extraction class that supports multiple file formats including Word, PowerPoint, PDF, and plain text files, with automatic format detection and conversion capabilities.

File: /tf/active/vicechatdev/leexi/document_extractor.py

document-processing text-extraction pdf word powerpoint

function test_document_extractor

A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

File: /tf/active/vicechatdev/leexi/test_document_extractor.py

testing document-extraction file-processing validation text-extraction

function extract_previous_reports_summary

Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.

File: /tf/active/vicechatdev/leexi/app.py

meeting-analysis document-extraction text-summarization llm openai

function test_attendee_extraction_comprehensive

A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

File: /tf/active/vicechatdev/leexi/test_attendee_comprehensive.py

testing attendee-extraction meeting-minutes transcript-parsing speaker-identification

function test_multiple_files

A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

File: /tf/active/vicechatdev/leexi/test_multiple_files.py

testing document-extraction file-processing text-extraction multiple-files

class PowerPointProcessor

A class that processes PowerPoint (.pptx) presentations to extract text content and tables, converting tables to markdown format and organizing content by slides.

File: /tf/active/vicechatdev/leexi/enhanced_meeting_minutes_generator.py

powerpoint pptx document-processing text-extraction table-extraction

function test_attendee_extraction

A test function that validates the attendee extraction logic of the EnhancedMeetingMinutesGenerator by parsing a meeting transcript and displaying extracted metadata including speakers, date, and duration.

File: /tf/active/vicechatdev/leexi/test_attendee_extraction.py

testing unit-test meeting-minutes attendee-extraction metadata-parsing

function parse_log_line

Parses a structured log line string and extracts timestamp, logger name, log level, and message components into a dictionary.

File: /tf/active/vicechatdev/SPFCsync/monitor.py

logging parsing regex text-processing log-analysis

function full_reading_example

Demonstrates the full reading mode of a RAG (Retrieval-Augmented Generation) system by processing all documents to answer a comprehensive query about key findings.

File: /tf/active/vicechatdev/docchat/example_usage.py

example demonstration RAG retrieval-augmented-generation full-reading

function process_chat_background

Processes chat requests asynchronously in a background thread, managing RAG engine interactions, progress updates, and session state for various query modes including basic, extensive, full_reading, and deep_reflection.

File: /tf/active/vicechatdev/docchat/app.py

background-processing async rag chat document-retrieval

class QueryBasedExtractor

A class that extracts relevant information from documents using a small LLM (Language Model), designed for Extensive and Full Reading modes in RAG systems.

File: /tf/active/vicechatdev/docchat/rag_engine.py

information-extraction document-processing llm rag query-based

class DocChatRAG

Main RAG engine with three operating modes: 1. Basic RAG (similarity search) 2. Extensive (full document retrieval with preprocessing) 3. Full Reading (process all documents)

File: /tf/active/vicechatdev/docchat/rag_engine.py

class docchatrag

class DocumentProcessor_v4

Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

File: /tf/active/vicechatdev/docchat/document_processor.py

class documentprocessor

class DocumentProcessor_v8

Process different document types for indexing

File: /tf/active/vicechatdev/docchat/document_indexer.py

class documentprocessor

function test_docx_file

Tests the ability to open and read a Microsoft Word (.docx) document file, validating file existence, size, and content extraction capabilities.

File: /tf/active/vicechatdev/docchat/test_problematic_files.py

document-testing file-validation docx word-document diagnostic

function validate_azure_token_v1

Validates an Azure AD token by parsing the JWT id_token and extracting user information such as user ID, email, name, and preferred username.

File: /tf/active/vicechatdev/docchat/auth/azure_auth.py

azure authentication jwt token-validation oauth

function test_document_processor

A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py

testing document-processing pdf-extraction text-extraction integration-test

function test_extraction_methods

A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

testing pdf-extraction ocr document-processing text-extraction

function test_local_document

Integration test function that validates end date extraction from a local PDF document using document processing and LLM-based analysis.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_local_document.py

testing integration-test document-processing pdf-extraction llm

Search Components

Search Results for "extraction"

function extract_warranty_data_improved

function parse_references_section

function main_v14

function validate_and_alternatives

class RegulatoryExtractor

function test_markdown_link_parsing

function extract_warranty_data

function main_v25

function extract_warranty_sections

class ReferenceManager_v2

class OneCo_hybrid_RAG

class FixedProjectVictoriaGenerator

class PatternBasedExtractor

function main_v11

class DocumentDetail

class ReferenceManager_v3

class OneCo_hybrid_RAG_v1

class DocumentProcessor_v5

class DocumentDetail_v1

class ImprovedProjectVictoriaGenerator

class MeetingMinutesGenerator_v1

class QueryBasedExtractor_v2

class ReferenceManager_v4

class OneCo_hybrid_RAG_v2

function msg_to_eml

function msg_to_eml_alternative

function html_to_pdf

function generate_neo4j_schema_report

class ProjectVictoriaDisclosureGenerator

class DocumentProcessor_v6

function test_mixed_previous_reports

class DocumentExtractor

function test_document_extractor

function extract_previous_reports_summary

function test_attendee_extraction_comprehensive

function test_multiple_files

class PowerPointProcessor

function test_attendee_extraction

function parse_log_line

function full_reading_example

function process_chat_background

class QueryBasedExtractor

class DocChatRAG

class DocumentProcessor_v4

class DocumentProcessor_v8

function test_docx_file

function validate_azure_token_v1

function test_document_processor

function test_extraction_methods

function test_local_document

Search Examples