🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "extraction"

Found 50 matching component(s)

  • function extract_warranty_data_improved

    Parses markdown-formatted warranty documentation to extract structured warranty data including IDs, titles, sections, disclosure text, and reference citations.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    markdown-parsing text-extraction warranty-processing document-parsing regex
  • function parse_references_section

    Parses a formatted references section string and extracts structured data including reference numbers, sources, and content previews using regular expressions.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    parsing text-processing references citations regex
  • function main_v10

    Orchestrates the conversion of an improved markdown file containing warranty disclosures into multiple tabular formats (CSV, Excel, Word) with timestamp-based file naming.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    file-conversion markdown-processing warranty-data csv-export excel-export
  • function validate_and_alternatives

    Validates whether a given keyword is a valid chemical compound, biochemical concept, or drug-related term using GPT-4, and returns alternative names/synonyms if valid.

    File: /tf/active/vicechatdev/offline_parser_docstore.py

    validation chemistry biochemistry drug-research llm
  • class RegulatoryExtractor

    A class for extracting structured metadata from regulatory guideline PDF documents using LLM-based analysis and storing the results in an Excel tracking spreadsheet.

    File: /tf/active/vicechatdev/reg_extractor.py

    pdf-extraction regulatory-documents llm-extraction ocr data-extraction
  • function test_markdown_link_parsing

    A test function that validates markdown link parsing capabilities, specifically testing extraction and URL encoding of complex URLs containing special characters from Quill editor format.

    File: /tf/active/vicechatdev/test_complex_hyperlink.py

    testing markdown url-parsing regex url-encoding
  • function extract_warranty_data

    Parses markdown-formatted warranty documentation to extract structured warranty information including IDs, titles, sections, source document counts, warranty text, and disclosure content.

    File: /tf/active/vicechatdev/convert_disclosures_to_table.py

    markdown-parsing data-extraction warranty-processing text-processing regex
  • function main_v17

    Converts a markdown file containing warranty disclosure data into multiple tabular formats (CSV, Excel, Word) with timestamped output files.

    File: /tf/active/vicechatdev/convert_disclosures_to_table.py

    markdown-conversion data-extraction report-generation csv-export excel-export
  • function extract_warranty_sections

    Parses markdown content to extract warranty section headers, returning a list of dictionaries containing section IDs and titles for table of contents generation.

    File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

    markdown-parsing text-processing warranty-documents table-of-contents document-structure
  • class ReferenceManager_v2

    Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    class referencemanager
  • class OneCo_hybrid_RAG

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    class oneco_hybrid_rag
  • class FixedProjectVictoriaGenerator

    Fixed Project Victoria Disclosure Generator that properly handles all warranty sections.

    File: /tf/active/vicechatdev/fixed_project_victoria_generator.py

    class fixedprojectvictoriagenerator
  • class PatternBasedExtractor

    Extract flocks based on farm-level In-Ovo usage patterns.

    File: /tf/active/vicechatdev/pattern_based_extraction.py

    class patternbasedextractor
  • function main_v5

    Command-line interface function that orchestrates pattern-based extraction of poultry flock data, including data loading, pattern classification, geocoding, and export functionality.

    File: /tf/active/vicechatdev/pattern_based_extraction.py

    cli command-line-interface data-extraction poultry-data pattern-analysis
  • class DocumentDetail

    Document detail view component

    File: /tf/active/vicechatdev/document_detail_backup.py

    class documentdetail
  • class ReferenceManager_v3

    Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

    class referencemanager
  • class OneCo_hybrid_RAG_v1

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

    class oneco_hybrid_rag
  • class DocumentProcessor_v5

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

    class documentprocessor
  • class DocumentDetail_v1

    Document detail view component

    File: /tf/active/vicechatdev/document_detail_old.py

    class documentdetail
  • class ImprovedProjectVictoriaGenerator

    Improved Project Victoria Disclosure Generator with proper reference management.

    File: /tf/active/vicechatdev/improved_project_victoria_generator.py

    class improvedprojectvictoriagenerator
  • class MeetingMinutesGenerator_v1

    A class that generates professional meeting minutes from meeting transcripts using either OpenAI's GPT-4o or Google's Gemini AI models.

    File: /tf/active/vicechatdev/advanced_meeting_minutes_generator.py

    meeting-minutes transcript-processing llm gpt-4o gemini
  • class QueryBasedExtractor_v2

    A class that performs targeted information extraction from text using LLM-based query-guided extraction, with support for handling long documents through chunking and token management.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    information-extraction text-processing llm openai query-based
  • class ReferenceManager_v4

    Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class referencemanager
  • class OneCo_hybrid_RAG_v2

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag
  • function msg_to_eml

    Converts Microsoft Outlook .msg files to standard .eml format, preserving email headers, body content (plain text and HTML), and attachments.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email-conversion msg-to-eml outlook email-processing file-conversion
  • function msg_to_eml_alternative

    Converts Microsoft Outlook .msg files to .eml (email) format using the extract_msg library, preserving email headers, body content (plain text and HTML), and attachments.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email-conversion msg-to-eml outlook mime email-processing
  • function html_to_pdf

    Converts HTML content to a PDF file using ReportLab with intelligent parsing of email-formatted HTML, including metadata extraction, body content processing, and attachment information.

    File: /tf/active/vicechatdev/msg_to_eml.py

    pdf-generation html-to-pdf email-conversion document-generation reportlab
  • function generate_neo4j_schema_report

    Generates a comprehensive schema report of a Neo4j graph database, including node labels, relationships, properties, constraints, indexes, and sample data, outputting multiple file formats (JSON, HTML, Python snippets, Cypher examples).

    File: /tf/active/vicechatdev/neo4j_schema_report.py

    neo4j graph-database schema-analysis database-introspection documentation-generation
  • class ProjectVictoriaDisclosureGenerator

    Main class for generating Project Victoria disclosures from warranty claims.

    File: /tf/active/vicechatdev/project_victoria_disclosure_generator.py

    class projectvictoriadisclosuregenerator
  • class DocumentProcessor_v6

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi.py

    class documentprocessor
  • function test_mixed_previous_reports

    A test function that validates the DocumentExtractor's ability to extract text content from multiple file formats (TXT and Markdown) and combine them into a unified previous reports summary.

    File: /tf/active/vicechatdev/leexi/test_enhanced_reports.py

    testing document-extraction file-processing integration-test text-extraction
  • class DocumentExtractor

    A document text extraction class that supports multiple file formats including Word, PowerPoint, PDF, and plain text files, with automatic format detection and conversion capabilities.

    File: /tf/active/vicechatdev/leexi/document_extractor.py

    document-processing text-extraction pdf word powerpoint
  • function test_document_extractor

    A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

    File: /tf/active/vicechatdev/leexi/test_document_extractor.py

    testing document-extraction file-processing validation text-extraction
  • function extract_previous_reports_summary

    Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.

    File: /tf/active/vicechatdev/leexi/app.py

    meeting-analysis document-extraction text-summarization llm openai
  • function test_attendee_extraction_comprehensive

    A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

    File: /tf/active/vicechatdev/leexi/test_attendee_comprehensive.py

    testing attendee-extraction meeting-minutes transcript-parsing speaker-identification
  • function test_multiple_files

    A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

    File: /tf/active/vicechatdev/leexi/test_multiple_files.py

    testing document-extraction file-processing text-extraction multiple-files
  • class PowerPointProcessor

    A class that processes PowerPoint (.pptx) presentations to extract text content and tables, converting tables to markdown format and organizing content by slides.

    File: /tf/active/vicechatdev/leexi/enhanced_meeting_minutes_generator.py

    powerpoint pptx document-processing text-extraction table-extraction
  • function test_attendee_extraction

    A test function that validates the attendee extraction logic of the EnhancedMeetingMinutesGenerator by parsing a meeting transcript and displaying extracted metadata including speakers, date, and duration.

    File: /tf/active/vicechatdev/leexi/test_attendee_extraction.py

    testing unit-test meeting-minutes attendee-extraction metadata-parsing
  • function parse_log_line

    Parses a structured log line string and extracts timestamp, logger name, log level, and message components into a dictionary.

    File: /tf/active/vicechatdev/SPFCsync/monitor.py

    logging parsing regex text-processing log-analysis
  • function full_reading_example

    Demonstrates the full reading mode of a RAG (Retrieval-Augmented Generation) system by processing all documents to answer a comprehensive query about key findings.

    File: /tf/active/vicechatdev/docchat/example_usage.py

    example demonstration RAG retrieval-augmented-generation full-reading
  • function process_chat_background

    Processes chat requests asynchronously in a background thread, managing RAG engine interactions, progress updates, and session state for various query modes including basic, extensive, full_reading, and deep_reflection.

    File: /tf/active/vicechatdev/docchat/app.py

    background-processing async rag chat document-retrieval
  • class QueryBasedExtractor

    A class that extracts relevant information from documents using a small LLM (Language Model), designed for Extensive and Full Reading modes in RAG systems.

    File: /tf/active/vicechatdev/docchat/rag_engine.py

    information-extraction document-processing llm rag query-based
  • class DocChatRAG

    Main RAG engine with three operating modes: 1. Basic RAG (similarity search) 2. Extensive (full document retrieval with preprocessing) 3. Full Reading (process all documents)

    File: /tf/active/vicechatdev/docchat/rag_engine.py

    class docchatrag
  • class DocumentProcessor_v4

    Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

    File: /tf/active/vicechatdev/docchat/document_processor.py

    class documentprocessor
  • class DocumentProcessor_v8

    Process different document types for indexing

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    class documentprocessor
  • function test_docx_file

    Tests the ability to open and read a Microsoft Word (.docx) document file, validating file existence, size, and content extraction capabilities.

    File: /tf/active/vicechatdev/docchat/test_problematic_files.py

    document-testing file-validation docx word-document diagnostic
  • function validate_azure_token_v1

    Validates an Azure AD token by parsing the JWT id_token and extracting user information such as user ID, email, name, and preferred username.

    File: /tf/active/vicechatdev/docchat/auth/azure_auth.py

    azure authentication jwt token-validation oauth
  • function test_document_processor

    A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py

    testing document-processing pdf-extraction text-extraction integration-test
  • function test_extraction_methods

    A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

    testing pdf-extraction ocr document-processing text-extraction
  • function test_local_document

    Integration test function that validates end date extraction from a local PDF document using document processing and LLM-based analysis.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_local_document.py

    testing integration-test document-processing pdf-extraction llm

Search Examples