🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "parsing"

Found 50 matching component(s)

  • function extract_warranty_data_improved

    Parses markdown-formatted warranty documentation to extract structured warranty data including IDs, titles, sections, disclosure text, and reference citations.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    markdown-parsing text-extraction warranty-processing document-parsing regex
  • function parse_references_section

    Parses a formatted references section string and extracts structured data including reference numbers, sources, and content previews using regular expressions.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    parsing text-processing references citations regex
  • function get_bibtext

    Retrieves and parses BibTeX citation data for a given DOI (Digital Object Identifier), extracting the title and formatted BibTeX string.

    File: /tf/active/vicechatdev/offline_parser_docstore.py

    bibliography bibtex doi citation academic
  • function create_folder_hierarchy_v2

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.

    File: /tf/active/vicechatdev/offline_parser_docstore.py

    neo4j graph-database hierarchy folder-structure file-system
  • class RegulatoryExtractor

    A class for extracting structured metadata from regulatory guideline PDF documents using LLM-based analysis and storing the results in an Excel tracking spreadsheet.

    File: /tf/active/vicechatdev/reg_extractor.py

    pdf-extraction regulatory-documents llm-extraction ocr data-extraction
  • function test_markdown_link_parsing

    A test function that validates markdown link parsing capabilities, specifically testing extraction and URL encoding of complex URLs containing special characters from Quill editor format.

    File: /tf/active/vicechatdev/test_complex_hyperlink.py

    testing markdown url-parsing regex url-encoding
  • function extract_warranty_data

    Parses markdown-formatted warranty documentation to extract structured warranty information including IDs, titles, sections, source document counts, warranty text, and disclosure content.

    File: /tf/active/vicechatdev/convert_disclosures_to_table.py

    markdown-parsing data-extraction warranty-processing text-processing regex
  • function create_word_report

    Generates a formatted Microsoft Word document report containing warranty disclosures with a table of contents, metadata, and structured sections for each warranty.

    File: /tf/active/vicechatdev/convert_disclosures_to_table.py

    document-generation word-document docx report-generation warranty
  • function clean_text_for_xml_v1

    Sanitizes text strings to ensure XML 1.0 compatibility by removing or replacing invalid control characters and ensuring all characters meet XML specification requirements for Word document generation.

    File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

    text-processing xml sanitization data-cleaning word-documents
  • function extract_warranty_sections

    Parses markdown content to extract warranty section headers, returning a list of dictionaries containing section IDs and titles for table of contents generation.

    File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

    markdown-parsing text-processing warranty-documents table-of-contents document-structure
  • function extract_total_references

    Extracts the total count of references from markdown-formatted content by first checking for a header line with the total, then falling back to manually counting reference entries.

    File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

    markdown parsing text-processing references bibliography
  • class OneCo_hybrid_RAG

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    class oneco_hybrid_rag
  • function create_document_v1

    Creates a new version of an existing document in a document management system, storing the file in FileCloud and tracking version metadata in Neo4j graph database.

    File: /tf/active/vicechatdev/document_controller_backup.py

    document-management version-control filecloud neo4j graph-database
  • class FileCloudAPI

    Python wrapper for the FileCloud REST API. This class provides methods to interact with FileCloud server APIs, handling authentication, session management, and various file operations.

    File: /tf/active/vicechatdev/FC_api copy.py

    class filecloudapi
  • class OneCo_hybrid_RAG_v1

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

    class oneco_hybrid_rag
  • class FileCloudAPI_v1

    Python wrapper for the FileCloud REST API. This class provides methods to interact with FileCloud server APIs, handling authentication, session management, and various file operations.

    File: /tf/active/vicechatdev/FC_api.py

    class filecloudapi
  • function create_folder_hierarchy

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file system path, connecting each folder level with PATH relationships.

    File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

    neo4j graph-database file-system hierarchy folder-structure
  • function main_v15

    Command-line interface function that orchestrates the generation of meeting minutes from a transcript file using either GPT-4o or Gemini LLM models.

    File: /tf/active/vicechatdev/advanced_meeting_minutes_generator.py

    cli command-line meeting-minutes transcript-processing llm
  • class OneCo_hybrid_RAG_v2

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag
  • function parse_email_address

    Parses email address strings by handling multiple addresses separated by semicolons and converting them to comma-separated format.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email parsing string-manipulation formatting address-normalization
  • function generate_html_from_msg

    Converts an email message object into a formatted HTML representation with styling, headers, body content, and attachment information.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email html-generation email-parsing formatting msg-file
  • function html_to_pdf

    Converts HTML content to a PDF file using ReportLab with intelligent parsing of email-formatted HTML, including metadata extraction, body content processing, and attachment information.

    File: /tf/active/vicechatdev/msg_to_eml.py

    pdf-generation html-to-pdf email-conversion document-generation reportlab
  • function generate_simple_html_from_eml

    Converts an email.message.Message object into a clean, styled HTML representation with embedded inline images and attachment listings.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email html-generation email-parsing mime inline-images
  • function eml_to_pdf

    Converts an .eml email file to PDF format, including the email body and all attachments merged into a single PDF document.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email-processing pdf-conversion eml-parser document-conversion attachment-handling
  • function main_v42

    Entry point function that parses command-line arguments and orchestrates the FileCloud email processing workflow to find, download, and convert .msg files.

    File: /tf/active/vicechatdev/msg_to_eml.py

    cli command-line entry-point filecloud email-processing
  • class DocxMerger

    A class named DocxMerger

    File: /tf/active/vicechatdev/word_merge.py

    class docxmerger
  • function handle_potential_truncation

    Detects and handles truncated meeting minutes by comparing agenda items to discussion sections, then attempts regeneration with enhanced instructions to ensure completeness.

    File: /tf/active/vicechatdev/leexi/app.py

    meeting-minutes truncation-handling text-processing regeneration quality-assurance
  • function test_attendee_extraction_comprehensive

    A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

    File: /tf/active/vicechatdev/leexi/test_attendee_comprehensive.py

    testing attendee-extraction meeting-minutes transcript-parsing speaker-identification
  • class PowerPointProcessor

    A class that processes PowerPoint (.pptx) presentations to extract text content and tables, converting tables to markdown format and organizing content by slides.

    File: /tf/active/vicechatdev/leexi/enhanced_meeting_minutes_generator.py

    powerpoint pptx document-processing text-extraction table-extraction
  • function main_v2

    Command-line interface function that orchestrates the generation of enhanced meeting minutes from transcript files and PowerPoint presentations using various LLM models (GPT-4o, Azure GPT-4o, or Gemini).

    File: /tf/active/vicechatdev/leexi/enhanced_meeting_minutes_generator.py

    cli command-line meeting-minutes llm gpt-4
  • function test_attendee_extraction

    A test function that validates the attendee extraction logic of the EnhancedMeetingMinutesGenerator by parsing a meeting transcript and displaying extracted metadata including speakers, date, and duration.

    File: /tf/active/vicechatdev/leexi/test_attendee_extraction.py

    testing unit-test meeting-minutes attendee-extraction metadata-parsing
  • function parse_log_line

    Parses a structured log line string and extracts timestamp, logger name, log level, and message components into a dictionary.

    File: /tf/active/vicechatdev/SPFCsync/monitor.py

    logging parsing regex text-processing log-analysis
  • function analyze_logs

    Parses and analyzes log files to extract synchronization statistics, error counts, and performance metrics for a specified time period.

    File: /tf/active/vicechatdev/SPFCsync/monitor.py

    log-analysis file-synchronization monitoring statistics parsing
  • function validate_sharepoint_url

    Validates that a given URL string conforms to SharePoint site URL format requirements, checking for proper protocol, domain, and path structure.

    File: /tf/active/vicechatdev/SPFCsync/validate_config.py

    validation sharepoint url-validation microsoft configuration
  • function load_env_file

    Reads and parses environment variables from a .env file in the current directory, returning them as a dictionary.

    File: /tf/active/vicechatdev/SPFCsync/validate_config.py

    environment-variables configuration file-parsing dotenv settings
  • function load_config_v1

    Parses a .env file and loads key-value pairs into a dictionary, ignoring comments and handling errors gracefully.

    File: /tf/active/vicechatdev/SPFCsync/grant_sharepoint_access.py

    configuration environment-variables file-parsing dotenv settings
  • class SyncDiagnostics

    A diagnostic class that analyzes and reports on synchronization issues between SharePoint and FileCloud, identifying missing files and root causes of sync failures.

    File: /tf/active/vicechatdev/SPFCsync/deep_diagnostics.py

    diagnostics sync-analysis sharepoint filecloud troubleshooting
  • function format_datetime_v1

    Converts an ISO format datetime string into a human-readable UTC datetime string formatted as 'YYYY-MM-DD HH:MM:SS UTC'.

    File: /tf/active/vicechatdev/SPFCsync/dry_run_test.py

    datetime formatting string-manipulation ISO-format UTC
  • class SharePointFileCloudSync

    Orchestrates synchronization of documents from SharePoint to FileCloud, managing the complete sync lifecycle including document retrieval, comparison, upload, and folder structure creation.

    File: /tf/active/vicechatdev/SPFCsync/sync_service.py

    synchronization sharepoint filecloud document-management cloud-sync
  • class DocumentProcessor_v7

    Process different document types for indexing

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    class documentprocessor
  • class OpenAIChatLLM

    Adapter class for interacting with OpenAI's Chat Completions API, supporting both GPT-4 and GPT-5 model families with automatic parameter adjustment based on model type.

    File: /tf/active/vicechatdev/docchat/llm_factory.py

    openai llm chat-completion gpt-4 gpt-5
  • function validate_azure_token_v1

    Validates an Azure AD token by parsing the JWT id_token and extracting user information such as user ID, email, name, and preferred username.

    File: /tf/active/vicechatdev/docchat/auth/azure_auth.py

    azure authentication jwt token-validation oauth
  • function parse_arguments_v1

    Parses command-line arguments for a legal contract extraction tool that processes documents from FileCloud storage.

    File: /tf/active/vicechatdev/contract_validity_analyzer/extractor.py

    cli command-line argument-parsing argparse configuration
  • function parse_arguments

    Parses command-line arguments for a contract validity analysis tool that processes FileCloud documents with configurable options for paths, concurrency, output, and file filtering.

    File: /tf/active/vicechatdev/contract_validity_analyzer/main.py

    cli argument-parsing command-line argparse configuration
  • function main_v5

    Main entry point function for the Contract Validity Analyzer application that orchestrates configuration loading, logging setup, FileCloud connection, and contract analysis execution.

    File: /tf/active/vicechatdev/contract_validity_analyzer/main.py

    entry-point main-function cli-application contract-analysis filecloud
  • function test_international_tax_ids

    A test function that validates an LLM client's ability to extract tax identification numbers and business registration numbers from a multi-party international contract document across 8 different countries.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_international_tax_ids.py

    testing llm document-analysis tax-id-extraction international
  • class FileCloudClient_v1

    A client class for interacting with FileCloud storage systems through direct API calls, providing authentication, file search, download, and metadata retrieval capabilities.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/filecloud_client.py

    filecloud storage api-client file-management document-management
  • class DocumentProcessor_v1

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • class DocumentProcessor_v2

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • function test_european_csv

    A test function that validates the ability to read and parse European-formatted CSV files (semicolon delimiters, comma decimal separators) and convert them to proper numeric types.

    File: /tf/active/vicechatdev/vice_ai/test_regional_formats.py

    testing csv european-format data-parsing unit-test

Search Examples