🔍 Code Extractor

function extract_total_references

Maturity: 41

Extracts the total count of references from markdown-formatted content by first checking for a header line with the total, then falling back to manually counting reference entries.

File:
/tf/active/vicechatdev/enhanced_word_converter_fixed.py
Lines:
73 - 88
Complexity:
simple

Purpose

This function is designed to parse markdown documents that contain bibliographic references and determine the total number of references present. It uses a two-stage approach: first attempting to find an explicit '**Total References**:' header line with the count, and if that fails, manually counting lines that match the reference format '**[...]**'. This is useful for document processing pipelines that need to validate or report on reference counts in markdown-formatted academic or technical documents.

Source Code

def extract_total_references(markdown_content):
    """Extract total number of references from the markdown content"""
    lines = markdown_content.split('\n')
    for line in lines:
        if line.startswith('**Total References**:'):
            try:
                return int(line.split(':')[1].strip())
            except:
                pass
    
    # Count references manually if not found in header
    ref_count = 0
    for line in lines:
        if line.startswith('**[') and ']**' in line:
            ref_count += 1
    return ref_count

Parameters

Name Type Default Kind
markdown_content - - positional_or_keyword

Parameter Details

markdown_content: A string containing markdown-formatted text. Expected to contain references formatted as '**[reference_id]**' or a header line '**Total References**: N' where N is an integer. Can be multi-line content with newline characters separating lines.

Return Value

Returns an integer representing the total number of references found in the markdown content. If a '**Total References**:' header is found and successfully parsed, returns that value. Otherwise, returns the count of lines matching the reference pattern '**[...]**'. Returns 0 if no references are found.

Usage Example

# Example 1: Markdown with explicit total
markdown_with_header = '''
**Total References**: 3

**[1]** Smith, J. (2020). Example Paper.
**[2]** Doe, J. (2021). Another Paper.
**[3]** Brown, A. (2022). Third Paper.
'''

total = extract_total_references(markdown_with_header)
print(f"Total references: {total}")  # Output: Total references: 3

# Example 2: Markdown without explicit total (manual count)
markdown_without_header = '''
**[1]** Smith, J. (2020). Example Paper.
**[2]** Doe, J. (2021). Another Paper.
'''

total = extract_total_references(markdown_without_header)
print(f"Total references: {total}")  # Output: Total references: 2

# Example 3: Empty or no references
empty_markdown = "Some text without references"
total = extract_total_references(empty_markdown)
print(f"Total references: {total}")  # Output: Total references: 0

Best Practices

  • Ensure markdown_content is a string; pass empty string '' instead of None to avoid AttributeError
  • The function uses a broad try-except block which silently catches all exceptions when parsing the header line; consider validating input format beforehand
  • Reference format must strictly match '**[' at line start and contain ']**' for manual counting to work correctly
  • The function assumes references are on separate lines; inline references won't be counted
  • If the '**Total References**:' header exists but contains invalid data, the function falls back to manual counting rather than raising an error

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function parse_references_section 66.4% similar

    Parses a formatted references section string and extracts structured data including reference numbers, sources, and content previews using regular expressions.

    From: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py
  • function process_markdown_content_v1 48.3% similar

    Parses markdown-formatted text content and converts it into a structured list of document elements (headers, paragraphs, lists, tables, code blocks) with their types and formatting preserved in original order.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function process_markdown_content 48.3% similar

    Parses markdown-formatted text content and converts it into a structured list of content elements with type annotations and formatting metadata suitable for document export.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • function extract_warranty_data_improved 47.8% similar

    Parses markdown-formatted warranty documentation to extract structured warranty data including IDs, titles, sections, disclosure text, and reference citations.

    From: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py
  • class ReferenceManager 47.0% similar

    Manages document references for inline citation and bibliography generation in a RAG (Retrieval-Augmented Generation) system.

    From: /tf/active/vicechatdev/fixed_project_victoria_generator.py
← Back to Browse