🔍 Code Extractor

class DocumentSummary

Maturity: 43

A dataclass that encapsulates comprehensive analysis results of a document, including page-level and document-level summaries, topics, findings, and confidence metrics.

File:
/tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
Lines:
29 - 37
Complexity:
simple

Purpose

DocumentSummary serves as a structured data container for storing and transferring complete document analysis results. It aggregates information from multi-page document processing, including individual page summaries, extracted topics, key findings, and an overall document summary with a confidence score. This class is typically used as the output format for document analysis pipelines that process PDFs or other multi-page documents.

Source Code

class DocumentSummary:
    """Summary of complete document analysis"""
    total_pages: int
    document_type: str
    main_topics: List[str]
    key_findings: List[str]
    page_summaries: List[str]
    overall_summary: str
    confidence_score: float

Parameters

Name Type Default Kind
bases - -

Parameter Details

total_pages: Integer representing the total number of pages analyzed in the document. Must be a positive integer.

document_type: String classification of the document type (e.g., 'report', 'invoice', 'contract', 'research paper'). Helps categorize the document for downstream processing.

main_topics: List of strings containing the primary topics or themes identified across the entire document. Each string represents a distinct topic.

key_findings: List of strings containing the most important findings, conclusions, or insights extracted from the document. Each string represents a significant finding.

page_summaries: List of strings where each element contains the summary of a corresponding page. The list length should match total_pages.

overall_summary: String containing a comprehensive summary of the entire document, synthesizing information from all pages.

confidence_score: Float value between 0.0 and 1.0 representing the confidence level of the analysis. Higher values indicate greater confidence in the extracted information.

Return Value

Instantiation returns a DocumentSummary object containing all the specified attributes. As a dataclass, it automatically generates __init__, __repr__, __eq__, and other methods. The object serves as an immutable-by-convention data structure for passing document analysis results.

Class Interface

Methods

__init__(total_pages: int, document_type: str, main_topics: List[str], key_findings: List[str], page_summaries: List[str], overall_summary: str, confidence_score: float)

Purpose: Automatically generated constructor that initializes all instance attributes with provided values

Parameters:

  • total_pages: Total number of pages in the analyzed document
  • document_type: Classification or category of the document
  • main_topics: List of primary topics identified in the document
  • key_findings: List of important findings or conclusions
  • page_summaries: List of summaries for each page
  • overall_summary: Comprehensive summary of the entire document
  • confidence_score: Confidence level of the analysis (0.0 to 1.0)

Returns: None (constructor)

__repr__() -> str

Purpose: Automatically generated method that returns a string representation of the object showing all attributes

Returns: String representation in the format 'DocumentSummary(total_pages=..., document_type=..., ...)'

__eq__(other) -> bool

Purpose: Automatically generated method that compares two DocumentSummary instances for equality based on all attributes

Parameters:

  • other: Another object to compare with

Returns: True if all attributes are equal, False otherwise

Attributes

Name Type Description Scope
total_pages int Total number of pages in the analyzed document instance
document_type str Classification or category of the document (e.g., 'report', 'invoice', 'contract') instance
main_topics List[str] List of primary topics or themes identified across the document instance
key_findings List[str] List of important findings, conclusions, or insights extracted from the document instance
page_summaries List[str] List of summaries for each page, where index corresponds to page number instance
overall_summary str Comprehensive summary synthesizing information from the entire document instance
confidence_score float Confidence level of the analysis results, typically ranging from 0.0 to 1.0 instance

Dependencies

  • dataclasses
  • typing

Required Imports

from dataclasses import dataclass
from typing import List

Usage Example

from dataclasses import dataclass
from typing import List

@dataclass
class DocumentSummary:
    total_pages: int
    document_type: str
    main_topics: List[str]
    key_findings: List[str]
    page_summaries: List[str]
    overall_summary: str
    confidence_score: float

# Create a document summary instance
summary = DocumentSummary(
    total_pages=5,
    document_type='research paper',
    main_topics=['machine learning', 'natural language processing', 'transformers'],
    key_findings=['Transformers outperform RNNs', 'Attention mechanism is key'],
    page_summaries=['Page 1: Introduction to NLP', 'Page 2: Methodology', 'Page 3: Results', 'Page 4: Discussion', 'Page 5: Conclusion'],
    overall_summary='This paper presents a comprehensive study on transformer models in NLP tasks.',
    confidence_score=0.92
)

# Access attributes
print(f'Document has {summary.total_pages} pages')
print(f'Type: {summary.document_type}')
print(f'Confidence: {summary.confidence_score}')
for topic in summary.main_topics:
    print(f'Topic: {topic}')

Best Practices

  • Always ensure total_pages matches the length of page_summaries list for consistency
  • Keep confidence_score between 0.0 and 1.0 to maintain standard probability conventions
  • Use descriptive and consistent document_type values to enable proper categorization
  • Populate main_topics with distinct, non-overlapping topics for clarity
  • Keep key_findings concise and actionable, focusing on the most important insights
  • Since this is a dataclass, it's immutable by convention - avoid modifying attributes after instantiation
  • Consider using frozen=True in the @dataclass decorator if true immutability is required
  • Validate data before instantiation to ensure all required fields are properly populated
  • Use this class as a return type for document analysis functions to maintain consistent API contracts

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class MultiPageAnalysisResult 75.7% similar

    A dataclass that encapsulates the complete results of analyzing a multi-page document, including individual page analyses, document summary, combined response, and processing statistics.

    From: /tf/active/vicechatdev/e-ink-llm/multi_page_llm_handler.py
  • class PageAnalysis 70.6% similar

    A dataclass that encapsulates the analysis results for a single PDF page, including its image representation, text content, dimensions, and optional analysis metadata.

    From: /tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
  • class DataSection 64.0% similar

    A dataclass representing a dedicated data analysis section that stores analysis results, plots, dataset information, and conclusions separately from text content.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class AnalysisResult 63.9% similar

    A dataclass that encapsulates the results from statistical analysis operations, including metadata, file paths, and timestamps.

    From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
  • class AnalysisResult_v1 63.5% similar

    A dataclass that encapsulates the results from statistical analysis operations, including metadata, file paths, and timestamps.

    From: /tf/active/vicechatdev/vice_ai/models.py
← Back to Browse