class DocumentSummary
A dataclass that encapsulates comprehensive analysis results of a document, including page-level and document-level summaries, topics, findings, and confidence metrics.
/tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
29 - 37
simple
Purpose
DocumentSummary serves as a structured data container for storing and transferring complete document analysis results. It aggregates information from multi-page document processing, including individual page summaries, extracted topics, key findings, and an overall document summary with a confidence score. This class is typically used as the output format for document analysis pipelines that process PDFs or other multi-page documents.
Source Code
class DocumentSummary:
"""Summary of complete document analysis"""
total_pages: int
document_type: str
main_topics: List[str]
key_findings: List[str]
page_summaries: List[str]
overall_summary: str
confidence_score: float
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
total_pages: Integer representing the total number of pages analyzed in the document. Must be a positive integer.
document_type: String classification of the document type (e.g., 'report', 'invoice', 'contract', 'research paper'). Helps categorize the document for downstream processing.
main_topics: List of strings containing the primary topics or themes identified across the entire document. Each string represents a distinct topic.
key_findings: List of strings containing the most important findings, conclusions, or insights extracted from the document. Each string represents a significant finding.
page_summaries: List of strings where each element contains the summary of a corresponding page. The list length should match total_pages.
overall_summary: String containing a comprehensive summary of the entire document, synthesizing information from all pages.
confidence_score: Float value between 0.0 and 1.0 representing the confidence level of the analysis. Higher values indicate greater confidence in the extracted information.
Return Value
Instantiation returns a DocumentSummary object containing all the specified attributes. As a dataclass, it automatically generates __init__, __repr__, __eq__, and other methods. The object serves as an immutable-by-convention data structure for passing document analysis results.
Class Interface
Methods
__init__(total_pages: int, document_type: str, main_topics: List[str], key_findings: List[str], page_summaries: List[str], overall_summary: str, confidence_score: float)
Purpose: Automatically generated constructor that initializes all instance attributes with provided values
Parameters:
total_pages: Total number of pages in the analyzed documentdocument_type: Classification or category of the documentmain_topics: List of primary topics identified in the documentkey_findings: List of important findings or conclusionspage_summaries: List of summaries for each pageoverall_summary: Comprehensive summary of the entire documentconfidence_score: Confidence level of the analysis (0.0 to 1.0)
Returns: None (constructor)
__repr__() -> str
Purpose: Automatically generated method that returns a string representation of the object showing all attributes
Returns: String representation in the format 'DocumentSummary(total_pages=..., document_type=..., ...)'
__eq__(other) -> bool
Purpose: Automatically generated method that compares two DocumentSummary instances for equality based on all attributes
Parameters:
other: Another object to compare with
Returns: True if all attributes are equal, False otherwise
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
total_pages |
int | Total number of pages in the analyzed document | instance |
document_type |
str | Classification or category of the document (e.g., 'report', 'invoice', 'contract') | instance |
main_topics |
List[str] | List of primary topics or themes identified across the document | instance |
key_findings |
List[str] | List of important findings, conclusions, or insights extracted from the document | instance |
page_summaries |
List[str] | List of summaries for each page, where index corresponds to page number | instance |
overall_summary |
str | Comprehensive summary synthesizing information from the entire document | instance |
confidence_score |
float | Confidence level of the analysis results, typically ranging from 0.0 to 1.0 | instance |
Dependencies
dataclassestyping
Required Imports
from dataclasses import dataclass
from typing import List
Usage Example
from dataclasses import dataclass
from typing import List
@dataclass
class DocumentSummary:
total_pages: int
document_type: str
main_topics: List[str]
key_findings: List[str]
page_summaries: List[str]
overall_summary: str
confidence_score: float
# Create a document summary instance
summary = DocumentSummary(
total_pages=5,
document_type='research paper',
main_topics=['machine learning', 'natural language processing', 'transformers'],
key_findings=['Transformers outperform RNNs', 'Attention mechanism is key'],
page_summaries=['Page 1: Introduction to NLP', 'Page 2: Methodology', 'Page 3: Results', 'Page 4: Discussion', 'Page 5: Conclusion'],
overall_summary='This paper presents a comprehensive study on transformer models in NLP tasks.',
confidence_score=0.92
)
# Access attributes
print(f'Document has {summary.total_pages} pages')
print(f'Type: {summary.document_type}')
print(f'Confidence: {summary.confidence_score}')
for topic in summary.main_topics:
print(f'Topic: {topic}')
Best Practices
- Always ensure total_pages matches the length of page_summaries list for consistency
- Keep confidence_score between 0.0 and 1.0 to maintain standard probability conventions
- Use descriptive and consistent document_type values to enable proper categorization
- Populate main_topics with distinct, non-overlapping topics for clarity
- Keep key_findings concise and actionable, focusing on the most important insights
- Since this is a dataclass, it's immutable by convention - avoid modifying attributes after instantiation
- Consider using frozen=True in the @dataclass decorator if true immutability is required
- Validate data before instantiation to ensure all required fields are properly populated
- Use this class as a return type for document analysis functions to maintain consistent API contracts
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class MultiPageAnalysisResult 75.7% similar
-
class PageAnalysis 70.6% similar
-
class DataSection 64.0% similar
-
class AnalysisResult 63.9% similar
-
class AnalysisResult_v1 63.5% similar