🔍 Code Extractor

class PageAnalysis

Maturity: 41

A dataclass that encapsulates the analysis results for a single PDF page, including its image representation, text content, dimensions, and optional analysis metadata.

File:
/tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
Lines:
18 - 26
Complexity:
simple

Purpose

PageAnalysis serves as a structured data container for storing comprehensive information about a single PDF page after processing and analysis. It holds the page's visual representation (as base64-encoded image), extracted text content, page dimensions, and optional analysis results such as content type classification and key elements identification. This class is typically used in PDF processing pipelines where pages need to be analyzed individually and their results stored in a structured format for further processing or reporting.

Source Code

class PageAnalysis:
    """Analysis result for a single PDF page"""
    page_number: int
    image_b64: str
    text_content: str
    dimensions: Tuple[int, int]
    analysis_result: Optional[str] = None
    content_type: Optional[str] = None
    key_elements: Optional[List[str]] = None

Parameters

Name Type Default Kind
bases - -

Parameter Details

page_number: The sequential number of the page within the PDF document (typically 1-indexed). Used to identify and order pages within a document.

image_b64: Base64-encoded string representation of the page rendered as an image. This allows the visual content of the page to be stored and transmitted as text.

text_content: The extracted text content from the PDF page. Contains all readable text elements found on the page.

dimensions: A tuple of two integers (width, height) representing the pixel dimensions of the page image.

analysis_result: Optional string containing the results of any analysis performed on the page (e.g., summary, classification results, or structured analysis output). Defaults to None if no analysis has been performed.

content_type: Optional string indicating the type or category of content on the page (e.g., 'table', 'text', 'image', 'mixed'). Defaults to None if not classified.

key_elements: Optional list of strings identifying important elements or features found on the page (e.g., ['header', 'table', 'chart']). Defaults to None if not analyzed.

Return Value

Instantiation returns a PageAnalysis object containing all the specified attributes. As a dataclass, it automatically generates __init__, __repr__, __eq__, and other special methods. The object serves as an immutable-by-convention data container for page analysis results.

Class Interface

Methods

__init__(page_number: int, image_b64: str, text_content: str, dimensions: Tuple[int, int], analysis_result: Optional[str] = None, content_type: Optional[str] = None, key_elements: Optional[List[str]] = None) -> None

Purpose: Initializes a new PageAnalysis instance with the provided page data and optional analysis results. Auto-generated by the dataclass decorator.

Parameters:

  • page_number: The page number within the PDF document
  • image_b64: Base64-encoded image representation of the page
  • text_content: Extracted text from the page
  • dimensions: Tuple of (width, height) in pixels
  • analysis_result: Optional analysis results or summary
  • content_type: Optional content type classification
  • key_elements: Optional list of identified key elements

Returns: None (constructor)

__repr__() -> str

Purpose: Returns a string representation of the PageAnalysis object showing all field values. Auto-generated by the dataclass decorator.

Returns: String representation of the object in the format 'PageAnalysis(page_number=..., image_b64=..., ...)'

__eq__(other: object) -> bool

Purpose: Compares two PageAnalysis objects for equality based on all field values. Auto-generated by the dataclass decorator.

Parameters:

  • other: Another object to compare with

Returns: True if all fields are equal, False otherwise

Attributes

Name Type Description Scope
page_number int The sequential number of the page within the PDF document instance
image_b64 str Base64-encoded string representation of the page rendered as an image instance
text_content str The extracted text content from the PDF page instance
dimensions Tuple[int, int] A tuple containing the width and height of the page image in pixels instance
analysis_result Optional[str] Optional string containing analysis results, summaries, or structured output from page analysis instance
content_type Optional[str] Optional classification of the page content type (e.g., 'table', 'text', 'image', 'mixed') instance
key_elements Optional[List[str]] Optional list of identified key elements or features on the page (e.g., headers, tables, charts) instance

Dependencies

  • dataclasses
  • typing

Required Imports

from dataclasses import dataclass
from typing import Tuple, Optional, List

Usage Example

from dataclasses import dataclass
from typing import Tuple, Optional, List

@dataclass
class PageAnalysis:
    page_number: int
    image_b64: str
    text_content: str
    dimensions: Tuple[int, int]
    analysis_result: Optional[str] = None
    content_type: Optional[str] = None
    key_elements: Optional[List[str]] = None

# Create a PageAnalysis instance for a simple page
page_analysis = PageAnalysis(
    page_number=1,
    image_b64="iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
    text_content="This is the text content of page 1.",
    dimensions=(800, 1100)
)

# Create a PageAnalysis instance with optional fields
detailed_analysis = PageAnalysis(
    page_number=2,
    image_b64="base64_encoded_image_data_here",
    text_content="Page 2 contains a table and chart.",
    dimensions=(800, 1100),
    analysis_result="This page contains financial data with a summary table and trend chart.",
    content_type="mixed",
    key_elements=["table", "chart", "header"]
)

# Access attributes
print(f"Page {page_analysis.page_number}: {page_analysis.dimensions}")
print(f"Content type: {detailed_analysis.content_type}")
print(f"Key elements: {detailed_analysis.key_elements}")

Best Practices

  • This is a dataclass, so it should be treated as an immutable data container. Avoid modifying attributes after instantiation unless necessary.
  • The image_b64 field can contain large amounts of data for high-resolution pages. Consider memory implications when storing many PageAnalysis objects.
  • Always provide the required fields (page_number, image_b64, text_content, dimensions) during instantiation. Optional fields can be set later if needed.
  • Use meaningful values for content_type to enable consistent categorization across your application (e.g., establish a fixed set of content types).
  • The key_elements list should contain standardized element names for consistency in downstream processing.
  • When serializing PageAnalysis objects (e.g., to JSON), be aware that the image_b64 field may significantly increase payload size.
  • Page numbers should typically start at 1 to match conventional PDF page numbering, though 0-indexing is also acceptable if used consistently.
  • The dimensions tuple should represent (width, height) in pixels, matching the resolution of the image_b64 data.

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class MultiPageAnalysisResult 81.5% similar

    A dataclass that encapsulates the complete results of analyzing a multi-page document, including individual page analyses, document summary, combined response, and processing statistics.

    From: /tf/active/vicechatdev/e-ink-llm/multi_page_llm_handler.py
  • class DocumentSummary 70.6% similar

    A dataclass that encapsulates comprehensive analysis results of a document, including page-level and document-level summaries, topics, findings, and confidence metrics.

    From: /tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
  • class AnalysisResult 67.7% similar

    A dataclass that encapsulates the results from statistical analysis operations, including metadata, file paths, and timestamps.

    From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
  • class AnalysisResult_v1 67.4% similar

    A dataclass that encapsulates the results from statistical analysis operations, including metadata, file paths, and timestamps.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DataSection 66.9% similar

    A dataclass representing a dedicated data analysis section that stores analysis results, plots, dataset information, and conclusions separately from text content.

    From: /tf/active/vicechatdev/vice_ai/models.py
← Back to Browse