PageAnalysis - Code Extractor

class PageAnalysis

Maturity: 41

A dataclass that encapsulates the analysis results for a single PDF page, including its image representation, text content, dimensions, and optional analysis metadata.

File:
/tf/active/vicechatdev/e-ink-llm/multi_page_processor.py

Lines:
18 - 26

Complexity:
simple

Purpose

PageAnalysis serves as a structured data container for storing comprehensive information about a single PDF page after processing and analysis. It holds the page's visual representation (as base64-encoded image), extracted text content, page dimensions, and optional analysis results such as content type classification and key elements identification. This class is typically used in PDF processing pipelines where pages need to be analyzed individually and their results stored in a structured format for further processing or reporting.

Source Code

class PageAnalysis:
    """Analysis result for a single PDF page"""
    page_number: int
    image_b64: str
    text_content: str
    dimensions: Tuple[int, int]
    analysis_result: Optional[str] = None
    content_type: Optional[str] = None
    key_elements: Optional[List[str]] = None

Parameters

Name	Type	Default	Kind
`bases`	-	-

Parameter Details

page_number: The sequential number of the page within the PDF document (typically 1-indexed). Used to identify and order pages within a document.

image_b64: Base64-encoded string representation of the page rendered as an image. This allows the visual content of the page to be stored and transmitted as text.

text_content: The extracted text content from the PDF page. Contains all readable text elements found on the page.

dimensions: A tuple of two integers (width, height) representing the pixel dimensions of the page image.

analysis_result: Optional string containing the results of any analysis performed on the page (e.g., summary, classification results, or structured analysis output). Defaults to None if no analysis has been performed.

content_type: Optional string indicating the type or category of content on the page (e.g., 'table', 'text', 'image', 'mixed'). Defaults to None if not classified.

key_elements: Optional list of strings identifying important elements or features found on the page (e.g., ['header', 'table', 'chart']). Defaults to None if not analyzed.

Return Value

Instantiation returns a PageAnalysis object containing all the specified attributes. As a dataclass, it automatically generates __init__, __repr__, __eq__, and other special methods. The object serves as an immutable-by-convention data container for page analysis results.

Class Interface

Methods

`init(page_number: int, image_b64: str, text_content: str, dimensions: Tuple[int, int], analysis_result: Optional[str] = None, content_type: Optional[str] = None, key_elements: Optional[List[str]] = None) -> None`

Purpose: Initializes a new PageAnalysis instance with the provided page data and optional analysis results. Auto-generated by the dataclass decorator.

Parameters:

page_number: The page number within the PDF document
image_b64: Base64-encoded image representation of the page
text_content: Extracted text from the page
dimensions: Tuple of (width, height) in pixels
analysis_result: Optional analysis results or summary
content_type: Optional content type classification
key_elements: Optional list of identified key elements

Returns: None (constructor)

`repr() -> str`

Purpose: Returns a string representation of the PageAnalysis object showing all field values. Auto-generated by the dataclass decorator.

Returns: String representation of the object in the format 'PageAnalysis(page_number=..., image_b64=..., ...)'

`eq(other: object) -> bool`

Purpose: Compares two PageAnalysis objects for equality based on all field values. Auto-generated by the dataclass decorator.

Parameters:

other: Another object to compare with

Returns: True if all fields are equal, False otherwise

Attributes

Name	Type	Description	Scope
`page_number`	int	The sequential number of the page within the PDF document	instance
`image_b64`	str	Base64-encoded string representation of the page rendered as an image	instance
`text_content`	str	The extracted text content from the PDF page	instance
`dimensions`	Tuple[int, int]	A tuple containing the width and height of the page image in pixels	instance
`analysis_result`	Optional[str]	Optional string containing analysis results, summaries, or structured output from page analysis	instance
`content_type`	Optional[str]	Optional classification of the page content type (e.g., 'table', 'text', 'image', 'mixed')	instance
`key_elements`	Optional[List[str]]	Optional list of identified key elements or features on the page (e.g., headers, tables, charts)	instance

Dependencies

dataclasses
typing

Required Imports

from dataclasses import dataclass
from typing import Tuple, Optional, List

Usage Example

from dataclasses import dataclass
from typing import Tuple, Optional, List

@dataclass
class PageAnalysis:
    page_number: int
    image_b64: str
    text_content: str
    dimensions: Tuple[int, int]
    analysis_result: Optional[str] = None
    content_type: Optional[str] = None
    key_elements: Optional[List[str]] = None

# Create a PageAnalysis instance for a simple page
page_analysis = PageAnalysis(
    page_number=1,
    image_b64="iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
    text_content="This is the text content of page 1.",
    dimensions=(800, 1100)
)

# Create a PageAnalysis instance with optional fields
detailed_analysis = PageAnalysis(
    page_number=2,
    image_b64="base64_encoded_image_data_here",
    text_content="Page 2 contains a table and chart.",
    dimensions=(800, 1100),
    analysis_result="This page contains financial data with a summary table and trend chart.",
    content_type="mixed",
    key_elements=["table", "chart", "header"]
)

# Access attributes
print(f"Page {page_analysis.page_number}: {page_analysis.dimensions}")
print(f"Content type: {detailed_analysis.content_type}")
print(f"Key elements: {detailed_analysis.key_elements}")

Best Practices

This is a dataclass, so it should be treated as an immutable data container. Avoid modifying attributes after instantiation unless necessary.
The image_b64 field can contain large amounts of data for high-resolution pages. Consider memory implications when storing many PageAnalysis objects.
Always provide the required fields (page_number, image_b64, text_content, dimensions) during instantiation. Optional fields can be set later if needed.
Use meaningful values for content_type to enable consistent categorization across your application (e.g., establish a fixed set of content types).
The key_elements list should contain standardized element names for consistency in downstream processing.
When serializing PageAnalysis objects (e.g., to JSON), be aware that the image_b64 field may significantly increase payload size.
Page numbers should typically start at 1 to match conventional PDF page numbering, though 0-indexing is also acceptable if used consistently.
The dimensions tuple should represent (width, height) in pixels, matching the resolution of the image_b64 data.

Similar Components

AI-powered semantic similarity - components with related functionality:

class MultiPageAnalysisResult 81.5% similar

A dataclass that encapsulates the complete results of analyzing a multi-page document, including individual page analyses, document summary, combined response, and processing statistics.
From: /tf/active/vicechatdev/e-ink-llm/multi_page_llm_handler.py
class DocumentSummary 70.6% similar

A dataclass that encapsulates comprehensive analysis results of a document, including page-level and document-level summaries, topics, findings, and confidence metrics.
From: /tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
class AnalysisResult 67.7% similar

A dataclass that encapsulates the results from statistical analysis operations, including metadata, file paths, and timestamps.
From: /tf/active/vicechatdev/vice_ai/smartstat_models.py
class AnalysisResult_v1 67.4% similar

A dataclass that encapsulates the results from statistical analysis operations, including metadata, file paths, and timestamps.
From: /tf/active/vicechatdev/vice_ai/models.py
class DataSection 66.9% similar

A dataclass representing a dedicated data analysis section that stores analysis results, plots, dataset information, and conclusions separately from text content.
From: /tf/active/vicechatdev/vice_ai/models.py

🔍 Code Extractor

class PageAnalysis

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`init(page_number: int, image_b64: str, text_content: str, dimensions: Tuple[int, int], analysis_result: Optional[str] = None, content_type: Optional[str] = None, key_elements: Optional[List[str]] = None) -> None`

`repr() -> str`

`eq(other: object) -> bool`

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

class MultiPageAnalysisResult 81.5% similar

class DocumentSummary 70.6% similar

class AnalysisResult 67.7% similar

class AnalysisResult_v1 67.4% similar

class DataSection 66.9% similar

class PageAnalysis

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__init__(page_number: int, image_b64: str, text_content: str, dimensions: Tuple[int, int], analysis_result: Optional[str] = None, content_type: Optional[str] = None, key_elements: Optional[List[str]] = None) -> None

__repr__() -> str

__eq__(other: object) -> bool

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

class MultiPageAnalysisResult 81.5% similar

class DocumentSummary 70.6% similar

class AnalysisResult 67.7% similar

class AnalysisResult_v1 67.4% similar

class DataSection 66.9% similar

✨ Improve Code: PageAnalysis

Code Comparison

`init(page_number: int, image_b64: str, text_content: str, dimensions: Tuple[int, int], analysis_result: Optional[str] = None, content_type: Optional[str] = None, key_elements: Optional[List[str]] = None) -> None`

`repr() -> str`

`eq(other: object) -> bool`