class AnnotationInfo
A dataclass that stores comprehensive information about a detected annotation in a PDF document, including its type, visual properties, location, and associated text content.
/tf/active/vicechatdev/e-ink-llm/annotation_detector.py
18 - 26
simple
Purpose
This dataclass serves as a structured container for metadata about annotations detected in PDF documents. It captures visual characteristics (color, area, bounds), classification information (annotation type, confidence score), location data (page number, bounding box), and optional text content. It is typically used as a return type or data transfer object in PDF annotation detection and analysis workflows.
Source Code
class AnnotationInfo:
"""Information about a detected annotation"""
annotation_type: str # 'highlight', 'strikethrough', 'markup', 'underline', 'insertion'
confidence: float # Confidence score 0-1
area: int # Area in pixels
color: Tuple[int, int, int] # RGB color
bounds: Tuple[int, int, int, int] # x, y, width, height
page_number: int # Page where annotation was found
text_content: Optional[str] = None # Associated text if available
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
annotation_type: String identifier for the type of annotation detected. Expected values are: 'highlight', 'strikethrough', 'markup', 'underline', or 'insertion'. This categorizes the visual annotation style.
confidence: Float value between 0 and 1 representing the confidence score of the annotation detection. Higher values indicate greater certainty that the detected region is indeed an annotation of the specified type.
area: Integer representing the area of the annotation in pixels. Calculated from the bounding box dimensions, useful for filtering or prioritizing annotations by size.
color: Tuple of three integers (R, G, B) representing the RGB color values of the annotation. Each value ranges from 0 to 255. Used to identify and categorize annotations by their visual appearance.
bounds: Tuple of four integers (x, y, width, height) defining the bounding box of the annotation. x and y are the top-left corner coordinates, width and height define the rectangle dimensions in pixels.
page_number: Integer indicating which page of the PDF document contains this annotation. Page numbering typically starts at 0 or 1 depending on the implementation context.
text_content: Optional string containing any text associated with or extracted from the annotation region. May be None if no text is available or if text extraction was not performed.
Return Value
Instantiation returns an AnnotationInfo object containing all the specified annotation metadata. As a dataclass, it automatically generates __init__, __repr__, __eq__, and other methods. The object is immutable by default unless frozen=False is specified in the dataclass decorator.
Class Interface
Methods
__init__(annotation_type: str, confidence: float, area: int, color: Tuple[int, int, int], bounds: Tuple[int, int, int, int], page_number: int, text_content: Optional[str] = None) -> None
Purpose: Initialize an AnnotationInfo instance with all required annotation metadata. Auto-generated by @dataclass decorator.
Parameters:
annotation_type: Type of annotation ('highlight', 'strikethrough', 'markup', 'underline', 'insertion')confidence: Detection confidence score (0-1)area: Annotation area in pixelscolor: RGB color tuple (R, G, B)bounds: Bounding box as (x, y, width, height)page_number: Page number where annotation appearstext_content: Optional associated text content
Returns: None - initializes the instance
__repr__() -> str
Purpose: Return a string representation of the AnnotationInfo instance. Auto-generated by @dataclass decorator.
Returns: String representation showing all field values
__eq__(other: object) -> bool
Purpose: Compare two AnnotationInfo instances for equality based on all fields. Auto-generated by @dataclass decorator.
Parameters:
other: Another object to compare with
Returns: True if all fields are equal, False otherwise
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
annotation_type |
str | Type of annotation detected: 'highlight', 'strikethrough', 'markup', 'underline', or 'insertion' | instance |
confidence |
float | Confidence score of the detection, ranging from 0 to 1 | instance |
area |
int | Area of the annotation in pixels | instance |
color |
Tuple[int, int, int] | RGB color values of the annotation as a tuple (R, G, B) | instance |
bounds |
Tuple[int, int, int, int] | Bounding box coordinates and dimensions as (x, y, width, height) | instance |
page_number |
int | Page number where the annotation was found | instance |
text_content |
Optional[str] | Associated text content extracted from the annotation region, or None if not available | instance |
Dependencies
dataclassestyping
Required Imports
from dataclasses import dataclass
from typing import Tuple, Optional
Usage Example
from dataclasses import dataclass
from typing import Tuple, Optional
@dataclass
class AnnotationInfo:
annotation_type: str
confidence: float
area: int
color: Tuple[int, int, int]
bounds: Tuple[int, int, int, int]
page_number: int
text_content: Optional[str] = None
# Create an annotation info object for a yellow highlight
annotation = AnnotationInfo(
annotation_type='highlight',
confidence=0.95,
area=15000,
color=(255, 255, 0),
bounds=(100, 200, 300, 50),
page_number=1,
text_content='Important passage to remember'
)
# Access attributes
print(f"Type: {annotation.annotation_type}")
print(f"Confidence: {annotation.confidence}")
print(f"Location: Page {annotation.page_number}, bounds {annotation.bounds}")
print(f"Text: {annotation.text_content}")
# Create annotation without text content
strikethrough = AnnotationInfo(
annotation_type='strikethrough',
confidence=0.87,
area=8000,
color=(255, 0, 0),
bounds=(150, 300, 200, 20),
page_number=2
)
Best Practices
- This is a data container class with no methods - use it to store and pass annotation information between functions
- Ensure confidence values are always between 0 and 1 when creating instances
- RGB color values should be integers between 0 and 255
- Page numbers should be consistent with your PDF processing library's indexing (0-based or 1-based)
- The bounds tuple follows (x, y, width, height) format - ensure consistency when creating instances
- text_content is optional and defaults to None - only populate it when text extraction is performed
- Consider validating input values in a factory function or wrapper if strict constraints are needed
- This dataclass is immutable by default - create new instances rather than modifying existing ones
- Use type hints when working with collections of AnnotationInfo objects (e.g., List[AnnotationInfo])
- The area field should match the calculated area from bounds (width * height) for consistency
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class AnnotationResult 76.7% similar
-
class PageAnalysis 65.8% similar
-
class SessionInfo 63.4% similar
-
class AnnotationDetector 62.9% similar
-
class TableInfo 60.3% similar