🔍 Code Extractor

class Document

Maturity: 51

A dataclass representing a document with hierarchical structure, versioning, metadata, and collaboration features.

File:
/tf/active/vicechatdev/vice_ai/models.py
Lines:
532 - 601
Complexity:
moderate

Purpose

The Document class serves as a container for structured documents with sections, version control, metadata management, and user collaboration capabilities. It provides serialization/deserialization methods for persistence and includes automatic timestamp management. This class is designed to be the primary data model for document management systems that require hierarchical organization, version tracking, and multi-user sharing.

Source Code

class Document:
    """Document as a container with hierarchical structure"""
    id: str
    owner: str
    title: str
    description: str = ""
    
    # Document structure (ordered list of section references)
    sections: List[DocumentSection] = None
    
    # Versioning
    current_version_id: str = None
    version_number: int = 1
    
    # Document-level metadata
    created_at: datetime = None
    updated_at: datetime = None
    tags: List[str] = None
    metadata: Dict[str, Any] = None
    
    # Collaboration
    shared_with: List[str] = None  # List of user emails
    
    def __post_init__(self):
        if self.created_at is None:
            self.created_at = datetime.now()
        if self.updated_at is None:
            self.updated_at = datetime.now()
        if self.sections is None:
            self.sections = []
        if self.tags is None:
            self.tags = []
        if self.metadata is None:
            self.metadata = {}
        if self.shared_with is None:
            self.shared_with = []
    
    def to_dict(self) -> Dict:
        return {
            'id': self.id,
            'owner': self.owner,
            'title': self.title,
            'description': self.description,
            'sections': [section.to_dict() for section in self.sections],
            'sections_count': len(self.sections),
            'current_version_id': self.current_version_id,
            'version_number': self.version_number,
            'created_at': self.created_at.isoformat(),
            'updated_at': self.updated_at.isoformat(),
            'tags': self.tags,
            'metadata': self.metadata,
            'shared_with': self.shared_with
        }
    
    @classmethod
    def from_dict(cls, data: Dict) -> 'Document':
        return cls(
            id=data['id'],
            owner=data['owner'],
            title=data['title'],
            description=data.get('description', ''),
            sections=[DocumentSection.from_dict(s) for s in data.get('sections', [])],
            current_version_id=data.get('current_version_id'),
            version_number=data.get('version_number', 1),
            created_at=datetime.fromisoformat(data['created_at']),
            updated_at=datetime.fromisoformat(data['updated_at']),
            tags=data.get('tags', []),
            metadata=data.get('metadata', {}),
            shared_with=data.get('shared_with', [])
        )

Parameters

Name Type Default Kind
bases - -

Parameter Details

id: Unique identifier for the document (string). Should be unique across all documents in the system.

owner: Email or identifier of the document owner (string). Represents the primary user who created or owns the document.

title: The title/name of the document (string). Required field for document identification.

description: Optional detailed description of the document (string). Defaults to empty string if not provided.

sections: Ordered list of DocumentSection objects representing the hierarchical structure. Defaults to empty list if not provided.

current_version_id: Identifier of the current version (string or None). Used for version control tracking.

version_number: Integer version number of the document. Defaults to 1 for new documents.

created_at: Timestamp when document was created (datetime). Auto-set to current time if not provided.

updated_at: Timestamp of last update (datetime). Auto-set to current time if not provided.

tags: List of string tags for categorization and search. Defaults to empty list if not provided.

metadata: Dictionary for storing arbitrary key-value metadata. Defaults to empty dict if not provided.

shared_with: List of user emails who have access to this document. Defaults to empty list if not provided.

Return Value

Instantiation returns a Document object with all attributes initialized. The to_dict() method returns a dictionary representation with all fields serialized (datetime objects converted to ISO format strings, sections converted to dicts). The from_dict() class method returns a new Document instance reconstructed from a dictionary.

Class Interface

Methods

__post_init__(self) -> None

Purpose: Initializes default values for optional attributes after dataclass initialization. Automatically called after __init__.

Returns: None. Modifies instance attributes in-place.

to_dict(self) -> Dict

Purpose: Serializes the Document instance to a dictionary representation suitable for JSON serialization or database storage.

Returns: Dictionary containing all document fields with datetime objects converted to ISO format strings, sections converted to dictionaries, and includes a computed 'sections_count' field.

from_dict(cls, data: Dict) -> Document

Purpose: Class method that deserializes a dictionary into a Document instance. Handles type conversions including ISO datetime strings and nested DocumentSection objects.

Parameters:

  • data: Dictionary containing document data with keys matching Document attributes. Must include 'id', 'owner', 'title', 'created_at', and 'updated_at'. Other fields are optional with defaults.

Returns: New Document instance reconstructed from the dictionary data with all attributes properly typed and initialized.

Attributes

Name Type Description Scope
id str Unique identifier for the document instance
owner str Email or identifier of the document owner instance
title str Title/name of the document instance
description str Optional detailed description of the document, defaults to empty string instance
sections List[DocumentSection] Ordered list of DocumentSection objects representing the document's hierarchical structure instance
current_version_id str Identifier of the current version, used for version control tracking instance
version_number int Integer version number, defaults to 1 for new documents instance
created_at datetime Timestamp when the document was created, auto-initialized to current time instance
updated_at datetime Timestamp of the last update, auto-initialized to current time instance
tags List[str] List of string tags for categorization and search functionality instance
metadata Dict[str, Any] Dictionary for storing arbitrary key-value metadata pairs instance
shared_with List[str] List of user emails who have access to this document for collaboration instance

Dependencies

  • datetime
  • typing
  • dataclasses

Required Imports

from datetime import datetime
from typing import List, Dict, Any
from dataclasses import dataclass

Usage Example

from datetime import datetime
from typing import List, Dict, Any
from dataclasses import dataclass

# Assuming DocumentSection is defined
doc = Document(
    id='doc-123',
    owner='user@example.com',
    title='My Document',
    description='A sample document',
    tags=['important', 'draft']
)

# Access attributes
print(doc.title)  # 'My Document'
print(doc.version_number)  # 1
print(doc.created_at)  # Auto-generated timestamp

# Add metadata
doc.metadata['category'] = 'technical'
doc.tags.append('reviewed')

# Share with users
doc.shared_with.append('colleague@example.com')

# Serialize to dictionary
doc_dict = doc.to_dict()
print(doc_dict['sections_count'])  # 0

# Deserialize from dictionary
restored_doc = Document.from_dict(doc_dict)
print(restored_doc.title)  # 'My Document'

Best Practices

  • Always provide unique id values to avoid conflicts in document storage systems
  • Use __post_init__ automatic initialization for optional fields - do not manually initialize None values
  • Call to_dict() before persisting to databases or JSON files for proper serialization
  • Use from_dict() class method for deserialization rather than manual construction
  • Update updated_at timestamp manually when modifying document content
  • Ensure DocumentSection class is properly defined with matching to_dict/from_dict methods
  • The sections list maintains order - use list operations to manage section sequence
  • Use metadata dictionary for extensible custom properties without modifying the class
  • Increment version_number when creating new versions of the document
  • The class uses mutable default arguments safely via __post_init__ pattern

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class DocumentSection 78.8% similar

    A class representing a section within a complex document, supporting hierarchical structure with headers, text content, and references.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • class DocumentVersion 78.8% similar

    A dataclass that represents a versioned snapshot of a document, capturing its structure, metadata, and change history at a specific point in time.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class ComplexDocument 77.3% similar

    A class representing a complex document with multiple sections, supporting section management, references, metadata, and serialization capabilities.

    From: /tf/active/vicechatdev/vice_ai/complex_app.py
  • class DocumentSection_v1 77.0% similar

    A dataclass representing a reference to a section (TextSection or DataSection) within a document structure, supporting hierarchical organization and section type differentiation.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class DocumentVersion_v1 67.6% similar

    Model representing a specific version of a controlled document.

    From: /tf/active/vicechatdev/CDocs/models/document.py
← Back to Browse