🔍 Code Extractor

class Document_v1

Maturity: 42

Document class represents a reMarkable document file, extending the Item class to provide document-specific operations like content extraction, uploading, and rendering with annotations.

File:
/tf/active/vicechatdev/rmcl/items.py
Lines:
214 - 281
Complexity:
complex

Purpose

The Document class manages reMarkable document files (PDF, EPUB, notes). It handles document content extraction from ZIP archives, uploading new documents with proper metadata structure, and rendering annotated versions using the rmrl library. It caches annotated document sizes for performance optimization and provides both synchronous and asynchronous interfaces for all operations.

Source Code

class Document(Item):

    def __init__(self, *args, **kw):
        super().__init__(*args, **kw)
        self._annotated_size = datacache.get_property(self.id, self.version, 'annotated_size')

    @add_sync
    async def contents(self):
        if await self.type() in (FileType.notes, FileType.unknown):
            return await self.raw()

        zf = zipfile.ZipFile(await self.raw(), 'r')
        for f in zf.filelist:
            if f.filename.endswith(str(await self.type())):
                return zf.open(f)
        return io.BytesIO(b'Unable to load file contents')

    @add_sync
    async def upload(self, new_contents, type_):
        if type_ not in (FileType.pdf, FileType.epub):
            raise TypeError(f"Cannot upload file of type {type_}")

        content = {
            'extraMetadata': {},
            'fileType': str(type_),
            'lastOpenedPage': 0,
            'lineHeight': -1,
            'margins': 100,
            'pageCount': 0,
            'textScale': 1,
            'transform': {},
        }

        f = io.BytesIO()
        with zipfile.ZipFile(f, 'w', zipfile.ZIP_DEFLATED) as zf:
            zf.writestr(f'{self.id}.pagedata','')
            zf.writestr(f'{self.id}.content', json.dumps(content))
            zf.writestr(f'{self.id}.{type_}', new_contents.read())
        f.seek(0)

        return await self.upload_raw(f)

    @add_sync
    async def annotated(self, **render_kw):
        if render is None:
            raise ImportError("rmrl must be installed to get annotated documents")

        if 'progress_cb' not in render_kw:
            render_kw['progress_cb'] = (
                lambda pct: log.info(f"Rendering {self}: {pct:0.1f}%"))

        zf = zipfile.ZipFile(await self.raw(), 'r')
        # run_sync doesn't accept keyword arguments to be passed to the sync
        # function, so we'll assemble to function to call out here.
        render_func = lambda: render(sources.ZipSource(zf), **render_kw)
        contents = (await trio.to_thread.run_sync(render_func))
        # Seek to end to get the length of this file.
        contents.seek(0, 2)
        self._annotated_size = contents.tell()
        datacache.set_property(self.id, self.version, 'annotated_size', self._annotated_size)
        contents.seek(0)
        return contents

    @add_sync
    async def annotated_size(self):
        if self._annotated_size is not None:
            return self._annotated_size
        return await self.size()

Parameters

Name Type Default Kind
bases Item -

Parameter Details

*args: Variable positional arguments passed to the parent Item class constructor for basic item initialization

**kw: Variable keyword arguments passed to the parent Item class constructor, typically including item metadata like id, version, and other Item properties

Return Value

Instantiation returns a Document object that represents a reMarkable document. Key method returns: contents() returns a file-like object with document contents; upload() returns the result of upload_raw(); annotated() returns a BytesIO object containing the rendered PDF with annotations; annotated_size() returns an integer representing the size in bytes.

Class Interface

Methods

__init__(self, *args, **kw)

Purpose: Initialize a Document instance, loading cached annotated size from datacache

Parameters:

  • *args: Positional arguments passed to parent Item class
  • **kw: Keyword arguments passed to parent Item class

Returns: None (constructor)

async contents(self) -> io.BytesIO

Purpose: Extract and return the document contents from the ZIP archive, handling different file types

Returns: File-like object containing the document contents (raw for notes/unknown types, extracted from ZIP for PDF/EPUB)

contents_sync(self) -> io.BytesIO

Purpose: Synchronous version of contents() method

Returns: File-like object containing the document contents

async upload(self, new_contents, type_: FileType)

Purpose: Upload a new document file (PDF or EPUB) with proper reMarkable metadata structure

Parameters:

  • new_contents: File-like object with read() method containing the document data to upload
  • type_: FileType enum value, must be FileType.pdf or FileType.epub

Returns: Result from upload_raw() method call

upload_sync(self, new_contents, type_: FileType)

Purpose: Synchronous version of upload() method

Parameters:

  • new_contents: File-like object with read() method containing the document data
  • type_: FileType enum value (PDF or EPUB)

Returns: Result from upload_raw() method call

async annotated(self, **render_kw) -> io.BytesIO

Purpose: Render the document with annotations using rmrl library, caching the resulting size

Parameters:

  • **render_kw: Keyword arguments passed to rmrl.render(), such as progress_cb for progress callbacks

Returns: BytesIO object containing the rendered PDF with annotations, seeked to position 0

annotated_sync(self, **render_kw) -> io.BytesIO

Purpose: Synchronous version of annotated() method

Parameters:

  • **render_kw: Keyword arguments for rendering

Returns: BytesIO object containing the rendered PDF with annotations

async annotated_size(self) -> int

Purpose: Get the size of the annotated document, using cached value if available or falling back to regular size

Returns: Integer representing the size in bytes of the annotated document

annotated_size_sync(self) -> int

Purpose: Synchronous version of annotated_size() method

Returns: Integer representing the size in bytes

Attributes

Name Type Description Scope
_annotated_size int or None Cached size in bytes of the annotated document, loaded from datacache on initialization and updated when annotated() is called instance
id str Unique identifier for the document, inherited from Item class instance
version int Version number of the document, inherited from Item class instance

Dependencies

  • trio
  • rmrl

Required Imports

import functools
import io
import json
import logging
import uuid
import zipfile
import trio
from const import ROOT_ID
from const import TRASH_ID
from const import FileType
from datacache import datacache
from exceptions import DocumentNotFound
from exceptions import VirtualItemError
from sync import add_sync
from utils import now
from utils import parse_datetime

Conditional/Optional Imports

These imports are only needed under specific conditions:

from rmrl import render

Condition: only needed when calling the annotated() method to render documents with annotations

Optional
from rmrl import sources

Condition: only needed when calling the annotated() method to provide ZIP source for rendering

Optional

Usage Example

# Instantiate a Document (typically done internally by the library)
doc = Document(id='some-uuid', version=1, parent='parent-id')

# Get document contents (async)
contents = await doc.contents()

# Or use synchronous version
contents = doc.contents_sync()

# Upload a new PDF document
with open('new_document.pdf', 'rb') as f:
    await doc.upload(f, FileType.pdf)

# Get annotated version with custom rendering options
annotated_pdf = await doc.annotated(progress_cb=lambda pct: print(f'{pct}%'))

# Get size of annotated document
size = await doc.annotated_size()

# Synchronous versions are also available
annotated_pdf = doc.annotated_sync()
size = doc.annotated_size_sync()

Best Practices

  • Always check document type before calling upload() - only PDF and EPUB are supported
  • The annotated() method requires rmrl library to be installed; handle ImportError appropriately
  • Use async methods (await) in async contexts, or use the _sync versions in synchronous code
  • The class caches annotated_size in datacache for performance; this persists across instances
  • When uploading documents, provide a file-like object with a read() method
  • The annotated() method can be resource-intensive; consider providing a progress callback
  • Document contents are stored in ZIP format internally; the class handles extraction automatically
  • The _annotated_size attribute is lazily loaded from datacache on initialization
  • Methods decorated with @add_sync automatically generate synchronous versions with _sync suffix
  • Always seek file-like objects to position 0 before reading if you need to read multiple times

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class Item 69.2% similar

    Base class representing an item (document or folder) in a reMarkable cloud storage system, providing methods for metadata management, file operations, and synchronization.

    From: /tf/active/vicechatdev/rmcl/items.py
  • class Document 65.2% similar

    A dataclass representing a document with hierarchical structure, versioning, metadata, and collaboration features.

    From: /tf/active/vicechatdev/vice_ai/models.py
  • class RemarkableCloudManager 64.7% similar

    Unified manager for reMarkable Cloud operations that uses REST API as primary method with rmcl library as fallback, handling authentication, file operations, and folder management.

    From: /tf/active/vicechatdev/e-ink-llm/remarkable_cloud.py
  • class RemarkableNode 64.5% similar

    A dataclass representing a node (file or folder) in the reMarkable cloud storage system, containing metadata, hierarchy information, and component hashes for documents.

    From: /tf/active/vicechatdev/e-ink-llm/cloudtest/discovery.py
  • class RemarkableCloudWatcher 63.7% similar

    Monitors the reMarkable Cloud 'gpt_out' folder for new documents, automatically downloads them, and converts .rm (reMarkable native) files to PDF format.

    From: /tf/active/vicechatdev/e-ink-llm/mixed_cloud_processor.py
← Back to Browse