Document_v1 - Code Extractor

class Document_v1

Maturity: 42

Document class represents a reMarkable document file, extending the Item class to provide document-specific operations like content extraction, uploading, and rendering with annotations.

File:
/tf/active/vicechatdev/rmcl/items.py

Lines:
214 - 281

Complexity:
complex

Purpose

The Document class manages reMarkable document files (PDF, EPUB, notes). It handles document content extraction from ZIP archives, uploading new documents with proper metadata structure, and rendering annotated versions using the rmrl library. It caches annotated document sizes for performance optimization and provides both synchronous and asynchronous interfaces for all operations.

Source Code

class Document(Item):

    def __init__(self, *args, **kw):
        super().__init__(*args, **kw)
        self._annotated_size = datacache.get_property(self.id, self.version, 'annotated_size')

    @add_sync
    async def contents(self):
        if await self.type() in (FileType.notes, FileType.unknown):
            return await self.raw()

        zf = zipfile.ZipFile(await self.raw(), 'r')
        for f in zf.filelist:
            if f.filename.endswith(str(await self.type())):
                return zf.open(f)
        return io.BytesIO(b'Unable to load file contents')

    @add_sync
    async def upload(self, new_contents, type_):
        if type_ not in (FileType.pdf, FileType.epub):
            raise TypeError(f"Cannot upload file of type {type_}")

        content = {
            'extraMetadata': {},
            'fileType': str(type_),
            'lastOpenedPage': 0,
            'lineHeight': -1,
            'margins': 100,
            'pageCount': 0,
            'textScale': 1,
            'transform': {},
        }

        f = io.BytesIO()
        with zipfile.ZipFile(f, 'w', zipfile.ZIP_DEFLATED) as zf:
            zf.writestr(f'{self.id}.pagedata','')
            zf.writestr(f'{self.id}.content', json.dumps(content))
            zf.writestr(f'{self.id}.{type_}', new_contents.read())
        f.seek(0)

        return await self.upload_raw(f)

    @add_sync
    async def annotated(self, **render_kw):
        if render is None:
            raise ImportError("rmrl must be installed to get annotated documents")

        if 'progress_cb' not in render_kw:
            render_kw['progress_cb'] = (
                lambda pct: log.info(f"Rendering {self}: {pct:0.1f}%"))

        zf = zipfile.ZipFile(await self.raw(), 'r')
        # run_sync doesn't accept keyword arguments to be passed to the sync
        # function, so we'll assemble to function to call out here.
        render_func = lambda: render(sources.ZipSource(zf), **render_kw)
        contents = (await trio.to_thread.run_sync(render_func))
        # Seek to end to get the length of this file.
        contents.seek(0, 2)
        self._annotated_size = contents.tell()
        datacache.set_property(self.id, self.version, 'annotated_size', self._annotated_size)
        contents.seek(0)
        return contents

    @add_sync
    async def annotated_size(self):
        if self._annotated_size is not None:
            return self._annotated_size
        return await self.size()

Parameters

Name	Type	Default	Kind
`bases`	Item	-

Parameter Details

*args: Variable positional arguments passed to the parent Item class constructor for basic item initialization

**kw: Variable keyword arguments passed to the parent Item class constructor, typically including item metadata like id, version, and other Item properties

Return Value

Instantiation returns a Document object that represents a reMarkable document. Key method returns: contents() returns a file-like object with document contents; upload() returns the result of upload_raw(); annotated() returns a BytesIO object containing the rendered PDF with annotations; annotated_size() returns an integer representing the size in bytes.

Class Interface

Methods

`init(self, *args, **kw)`

Purpose: Initialize a Document instance, loading cached annotated size from datacache

Parameters:

*args: Positional arguments passed to parent Item class
**kw: Keyword arguments passed to parent Item class

Returns: None (constructor)

`async contents(self) -> io.BytesIO`

Purpose: Extract and return the document contents from the ZIP archive, handling different file types

Returns: File-like object containing the document contents (raw for notes/unknown types, extracted from ZIP for PDF/EPUB)

`contents_sync(self) -> io.BytesIO`

Purpose: Synchronous version of contents() method

Returns: File-like object containing the document contents

`async upload(self, new_contents, type_: FileType)`

Purpose: Upload a new document file (PDF or EPUB) with proper reMarkable metadata structure

Parameters:

new_contents: File-like object with read() method containing the document data to upload
type_: FileType enum value, must be FileType.pdf or FileType.epub

Returns: Result from upload_raw() method call

`upload_sync(self, new_contents, type_: FileType)`

Purpose: Synchronous version of upload() method

Parameters:

new_contents: File-like object with read() method containing the document data
type_: FileType enum value (PDF or EPUB)

Returns: Result from upload_raw() method call

`async annotated(self, **render_kw) -> io.BytesIO`

Purpose: Render the document with annotations using rmrl library, caching the resulting size

Parameters:

**render_kw: Keyword arguments passed to rmrl.render(), such as progress_cb for progress callbacks

Returns: BytesIO object containing the rendered PDF with annotations, seeked to position 0

`annotated_sync(self, **render_kw) -> io.BytesIO`

Purpose: Synchronous version of annotated() method

Parameters:

**render_kw: Keyword arguments for rendering

Returns: BytesIO object containing the rendered PDF with annotations

`async annotated_size(self) -> int`

Purpose: Get the size of the annotated document, using cached value if available or falling back to regular size

Returns: Integer representing the size in bytes of the annotated document

`annotated_size_sync(self) -> int`

Purpose: Synchronous version of annotated_size() method

Returns: Integer representing the size in bytes

Attributes

Name	Type	Description	Scope
`_annotated_size`	int or None	Cached size in bytes of the annotated document, loaded from datacache on initialization and updated when annotated() is called	instance
`id`	str	Unique identifier for the document, inherited from Item class	instance
`version`	int	Version number of the document, inherited from Item class	instance

Dependencies

trio
rmrl

Required Imports

import functools
import io
import json
import logging
import uuid
import zipfile
import trio
from const import ROOT_ID
from const import TRASH_ID
from const import FileType
from datacache import datacache
from exceptions import DocumentNotFound
from exceptions import VirtualItemError
from sync import add_sync
from utils import now
from utils import parse_datetime

Conditional/Optional Imports

These imports are only needed under specific conditions:

from rmrl import render

Condition: only needed when calling the annotated() method to render documents with annotations

Optional

from rmrl import sources

Condition: only needed when calling the annotated() method to provide ZIP source for rendering

Optional

Usage Example

# Instantiate a Document (typically done internally by the library)
doc = Document(id='some-uuid', version=1, parent='parent-id')

# Get document contents (async)
contents = await doc.contents()

# Or use synchronous version
contents = doc.contents_sync()

# Upload a new PDF document
with open('new_document.pdf', 'rb') as f:
    await doc.upload(f, FileType.pdf)

# Get annotated version with custom rendering options
annotated_pdf = await doc.annotated(progress_cb=lambda pct: print(f'{pct}%'))

# Get size of annotated document
size = await doc.annotated_size()

# Synchronous versions are also available
annotated_pdf = doc.annotated_sync()
size = doc.annotated_size_sync()

Best Practices

Always check document type before calling upload() - only PDF and EPUB are supported
The annotated() method requires rmrl library to be installed; handle ImportError appropriately
Use async methods (await) in async contexts, or use the _sync versions in synchronous code
The class caches annotated_size in datacache for performance; this persists across instances
When uploading documents, provide a file-like object with a read() method
The annotated() method can be resource-intensive; consider providing a progress callback
Document contents are stored in ZIP format internally; the class handles extraction automatically
The _annotated_size attribute is lazily loaded from datacache on initialization
Methods decorated with @add_sync automatically generate synchronous versions with _sync suffix
Always seek file-like objects to position 0 before reading if you need to read multiple times

Similar Components

AI-powered semantic similarity - components with related functionality:

class Item 69.2% similar

Base class representing an item (document or folder) in a reMarkable cloud storage system, providing methods for metadata management, file operations, and synchronization.
From: /tf/active/vicechatdev/rmcl/items.py
class Document 65.2% similar

A dataclass representing a document with hierarchical structure, versioning, metadata, and collaboration features.
From: /tf/active/vicechatdev/vice_ai/models.py
class RemarkableCloudManager 64.7% similar

Unified manager for reMarkable Cloud operations that uses REST API as primary method with rmcl library as fallback, handling authentication, file operations, and folder management.
From: /tf/active/vicechatdev/e-ink-llm/remarkable_cloud.py
class RemarkableNode 64.5% similar

A dataclass representing a node (file or folder) in the reMarkable cloud storage system, containing metadata, hierarchy information, and component hashes for documents.
From: /tf/active/vicechatdev/e-ink-llm/cloudtest/discovery.py
class RemarkableCloudWatcher 63.7% similar

Monitors the reMarkable Cloud 'gpt_out' folder for new documents, automatically downloads them, and converts .rm (reMarkable native) files to PDF format.
From: /tf/active/vicechatdev/e-ink-llm/mixed_cloud_processor.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            class Document(Item):

    def __init__(self, *args, **kw):
        super().__init__(*args, **kw)
        self._annotated_size = datacache.get_property(self.id, self.version, 'annotated_size')

    @add_sync
    async def contents(self):
        if await self.type() in (FileType.notes, FileType.unknown):
            return await self.raw()

        zf = zipfile.ZipFile(await self.raw(), 'r')
        for f in zf.filelist:
            if f.filename.endswith(str(await self.type())):
                return zf.open(f)
        return io.BytesIO(b'Unable to load file contents')

    @add_sync
    async def upload(self, new_contents, type_):
        if type_ not in (FileType.pdf, FileType.epub):
            raise TypeError(f"Cannot upload file of type {type_}")

        content = {
            'extraMetadata': {},
            'fileType': str(type_),
            'lastOpenedPage': 0,
            'lineHeight': -1,
            'margins': 100,
            'pageCount': 0,
            'textScale': 1,
            'transform': {},
        }

        f = io.BytesIO()
        with zipfile.ZipFile(f, 'w', zipfile.ZIP_DEFLATED) as zf:
            zf.writestr(f'{self.id}.pagedata','')
            zf.writestr(f'{self.id}.content', json.dumps(content))
            zf.writestr(f'{self.id}.{type_}', new_contents.read())
        f.seek(0)

        return await self.upload_raw(f)

    @add_sync
    async def annotated(self, **render_kw):
        if render is None:
            raise ImportError("rmrl must be installed to get annotated documents")

        if 'progress_cb' not in render_kw:
            render_kw['progress_cb'] = (
                lambda pct: log.info(f"Rendering {self}: {pct:0.1f}%"))

        zf = zipfile.ZipFile(await self.raw(), 'r')
        # run_sync doesn't accept keyword arguments to be passed to the sync
        # function, so we'll assemble to function to call out here.
        render_func = lambda: render(sources.ZipSource(zf), **render_kw)
        contents = (await trio.to_thread.run_sync(render_func))
        # Seek to end to get the length of this file.
        contents.seek(0, 2)
        self._annotated_size = contents.tell()
        datacache.set_property(self.id, self.version, 'annotated_size', self._annotated_size)
        contents.seek(0)
        return contents

    @add_sync
    async def annotated_size(self):
        if self._annotated_size is not None:
            return self._annotated_size
        return await self.size()
                        

Improved Code

🔍 Code Extractor

class Document_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`init(self, *args, **kw)`

`async contents(self) -> io.BytesIO`

`contents_sync(self) -> io.BytesIO`

`async upload(self, new_contents, type_: FileType)`

`upload_sync(self, new_contents, type_: FileType)`

`async annotated(self, **render_kw) -> io.BytesIO`

`annotated_sync(self, **render_kw) -> io.BytesIO`

`async annotated_size(self) -> int`

`annotated_size_sync(self) -> int`

Attributes

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

class Item 69.2% similar

class Document 65.2% similar

class RemarkableCloudManager 64.7% similar

class RemarkableNode 64.5% similar

class RemarkableCloudWatcher 63.7% similar

class Document_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__init__(self, *args, **kw)

async contents(self) -> io.BytesIO

contents_sync(self) -> io.BytesIO

async upload(self, new_contents, type_: FileType)

upload_sync(self, new_contents, type_: FileType)

async annotated(self, **render_kw) -> io.BytesIO

annotated_sync(self, **render_kw) -> io.BytesIO

async annotated_size(self) -> int

annotated_size_sync(self) -> int

Attributes

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

class Item 69.2% similar

class Document 65.2% similar

class RemarkableCloudManager 64.7% similar

class RemarkableNode 64.5% similar

class RemarkableCloudWatcher 63.7% similar

✨ Improve Code: Document_v1

Code Comparison

`init(self, *args, **kw)`

`async contents(self) -> io.BytesIO`

`contents_sync(self) -> io.BytesIO`

`async upload(self, new_contents, type_: FileType)`

`upload_sync(self, new_contents, type_: FileType)`

`async annotated(self, **render_kw) -> io.BytesIO`

`annotated_sync(self, **render_kw) -> io.BytesIO`

`async annotated_size(self) -> int`

`annotated_size_sync(self) -> int`