build_document_tree_lazy - Code Extractor

function build_document_tree_lazy

Maturity: 51

Builds a single-level document tree structure for lazy loading, scanning only immediate children of a target directory without recursively loading subdirectories.

File:
/tf/active/vicechatdev/docchat/app.py

Lines:
325 - 401

Complexity:
moderate

Purpose

This function is designed for efficient file system navigation in document management applications. It performs lazy loading by scanning only one directory level at a time, avoiding performance issues with deep directory structures. It filters for supported document types (.pdf, .txt, .md, .doc, .docx, .pptx, .ppt, .xlsx, .xls, .html), skips hidden and cache directories, and retrieves indexing status for each file. The function is optimized for web applications that need to display file trees with on-demand expansion of folders.

Source Code

def build_document_tree_lazy(target_path, relative_base=""):
    """Build document tree for ONE level only (lazy loading)"""
    import time
    start_time = time.time()
    items = []
    
    logger.info(f"[BUILD_TREE_LAZY] Scanning single level: {target_path}")
    
    try:
        entries = sorted(target_path.iterdir(), key=lambda x: (not x.is_dir(), x.name.lower()))
    except PermissionError:
        logger.warning(f"[BUILD_TREE_LAZY] Permission denied: {target_path}")
        return {'children': []}
    
    for entry in entries:
        try:
            # Calculate relative path from document root
            if relative_base:
                relative_path = f"{relative_base}/{entry.name}"
            else:
                relative_path = entry.name
            
            if entry.is_dir():
                # Skip hidden and cache directories
                if entry.name.startswith('.') or entry.name == '__pycache__':
                    continue
                
                # For folders, just indicate they have children (don't load them yet)
                # Check if folder has any valid content
                has_content = any(
                    child.is_file() and child.suffix.lower() in ['.pdf', '.txt', '.md', '.doc', '.docx', '.pptx', '.ppt', '.xlsx', '.xls', '.html']
                    or (child.is_dir() and not child.name.startswith('.') and child.name != '__pycache__')
                    for child in entry.iterdir()
                )
                
                if has_content:
                    items.append({
                        'type': 'folder',
                        'name': entry.name,
                        'path': relative_path,
                        'hasChildren': True,  # Indicate this folder can be expanded
                        'loaded': False  # Not loaded yet
                    })
            
            elif entry.is_file():
                # Check if file extension is supported
                if entry.suffix.lower() in ['.pdf', '.txt', '.md', '.doc', '.docx', '.pptx', '.ppt', '.xlsx', '.xls', '.html']:
                    file_info = get_document_info(entry)
                    items.append({
                        'type': 'file',
                        'name': entry.name,
                        'path': relative_path,
                        'size': entry.stat().st_size,
                        'indexed': file_info['indexed'],
                        'doc_id': file_info.get('doc_id'),
                        'chunk_count': file_info.get('chunk_count', 0)
                    })
                    
        except Exception as e:
            logger.warning(f"[BUILD_TREE_LAZY] Error processing {entry}: {e}")
            continue
    
    elapsed = time.time() - start_time
    logger.info(f"[BUILD_TREE_LAZY] Completed in {elapsed:.2f}s, {len(items)} items")
    
    if relative_base:
        # Return children only for subfolder expansion
        return {'children': items}
    else:
        # Return root structure
        return {
            'type': 'folder',
            'name': target_path.name,
            'path': '',
            'children': items,
            'loaded': False
        }

Parameters

Name	Type	Default	Kind
`target_path`	-	-	positional_or_keyword
`relative_base`	-	''	positional_or_keyword

Parameter Details

target_path: A Path object (from pathlib) representing the directory to scan. This should be the absolute or relative path to the folder whose immediate children will be listed. The function expects a pathlib.Path object, not a string.

relative_base: A string representing the relative path prefix from the document root. Defaults to empty string for root-level scanning. When expanding subfolders, this should contain the parent path (e.g., 'folder1/subfolder2') to build correct relative paths for nested items. Used to construct the full relative path for each discovered item.

Return Value

Returns a dictionary with different structures depending on context. For root-level calls (relative_base is empty), returns: {'type': 'folder', 'name': str, 'path': '', 'children': list, 'loaded': False}. For subfolder expansion (relative_base is not empty), returns: {'children': list}. Each item in 'children' is either a folder dict {'type': 'folder', 'name': str, 'path': str, 'hasChildren': bool, 'loaded': False} or a file dict {'type': 'file', 'name': str, 'path': str, 'size': int, 'indexed': bool, 'doc_id': str|None, 'chunk_count': int}. Returns {'children': []} if permission is denied.

Dependencies

pathlib
time
logging

Required Imports

from pathlib import Path
import time
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

import time

Condition: imported inside the function for performance timing, but should be available at module level

Required (conditional)

Usage Example

from pathlib import Path
import logging

# Setup logger
logger = logging.getLogger(__name__)

# Mock get_document_info function (replace with actual implementation)
def get_document_info(file_path):
    return {'indexed': False, 'doc_id': None, 'chunk_count': 0}

# Scan root level
document_root = Path('/path/to/documents')
tree = build_document_tree_lazy(document_root)
print(f"Root folder: {tree['name']}")
print(f"Items found: {len(tree['children'])}")

# Expand a subfolder
subfolder_path = document_root / 'subfolder1'
subtree = build_document_tree_lazy(subfolder_path, relative_base='subfolder1')
print(f"Subfolder items: {len(subtree['children'])}")

# Access file information
for item in tree['children']:
    if item['type'] == 'file':
        print(f"File: {item['name']}, Size: {item['size']}, Indexed: {item['indexed']}")
    elif item['type'] == 'folder':
        print(f"Folder: {item['name']}, Has children: {item['hasChildren']}")

Best Practices

Ensure the 'logger' object is properly configured before calling this function
Implement the 'get_document_info' function to return document indexing status from your database or indexing system
Always pass a pathlib.Path object for target_path, not a string
Handle the different return structures based on whether relative_base is empty or not
The function filters for specific file extensions - modify the extension list if you need to support additional document types
Consider implementing caching for frequently accessed directories to improve performance
The function logs timing information - monitor these logs to identify performance bottlenecks
Handle PermissionError gracefully in the calling code, as the function returns an empty children list
When building a UI tree component, use the 'hasChildren' and 'loaded' flags to implement expand/collapse functionality
The relative_base parameter should use forward slashes (/) as path separators for consistency across platforms

Similar Components

AI-powered semantic similarity - components with related functionality:

function build_document_tree_recursive 76.5% similar

Recursively builds a complete hierarchical tree structure of documents and folders from a target directory path, filtering for supported file types and skipping hidden/cache directories.
From: /tf/active/vicechatdev/docchat/app.py
function api_document_tree 67.7% similar

Flask API endpoint that returns a hierarchical document tree structure from a configured document folder, supporting lazy loading and full expansion modes for efficient navigation and search.
From: /tf/active/vicechatdev/docchat/app.py
function show_directory_tree 56.5% similar

Recursively displays a visual tree structure of a directory and its contents, showing files with sizes and subdirectories up to a specified depth.
From: /tf/active/vicechatdev/e-ink-llm/cloudtest/test_complete_suite.py
function load_all_documents 56.4% similar

Loads all JSON documents from a designated documents directory and returns them as a dictionary indexed by document ID.
From: /tf/active/vicechatdev/vice_ai/complex_app.py
function create_folder_hierarchy_v2 52.4% similar

Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.
From: /tf/active/vicechatdev/offline_parser_docstore.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            def build_document_tree_lazy(target_path, relative_base=""):
    """Build document tree for ONE level only (lazy loading)"""
    import time
    start_time = time.time()
    items = []
    
    logger.info(f"[BUILD_TREE_LAZY] Scanning single level: {target_path}")
    
    try:
        entries = sorted(target_path.iterdir(), key=lambda x: (not x.is_dir(), x.name.lower()))
    except PermissionError:
        logger.warning(f"[BUILD_TREE_LAZY] Permission denied: {target_path}")
        return {'children': []}
    
    for entry in entries:
        try:
            # Calculate relative path from document root
            if relative_base:
                relative_path = f"{relative_base}/{entry.name}"
            else:
                relative_path = entry.name
            
            if entry.is_dir():
                # Skip hidden and cache directories
                if entry.name.startswith('.') or entry.name == '__pycache__':
                    continue
                
                # For folders, just indicate they have children (don't load them yet)
                # Check if folder has any valid content
                has_content = any(
                    child.is_file() and child.suffix.lower() in ['.pdf', '.txt', '.md', '.doc', '.docx', '.pptx', '.ppt', '.xlsx', '.xls', '.html']
                    or (child.is_dir() and not child.name.startswith('.') and child.name != '__pycache__')
                    for child in entry.iterdir()
                )
                
                if has_content:
                    items.append({
                        'type': 'folder',
                        'name': entry.name,
                        'path': relative_path,
                        'hasChildren': True,  # Indicate this folder can be expanded
                        'loaded': False  # Not loaded yet
                    })
            
            elif entry.is_file():
                # Check if file extension is supported
                if entry.suffix.lower() in ['.pdf', '.txt', '.md', '.doc', '.docx', '.pptx', '.ppt', '.xlsx', '.xls', '.html']:
                    file_info = get_document_info(entry)
                    items.append({
                        'type': 'file',
                        'name': entry.name,
                        'path': relative_path,
                        'size': entry.stat().st_size,
                        'indexed': file_info['indexed'],
                        'doc_id': file_info.get('doc_id'),
                        'chunk_count': file_info.get('chunk_count', 0)
                    })
                    
        except Exception as e:
            logger.warning(f"[BUILD_TREE_LAZY] Error processing {entry}: {e}")
            continue
    
    elapsed = time.time() - start_time
    logger.info(f"[BUILD_TREE_LAZY] Completed in {elapsed:.2f}s, {len(items)} items")
    
    if relative_base:
        # Return children only for subfolder expansion
        return {'children': items}
    else:
        # Return root structure
        return {
            'type': 'folder',
            'name': target_path.name,
            'path': '',
            'children': items,
            'loaded': False
        }
                        

Improved Code

🔍 Code Extractor

function build_document_tree_lazy

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function build_document_tree_recursive 76.5% similar

function api_document_tree 67.7% similar

function show_directory_tree 56.5% similar

function load_all_documents 56.4% similar

function create_folder_hierarchy_v2 52.4% similar

function build_document_tree_lazy

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function build_document_tree_recursive 76.5% similar

function api_document_tree 67.7% similar

function show_directory_tree 56.5% similar

function load_all_documents 56.4% similar

function create_folder_hierarchy_v2 52.4% similar

✨ Improve Code: build_document_tree_lazy

Code Comparison