🔍 Code Extractor

function create_folder_hierarchy_v1

Maturity: 49

Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, connecting each folder level with PATH relationships.

File:
/tf/active/vicechatdev/offline_docstore_multi.py
Lines:
1233 - 1289
Complexity:
complex

Purpose

This function parses a file path and creates a corresponding folder hierarchy in a Neo4j graph database. It starts from a root 'Documents' folder, creates Subfolder nodes for each directory level, and establishes parent-child relationships using PATH edges. Each folder node is assigned a unique identifier (UID) and stores metadata including name, path, level, and keys. The function handles both nested folder structures and files directly in the root, and avoids creating duplicate folder nodes by checking for existing paths.

Source Code

def create_folder_hierarchy(session, file_path):
    """Create a hierarchy of Subfolder nodes based on the file path"""
    # Get path components from the PDF_docs folder
    print("working on ",file_path)
    if file_path.startswith("Documents/"):
        rel_path = file_path[10:]  # Remove './PDF_docs/' prefix
    else:
        rel_path = os.path.basename(file_path)  # Just use filename if no expected prefix
    
    # If file is directly in the PDF_docs root
    if "/" not in rel_path:
        return None
    
    # Split into folder components
    folders = rel_path.split("/")
    folders.pop()  # Remove the filename itself
    
    if not folders:  # No subfolders
        return None
    print("Folders: ",folders)
    current_path = "./Documents"
    parent_uid = None
    key=evaluate_query(session,"match (x:Docstores)  where not ('Template' in labels(x)) return x.Keys")
    
    # Create folder hierarchy
    for i, folder in enumerate(folders):
        current_path = os.path.join(current_path, folder)
        folder_escaped = folder.replace("'", "`")
        current_path_escaped = current_path.replace("'", "``")
        
        # Check if this folder node already exists
        result = run_query(session,f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
                          f" RETURN f.UID as uid")
        
        if not result:
            # Create new folder node
            folder_uid = str(uuid4())
            if i == 0:
                # Connect to the References node since it's the first level
                run_query(session,f"MATCH (x:Rootfolder {{Name:'T001'}}) "
                         f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
                         f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
                         f"Level: '{i+1}',"
                         f"Keys:'{key}'}})")
            else:
                # Connect to parent folder
                run_query(session,f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
                         f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
                         f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
                         f"Level: '{i+1}',"
                         f"Keys:'{key}'}})")
            parent_uid = folder_uid
        else:
            parent_uid = result[0]['uid']
    
    # Return the UID of the deepest subfolder
    return parent_uid

Parameters

Name Type Default Kind
session - - positional_or_keyword
file_path - - positional_or_keyword

Parameter Details

session: A Neo4j database session object used to execute Cypher queries against the graph database. This should be an active session from a Neo4j driver connection.

file_path: A string representing the file path to process. Expected to start with 'Documents/' prefix (which gets stripped) or be a simple filename. The path should use forward slashes (/) as separators. The function extracts folder hierarchy from this path, excluding the filename itself.

Return Value

Returns a string containing the UID (Unique Identifier) of the deepest subfolder node created or found in the hierarchy. Returns None if the file is directly in the root directory with no subfolders, or if there are no folder components in the path after processing.

Dependencies

  • os
  • uuid
  • neo4j

Required Imports

import os
from uuid import uuid4

Usage Example

from neo4j import GraphDatabase
from uuid import uuid4
import os

# Assume evaluate_query and run_query functions are defined
# def evaluate_query(session, query): ...
# def run_query(session, query): ...

# Connect to Neo4j
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password'))

with driver.session() as session:
    # Create folder hierarchy for a nested file
    file_path = 'Documents/reports/2024/quarterly/report.pdf'
    deepest_folder_uid = create_folder_hierarchy(session, file_path)
    
    if deepest_folder_uid:
        print(f'Created hierarchy, deepest folder UID: {deepest_folder_uid}')
    else:
        print('No subfolders to create')
    
    # For a file in root directory
    root_file = 'Documents/readme.txt'
    result = create_folder_hierarchy(session, root_file)
    # Returns None since no subfolders exist

driver.close()

Best Practices

  • Ensure the Neo4j database has the required Rootfolder node with Name='T001' before calling this function
  • The function depends on external helper functions 'evaluate_query' and 'run_query' which must be properly implemented
  • File paths should use forward slashes (/) as separators for consistent parsing
  • The function performs string escaping for single quotes (replacing with backticks), but may be vulnerable to Cypher injection if file paths contain malicious content - consider using parameterized queries instead
  • The function prints debug information to console; consider using proper logging for production environments
  • Ensure proper error handling around Neo4j session operations in production code
  • The function assumes a specific database schema - verify schema compatibility before use
  • Consider transaction management for atomic folder hierarchy creation
  • The Keys property is retrieved once and applied to all folders - verify this is the intended behavior for your use case

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function create_folder_hierarchy 96.4% similar

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file system path, connecting each folder level with PATH relationships.

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
  • function create_folder_hierarchy_v2 93.9% similar

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.

    From: /tf/active/vicechatdev/offline_parser_docstore.py
  • function create_folder 71.3% similar

    Creates a nested folder structure on a FileCloud server by traversing a path and creating missing directories.

    From: /tf/active/vicechatdev/filecloud_wuxi_sync.py
  • function add_document_to_graph 63.2% similar

    Creates nodes and relationships in a Neo4j graph database for a processed document, including its text and table chunks, connecting it to a folder hierarchy.

    From: /tf/active/vicechatdev/offline_docstore_multi.py
  • function add_document_to_graph_v1 62.4% similar

    Creates a Neo4j graph node for a processed document and connects it to a folder hierarchy, along with its text and table chunks.

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
← Back to Browse