🔍 Code Extractor

function create_folder_hierarchy

Maturity: 51

Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file system path, connecting each folder level with PATH relationships.

File:
/tf/active/vicechatdev/offline_docstore_multi_vice.py
Lines:
1277 - 1327
Complexity:
complex

Purpose

This function is designed to mirror a file system's folder structure within a Neo4j graph database. It parses a file path, creates Subfolder nodes for each directory level, and establishes parent-child relationships between them. The function connects the top-level folder to a Rootfolder node and ensures that duplicate folders are not created by checking for existing nodes. This is typically used in document management systems or knowledge graphs where file organization needs to be represented in a graph structure.

Source Code

def create_folder_hierarchy(session, common_path, file_path, topfolder, rootfolder_uid):
    """Create a hierarchy of Subfolder nodes based on the file path"""
    print("working on ",file_path)
    subpath=str(file).replace(common_path,'').replace('/'+folder_name+'/','')
    # Split into folder components
    folders = subpath.split("/")
    folders.pop()  # Remove the filename itself
    
    if not folders:  # No subfolders
        return None
    print("Folders: ",folders)
    current_path = common_path+'/'+topfolder
    parent_uid = None
    key=evaluate_query(session,"match (x:Docstores)  where not ('Template' in labels(x)) return x.Keys")
    
    # Create folder hierarchy
    for i, folder in enumerate(folders):
        current_path = os.path.join(current_path, folder)
        folder_escaped = folder.replace("'", "`")
        current_path_escaped = current_path.replace("'", "``")
        
        # Check if this folder node already exists - get result DATA
        result_data = run_query(session,f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
                          f" RETURN f.UID as uid").data()
        #print(result_data)
        
        # Check if the result data list is empty
        if not result_data:
            # Create new folder node
            folder_uid = str(uuid4())
            if i == 0:
                # Connect to the References node since it's the first level
                run_query(session,f"MATCH (x:Rootfolder {{UID:'{rootfolder_uid}'}}) "
                         f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
                         f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
                         f"Level: '{i+1}',"
                         f"Keys:'{key}'}})")
            else:
                # Connect to parent folder
                run_query(session,f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
                         f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
                         f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
                         f"Level: '{i+1}',"
                         f"Keys:'{key}'}})")
            parent_uid = folder_uid
        else:
            # Access the UID from the first result in the data list
            parent_uid = result_data[0]['uid']
    
    # Return the UID of the deepest subfolder
    return parent_uid

Parameters

Name Type Default Kind
session - - positional_or_keyword
common_path - - positional_or_keyword
file_path - - positional_or_keyword
topfolder - - positional_or_keyword
rootfolder_uid - - positional_or_keyword

Parameter Details

session: A Neo4j database session object used to execute Cypher queries against the graph database. Must be an active session from neo4j.GraphDatabase.driver().session()

common_path: The base/root path string that should be removed from the file_path to determine the relative folder structure. This represents the common prefix shared by all files being processed

file_path: The complete file system path string for the file being processed. This path will be parsed to extract the folder hierarchy

topfolder: The name of the top-level folder string that serves as the root of the hierarchy being created. This is appended to common_path to build the current_path

rootfolder_uid: The unique identifier (UID) string of the Rootfolder node in Neo4j to which the first-level Subfolder should be connected via a PATH relationship

Return Value

Returns the UID (string) of the deepest/last Subfolder node in the hierarchy, which represents the immediate parent folder of the file. Returns None if there are no subfolders in the path (file is directly in the root folder). This UID can be used to link the actual file node to its containing folder.

Dependencies

  • neo4j
  • uuid
  • os

Required Imports

from neo4j import GraphDatabase
from uuid import uuid4
import os

Usage Example

from neo4j import GraphDatabase
from uuid import uuid4
import os

# Assuming evaluate_query and run_query functions are defined
# Assuming folder_name and file variables are defined in scope

driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password'))
session = driver.session()

# Create root folder first
rootfolder_uid = str(uuid4())
session.run("CREATE (r:Rootfolder {UID: $uid, Name: 'Documents'})", uid=rootfolder_uid)

# Define paths
common_path = '/home/user/documents'
file_path = '/home/user/documents/project/reports/2024/report.pdf'
topfolder = 'project'

# Create folder hierarchy
deepest_folder_uid = create_folder_hierarchy(
    session=session,
    common_path=common_path,
    file_path=file_path,
    topfolder=topfolder,
    rootfolder_uid=rootfolder_uid
)

if deepest_folder_uid:
    print(f'Deepest folder UID: {deepest_folder_uid}')
    # Now you can link the file node to this folder
else:
    print('No subfolders created')

session.close()
driver.close()

Best Practices

  • Ensure the Neo4j session is properly opened and closed using context managers or explicit close() calls
  • The function has undefined variables 'file' and 'folder_name' that must be defined in the calling scope - consider passing these as parameters
  • The function uses string interpolation for Cypher queries which can be vulnerable to injection attacks - consider using parameterized queries instead
  • The function performs character escaping for single quotes but uses different escape sequences (` vs ``) - verify this matches Neo4j's escaping requirements
  • Error handling should be added for database connection failures and query execution errors
  • The evaluate_query and run_query helper functions must be available in scope
  • Consider adding transaction management to ensure atomicity of the folder hierarchy creation
  • The function prints debug information - consider using proper logging instead
  • Verify that the Docstores node exists before calling this function to avoid errors in the evaluate_query call
  • The function assumes forward slashes in paths - may need adjustment for Windows paths

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function create_folder_hierarchy_v1 96.4% similar

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, connecting each folder level with PATH relationships.

    From: /tf/active/vicechatdev/offline_docstore_multi.py
  • function create_folder_hierarchy_v2 93.4% similar

    Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.

    From: /tf/active/vicechatdev/offline_parser_docstore.py
  • function create_folder 71.3% similar

    Creates a nested folder structure on a FileCloud server by traversing a path and creating missing directories.

    From: /tf/active/vicechatdev/filecloud_wuxi_sync.py
  • function add_document_to_graph 62.8% similar

    Creates nodes and relationships in a Neo4j graph database for a processed document, including its text and table chunks, connecting it to a folder hierarchy.

    From: /tf/active/vicechatdev/offline_docstore_multi.py
  • function add_document_to_graph_v1 61.5% similar

    Creates a Neo4j graph node for a processed document and connects it to a folder hierarchy, along with its text and table chunks.

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
← Back to Browse