function create_folder_hierarchy_v1
Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, connecting each folder level with PATH relationships.
/tf/active/vicechatdev/offline_docstore_multi.py
1233 - 1289
complex
Purpose
This function parses a file path and creates a corresponding folder hierarchy in a Neo4j graph database. It starts from a root 'Documents' folder, creates Subfolder nodes for each directory level, and establishes parent-child relationships using PATH edges. Each folder node is assigned a unique identifier (UID) and stores metadata including name, path, level, and keys. The function handles both nested folder structures and files directly in the root, and avoids creating duplicate folder nodes by checking for existing paths.
Source Code
def create_folder_hierarchy(session, file_path):
"""Create a hierarchy of Subfolder nodes based on the file path"""
# Get path components from the PDF_docs folder
print("working on ",file_path)
if file_path.startswith("Documents/"):
rel_path = file_path[10:] # Remove './PDF_docs/' prefix
else:
rel_path = os.path.basename(file_path) # Just use filename if no expected prefix
# If file is directly in the PDF_docs root
if "/" not in rel_path:
return None
# Split into folder components
folders = rel_path.split("/")
folders.pop() # Remove the filename itself
if not folders: # No subfolders
return None
print("Folders: ",folders)
current_path = "./Documents"
parent_uid = None
key=evaluate_query(session,"match (x:Docstores) where not ('Template' in labels(x)) return x.Keys")
# Create folder hierarchy
for i, folder in enumerate(folders):
current_path = os.path.join(current_path, folder)
folder_escaped = folder.replace("'", "`")
current_path_escaped = current_path.replace("'", "``")
# Check if this folder node already exists
result = run_query(session,f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
f" RETURN f.UID as uid")
if not result:
# Create new folder node
folder_uid = str(uuid4())
if i == 0:
# Connect to the References node since it's the first level
run_query(session,f"MATCH (x:Rootfolder {{Name:'T001'}}) "
f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
else:
# Connect to parent folder
run_query(session,f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
parent_uid = folder_uid
else:
parent_uid = result[0]['uid']
# Return the UID of the deepest subfolder
return parent_uid
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
session |
- | - | positional_or_keyword |
file_path |
- | - | positional_or_keyword |
Parameter Details
session: A Neo4j database session object used to execute Cypher queries against the graph database. This should be an active session from a Neo4j driver connection.
file_path: A string representing the file path to process. Expected to start with 'Documents/' prefix (which gets stripped) or be a simple filename. The path should use forward slashes (/) as separators. The function extracts folder hierarchy from this path, excluding the filename itself.
Return Value
Returns a string containing the UID (Unique Identifier) of the deepest subfolder node created or found in the hierarchy. Returns None if the file is directly in the root directory with no subfolders, or if there are no folder components in the path after processing.
Dependencies
osuuidneo4j
Required Imports
import os
from uuid import uuid4
Usage Example
from neo4j import GraphDatabase
from uuid import uuid4
import os
# Assume evaluate_query and run_query functions are defined
# def evaluate_query(session, query): ...
# def run_query(session, query): ...
# Connect to Neo4j
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password'))
with driver.session() as session:
# Create folder hierarchy for a nested file
file_path = 'Documents/reports/2024/quarterly/report.pdf'
deepest_folder_uid = create_folder_hierarchy(session, file_path)
if deepest_folder_uid:
print(f'Created hierarchy, deepest folder UID: {deepest_folder_uid}')
else:
print('No subfolders to create')
# For a file in root directory
root_file = 'Documents/readme.txt'
result = create_folder_hierarchy(session, root_file)
# Returns None since no subfolders exist
driver.close()
Best Practices
- Ensure the Neo4j database has the required Rootfolder node with Name='T001' before calling this function
- The function depends on external helper functions 'evaluate_query' and 'run_query' which must be properly implemented
- File paths should use forward slashes (/) as separators for consistent parsing
- The function performs string escaping for single quotes (replacing with backticks), but may be vulnerable to Cypher injection if file paths contain malicious content - consider using parameterized queries instead
- The function prints debug information to console; consider using proper logging for production environments
- Ensure proper error handling around Neo4j session operations in production code
- The function assumes a specific database schema - verify schema compatibility before use
- Consider transaction management for atomic folder hierarchy creation
- The Keys property is retrieved once and applied to all folders - verify this is the intended behavior for your use case
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function create_folder_hierarchy 96.4% similar
-
function create_folder_hierarchy_v2 93.9% similar
-
function create_folder 71.3% similar
-
function add_document_to_graph 63.2% similar
-
function add_document_to_graph_v1 62.4% similar