function create_folder_hierarchy
Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file system path, connecting each folder level with PATH relationships.
/tf/active/vicechatdev/offline_docstore_multi_vice.py
1277 - 1327
complex
Purpose
This function is designed to mirror a file system's folder structure within a Neo4j graph database. It parses a file path, creates Subfolder nodes for each directory level, and establishes parent-child relationships between them. The function connects the top-level folder to a Rootfolder node and ensures that duplicate folders are not created by checking for existing nodes. This is typically used in document management systems or knowledge graphs where file organization needs to be represented in a graph structure.
Source Code
def create_folder_hierarchy(session, common_path, file_path, topfolder, rootfolder_uid):
"""Create a hierarchy of Subfolder nodes based on the file path"""
print("working on ",file_path)
subpath=str(file).replace(common_path,'').replace('/'+folder_name+'/','')
# Split into folder components
folders = subpath.split("/")
folders.pop() # Remove the filename itself
if not folders: # No subfolders
return None
print("Folders: ",folders)
current_path = common_path+'/'+topfolder
parent_uid = None
key=evaluate_query(session,"match (x:Docstores) where not ('Template' in labels(x)) return x.Keys")
# Create folder hierarchy
for i, folder in enumerate(folders):
current_path = os.path.join(current_path, folder)
folder_escaped = folder.replace("'", "`")
current_path_escaped = current_path.replace("'", "``")
# Check if this folder node already exists - get result DATA
result_data = run_query(session,f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
f" RETURN f.UID as uid").data()
#print(result_data)
# Check if the result data list is empty
if not result_data:
# Create new folder node
folder_uid = str(uuid4())
if i == 0:
# Connect to the References node since it's the first level
run_query(session,f"MATCH (x:Rootfolder {{UID:'{rootfolder_uid}'}}) "
f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
else:
# Connect to parent folder
run_query(session,f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
parent_uid = folder_uid
else:
# Access the UID from the first result in the data list
parent_uid = result_data[0]['uid']
# Return the UID of the deepest subfolder
return parent_uid
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
session |
- | - | positional_or_keyword |
common_path |
- | - | positional_or_keyword |
file_path |
- | - | positional_or_keyword |
topfolder |
- | - | positional_or_keyword |
rootfolder_uid |
- | - | positional_or_keyword |
Parameter Details
session: A Neo4j database session object used to execute Cypher queries against the graph database. Must be an active session from neo4j.GraphDatabase.driver().session()
common_path: The base/root path string that should be removed from the file_path to determine the relative folder structure. This represents the common prefix shared by all files being processed
file_path: The complete file system path string for the file being processed. This path will be parsed to extract the folder hierarchy
topfolder: The name of the top-level folder string that serves as the root of the hierarchy being created. This is appended to common_path to build the current_path
rootfolder_uid: The unique identifier (UID) string of the Rootfolder node in Neo4j to which the first-level Subfolder should be connected via a PATH relationship
Return Value
Returns the UID (string) of the deepest/last Subfolder node in the hierarchy, which represents the immediate parent folder of the file. Returns None if there are no subfolders in the path (file is directly in the root folder). This UID can be used to link the actual file node to its containing folder.
Dependencies
neo4juuidos
Required Imports
from neo4j import GraphDatabase
from uuid import uuid4
import os
Usage Example
from neo4j import GraphDatabase
from uuid import uuid4
import os
# Assuming evaluate_query and run_query functions are defined
# Assuming folder_name and file variables are defined in scope
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password'))
session = driver.session()
# Create root folder first
rootfolder_uid = str(uuid4())
session.run("CREATE (r:Rootfolder {UID: $uid, Name: 'Documents'})", uid=rootfolder_uid)
# Define paths
common_path = '/home/user/documents'
file_path = '/home/user/documents/project/reports/2024/report.pdf'
topfolder = 'project'
# Create folder hierarchy
deepest_folder_uid = create_folder_hierarchy(
session=session,
common_path=common_path,
file_path=file_path,
topfolder=topfolder,
rootfolder_uid=rootfolder_uid
)
if deepest_folder_uid:
print(f'Deepest folder UID: {deepest_folder_uid}')
# Now you can link the file node to this folder
else:
print('No subfolders created')
session.close()
driver.close()
Best Practices
- Ensure the Neo4j session is properly opened and closed using context managers or explicit close() calls
- The function has undefined variables 'file' and 'folder_name' that must be defined in the calling scope - consider passing these as parameters
- The function uses string interpolation for Cypher queries which can be vulnerable to injection attacks - consider using parameterized queries instead
- The function performs character escaping for single quotes but uses different escape sequences (` vs ``) - verify this matches Neo4j's escaping requirements
- Error handling should be added for database connection failures and query execution errors
- The evaluate_query and run_query helper functions must be available in scope
- Consider adding transaction management to ensure atomicity of the folder hierarchy creation
- The function prints debug information - consider using proper logging instead
- Verify that the Docstores node exists before calling this function to avoid errors in the evaluate_query call
- The function assumes forward slashes in paths - may need adjustment for Windows paths
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function create_folder_hierarchy_v1 96.4% similar
-
function create_folder_hierarchy_v2 93.4% similar
-
function create_folder 71.3% similar
-
function add_document_to_graph 62.8% similar
-
function add_document_to_graph_v1 61.5% similar