function create_folder_hierarchy_v2
Creates a hierarchical structure of Subfolder nodes in a Neo4j graph database based on a file path, establishing parent-child relationships between folders.
/tf/active/vicechatdev/offline_parser_docstore.py
114 - 169
moderate
Purpose
This function parses a file path (expected to start with './PDF_docs/') and creates a corresponding hierarchy of Subfolder nodes in a Neo4j graph database. Each folder level is represented as a node with properties including UID, Name, Path, Level, and Keys. The function connects folders to their parent folders using PATH relationships, with the root level connecting to a Rootfolder node named 'T001'. It's designed for organizing document storage structures in a graph database, particularly for PDF document management systems.
Source Code
def create_folder_hierarchy(graph, file_path):
"""Create a hierarchy of Subfolder nodes based on the file path"""
# Get path components from the PDF_docs folder
if file_path.startswith("./PDF_docs/"):
rel_path = file_path[11:] # Remove './PDF_docs/' prefix
else:
rel_path = os.path.basename(file_path) # Just use filename if no expected prefix
# If file is directly in the PDF_docs root
if "/" not in rel_path:
return None
# Split into folder components
folders = rel_path.split("/")
folders.pop() # Remove the filename itself
if not folders: # No subfolders
return None
current_path = "./PDF_docs"
parent_uid = None
key=graph.run("match (x:Docstores) where not ('Template' in labels(x)) return x.Keys").evaluate()
# Create folder hierarchy
for i, folder in enumerate(folders):
current_path = os.path.join(current_path, folder)
folder_escaped = folder.replace("'", "`")
current_path_escaped = current_path.replace("'", "``")
# Check if this folder node already exists
result = graph.run(f"MATCH (f:Subfolder {{Path: '{current_path_escaped}'}})"
f" RETURN f.UID as uid").data()
if not result:
# Create new folder node
folder_uid = str(uuid4())
if i == 0:
# Connect to the References node since it's the first level
graph.run(f"MATCH (x:Rootfolder {{Name:'T001'}}) "
f" MERGE (x)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
else:
# Connect to parent folder
graph.run(f"MATCH (p:Subfolder {{UID: '{parent_uid}'}})"
f" MERGE (p)-[:PATH]->(:Subfolder {{UID: '{folder_uid}', "
f"Name: '{folder_escaped}', Path: '{current_path_escaped}', "
f"Level: '{i+1}',"
f"Keys:'{key}'}})")
parent_uid = folder_uid
else:
parent_uid = result[0]['uid']
# Return the UID of the deepest subfolder
return parent_uid
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
graph |
- | - | positional_or_keyword |
file_path |
- | - | positional_or_keyword |
Parameter Details
graph: A Neo4j graph database connection object (likely from py2neo or similar library) that provides a 'run' method to execute Cypher queries. This object is used to query and create nodes and relationships in the database.
file_path: A string representing the file path to process. Expected format is './PDF_docs/subfolder1/subfolder2/.../filename.ext'. The function extracts folder hierarchy from this path. If the path doesn't start with './PDF_docs/', only the basename is used.
Return Value
Returns a string containing the UID (Unique Identifier) of the deepest subfolder node created or found in the hierarchy. Returns None if the file is directly in the PDF_docs root directory (no subfolders) or if there are no folders to process. The returned UID can be used to link documents to their containing folder.
Dependencies
neo4jpy2neouuid
Required Imports
from uuid import uuid4
import os
Usage Example
from uuid import uuid4
import os
from py2neo import Graph
# Establish Neo4j connection
graph = Graph('bolt://localhost:7687', auth=('neo4j', 'password'))
# Ensure required nodes exist
graph.run("MERGE (r:Rootfolder {Name:'T001'})")
graph.run("MERGE (d:Docstores {Keys:'default_key'})")
# Create folder hierarchy for a file
file_path = './PDF_docs/research/papers/2024/document.pdf'
deepest_folder_uid = create_folder_hierarchy(graph, file_path)
if deepest_folder_uid:
print(f'Deepest folder UID: {deepest_folder_uid}')
# Use the UID to link a document node
graph.run(f"MATCH (f:Subfolder {{UID: '{deepest_folder_uid}'}}) "
f"MERGE (f)-[:CONTAINS]->(:Document {{Name: 'document.pdf'}})")
else:
print('File is in root directory, no subfolders created')
Best Practices
- Ensure the Neo4j database has a Rootfolder node with Name='T001' before calling this function
- Ensure at least one Docstores node exists in the database without a 'Template' label
- Be aware that single quotes in folder names are escaped to backticks, which may cause issues with folder names containing backticks
- The function uses string interpolation in Cypher queries which could be vulnerable to injection attacks; consider using parameterized queries instead
- The function assumes './PDF_docs/' as the root path; modify the prefix removal logic if using a different root directory
- Consider adding error handling for database connection failures or query execution errors
- The function creates nodes with MERGE operations, which prevents duplicates but may have performance implications for large hierarchies
- The 'Keys' property is retrieved once and applied to all folders; ensure this is the intended behavior for your use case
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function create_folder_hierarchy_v1 93.9% similar
-
function create_folder_hierarchy 93.4% similar
-
function create_folder 67.5% similar
-
function add_document_to_graph 63.6% similar
-
function add_document_to_graph_v1 62.4% similar