function batch_create_nodes
Creates multiple Neo4j graph database nodes in batches for improved performance, automatically generating UIDs and timestamps for each node.
/tf/active/vicechatdev/CDocs/db/db_operations.py
712 - 759
moderate
Purpose
This function efficiently creates large numbers of nodes in a Neo4j database by processing them in configurable batches. It uses the UNWIND Cypher clause to minimize database round-trips, automatically assigns unique identifiers (UIDs) and creation timestamps to nodes that don't have them, and provides error handling with partial success tracking. Ideal for bulk data imports, migrations, or any scenario requiring creation of many nodes with the same label.
Source Code
def batch_create_nodes(label: str,
nodes_data: List[Dict[str, Any]],
batch_size: int = 100) -> List[str]:
"""
Create multiple nodes in batches for better performance.
Args:
label: Label for all nodes
nodes_data: List of property dictionaries for nodes to create
batch_size: Number of nodes to create in each batch
Returns:
List of UIDs for created nodes
"""
created_uids = []
try:
driver = get_driver()
with driver.session() as session:
# Process in batches
for i in range(0, len(nodes_data), batch_size):
batch = nodes_data[i:i+batch_size]
# Ensure each node has a UID
for node in batch:
if 'UID' not in node:
node['UID'] = str(uuid.uuid4())
if 'createdDate' not in node:
node['createdDate'] = datetime.now()
created_uids.append(node['UID'])
# Create batch with single query using UNWIND
session.run(
f"""
UNWIND $nodes as node
CREATE (n:{label} {{UID: node.UID}})
SET n += node
""",
nodes=batch
)
logger.debug(f"Created batch of {len(batch)} nodes")
return created_uids
except Exception as e:
logger.error(f"Error in batch node creation: {e}")
return created_uids
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
label |
str | - | positional_or_keyword |
nodes_data |
List[Dict[str, Any]] | - | positional_or_keyword |
batch_size |
int | 100 | positional_or_keyword |
Parameter Details
label: The Neo4j node label to assign to all created nodes. This is a string that categorizes the nodes (e.g., 'Person', 'Document', 'Event'). Must be a valid Neo4j label name without special characters.
nodes_data: A list of dictionaries where each dictionary represents a node's properties. Each dictionary can contain any key-value pairs that will become node properties. If 'UID' or 'createdDate' keys are missing, they will be automatically generated. Example: [{'name': 'John', 'age': 30}, {'name': 'Jane', 'age': 25}]
batch_size: The number of nodes to create in each database transaction. Default is 100. Larger batches improve performance but consume more memory. Adjust based on node complexity and available resources. Typical range: 50-1000.
Return Value
Type: List[str]
Returns a list of strings containing the UIDs (Unique Identifiers) of all nodes that were successfully created. Each UID is a UUID v4 string. If an error occurs during batch processing, the list will contain UIDs of nodes created before the error occurred. An empty list is returned if no nodes were created or if an error occurred before any creation.
Dependencies
neo4juuidloggingtypingdatetimetracebackCDocs.dbCDocs.db.schema_manager
Required Imports
import logging
import uuid
from typing import Dict, List, Any
from datetime import datetime
from neo4j import Driver
from CDocs.db import get_driver
Usage Example
from CDocs.db import get_driver
import logging
from typing import List, Dict, Any
from datetime import datetime
import uuid
# Setup logger
logger = logging.getLogger(__name__)
# Prepare node data
nodes_to_create = [
{'name': 'Alice', 'email': 'alice@example.com', 'role': 'admin'},
{'name': 'Bob', 'email': 'bob@example.com', 'role': 'user'},
{'name': 'Charlie', 'email': 'charlie@example.com', 'role': 'user', 'UID': 'custom-uid-123'}
]
# Create nodes in batches of 50
created_uids = batch_create_nodes(
label='User',
nodes_data=nodes_to_create,
batch_size=50
)
print(f"Created {len(created_uids)} nodes")
print(f"UIDs: {created_uids}")
# Example with larger dataset
large_dataset = [{'title': f'Document {i}', 'content': f'Content {i}'} for i in range(1000)]
doc_uids = batch_create_nodes('Document', large_dataset, batch_size=100)
Best Practices
- Choose batch_size based on node complexity and available memory. Start with default 100 and adjust if needed.
- Ensure the Neo4j database connection is properly configured before calling this function.
- The function modifies the input nodes_data list by adding UID and createdDate fields if missing. Clone the list if you need the original data unchanged.
- Monitor the returned UIDs list length to verify all nodes were created successfully.
- Use appropriate error handling around this function call as it returns partial results on error.
- For very large datasets (millions of nodes), consider implementing progress tracking or splitting into multiple function calls.
- Ensure the label parameter is a valid Neo4j label name (alphanumeric, no spaces or special characters except underscore).
- Consider creating indexes on frequently queried properties (especially UID) for better query performance after bulk creation.
- The function uses SET n += node which overwrites properties, so ensure nodes_data doesn't contain conflicting property values.
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function create_node 69.5% similar
-
function create_node_with_uid 65.4% similar
-
function create_node_and_ensure_relationships 62.2% similar
-
function create_node_with_relationship 56.5% similar
-
function get_nodes_by_label 56.4% similar