🔍 Code Extractor

function batch_create_nodes

Maturity: 64

Creates multiple Neo4j graph database nodes in batches for improved performance, automatically generating UIDs and timestamps for each node.

File:
/tf/active/vicechatdev/CDocs/db/db_operations.py
Lines:
712 - 759
Complexity:
moderate

Purpose

This function efficiently creates large numbers of nodes in a Neo4j database by processing them in configurable batches. It uses the UNWIND Cypher clause to minimize database round-trips, automatically assigns unique identifiers (UIDs) and creation timestamps to nodes that don't have them, and provides error handling with partial success tracking. Ideal for bulk data imports, migrations, or any scenario requiring creation of many nodes with the same label.

Source Code

def batch_create_nodes(label: str, 
                      nodes_data: List[Dict[str, Any]],
                      batch_size: int = 100) -> List[str]:
    """
    Create multiple nodes in batches for better performance.
    
    Args:
        label: Label for all nodes
        nodes_data: List of property dictionaries for nodes to create
        batch_size: Number of nodes to create in each batch
        
    Returns:
        List of UIDs for created nodes
    """
    created_uids = []
    
    try:
        driver = get_driver()
        with driver.session() as session:
            # Process in batches
            for i in range(0, len(nodes_data), batch_size):
                batch = nodes_data[i:i+batch_size]
                
                # Ensure each node has a UID
                for node in batch:
                    if 'UID' not in node:
                        node['UID'] = str(uuid.uuid4())
                    if 'createdDate' not in node:
                        node['createdDate'] = datetime.now()
                    created_uids.append(node['UID'])
                    
                # Create batch with single query using UNWIND
                session.run(
                    f"""
                    UNWIND $nodes as node
                    CREATE (n:{label} {{UID: node.UID}})
                    SET n += node
                    """,
                    nodes=batch
                )
                
                logger.debug(f"Created batch of {len(batch)} nodes")
                
            return created_uids
            
    except Exception as e:
        logger.error(f"Error in batch node creation: {e}")
        return created_uids

Parameters

Name Type Default Kind
label str - positional_or_keyword
nodes_data List[Dict[str, Any]] - positional_or_keyword
batch_size int 100 positional_or_keyword

Parameter Details

label: The Neo4j node label to assign to all created nodes. This is a string that categorizes the nodes (e.g., 'Person', 'Document', 'Event'). Must be a valid Neo4j label name without special characters.

nodes_data: A list of dictionaries where each dictionary represents a node's properties. Each dictionary can contain any key-value pairs that will become node properties. If 'UID' or 'createdDate' keys are missing, they will be automatically generated. Example: [{'name': 'John', 'age': 30}, {'name': 'Jane', 'age': 25}]

batch_size: The number of nodes to create in each database transaction. Default is 100. Larger batches improve performance but consume more memory. Adjust based on node complexity and available resources. Typical range: 50-1000.

Return Value

Type: List[str]

Returns a list of strings containing the UIDs (Unique Identifiers) of all nodes that were successfully created. Each UID is a UUID v4 string. If an error occurs during batch processing, the list will contain UIDs of nodes created before the error occurred. An empty list is returned if no nodes were created or if an error occurred before any creation.

Dependencies

  • neo4j
  • uuid
  • logging
  • typing
  • datetime
  • traceback
  • CDocs.db
  • CDocs.db.schema_manager

Required Imports

import logging
import uuid
from typing import Dict, List, Any
from datetime import datetime
from neo4j import Driver
from CDocs.db import get_driver

Usage Example

from CDocs.db import get_driver
import logging
from typing import List, Dict, Any
from datetime import datetime
import uuid

# Setup logger
logger = logging.getLogger(__name__)

# Prepare node data
nodes_to_create = [
    {'name': 'Alice', 'email': 'alice@example.com', 'role': 'admin'},
    {'name': 'Bob', 'email': 'bob@example.com', 'role': 'user'},
    {'name': 'Charlie', 'email': 'charlie@example.com', 'role': 'user', 'UID': 'custom-uid-123'}
]

# Create nodes in batches of 50
created_uids = batch_create_nodes(
    label='User',
    nodes_data=nodes_to_create,
    batch_size=50
)

print(f"Created {len(created_uids)} nodes")
print(f"UIDs: {created_uids}")

# Example with larger dataset
large_dataset = [{'title': f'Document {i}', 'content': f'Content {i}'} for i in range(1000)]
doc_uids = batch_create_nodes('Document', large_dataset, batch_size=100)

Best Practices

  • Choose batch_size based on node complexity and available memory. Start with default 100 and adjust if needed.
  • Ensure the Neo4j database connection is properly configured before calling this function.
  • The function modifies the input nodes_data list by adding UID and createdDate fields if missing. Clone the list if you need the original data unchanged.
  • Monitor the returned UIDs list length to verify all nodes were created successfully.
  • Use appropriate error handling around this function call as it returns partial results on error.
  • For very large datasets (millions of nodes), consider implementing progress tracking or splitting into multiple function calls.
  • Ensure the label parameter is a valid Neo4j label name (alphanumeric, no spaces or special characters except underscore).
  • Consider creating indexes on frequently queried properties (especially UID) for better query performance after bulk creation.
  • The function uses SET n += node which overwrites properties, so ensure nodes_data doesn't contain conflicting property values.

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function create_node 69.5% similar

    Creates a node in a Neo4j graph database with a specified label and properties, automatically generating a unique ID and timestamp if not provided.

    From: /tf/active/vicechatdev/CDocs/db/db_operations.py
  • function create_node_with_uid 65.4% similar

    Creates a new node in a Neo4j graph database with a specified UID, label, and properties, automatically adding a creation timestamp if not provided.

    From: /tf/active/vicechatdev/CDocs/db/db_operations.py
  • function create_node_and_ensure_relationships 62.2% similar

    Creates a new node in a Neo4j graph database and establishes multiple relationships to existing nodes within a single atomic transaction.

    From: /tf/active/vicechatdev/CDocs/db/db_operations.py
  • function create_node_with_relationship 56.5% similar

    Creates a new node in a Neo4j graph database and optionally establishes a relationship with an existing node in a single atomic operation.

    From: /tf/active/vicechatdev/CDocs/db/db_operations.py
  • function get_nodes_by_label 56.4% similar

    Retrieves nodes from a Neo4j graph database by label with optional property filtering, pagination, and sorting capabilities.

    From: /tf/active/vicechatdev/CDocs/db/db_operations.py
← Back to Browse