function get_database_statistics
Retrieves statistical information about a Neo4j graph database, including counts of nodes grouped by label and counts of relationships grouped by type.
/tf/active/vicechatdev/neo4j_schema/neo4j_python_snippets.py
2294 - 2314
simple
Purpose
This function provides a comprehensive overview of the database structure by querying and aggregating node and relationship counts. It's useful for database monitoring, understanding data distribution, generating reports, validating data imports, and getting a quick snapshot of the graph database's contents. The function executes two Cypher queries: one to count nodes by their labels and another to count relationships by their types.
Source Code
def get_database_statistics():
"""Get general statistics about the database"""
stats = {}
# Node counts by label
query = """
MATCH (n)
RETURN labels(n) AS label, count(*) AS count
"""
label_counts = run_query(query)
stats["node_counts"] = {record["label"][0]: record["count"] for record in label_counts if record["label"]}
# Relationship counts by type
query = """
MATCH ()-[r]->()
RETURN type(r) AS type, count(*) AS count
"""
rel_counts = run_query(query)
stats["relationship_counts"] = {record["type"]: record["count"] for record in rel_counts}
return stats
Return Value
Returns a dictionary with two keys: 'node_counts' containing a mapping of node labels (strings) to their counts (integers), and 'relationship_counts' containing a mapping of relationship types (strings) to their counts (integers). Example: {'node_counts': {'Person': 100, 'Company': 50}, 'relationship_counts': {'WORKS_AT': 75, 'KNOWS': 200}}
Dependencies
neo4j
Required Imports
from neo4j import GraphDatabase
Usage Example
# Assuming run_query() is already defined and configured
# Example run_query implementation:
from neo4j import GraphDatabase
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password'))
def run_query(query, parameters=None):
with driver.session() as session:
result = session.run(query, parameters or {})
return [record.data() for record in result]
# Use the function
stats = get_database_statistics()
print(f"Node counts: {stats['node_counts']}")
print(f"Relationship counts: {stats['relationship_counts']}")
# Example output:
# Node counts: {'Person': 150, 'Company': 25, 'Product': 300}
# Relationship counts: {'WORKS_AT': 120, 'PURCHASED': 450, 'KNOWS': 200}
Best Practices
- Ensure the run_query() function is properly implemented with error handling and connection management
- This function may be slow on large databases as it scans all nodes and relationships; consider caching results or running during off-peak hours
- The function assumes nodes have at least one label; nodes without labels are filtered out by the 'if record["label"]' condition
- For very large databases, consider adding LIMIT clauses or using database metadata queries instead of full scans
- Close the Neo4j driver connection properly when done to avoid resource leaks
- Consider adding error handling for database connection failures or query execution errors
- The function returns empty dictionaries for node_counts and relationship_counts if the database is empty
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function get_document_stats 70.8% similar
-
function get_system_stats 70.5% similar
-
function get_review_statistics 64.1% similar
-
function get_approval_statistics_v1 58.7% similar
-
function generate_neo4j_schema_report 58.4% similar