check_document_hash_exists

function check_document_hash_exists

Maturity: 54

Checks if a document with a given SHA-256 hash already exists in the database by querying the graph database for matching DocumentVersion nodes.

File:
/tf/active/vicechatdev/CDocs/utils/document_processor.py

Lines:
509 - 537

Complexity:
simple

Purpose

This function prevents duplicate document uploads by verifying whether a file with the same hash has already been stored. It queries a Neo4j graph database to find DocumentVersion nodes with matching hashes and returns both the existence status and the associated document UID if found. This is useful for deduplication, version control, and avoiding redundant storage of identical files.

Source Code

def check_document_hash_exists(file_hash: str) -> Tuple[bool, Optional[str]]:
    """
    Check if a document with the given hash already exists.
    
    Args:
        file_hash: SHA-256 hash of the file
        
    Returns:
        Tuple of (exists, document_uid)
    """
    try:
        result = db.run_query(
            """
            MATCH (v:DocumentVersion {hash: $hash})
            MATCH (d:ControlledDocument)-[:HAS_VERSION]->(v)
            RETURN d.UID as docUID, v.UID as versionUID
            LIMIT 1
            """,
            {"hash": file_hash}
        )
        
        if result and 'docUID' in result[0]:
            return True, result[0]['docUID']
            
        return False, None
        
    except Exception as e:
        logger.error(f"Error checking document hash: {e}")
        return False, None

Parameters

Name	Type	Default	Kind
`file_hash`	str	-	positional_or_keyword

Parameter Details

file_hash: A SHA-256 hash string representing the cryptographic hash of a file. This should be a 64-character hexadecimal string generated from the file's contents. Used to uniquely identify document versions in the database.

Return Value

Type: Tuple[bool, Optional[str]]

Returns a tuple containing two elements: (1) A boolean indicating whether a document with the given hash exists (True if found, False otherwise), and (2) An optional string containing the document's UID if found, or None if not found or if an error occurred. The UID is retrieved from the ControlledDocument node associated with the matching DocumentVersion.

Dependencies

CDocs.db
logging

Required Imports

from typing import Tuple, Optional
from CDocs import db
import logging

Usage Example

from typing import Tuple, Optional
from CDocs import db
import logging
import hashlib

# Setup logger (assumed to be in module scope)
logger = logging.getLogger(__name__)

# Calculate file hash
with open('document.pdf', 'rb') as f:
    file_content = f.read()
    file_hash = hashlib.sha256(file_content).hexdigest()

# Check if document exists
exists, doc_uid = check_document_hash_exists(file_hash)

if exists:
    print(f"Document already exists with UID: {doc_uid}")
else:
    print("Document is new, proceed with upload")

Best Practices

Always ensure the file_hash parameter is a valid SHA-256 hash (64-character hexadecimal string) before calling this function
Handle both return values appropriately - check the boolean first before using the document UID
The function returns (False, None) on errors, so implement proper error handling in calling code
This function assumes a 'logger' object exists in the module scope - ensure logging is properly configured
The database query uses LIMIT 1, so only the first matching document is returned if multiple exist
Consider the performance implications when checking large numbers of hashes - batch operations may be more efficient
The function catches all exceptions broadly - consider more specific exception handling for production use

Similar Components

AI-powered semantic similarity - components with related functionality:

function check_document_exists_by_uid 62.9% similar

Queries a Neo4j database to check if a ControlledDocument with a specific UID exists and returns the document object if found.
From: /tf/active/vicechatdev/CDocs/FC_sync.py
function node_exists 60.3% similar

Checks if a node with a specific UID exists in a Neo4j graph database by querying for the node and returning a boolean result.
From: /tf/active/vicechatdev/CDocs/db/db_operations.py
function check_document_exists_by_doc_number 57.5% similar

Queries a Neo4j database to check if a ControlledDocument exists with a specific document number and returns the document object if found.
From: /tf/active/vicechatdev/CDocs/FC_sync.py
function compare_document_versions 54.9% similar

Compares two document versions by their UIDs and generates a summary of changes including metadata differences and hash comparisons.
From: /tf/active/vicechatdev/CDocs/utils/document_processor.py
function get_document_with_relationships 52.8% similar

Retrieves a complete document from Neo4j graph database along with all its related entities including versions, reviews, approvals, and authors.
From: /tf/active/vicechatdev/CDocs/db/db_operations.py

🔍 Code Extractor

function check_document_hash_exists

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function check_document_exists_by_uid 62.9% similar

function node_exists 60.3% similar

function check_document_exists_by_doc_number 57.5% similar

function compare_document_versions 54.9% similar

function get_document_with_relationships 52.8% similar

function check_document_hash_exists

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function check_document_exists_by_uid 62.9% similar

function node_exists 60.3% similar

function check_document_exists_by_doc_number 57.5% similar

function compare_document_versions 54.9% similar

function get_document_with_relationships 52.8% similar

✨ Improve Code: check_document_hash_exists

Code Comparison