🔍 Code Extractor

function search_documents

Maturity: 61

Searches for documents in a Neo4j graph database based on multiple optional filter criteria including text query, document type, department, status, and owner.

File:
/tf/active/vicechatdev/document_controller_backup.py
Lines:
308 - 402
Complexity:
moderate

Purpose

This function provides a flexible document search capability for a document management system. It constructs and executes a Cypher query against a Neo4j database to retrieve documents matching specified criteria. The function supports text search across title and description fields, filtering by document metadata, and returns results as a list of dictionaries. It includes logging via a decorator and handles permission filtering (currently commented out). The function is designed to be used in a controlled document management system (CDocs) where documents have properties like type, department, status, and ownership.

Source Code

def search_documents(query=None, doc_type=None, department=None, status=None, owner=None, limit=100, user=None):
    """
    Search for documents based on criteria.
    
    Parameters
    ----------
    query : str, optional
        Text search query
    doc_type : str, optional
        Document type to filter by
    department : str, optional
        Department to filter by
    status : str, optional
        Status to filter by
    owner : str, optional
        Owner UID to filter by
    limit : int, optional
        Maximum number of results to return
    user : DocUser, optional
        The current user (for permission filtering)
        
    Returns
    -------
    List[Dict[str, Any]]
        List of document dictionaries matching the search criteria
    """
    try:
        from CDocs.db import db_operations
        
        logger.info("Controller action: search_documents")
        
        # Build the Cypher query
        cypher_query = """
        MATCH (d:Document)
        """
        
        # Add optional filters
        where_clauses = []
        params = {}
        
        if query:
            where_clauses.append("(d.title CONTAINS $query OR d.description CONTAINS $query)")
            params["query"] = query
        
        if doc_type:
            where_clauses.append("d.doc_type = $doc_type")
            params["doc_type"] = doc_type
        
        if department:
            where_clauses.append("d.department = $department")
            params["department"] = department
        
        if status:
            where_clauses.append("d.status = $status")
            params["status"] = status
        
        if owner:
            where_clauses.append("d.owner_id = $owner")
            params["owner"] = owner
        
        # Add WHERE clause if we have any conditions
        if where_clauses:
            cypher_query += "WHERE " + " AND ".join(where_clauses)
        
        # Add permission filtering if user is provided
        # This is commented out for now as it depends on schema details
        # if user and hasattr(user, 'uid') and user.role != 'ADMIN':
        #    # Only add more WHERE conditions if we already have some
        #    connector = "AND" if where_clauses else "WHERE"
        #    cypher_query += f" {connector} (d.owner_id = $user_id OR d.is_public = true)"
        #    params["user_id"] = user.uid
        
        # Add RETURN clause with LIMIT
        cypher_query += f"""
        RETURN d 
        ORDER BY d.created_date DESC
        LIMIT {int(limit)}
        """
        
        # Execute query
        result = db_operations.run_query(cypher_query, params)
        
        # Process results into a list of document dictionaries
        documents = []
        if result:
            for record in result:
                if 'd' in record:
                    document = dict(record['d'])
                    documents.append(document)
        
        return documents
        
    except Exception as e:
        logger.error(f"Error in controller action search_documents: {e}")
        raise e

Parameters

Name Type Default Kind
query - None positional_or_keyword
doc_type - None positional_or_keyword
department - None positional_or_keyword
status - None positional_or_keyword
owner - None positional_or_keyword
limit - 100 positional_or_keyword
user - None positional_or_keyword

Parameter Details

query: Optional text string to search within document titles and descriptions. Uses CONTAINS operator for partial matching. Can be None to skip text search filtering.

doc_type: Optional string to filter documents by their type (e.g., 'policy', 'procedure', 'form'). Must match the doc_type property exactly. Can be None to include all document types.

department: Optional string to filter documents by department (e.g., 'HR', 'Engineering', 'Finance'). Must match the department property exactly. Can be None to include all departments.

status: Optional string to filter documents by their current status (e.g., 'DRAFT', 'PUBLISHED', 'ARCHIVED'). Must match the status property exactly. Can be None to include all statuses.

owner: Optional string representing the owner's UID (user identifier) to filter documents by ownership. Must match the owner_id property exactly. Can be None to include documents from all owners.

limit: Integer specifying the maximum number of documents to return. Defaults to 100. Must be a positive integer. Results are ordered by created_date in descending order (newest first).

user: Optional DocUser object representing the current user making the search request. Intended for permission filtering to restrict results based on user access rights. Currently not actively used in the query but included for future permission implementation.

Return Value

Returns a List[Dict[str, Any]] containing document dictionaries. Each dictionary represents a document node from the Neo4j database with all its properties (e.g., title, description, doc_type, department, status, owner_id, created_date, etc.). Returns an empty list if no documents match the criteria or if an error occurs during query execution. The list is ordered by created_date in descending order and limited to the specified number of results.

Dependencies

  • logging
  • CDocs.db.db_operations
  • CDocs.models.user_extensions
  • CDocs.controllers

Required Imports

import logging
from CDocs.controllers import log_controller_action
from CDocs.models.user_extensions import DocUser

Conditional/Optional Imports

These imports are only needed under specific conditions:

from CDocs.db import db_operations

Condition: imported lazily inside the function at runtime, always needed for function execution

Required (conditional)

Usage Example

from CDocs.controllers import search_documents
from CDocs.models.user_extensions import DocUser

# Simple text search
results = search_documents(query='safety protocol')

# Search with multiple filters
results = search_documents(
    query='procedure',
    doc_type='SOP',
    department='Engineering',
    status='PUBLISHED',
    limit=50
)

# Search by owner
results = search_documents(
    owner='user123',
    status='DRAFT'
)

# Search with user context for future permission filtering
current_user = DocUser(uid='user456', role='VIEWER')
results = search_documents(
    department='HR',
    user=current_user
)

# Process results
for doc in results:
    print(f"Title: {doc.get('title')}, Status: {doc.get('status')}")

Best Practices

  • Always handle the returned list defensively as it may be empty if no documents match or if an error occurs
  • Use the limit parameter to prevent retrieving excessive amounts of data, especially in production environments
  • Consider implementing pagination for large result sets rather than increasing the limit
  • The text query parameter uses CONTAINS which is case-sensitive in Neo4j; consider normalizing search terms
  • Ensure proper error handling when calling this function as it re-raises exceptions after logging
  • The user parameter is included for future permission filtering but is currently not enforced; do not rely on it for access control until implemented
  • Filter parameters must match exact values in the database; consider providing users with valid options from a controlled vocabulary
  • Results are ordered by created_date DESC, so newest documents appear first
  • The function uses dynamic Cypher query construction; ensure all parameters are properly sanitized (currently handled via parameterized queries)
  • Consider adding indexes on frequently queried properties (doc_type, department, status, owner_id) for better performance

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function search_documents_v1 94.7% similar

    Searches for controlled documents in a Neo4j graph database based on multiple optional filter criteria including text query, document type, department, status, and owner.

    From: /tf/active/vicechatdev/CDocs/controllers/document_controller.py
  • function get_documents_v1 82.2% similar

    Retrieves filtered and paginated documents from a Neo4j graph database with permission-based access control, supporting multiple filter criteria and search functionality.

    From: /tf/active/vicechatdev/document_controller_backup.py
  • function get_documents 80.7% similar

    Retrieves controlled documents from a Neo4j database with comprehensive filtering, permission-based access control, pagination, and full-text search capabilities.

    From: /tf/active/vicechatdev/CDocs/controllers/document_controller.py
  • function search_documents_in_filecloud 71.5% similar

    Searches for controlled documents in FileCloud using text search and optional metadata filters, returning structured document information including UIDs, versions, and metadata.

    From: /tf/active/vicechatdev/CDocs/controllers/filecloud_controller.py
  • function get_all_documents 65.3% similar

    Retrieves all controlled documents from a Neo4j graph database with their associated owner information, formatted for administrative management interfaces.

    From: /tf/active/vicechatdev/CDocs/controllers/admin_controller.py
← Back to Browse