🔍 Code Extractor

function get_file_info

Maturity: 51

Retrieves file metadata including size in bytes and cryptographic hash for a given file path.

File:
/tf/active/vicechatdev/mailsearch/compare_documents.py
Lines:
65 - 84
Complexity:
simple

Purpose

This function provides a safe way to gather essential file information (size, hash, and path) for file management, deduplication, integrity checking, or comparison operations. It handles errors gracefully by returning default values when file access fails, making it suitable for batch processing scenarios where some files might be inaccessible.

Source Code

def get_file_info(filepath: str) -> Dict:
    """
    Get file information including size and hash
    
    Args:
        filepath: Path to the file
        
    Returns:
        Dictionary with file information
    """
    try:
        stat = os.stat(filepath)
        return {
            'size': stat.st_size,
            'hash': calculate_file_hash(filepath),
            'path': filepath
        }
    except Exception as e:
        print(f"Error getting info for {filepath}: {e}")
        return {'size': 0, 'hash': '', 'path': filepath}

Parameters

Name Type Default Kind
filepath str - positional_or_keyword

Parameter Details

filepath: String representing the absolute or relative path to the file to be analyzed. Must point to an existing file that the process has read permissions for. Can be a string path or any path-like object accepted by os.stat().

Return Value

Type: Dict

Returns a dictionary with three keys: 'size' (int) containing the file size in bytes, 'hash' (str) containing the file's cryptographic hash computed by calculate_file_hash(), and 'path' (str) containing the original filepath. On error, returns a dictionary with size=0, hash='', and the original path.

Dependencies

  • os
  • hashlib

Required Imports

import os
from typing import Dict

Usage Example

# Assuming calculate_file_hash() is defined
import os
from typing import Dict
import hashlib

def calculate_file_hash(filepath: str) -> str:
    """Calculate SHA256 hash of file"""
    sha256_hash = hashlib.sha256()
    with open(filepath, 'rb') as f:
        for byte_block in iter(lambda: f.read(4096), b''):
            sha256_hash.update(byte_block)
    return sha256_hash.hexdigest()

def get_file_info(filepath: str) -> Dict:
    try:
        stat = os.stat(filepath)
        return {
            'size': stat.st_size,
            'hash': calculate_file_hash(filepath),
            'path': filepath
        }
    except Exception as e:
        print(f"Error getting info for {filepath}: {e}")
        return {'size': 0, 'hash': '', 'path': filepath}

# Usage
file_info = get_file_info('/path/to/myfile.txt')
print(f"File size: {file_info['size']} bytes")
print(f"File hash: {file_info['hash']}")
print(f"File path: {file_info['path']}")

Best Practices

  • Ensure the calculate_file_hash() function is available before calling this function
  • Handle the error case where size=0 and hash='' in calling code to distinguish between actual errors and empty files
  • Consider validating that the filepath exists before calling if you need to distinguish between different error types
  • For large files, be aware that calculate_file_hash() may take significant time to compute
  • The function prints errors to stdout; consider logging or returning error information in the dictionary for production use
  • The broad Exception catch may hide specific issues; consider catching specific exceptions (FileNotFoundError, PermissionError) for better error handling

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function calculate_file_hash 62.3% similar

    Calculates the MD5 hash of a file by reading it in chunks to handle large files efficiently.

    From: /tf/active/vicechatdev/mailsearch/compare_documents.py
  • function extract_metadata 55.6% similar

    Extracts metadata from file content by analyzing the file type and computing file properties including hash, size, and type-specific metadata.

    From: /tf/active/vicechatdev/CDocs/utils/document_processor.py
  • function process_document 50.5% similar

    Processes a document file (DOCX, DOC, or PDF) and extracts comprehensive metadata including file information, content metadata, and cryptographic hash.

    From: /tf/active/vicechatdev/CDocs/utils/document_processor.py
  • function extract_metadata_docx 47.0% similar

    Extracts comprehensive metadata from Microsoft Word DOCX files, including document properties, statistics, and fallback title extraction from content or filename.

    From: /tf/active/vicechatdev/CDocs/utils/document_processor.py
  • function extract_metadata_pdf 46.8% similar

    Extracts metadata from PDF files including title, author, creation date, page count, and other document properties using PyPDF2 library.

    From: /tf/active/vicechatdev/CDocs/utils/document_processor.py
← Back to Browse