function get_file_info
Retrieves file metadata including size in bytes and cryptographic hash for a given file path.
/tf/active/vicechatdev/mailsearch/compare_documents.py
65 - 84
simple
Purpose
This function provides a safe way to gather essential file information (size, hash, and path) for file management, deduplication, integrity checking, or comparison operations. It handles errors gracefully by returning default values when file access fails, making it suitable for batch processing scenarios where some files might be inaccessible.
Source Code
def get_file_info(filepath: str) -> Dict:
"""
Get file information including size and hash
Args:
filepath: Path to the file
Returns:
Dictionary with file information
"""
try:
stat = os.stat(filepath)
return {
'size': stat.st_size,
'hash': calculate_file_hash(filepath),
'path': filepath
}
except Exception as e:
print(f"Error getting info for {filepath}: {e}")
return {'size': 0, 'hash': '', 'path': filepath}
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
filepath |
str | - | positional_or_keyword |
Parameter Details
filepath: String representing the absolute or relative path to the file to be analyzed. Must point to an existing file that the process has read permissions for. Can be a string path or any path-like object accepted by os.stat().
Return Value
Type: Dict
Returns a dictionary with three keys: 'size' (int) containing the file size in bytes, 'hash' (str) containing the file's cryptographic hash computed by calculate_file_hash(), and 'path' (str) containing the original filepath. On error, returns a dictionary with size=0, hash='', and the original path.
Dependencies
oshashlib
Required Imports
import os
from typing import Dict
Usage Example
# Assuming calculate_file_hash() is defined
import os
from typing import Dict
import hashlib
def calculate_file_hash(filepath: str) -> str:
"""Calculate SHA256 hash of file"""
sha256_hash = hashlib.sha256()
with open(filepath, 'rb') as f:
for byte_block in iter(lambda: f.read(4096), b''):
sha256_hash.update(byte_block)
return sha256_hash.hexdigest()
def get_file_info(filepath: str) -> Dict:
try:
stat = os.stat(filepath)
return {
'size': stat.st_size,
'hash': calculate_file_hash(filepath),
'path': filepath
}
except Exception as e:
print(f"Error getting info for {filepath}: {e}")
return {'size': 0, 'hash': '', 'path': filepath}
# Usage
file_info = get_file_info('/path/to/myfile.txt')
print(f"File size: {file_info['size']} bytes")
print(f"File hash: {file_info['hash']}")
print(f"File path: {file_info['path']}")
Best Practices
- Ensure the calculate_file_hash() function is available before calling this function
- Handle the error case where size=0 and hash='' in calling code to distinguish between actual errors and empty files
- Consider validating that the filepath exists before calling if you need to distinguish between different error types
- For large files, be aware that calculate_file_hash() may take significant time to compute
- The function prints errors to stdout; consider logging or returning error information in the dictionary for production use
- The broad Exception catch may hide specific issues; consider catching specific exceptions (FileNotFoundError, PermissionError) for better error handling
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function calculate_file_hash 62.3% similar
-
function extract_metadata 55.6% similar
-
function process_document 50.5% similar
-
function extract_metadata_docx 47.0% similar
-
function extract_metadata_pdf 46.8% similar