function find_best_folder
Finds the best matching folder in a directory tree by comparing hierarchical document codes with folder names containing numeric codes.
/tf/active/vicechatdev/mailsearch/copy_signed_documents.py
37 - 83
moderate
Purpose
This function traverses a directory structure (wuxi2_root) to find the most appropriate folder for placing a document based on its hierarchical code (e.g., '1.2.3'). It matches document codes against numeric codes found in folder names, prioritizing folders with the highest matching prefix and longest code length that is still shorter than the document code. This is useful for organizing documents in a hierarchical filing system where folders represent higher-level categories and documents belong in the most specific matching folder.
Source Code
def find_best_folder(doc_code, wuxi2_root=WUXI2_ROOT):
"""Find the best matching folder in wuxi2 based on code structure"""
code_parts = extract_code_parts(doc_code)
doc_code_length = len(code_parts)
best_match = None
best_score = 0
best_folder_length = 0
for root, dirs, files in os.walk(wuxi2_root):
rel_path = os.path.relpath(root, wuxi2_root)
if rel_path == '.':
continue
path_parts = rel_path.split(os.sep)
# Look for folders with document codes in their names
for folder in path_parts:
folder_codes = re.findall(r'\d+(?:\.\d+)*', folder)
for folder_code in folder_codes:
folder_parts = folder_code.split('.')
folder_code_length = len(folder_parts)
# Skip folders with coding same length or longer than the document
# When codes are equal length, document goes next to folder (in parent), not inside
if folder_code_length >= doc_code_length:
continue
# Calculate match score (how many leading parts match)
score = 0
for cp, fp in zip(code_parts, folder_parts):
if cp == fp:
score += 1
else:
break
# Update best match if this is better
# Priority: 1) Higher score, 2) Longer folder code (closer to doc length)
if (score > best_score or
(score == best_score and folder_code_length > best_folder_length)):
best_score = score
best_folder_length = folder_code_length
best_match = root
return best_match, best_score
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
doc_code |
- | - | positional_or_keyword |
wuxi2_root |
- | WUXI2_ROOT | positional_or_keyword |
Parameter Details
doc_code: A string representing the document's hierarchical code (e.g., '1.2.3.4'). The code should contain numeric parts separated by dots, representing a hierarchical classification system. This code is parsed to find the best matching folder in the directory tree.
wuxi2_root: A string path to the root directory where the search should begin. Defaults to WUXI2_ROOT constant. This directory is traversed recursively to find folders with matching codes. The path should be absolute or relative to the current working directory.
Return Value
Returns a tuple of (best_match, best_score). 'best_match' is a string containing the full path to the best matching folder, or None if no suitable match is found. 'best_score' is an integer representing the number of matching code parts between the document code and the folder code (e.g., if doc_code is '1.2.3.4' and folder has '1.2', score would be 2). Higher scores indicate better matches.
Dependencies
osre
Required Imports
import os
import re
Usage Example
import os
import re
# Define required constant and dependency function
WUXI2_ROOT = '/path/to/wuxi2'
def extract_code_parts(code):
"""Helper function to extract code parts"""
return code.split('.')
# Use the function
doc_code = '1.2.3.4'
best_folder, score = find_best_folder(doc_code)
if best_folder:
print(f"Best folder: {best_folder}")
print(f"Match score: {score}")
else:
print("No matching folder found")
# With custom root
custom_root = '/custom/path'
best_folder, score = find_best_folder('2.1.5', wuxi2_root=custom_root)
Best Practices
- Ensure the extract_code_parts function is defined before calling find_best_folder, as it's a required dependency
- The WUXI2_ROOT constant should be defined at module level if using the default parameter
- Document codes should follow a consistent hierarchical format (e.g., '1.2.3') for proper matching
- The function skips folders with codes equal to or longer than the document code, as documents should be placed alongside (not inside) folders of equal hierarchy
- For large directory trees, this function may be slow as it performs a full recursive walk; consider caching results if calling repeatedly
- Folder names can contain multiple numeric codes; the function will check all of them for matches
- The function returns None as best_match if no suitable folder is found; always check for None before using the result
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function scan_wuxi2_folder 70.3% similar
-
function scan_wuxi2_folder_v1 67.1% similar
-
function find_best_match 60.9% similar
-
function main_v57 59.6% similar
-
function main_v102 58.2% similar