šŸ” Code Extractor

function analyze_structure

Maturity: 44

Analyzes and reports on the folder structure of a SharePoint site, displaying folder paths, file counts, and searching for expected folder patterns.

File:
/tf/active/vicechatdev/SPFCsync/analyze_structure.py
Lines:
10 - 92
Complexity:
moderate

Purpose

This function connects to a SharePoint site via Microsoft Graph API, retrieves all documents from the root directory, analyzes the folder structure, and provides a detailed report including: total item count, unique folder paths with file counts, sample files in each folder, and searches for specific expected folder patterns (like numbered folders 01-08 and named folders like UCJ, Toxicology, CMC, etc.). It's useful for auditing SharePoint document libraries, understanding folder organization, and verifying expected folder structures exist.

Source Code

def analyze_structure():
    """Analyze the current folder structure"""
    config = Config()
    
    try:
        client = SharePointGraphClient(
            site_url=config.SHAREPOINT_SITE_URL,
            client_id=config.AZURE_CLIENT_ID,
            client_secret=config.AZURE_CLIENT_SECRET
        )
        
        print("āœ… SharePoint Graph client initialized successfully")
        print(f"Site ID: {client.site_id}")
        print(f"Drive ID: {client.drive_id}")
        print()
        
    except Exception as e:
        print(f"āŒ Failed to initialize client: {e}")
        return
    
    print("šŸ” ANALYZING CURRENT FOLDER STRUCTURE")
    print("=" * 60)
    
    # Get all documents and analyze their paths
    try:
        documents = client.get_all_documents("/")
        print(f"āœ… Found {len(documents)} total items")
        print()
        
        # Analyze folder distribution
        folder_paths = set()
        file_by_folder = {}
        
        for doc in documents:
            folder_path = doc.get('folder_path', 'Unknown')
            folder_paths.add(folder_path)
            
            if folder_path not in file_by_folder:
                file_by_folder[folder_path] = []
            file_by_folder[folder_path].append(doc)
        
        print(f"šŸ“ Found {len(folder_paths)} unique folder paths:")
        print("-" * 40)
        
        for folder_path in sorted(folder_paths):
            files_in_folder = len(file_by_folder[folder_path])
            print(f"šŸ“ {folder_path}: {files_in_folder} files")
            
            # Show first few files as examples
            if files_in_folder > 0:
                example_files = file_by_folder[folder_path][:3]
                for file_info in example_files:
                    print(f"   šŸ“„ {file_info.get('name', 'Unknown')}")
                if files_in_folder > 3:
                    print(f"   ... and {files_in_folder - 3} more files")
            print()
        
        # Look for specific patterns
        print("\nšŸ” SEARCHING FOR EXPECTED FOLDER PATTERNS")
        print("-" * 50)
        
        expected_patterns = [
            "01", "02", "03", "04", "05", "06", "07", "08",
            "UCJ", "Toxicology", "CMC", "Quality", "Clinical", 
            "Regulatory", "Marketing", "Manufacturing"
        ]
        
        for pattern in expected_patterns:
            matching_folders = [path for path in folder_paths if pattern.lower() in path.lower()]
            matching_files = [doc for doc in documents if pattern.lower() in doc.get('name', '').lower()]
            
            if matching_folders or matching_files:
                print(f"šŸ” Pattern '{pattern}':")
                if matching_folders:
                    print(f"   šŸ“ Folders: {matching_folders}")
                if matching_files:
                    print(f"   šŸ“„ Files: {len(matching_files)} files contain this pattern")
                print()
        
    except Exception as e:
        print(f"āŒ Failed to get documents: {e}")
        import traceback
        traceback.print_exc()

Return Value

This function does not return any value (implicitly returns None). It prints analysis results directly to stdout, including success/error messages, folder structure information, file counts, and pattern matching results.

Dependencies

  • sharepoint_graph_client
  • config
  • traceback

Required Imports

from sharepoint_graph_client import SharePointGraphClient
from config import Config

Conditional/Optional Imports

These imports are only needed under specific conditions:

import traceback

Condition: only used when an exception occurs during document retrieval to print detailed error information

Optional

Usage Example

# Ensure config.py exists with required settings
# Example config.py:
# class Config:
#     SHAREPOINT_SITE_URL = 'https://yourtenant.sharepoint.com/sites/yoursite'
#     AZURE_CLIENT_ID = 'your-client-id'
#     AZURE_CLIENT_SECRET = 'your-client-secret'

# Run the analysis
analyze_structure()

# Output will be printed to console showing:
# - SharePoint connection status
# - Total number of items found
# - List of folder paths with file counts
# - Sample files in each folder
# - Pattern matching results for expected folders

Best Practices

  • Ensure Azure AD application has appropriate SharePoint permissions (Sites.Read.All or Sites.ReadWrite.All) before running
  • The function prints output directly to stdout, so redirect output if you need to capture results programmatically
  • Handle the case where the function returns early (None) if client initialization fails
  • The function analyzes all documents from root ('/'), which may be slow for large SharePoint sites with many files
  • Expected patterns list can be customized by modifying the expected_patterns list in the source code
  • Error handling is built-in but basic - consider wrapping calls in try-except for production use
  • The function shows only the first 3 files per folder as examples to avoid cluttering output

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function test_folder_structure 80.9% similar

    Tests SharePoint folder structure by listing root-level folders, displaying their contents, and providing a summary of total folders and documents.

    From: /tf/active/vicechatdev/SPFCsync/test_folder_structure.py
  • function search_and_locate 74.9% similar

    Searches for specific numbered folders (01-08) in a SharePoint site and traces their locations, contents, and file distributions by type.

    From: /tf/active/vicechatdev/SPFCsync/search_detailed.py
  • function search_for_folders 74.8% similar

    Searches for specific predefined folders in a SharePoint site using Microsoft Graph API and prints the search results with their locations.

    From: /tf/active/vicechatdev/SPFCsync/diagnostic_comprehensive.py
  • function explore_site_structure 73.6% similar

    Explores and displays the complete structure of a SharePoint site using Microsoft Graph API, including drives, document libraries, lists, and alternative API endpoints.

    From: /tf/active/vicechatdev/SPFCsync/diagnostic_comprehensive.py
  • function compare_with_expected_folders 72.7% similar

    Compares SharePoint folders found via Microsoft Graph API against a predefined list of expected folder names from a reference screenshot, reporting matches, missing folders, and additional folders.

    From: /tf/active/vicechatdev/SPFCsync/test_folder_structure.py
← Back to Browse