function load_vendor_list
Loads unique vendor names from the first column of an Excel file, removing any null values and returning them as a list.
/tf/active/vicechatdev/find_email/extract_vendor_batch.py
30 - 41
simple
Purpose
This function is designed to extract vendor names from an enriched Excel file for further processing. It reads the Excel file using pandas, identifies the first column as the vendor column, removes duplicate and null entries, and returns a clean list of unique vendor names. This is typically used as an initial data loading step in vendor management or email extraction workflows.
Source Code
def load_vendor_list(excel_file: str) -> List[str]:
"""Load vendor names from enriched Excel file"""
print(f"Loading vendors from: {excel_file}")
df = pd.read_excel(excel_file)
# Assume first column contains vendor names
vendor_column = df.columns[0]
vendors = df[vendor_column].dropna().unique().tolist()
print(f"Found {len(vendors)} unique vendors")
return vendors
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
excel_file |
str | - | positional_or_keyword |
Parameter Details
excel_file: String path to the Excel file containing vendor data. Can be absolute or relative path. The file must be a valid Excel format (.xlsx, .xls) readable by pandas. The first column of this file is expected to contain vendor names.
Return Value
Type: List[str]
Returns a List[str] containing unique vendor names extracted from the first column of the Excel file. Null/NaN values are automatically filtered out. The list contains no duplicates due to the unique() operation. Returns an empty list if the file is empty or the first column contains no valid data.
Dependencies
pandas
Required Imports
import pandas as pd
from typing import List
Usage Example
import pandas as pd
from typing import List
def load_vendor_list(excel_file: str) -> List[str]:
"""Load vendor names from enriched Excel file"""
print(f"Loading vendors from: {excel_file}")
df = pd.read_excel(excel_file)
vendor_column = df.columns[0]
vendors = df[vendor_column].dropna().unique().tolist()
print(f"Found {len(vendors)} unique vendors")
return vendors
# Usage
vendors = load_vendor_list('vendors_enriched.xlsx')
print(f"Loaded vendors: {vendors}")
# Example with error handling
try:
vendors = load_vendor_list('/path/to/vendor_data.xlsx')
for vendor in vendors:
print(f"Processing vendor: {vendor}")
except FileNotFoundError:
print("Excel file not found")
except Exception as e:
print(f"Error loading vendors: {e}")
Best Practices
- Ensure the Excel file exists before calling this function to avoid FileNotFoundError
- The function assumes vendor names are in the first column - verify your Excel file structure matches this expectation
- Consider adding error handling for invalid file formats or corrupted Excel files
- For large Excel files, be aware of memory usage as pandas loads the entire file into memory
- The function prints progress to stdout - redirect or suppress if running in production/silent mode
- Vendor names are case-sensitive in the unique() operation - consider normalizing case if needed
- Empty strings are not filtered out, only NaN/None values - add additional filtering if needed
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function extract_batch 59.5% similar
-
function main_v27 53.3% similar
-
function main_v15 53.1% similar
-
class VendorEnricher 49.7% similar
-
function main_v28 49.2% similar