function detect_outliers_iqr_v1
Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py
72 - 83
simple
Purpose
This function implements the IQR outlier detection method, a robust statistical technique for identifying data points that fall outside the typical range. It calculates the first quartile (Q1), third quartile (Q3), and IQR (Q3-Q1), then determines lower and upper bounds using a multiplier (default 1.5). Data points below the lower bound or above the upper bound are flagged as outliers. This is commonly used in exploratory data analysis, data cleaning, and preprocessing pipelines to identify anomalous values that may need special handling or removal.
Source Code
def detect_outliers_iqr(data, multiplier=1.5):
"""
Detect outliers using IQR method
Returns: indices of outliers, lower bound, upper bound
"""
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - multiplier * IQR
upper_bound = Q3 + multiplier * IQR
outliers = (data < lower_bound) | (data > upper_bound)
return outliers, lower_bound, upper_bound
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
data |
- | - | positional_or_keyword |
multiplier |
- | 1.5 | positional_or_keyword |
Parameter Details
data: A pandas Series or DataFrame column containing numerical data to analyze for outliers. Must support .quantile() method. Expected to contain numeric values (int or float). Missing values (NaN) should be handled before calling this function.
multiplier: A numeric value (float or int) that controls the sensitivity of outlier detection. Default is 1.5 (standard IQR method). Lower values (e.g., 1.0) make detection more aggressive, flagging more outliers. Higher values (e.g., 3.0) make it more conservative. Typical range is 1.0 to 3.0. Must be positive.
Return Value
Returns a tuple containing three elements: (1) outliers - a pandas Series of boolean values with the same index as input data, where True indicates an outlier and False indicates a normal value; (2) lower_bound - a numeric value (float) representing the minimum threshold below which values are considered outliers; (3) upper_bound - a numeric value (float) representing the maximum threshold above which values are considered outliers.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
import numpy as np
# Create sample data with outliers
data = pd.Series([10, 12, 12, 13, 12, 11, 14, 13, 15, 10, 10, 100, 12, 14, 13])
# Detect outliers using default multiplier (1.5)
outliers, lower, upper = detect_outliers_iqr(data)
print(f"Lower bound: {lower}")
print(f"Upper bound: {upper}")
print(f"Number of outliers: {outliers.sum()}")
print(f"Outlier values: {data[outliers].values}")
# Use more conservative detection (multiplier=3.0)
outliers_conservative, lower_c, upper_c = detect_outliers_iqr(data, multiplier=3.0)
print(f"Conservative outliers: {outliers_conservative.sum()}")
# Filter out outliers from dataset
clean_data = data[~outliers]
Best Practices
- Handle missing values (NaN) in the data before calling this function, as they may affect quartile calculations
- The default multiplier of 1.5 is standard for IQR method; use 3.0 for more conservative outlier detection (fewer outliers flagged)
- This method works best with normally distributed or near-normal data; consider other methods for highly skewed distributions
- Always inspect the lower_bound and upper_bound values to ensure they make sense for your domain
- Use the returned boolean Series to filter data: clean_data = data[~outliers]
- Consider visualizing outliers using box plots or scatter plots before removing them
- Document the multiplier value used in your analysis for reproducibility
- This method is univariate; for multivariate outlier detection, apply to each column separately or use multivariate methods
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function detect_outliers_iqr_v2 93.2% similar
-
function detect_outliers_iqr 86.7% similar
-
function remove_outliers 85.3% similar
-
function remove_outliers_iqr 83.9% similar
-
function remove_outliers_iqr_v1 83.2% similar