function detect_outliers_iqr
Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py
68 - 84
simple
Purpose
This function identifies data points that fall outside the bounds defined by Q1 - (multiplier × IQR) and Q3 + (multiplier × IQR), where Q1 and Q3 are the first and third quartiles. The default multiplier of 3.0 is more lenient than the standard 1.5, making it suitable for detecting only extreme outliers rather than mild ones. This is useful in data cleaning, anomaly detection, and exploratory data analysis where you want to identify only the most extreme values that might indicate data quality issues or genuinely unusual observations.
Source Code
def detect_outliers_iqr(data, multiplier=3.0):
"""
Detect outliers using IQR method with a more lenient multiplier (3.0)
to only catch extreme outliers
"""
if len(data) < 4: # Need at least 4 points for IQR
return []
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - multiplier * IQR
upper_bound = Q3 + multiplier * IQR
outliers = data[(data < lower_bound) | (data > upper_bound)]
return outliers.index.tolist()
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
data |
- | - | positional_or_keyword |
multiplier |
- | 3.0 | positional_or_keyword |
Parameter Details
data: A pandas Series containing numeric data to analyze for outliers. Must have at least 4 data points for meaningful IQR calculation. Should contain numeric values (int or float) that can be sorted and quantiled.
multiplier: A float value (default 3.0) that determines the sensitivity of outlier detection. Higher values (e.g., 3.0) are more lenient and catch only extreme outliers, while lower values (e.g., 1.5) are more strict and catch mild outliers. Must be a positive number. Common values are 1.5 (standard), 2.0 (moderate), or 3.0 (lenient).
Return Value
Returns a list of indices (as Python list) corresponding to the positions of detected outliers in the input data Series. If no outliers are found, returns an empty list. If the input data has fewer than 4 points, returns an empty list. The indices maintain their original data type from the pandas Series index (could be integers, strings, or datetime objects).
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
# Create sample data with outliers
data = pd.Series([10, 12, 11, 13, 12, 14, 11, 100, 13, 12, 15, 14])
# Detect extreme outliers with default multiplier (3.0)
outlier_indices = detect_outliers_iqr(data)
print(f"Outlier indices: {outlier_indices}")
print(f"Outlier values: {data[outlier_indices].tolist()}")
# Use stricter detection with lower multiplier
strict_outliers = detect_outliers_iqr(data, multiplier=1.5)
print(f"Strict outlier indices: {strict_outliers}")
# Example with named index
data_named = pd.Series([10, 12, 11, 13, 200], index=['a', 'b', 'c', 'd', 'e'])
outliers = detect_outliers_iqr(data_named)
print(f"Named index outliers: {outliers}")
Best Practices
- Ensure input data is a pandas Series with numeric values; the function will fail on non-numeric data
- The function requires at least 4 data points to calculate meaningful quartiles; fewer points will return an empty list
- The default multiplier of 3.0 is lenient and suitable for catching only extreme outliers; use 1.5 for standard outlier detection or 2.0 for moderate detection
- Consider the distribution of your data: IQR method works best with roughly symmetric distributions and may not be ideal for highly skewed data
- The returned indices can be used directly to filter or examine outliers: data[outlier_indices] or data.drop(outlier_indices)
- For time series data, consider whether outliers should be detected globally or within rolling windows
- Always visualize detected outliers (e.g., with box plots) to verify they make sense in your domain context
- Handle missing values (NaN) in the input data before calling this function, as they may affect quantile calculations
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function detect_outliers_iqr_v2 91.4% similar
-
function remove_outliers_iqr 88.6% similar
-
function remove_outliers 86.8% similar
-
function detect_outliers_iqr_v1 86.7% similar
-
function remove_outliers_iqr_v1 86.3% similar