function detect_outliers_iqr_v2
Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py
82 - 91
simple
Purpose
This function identifies outliers in numerical data by calculating the first quartile (Q1), third quartile (Q3), and interquartile range (IQR), then flagging values that fall below Q1 - 3×IQR or above Q3 + 3×IQR. The 3×IQR multiplier provides more conservative outlier detection compared to the standard 1.5×IQR, reducing false positives. Useful for data cleaning, exploratory data analysis, and identifying anomalies in statistical datasets.
Source Code
def detect_outliers_iqr(data, column_name):
"""Detect outliers using IQR method"""
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 3 * IQR # Using 3*IQR for more conservative outlier detection
upper_bound = Q3 + 3 * IQR
outliers = data[(data < lower_bound) | (data > upper_bound)]
return outliers, lower_bound, upper_bound
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
data |
- | - | positional_or_keyword |
column_name |
- | - | positional_or_keyword |
Parameter Details
data: A pandas Series or array-like numerical data structure containing the values to analyze for outliers. Should contain numeric data types (int, float). Missing values (NaN) may affect quantile calculations.
column_name: String representing the name of the column being analyzed. Note: This parameter is declared but NOT used in the function implementation - it appears to be a vestigial parameter that may have been intended for logging or labeling purposes but is not functionally necessary.
Return Value
Returns a tuple containing three elements: (1) outliers - a pandas Series or filtered data structure containing only the values identified as outliers, (2) lower_bound - a numeric value representing the lower threshold (Q1 - 3×IQR), below which values are considered outliers, (3) upper_bound - a numeric value representing the upper threshold (Q3 + 3×IQR), above which values are considered outliers.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
import numpy as np
# Create sample data with outliers
data = pd.Series([10, 12, 13, 12, 11, 14, 100, 13, 12, 11, 15, -50, 14])
# Detect outliers
outliers, lower_bound, upper_bound = detect_outliers_iqr(data, 'sample_column')
print(f"Lower bound: {lower_bound}")
print(f"Upper bound: {upper_bound}")
print(f"Outliers detected: {len(outliers)}")
print(f"Outlier values:\n{outliers}")
# Use with DataFrame column
df = pd.DataFrame({'values': [10, 12, 13, 12, 11, 14, 100, 13, 12, 11, 15, -50, 14]})
outliers, lb, ub = detect_outliers_iqr(df['values'], 'values')
print(f"\nOutliers from DataFrame: {outliers.tolist()}")
Best Practices
- The 'column_name' parameter is not used in the function body and can be omitted when calling, though it's part of the signature
- Ensure input data is numeric; non-numeric data will cause errors during quantile calculations
- Handle missing values (NaN) before calling this function, as they can affect quantile calculations
- The function uses 3×IQR instead of the standard 1.5×IQR, making it more conservative (fewer outliers detected)
- Consider the context of your data when interpreting results - not all statistical outliers are errors or anomalies
- For small datasets (n < 30), IQR method may not be reliable; consider alternative methods
- The returned outliers maintain their original index from the input data, useful for tracking which observations are outliers
- If you need standard outlier detection, modify the multiplier from 3 to 1.5 in the bounds calculations
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function detect_outliers_iqr_v1 93.2% similar
-
function detect_outliers_iqr 91.4% similar
-
function remove_outliers_iqr 89.6% similar
-
function remove_outliers 88.5% similar
-
function remove_outliers_iqr_v1 87.7% similar