🔍 Code Extractor

function detect_outliers_iqr_v2

Maturity: 46

Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py
Lines:
82 - 91
Complexity:
simple

Purpose

This function identifies outliers in numerical data by calculating the first quartile (Q1), third quartile (Q3), and interquartile range (IQR), then flagging values that fall below Q1 - 3×IQR or above Q3 + 3×IQR. The 3×IQR multiplier provides more conservative outlier detection compared to the standard 1.5×IQR, reducing false positives. Useful for data cleaning, exploratory data analysis, and identifying anomalies in statistical datasets.

Source Code

def detect_outliers_iqr(data, column_name):
    """Detect outliers using IQR method"""
    Q1 = data.quantile(0.25)
    Q3 = data.quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 3 * IQR  # Using 3*IQR for more conservative outlier detection
    upper_bound = Q3 + 3 * IQR
    
    outliers = data[(data < lower_bound) | (data > upper_bound)]
    return outliers, lower_bound, upper_bound

Parameters

Name Type Default Kind
data - - positional_or_keyword
column_name - - positional_or_keyword

Parameter Details

data: A pandas Series or array-like numerical data structure containing the values to analyze for outliers. Should contain numeric data types (int, float). Missing values (NaN) may affect quantile calculations.

column_name: String representing the name of the column being analyzed. Note: This parameter is declared but NOT used in the function implementation - it appears to be a vestigial parameter that may have been intended for logging or labeling purposes but is not functionally necessary.

Return Value

Returns a tuple containing three elements: (1) outliers - a pandas Series or filtered data structure containing only the values identified as outliers, (2) lower_bound - a numeric value representing the lower threshold (Q1 - 3×IQR), below which values are considered outliers, (3) upper_bound - a numeric value representing the upper threshold (Q3 + 3×IQR), above which values are considered outliers.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd
import numpy as np

# Create sample data with outliers
data = pd.Series([10, 12, 13, 12, 11, 14, 100, 13, 12, 11, 15, -50, 14])

# Detect outliers
outliers, lower_bound, upper_bound = detect_outliers_iqr(data, 'sample_column')

print(f"Lower bound: {lower_bound}")
print(f"Upper bound: {upper_bound}")
print(f"Outliers detected: {len(outliers)}")
print(f"Outlier values:\n{outliers}")

# Use with DataFrame column
df = pd.DataFrame({'values': [10, 12, 13, 12, 11, 14, 100, 13, 12, 11, 15, -50, 14]})
outliers, lb, ub = detect_outliers_iqr(df['values'], 'values')
print(f"\nOutliers from DataFrame: {outliers.tolist()}")

Best Practices

  • The 'column_name' parameter is not used in the function body and can be omitted when calling, though it's part of the signature
  • Ensure input data is numeric; non-numeric data will cause errors during quantile calculations
  • Handle missing values (NaN) before calling this function, as they can affect quantile calculations
  • The function uses 3×IQR instead of the standard 1.5×IQR, making it more conservative (fewer outliers detected)
  • Consider the context of your data when interpreting results - not all statistical outliers are errors or anomalies
  • For small datasets (n < 30), IQR method may not be reliable; consider alternative methods
  • The returned outliers maintain their original index from the input data, useful for tracking which observations are outliers
  • If you need standard outlier detection, modify the multiplier from 3 to 1.5 in the bounds calculations

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function detect_outliers_iqr_v1 93.2% similar

    Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py
  • function detect_outliers_iqr 91.4% similar

    Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py
  • function remove_outliers_iqr 89.6% similar

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py
  • function remove_outliers 88.5% similar

    Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
  • function remove_outliers_iqr_v1 87.7% similar

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a 3×IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/project_1/analysis.py
← Back to Browse