🔍 Code Extractor

function detect_outliers_iqr

Maturity: 48

Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py
Lines:
68 - 84
Complexity:
simple

Purpose

This function identifies data points that fall outside the bounds defined by Q1 - (multiplier × IQR) and Q3 + (multiplier × IQR), where Q1 and Q3 are the first and third quartiles. The default multiplier of 3.0 is more lenient than the standard 1.5, making it suitable for detecting only extreme outliers rather than mild ones. This is useful in data cleaning, anomaly detection, and exploratory data analysis where you want to identify only the most extreme values that might indicate data quality issues or genuinely unusual observations.

Source Code

def detect_outliers_iqr(data, multiplier=3.0):
    """
    Detect outliers using IQR method with a more lenient multiplier (3.0)
    to only catch extreme outliers
    """
    if len(data) < 4:  # Need at least 4 points for IQR
        return []
    
    Q1 = data.quantile(0.25)
    Q3 = data.quantile(0.75)
    IQR = Q3 - Q1
    
    lower_bound = Q1 - multiplier * IQR
    upper_bound = Q3 + multiplier * IQR
    
    outliers = data[(data < lower_bound) | (data > upper_bound)]
    return outliers.index.tolist()

Parameters

Name Type Default Kind
data - - positional_or_keyword
multiplier - 3.0 positional_or_keyword

Parameter Details

data: A pandas Series containing numeric data to analyze for outliers. Must have at least 4 data points for meaningful IQR calculation. Should contain numeric values (int or float) that can be sorted and quantiled.

multiplier: A float value (default 3.0) that determines the sensitivity of outlier detection. Higher values (e.g., 3.0) are more lenient and catch only extreme outliers, while lower values (e.g., 1.5) are more strict and catch mild outliers. Must be a positive number. Common values are 1.5 (standard), 2.0 (moderate), or 3.0 (lenient).

Return Value

Returns a list of indices (as Python list) corresponding to the positions of detected outliers in the input data Series. If no outliers are found, returns an empty list. If the input data has fewer than 4 points, returns an empty list. The indices maintain their original data type from the pandas Series index (could be integers, strings, or datetime objects).

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd

# Create sample data with outliers
data = pd.Series([10, 12, 11, 13, 12, 14, 11, 100, 13, 12, 15, 14])

# Detect extreme outliers with default multiplier (3.0)
outlier_indices = detect_outliers_iqr(data)
print(f"Outlier indices: {outlier_indices}")
print(f"Outlier values: {data[outlier_indices].tolist()}")

# Use stricter detection with lower multiplier
strict_outliers = detect_outliers_iqr(data, multiplier=1.5)
print(f"Strict outlier indices: {strict_outliers}")

# Example with named index
data_named = pd.Series([10, 12, 11, 13, 200], index=['a', 'b', 'c', 'd', 'e'])
outliers = detect_outliers_iqr(data_named)
print(f"Named index outliers: {outliers}")

Best Practices

  • Ensure input data is a pandas Series with numeric values; the function will fail on non-numeric data
  • The function requires at least 4 data points to calculate meaningful quartiles; fewer points will return an empty list
  • The default multiplier of 3.0 is lenient and suitable for catching only extreme outliers; use 1.5 for standard outlier detection or 2.0 for moderate detection
  • Consider the distribution of your data: IQR method works best with roughly symmetric distributions and may not be ideal for highly skewed data
  • The returned indices can be used directly to filter or examine outliers: data[outlier_indices] or data.drop(outlier_indices)
  • For time series data, consider whether outliers should be detected globally or within rolling windows
  • Always visualize detected outliers (e.g., with box plots) to verify they make sense in your domain context
  • Handle missing values (NaN) in the input data before calling this function, as they may affect quantile calculations

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function detect_outliers_iqr_v2 91.4% similar

    Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py
  • function remove_outliers_iqr 88.6% similar

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py
  • function remove_outliers 86.8% similar

    Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
  • function detect_outliers_iqr_v1 86.7% similar

    Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py
  • function remove_outliers_iqr_v1 86.3% similar

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a 3×IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/project_1/analysis.py
← Back to Browse