🔍 Code Extractor

function detect_outliers_iqr_v1

Maturity: 47

Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py
Lines:
72 - 83
Complexity:
simple

Purpose

This function implements the IQR outlier detection method, a robust statistical technique for identifying data points that fall outside the typical range. It calculates the first quartile (Q1), third quartile (Q3), and IQR (Q3-Q1), then determines lower and upper bounds using a multiplier (default 1.5). Data points below the lower bound or above the upper bound are flagged as outliers. This is commonly used in exploratory data analysis, data cleaning, and preprocessing pipelines to identify anomalous values that may need special handling or removal.

Source Code

def detect_outliers_iqr(data, multiplier=1.5):
    """
    Detect outliers using IQR method
    Returns: indices of outliers, lower bound, upper bound
    """
    Q1 = data.quantile(0.25)
    Q3 = data.quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - multiplier * IQR
    upper_bound = Q3 + multiplier * IQR
    outliers = (data < lower_bound) | (data > upper_bound)
    return outliers, lower_bound, upper_bound

Parameters

Name Type Default Kind
data - - positional_or_keyword
multiplier - 1.5 positional_or_keyword

Parameter Details

data: A pandas Series or DataFrame column containing numerical data to analyze for outliers. Must support .quantile() method. Expected to contain numeric values (int or float). Missing values (NaN) should be handled before calling this function.

multiplier: A numeric value (float or int) that controls the sensitivity of outlier detection. Default is 1.5 (standard IQR method). Lower values (e.g., 1.0) make detection more aggressive, flagging more outliers. Higher values (e.g., 3.0) make it more conservative. Typical range is 1.0 to 3.0. Must be positive.

Return Value

Returns a tuple containing three elements: (1) outliers - a pandas Series of boolean values with the same index as input data, where True indicates an outlier and False indicates a normal value; (2) lower_bound - a numeric value (float) representing the minimum threshold below which values are considered outliers; (3) upper_bound - a numeric value (float) representing the maximum threshold above which values are considered outliers.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd
import numpy as np

# Create sample data with outliers
data = pd.Series([10, 12, 12, 13, 12, 11, 14, 13, 15, 10, 10, 100, 12, 14, 13])

# Detect outliers using default multiplier (1.5)
outliers, lower, upper = detect_outliers_iqr(data)

print(f"Lower bound: {lower}")
print(f"Upper bound: {upper}")
print(f"Number of outliers: {outliers.sum()}")
print(f"Outlier values: {data[outliers].values}")

# Use more conservative detection (multiplier=3.0)
outliers_conservative, lower_c, upper_c = detect_outliers_iqr(data, multiplier=3.0)
print(f"Conservative outliers: {outliers_conservative.sum()}")

# Filter out outliers from dataset
clean_data = data[~outliers]

Best Practices

  • Handle missing values (NaN) in the data before calling this function, as they may affect quartile calculations
  • The default multiplier of 1.5 is standard for IQR method; use 3.0 for more conservative outlier detection (fewer outliers flagged)
  • This method works best with normally distributed or near-normal data; consider other methods for highly skewed distributions
  • Always inspect the lower_bound and upper_bound values to ensure they make sense for your domain
  • Use the returned boolean Series to filter data: clean_data = data[~outliers]
  • Consider visualizing outliers using box plots or scatter plots before removing them
  • Document the multiplier value used in your analysis for reproducibility
  • This method is univariate; for multivariate outlier detection, apply to each column separately or use multivariate methods

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function detect_outliers_iqr_v2 93.2% similar

    Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py
  • function detect_outliers_iqr 86.7% similar

    Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py
  • function remove_outliers 85.3% similar

    Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
  • function remove_outliers_iqr 83.9% similar

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py
  • function remove_outliers_iqr_v1 83.2% similar

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a 3×IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/project_1/analysis.py
← Back to Browse