🔍 Code Extractor

function remove_outliers_iqr_v1

Maturity: 42

Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a 3×IQR threshold.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/project_1/analysis.py
Lines:
74 - 85
Complexity:
simple

Purpose

This function identifies and removes statistical outliers from a specified column in a pandas DataFrame. It uses the IQR method, defining outliers as values that fall below Q1 - 3×IQR or above Q3 + 3×IQR. This is useful for data cleaning and preprocessing tasks where extreme values need to be filtered out to improve data quality and statistical analysis accuracy.

Source Code

def remove_outliers_iqr(data, column):
    """Remove outliers using IQR method"""
    Q1 = data[column].quantile(0.25)
    Q3 = data[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 3 * IQR
    upper_bound = Q3 + 3 * IQR
    
    outliers_mask = (data[column] < lower_bound) | (data[column] > upper_bound)
    n_outliers = outliers_mask.sum()
    
    return data[~outliers_mask], n_outliers

Parameters

Name Type Default Kind
data - - positional_or_keyword
column - - positional_or_keyword

Parameter Details

data: A pandas DataFrame containing the data to be processed. Must be a valid DataFrame object with at least one column.

column: String or column identifier specifying which column in the DataFrame to analyze for outliers. The column must exist in the DataFrame and contain numeric data suitable for quantile calculations.

Return Value

Returns a tuple containing two elements: (1) A pandas DataFrame with outlier rows removed from the original data, maintaining all columns but excluding rows where the specified column had outlier values, and (2) An integer representing the count of outlier rows that were removed.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd

def remove_outliers_iqr(data, column):
    Q1 = data[column].quantile(0.25)
    Q3 = data[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 3 * IQR
    upper_bound = Q3 + 3 * IQR
    outliers_mask = (data[column] < lower_bound) | (data[column] > upper_bound)
    n_outliers = outliers_mask.sum()
    return data[~outliers_mask], n_outliers

# Example usage
df = pd.DataFrame({
    'values': [10, 12, 13, 14, 15, 100, 11, 13, 14, 12],
    'category': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
})

cleaned_df, num_outliers = remove_outliers_iqr(df, 'values')
print(f'Removed {num_outliers} outliers')
print(cleaned_df)

Best Practices

  • Ensure the specified column contains numeric data before calling this function to avoid errors
  • The function uses a 3×IQR multiplier which is more conservative than the standard 1.5×IQR; adjust the multiplier in the code if different sensitivity is needed
  • Consider making a copy of your DataFrame before calling this function if you need to preserve the original data
  • Check the n_outliers return value to understand how much data was removed
  • This method assumes a roughly symmetric distribution; for highly skewed data, consider alternative outlier detection methods
  • The function removes entire rows where outliers are detected in the specified column, affecting all columns in the DataFrame

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function remove_outliers_iqr 97.3% similar

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py
  • function remove_outliers 95.8% similar

    Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
  • function detect_outliers_iqr_v2 87.7% similar

    Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py
  • function detect_outliers_iqr 86.3% similar

    Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py
  • function detect_outliers_iqr_v1 83.2% similar

    Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py
← Back to Browse