remove_outliers_iqr_v1 - Code Extractor

function remove_outliers_iqr_v1

Maturity: 42

Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a 3×IQR threshold.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/project_1/analysis.py

Lines:
74 - 85

Complexity:
simple

Purpose

This function identifies and removes statistical outliers from a specified column in a pandas DataFrame. It uses the IQR method, defining outliers as values that fall below Q1 - 3×IQR or above Q3 + 3×IQR. This is useful for data cleaning and preprocessing tasks where extreme values need to be filtered out to improve data quality and statistical analysis accuracy.

Source Code

def remove_outliers_iqr(data, column):
    """Remove outliers using IQR method"""
    Q1 = data[column].quantile(0.25)
    Q3 = data[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 3 * IQR
    upper_bound = Q3 + 3 * IQR
    
    outliers_mask = (data[column] < lower_bound) | (data[column] > upper_bound)
    n_outliers = outliers_mask.sum()
    
    return data[~outliers_mask], n_outliers

Parameters

Name	Type	Default	Kind
`data`	-	-	positional_or_keyword
`column`	-	-	positional_or_keyword

Parameter Details

data: A pandas DataFrame containing the data to be processed. Must be a valid DataFrame object with at least one column.

column: String or column identifier specifying which column in the DataFrame to analyze for outliers. The column must exist in the DataFrame and contain numeric data suitable for quantile calculations.

Return Value

Returns a tuple containing two elements: (1) A pandas DataFrame with outlier rows removed from the original data, maintaining all columns but excluding rows where the specified column had outlier values, and (2) An integer representing the count of outlier rows that were removed.

Dependencies

pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd

def remove_outliers_iqr(data, column):
    Q1 = data[column].quantile(0.25)
    Q3 = data[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 3 * IQR
    upper_bound = Q3 + 3 * IQR
    outliers_mask = (data[column] < lower_bound) | (data[column] > upper_bound)
    n_outliers = outliers_mask.sum()
    return data[~outliers_mask], n_outliers

# Example usage
df = pd.DataFrame({
    'values': [10, 12, 13, 14, 15, 100, 11, 13, 14, 12],
    'category': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
})

cleaned_df, num_outliers = remove_outliers_iqr(df, 'values')
print(f'Removed {num_outliers} outliers')
print(cleaned_df)

Best Practices

Ensure the specified column contains numeric data before calling this function to avoid errors
The function uses a 3×IQR multiplier which is more conservative than the standard 1.5×IQR; adjust the multiplier in the code if different sensitivity is needed
Consider making a copy of your DataFrame before calling this function if you need to preserve the original data
Check the n_outliers return value to understand how much data was removed
This method assumes a roughly symmetric distribution; for highly skewed data, consider alternative outlier detection methods
The function removes entire rows where outliers are detected in the specified column, affecting all columns in the DataFrame

Similar Components

AI-powered semantic similarity - components with related functionality:

function remove_outliers_iqr 97.3% similar

Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py
function remove_outliers 95.8% similar

Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py
function detect_outliers_iqr_v2 87.7% similar

Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py
function detect_outliers_iqr 86.3% similar

Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py
function detect_outliers_iqr_v1 83.2% similar

Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py

🔍 Code Extractor

function remove_outliers_iqr_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function remove_outliers_iqr 97.3% similar

function remove_outliers 95.8% similar

function detect_outliers_iqr_v2 87.7% similar

function detect_outliers_iqr 86.3% similar

function detect_outliers_iqr_v1 83.2% similar

function remove_outliers_iqr_v1

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function remove_outliers_iqr 97.3% similar

function remove_outliers 95.8% similar

function detect_outliers_iqr_v2 87.7% similar

function detect_outliers_iqr 86.3% similar

function detect_outliers_iqr_v1 83.2% similar

✨ Improve Code: remove_outliers_iqr_v1

Code Comparison