remove_outliers - Code Extractor

function remove_outliers

Maturity: 29

Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py

Lines:
26 - 32

Complexity:
simple

Purpose

This function identifies and filters out statistical outliers from a dataset using the IQR method, which is a robust statistical technique. It calculates the first quartile (Q1), third quartile (Q3), and IQR, then removes data points that fall outside 1.5 times the IQR below Q1 or above Q3. This is commonly used in data preprocessing and exploratory data analysis to clean datasets and improve model performance by removing extreme values that may skew results.

Source Code

def remove_outliers(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

Parameters

Name	Type	Default	Kind
`df`	-	-	positional_or_keyword
`column`	-	-	positional_or_keyword

Parameter Details

df: A pandas DataFrame containing the data to be filtered. Must be a valid DataFrame object with at least one numeric column.

column: String representing the name of the column in the DataFrame to check for outliers. The column must exist in the DataFrame and should contain numeric data (int or float) for quantile calculations to work properly.

Return Value

Returns a filtered pandas DataFrame containing only the rows where the specified column's values fall within the acceptable range (between lower_bound and upper_bound). The returned DataFrame maintains the same structure and columns as the input DataFrame but with fewer rows. If no outliers are found, returns the original DataFrame unchanged.

Dependencies

pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd

# Create sample data with outliers
data = {'values': [10, 12, 13, 14, 15, 16, 17, 18, 100, 200],
        'category': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']}
df = pd.DataFrame(data)

# Remove outliers from 'values' column
df_cleaned = remove_outliers(df, 'values')

print(f"Original shape: {df.shape}")
print(f"Cleaned shape: {df_cleaned.shape}")
print(df_cleaned)

Best Practices

Ensure the specified column contains numeric data before calling this function to avoid errors
Be aware that this function modifies the DataFrame by filtering rows, which may significantly reduce dataset size if many outliers exist
The 1.5 * IQR multiplier is a standard threshold, but consider creating a parameterized version if different sensitivity levels are needed
Always inspect the data before and after outlier removal to understand the impact on your dataset
This method assumes a roughly symmetric distribution; for highly skewed data, consider alternative outlier detection methods
The function returns a view/copy of the DataFrame, so the original DataFrame remains unchanged unless you reassign it
Consider handling missing values (NaN) in the column before applying this function, as they may affect quantile calculations

Similar Components

AI-powered semantic similarity - components with related functionality:

function remove_outliers_iqr_v1 95.8% similar

Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a 3×IQR threshold.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/project_1/analysis.py
function remove_outliers_iqr 95.1% similar

Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py
function detect_outliers_iqr_v2 88.5% similar

Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py
function detect_outliers_iqr 86.8% similar

Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py
function detect_outliers_iqr_v1 85.3% similar

Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py

🔍 Code Extractor

function remove_outliers

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function remove_outliers_iqr_v1 95.8% similar

function remove_outliers_iqr 95.1% similar

function detect_outliers_iqr_v2 88.5% similar

function detect_outliers_iqr 86.8% similar

function detect_outliers_iqr_v1 85.3% similar

function remove_outliers

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function remove_outliers_iqr_v1 95.8% similar

function remove_outliers_iqr 95.1% similar

function detect_outliers_iqr_v2 88.5% similar

function detect_outliers_iqr 86.8% similar

function detect_outliers_iqr_v1 85.3% similar

✨ Improve Code: remove_outliers

Code Comparison