detect_outliers_zscore - Code Extractor

function detect_outliers_zscore

Maturity: 40

Detects outliers in numerical data using the Z-score statistical method, identifying data points that deviate significantly from the mean.

File:
/tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py

Lines:
86 - 93

Complexity:
simple

Purpose

This function identifies outliers in a dataset by calculating Z-scores (standard deviations from the mean) for each data point and flagging those that exceed a specified threshold. It's commonly used in data cleaning, anomaly detection, and statistical analysis to identify unusual observations that may warrant further investigation or removal. The Z-score method assumes the data follows a normal distribution and is most effective for univariate outlier detection.

Source Code

def detect_outliers_zscore(data, threshold=3):
    """
    Detect outliers using Z-score method
    Returns: indices of outliers
    """
    z_scores = np.abs(stats.zscore(data))
    outliers = z_scores > threshold
    return outliers

Parameters

Name	Type	Default	Kind
`data`	-	-	positional_or_keyword
`threshold`	-	3	positional_or_keyword

Parameter Details

data: A 1D array-like object (list, numpy array, pandas Series) containing numerical values to analyze for outliers. Should contain numeric data without NaN values for accurate Z-score calculation. The data should ideally follow a normal distribution for best results.

threshold: A positive numeric value (default=3) representing the number of standard deviations from the mean beyond which a data point is considered an outlier. Common values are 2 (95% confidence), 2.5, or 3 (99.7% confidence). Higher thresholds result in fewer outliers being detected.

Return Value

Returns a boolean numpy array of the same length as the input data, where True indicates an outlier at that index position and False indicates a normal value. Despite the docstring saying 'indices of outliers', the function actually returns a boolean mask that can be used to filter or index the original data.

Dependencies

numpy
scipy

Required Imports

import numpy as np
from scipy import stats

Usage Example

import numpy as np
from scipy import stats

def detect_outliers_zscore(data, threshold=3):
    z_scores = np.abs(stats.zscore(data))
    outliers = z_scores > threshold
    return outliers

# Example usage
data = [10, 12, 11, 13, 12, 100, 11, 12, 14, 13]
outlier_mask = detect_outliers_zscore(data, threshold=3)
print(f"Outlier mask: {outlier_mask}")
print(f"Outlier values: {np.array(data)[outlier_mask]}")
print(f"Outlier indices: {np.where(outlier_mask)[0]}")

# With pandas Series
import pandas as pd
df = pd.DataFrame({'values': data})
df['is_outlier'] = detect_outliers_zscore(df['values'])
print(df)

Best Practices

Ensure input data does not contain NaN or infinite values, as these will cause scipy.stats.zscore to fail or return NaN
The Z-score method assumes data follows a normal distribution; consider using alternative methods (IQR, modified Z-score) for skewed distributions
A threshold of 3 is standard (99.7% confidence interval), but adjust based on your domain and tolerance for false positives
The function returns a boolean mask, not indices; use np.where(outliers)[0] to get actual indices
For small datasets (n < 30), Z-score method may not be reliable; consider using other outlier detection methods
Consider handling outliers appropriately after detection (removal, transformation, or investigation) rather than automatic deletion
For multivariate outlier detection, consider using Mahalanobis distance or other multivariate methods instead

Similar Components

AI-powered semantic similarity - components with related functionality:

function detect_outliers_iqr_v2 65.7% similar

Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py
function detect_outliers_iqr_v1 65.2% similar

Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py
function detect_outliers_iqr 59.8% similar

Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py
function remove_outliers_iqr 58.1% similar

Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py
function remove_outliers 57.3% similar

Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.
From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py

🔍 Code Extractor

function detect_outliers_zscore

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function detect_outliers_iqr_v2 65.7% similar

function detect_outliers_iqr_v1 65.2% similar

function detect_outliers_iqr 59.8% similar

function remove_outliers_iqr 58.1% similar

function remove_outliers 57.3% similar

function detect_outliers_zscore

Purpose

Source Code

Parameters

Parameter Details

Return Value

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function detect_outliers_iqr_v2 65.7% similar

function detect_outliers_iqr_v1 65.2% similar

function detect_outliers_iqr 59.8% similar

function remove_outliers_iqr 58.1% similar

function remove_outliers 57.3% similar

✨ Improve Code: detect_outliers_zscore

Code Comparison