function clean_for_json
Recursively traverses and sanitizes Python data structures (dicts, lists, tuples, numpy arrays) to ensure all values are JSON-serializable by converting numpy types, handling NaN/Inf values, and normalizing data types.
/tf/active/vicechatdev/vice_ai/smartstat_scripts/f0b81d95-24d9-418a-8d9f-1b241684e64c/project_1/analysis.py
472 - 494
moderate
Purpose
This function prepares complex Python data structures containing numpy arrays, pandas objects, and various numeric types for JSON serialization. It handles edge cases like NaN, Infinity, numpy-specific types, and nested structures. Common use cases include preparing data analysis results for API responses, saving computation results to JSON files, or transmitting scientific computing data over web services.
Source Code
def clean_for_json(obj):
"""Recursively clean data structure for JSON serialization"""
if isinstance(obj, dict):
return {str(k): clean_for_json(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [clean_for_json(item) for item in obj]
elif isinstance(obj, tuple):
return clean_for_json(list(obj))
elif isinstance(obj, (np.integer, np.int64, np.int32)):
return int(obj)
elif isinstance(obj, (np.floating, np.float64, np.float32)):
if math.isnan(obj) or math.isinf(obj):
return None
return float(obj)
elif isinstance(obj, np.ndarray):
return clean_for_json(obj.tolist())
elif isinstance(obj, float):
if math.isnan(obj) or math.isinf(obj):
return None
return obj
elif pd.isna(obj):
return None
return obj
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
obj |
- | - | positional_or_keyword |
Parameter Details
obj: Any Python object to be cleaned for JSON serialization. Can be a primitive type (int, float, str), collection (dict, list, tuple), numpy array (np.ndarray), numpy scalar (np.integer, np.floating), pandas NA value, or nested combinations of these types. The function recursively processes nested structures.
Return Value
Returns a JSON-serializable version of the input object. Dictionaries have string keys and cleaned values; lists and tuples become lists with cleaned elements; numpy arrays are converted to nested lists; numpy numeric types become Python int/float; NaN and Inf values become None; pandas NA values become None; all other values are returned unchanged. The return type matches the structure of the input but with all values converted to JSON-compatible types.
Dependencies
numpypandasmath
Required Imports
import numpy as np
import pandas as pd
import math
Usage Example
import numpy as np
import pandas as pd
import math
import json
def clean_for_json(obj):
if isinstance(obj, dict):
return {str(k): clean_for_json(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [clean_for_json(item) for item in obj]
elif isinstance(obj, tuple):
return clean_for_json(list(obj))
elif isinstance(obj, (np.integer, np.int64, np.int32)):
return int(obj)
elif isinstance(obj, (np.floating, np.float64, np.float32)):
if math.isnan(obj) or math.isinf(obj):
return None
return float(obj)
elif isinstance(obj, np.ndarray):
return clean_for_json(obj.tolist())
elif isinstance(obj, float):
if math.isnan(obj) or math.isinf(obj):
return None
return obj
elif pd.isna(obj):
return None
return obj
# Example usage
data = {
'array': np.array([1, 2, 3]),
'float': np.float64(3.14),
'nan': float('nan'),
'inf': float('inf'),
'nested': {
'tuple': (1, 2, 3),
'pd_na': pd.NA
}
}
cleaned = clean_for_json(data)
print(json.dumps(cleaned, indent=2))
# Output:
# {
# "array": [1, 2, 3],
# "float": 3.14,
# "nan": null,
# "inf": null,
# "nested": {
# "tuple": [1, 2, 3],
# "pd_na": null
# }
# }
Best Practices
- Always use this function before calling json.dumps() on data structures containing numpy or pandas objects to avoid TypeError exceptions
- Be aware that NaN and Infinity values are converted to None (null in JSON), which may affect downstream data analysis
- Dictionary keys are converted to strings, so numeric keys will lose their original type
- Tuples are converted to lists in the output, losing the immutability property
- For large numpy arrays, consider the memory implications of converting to nested Python lists
- The function does not handle custom objects or classes - these will be returned unchanged and may still cause JSON serialization errors
- Consider validating the output with json.dumps() to ensure complete serializability if dealing with unknown data types
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function clean_for_json_v7 95.9% similar
-
function clean_for_json_v12 94.1% similar
-
function clean_for_json_v15 93.1% similar
-
function clean_for_json_v8 92.8% similar
-
function clean_for_json_v13 92.3% similar