function is_dataframe
Checks whether the supplied data object is a pandas DataFrame or a Dask DataFrame, with support for lazy imports of both libraries.
/tf/active/vicechatdev/patches/util.py
1477 - 1485
simple
Purpose
This utility function provides a safe way to determine if an object is a DataFrame from either pandas or Dask libraries. It handles cases where these libraries may or may not be imported in the current module, checking sys.modules before attempting to use them. This is particularly useful in libraries that support multiple DataFrame implementations without requiring all dependencies to be installed.
Source Code
def is_dataframe(data):
"""
Checks whether the supplied data is of DataFrame type.
"""
dd = None
if 'dask.dataframe' in sys.modules and 'pandas' in sys.modules:
import dask.dataframe as dd
return((pd is not None and isinstance(data, pd.DataFrame)) or
(dd is not None and isinstance(data, dd.DataFrame)))
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
data |
- | - | positional_or_keyword |
Parameter Details
data: Any Python object to be checked. Typically expected to be a DataFrame-like object, but can be any type. No type constraints are enforced at the parameter level.
Return Value
Returns a boolean value: True if the data is an instance of pandas.DataFrame or dask.dataframe.DataFrame, False otherwise. Returns False if neither pandas nor dask.dataframe are available in sys.modules.
Dependencies
syspandasdask
Required Imports
import sys
import pandas as pd
Conditional/Optional Imports
These imports are only needed under specific conditions:
import dask.dataframe as dd
Condition: only if dask.dataframe is already loaded in sys.modules and pandas is available
OptionalUsage Example
import sys
import pandas as pd
# Assuming the function is defined or imported
def is_dataframe(data):
dd = None
if 'dask.dataframe' in sys.modules and 'pandas' in sys.modules:
import dask.dataframe as dd
return((pd is not None and isinstance(data, pd.DataFrame)) or
(dd is not None and isinstance(data, dd.DataFrame)))
# Example usage
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
print(is_dataframe(df)) # Output: True
my_list = [1, 2, 3]
print(is_dataframe(my_list)) # Output: False
# With Dask (if available)
try:
import dask.dataframe as dd
dask_df = dd.from_pandas(df, npartitions=2)
print(is_dataframe(dask_df)) # Output: True
except ImportError:
print('Dask not available')
Best Practices
- This function assumes that 'pd' is already imported as a global variable (import pandas as pd) in the module where it's defined
- The function uses lazy importing for dask.dataframe to avoid import errors when dask is not installed
- The function checks sys.modules before attempting to import dask.dataframe, which is more efficient than try-except blocks
- Note that the function relies on the global 'pd' variable being defined; if pandas is not imported in the calling context, this will raise a NameError
- This function is best used in libraries that want to support both pandas and dask without making dask a hard requirement
- The function does not check for DataFrame-like objects from other libraries (e.g., polars, cudf) - it only checks pandas and dask
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function is_series 82.3% similar
-
function is_dask_array 70.7% similar
-
function is_cupy_array 62.2% similar
-
function isdatetime 57.7% similar
-
function load_dataset 52.2% similar