pydit.wrangling.blanks.check_blanks¶
- pydit.wrangling.blanks.check_blanks(obj, columns=None, include_zeroes=False, include_nullstrings_and_spaces=False, totals_only=True, silent=False)[source]¶
Returns by default a summary dictionary with column names as key and count of blanks as value, for the columns selected (or all if no column list provided)
- If “total_only” is False it would return detailed information of the blanks
original/copied dataframe with:
one boolean column per input columns, True when there are blanks in that record
a summary boolean column if any of the previous is true
Check out https://github.com/ResidentMario/missingno library for a nice visualization (seems to come with Anaconda)
- Parameters:
obj (DataFrame or Series) – The dataframe or series to check for blanks
columns (list, optional, default None) – The columns to check for blanks. If None, all columns are checked.
include_zeroes (bool, optional, default False) – If True, checks for zeroes as blanks
include_nullstrings_and_spaces (bool, optional, default False) – If True, checks for null strings and spaces as blanks
totals_only (bool, optional, default False) – If True, only the total counts are returned
silent (bool, optional, default False) – If True, logging level set to critical, ie no info messages shown
- Returns:
A dataframe with the counts of blanks in each column. Or a summary dictionary with various counts.
- Return type:
pandas.DataFrame
See also
profile_dataframe,includesExamples
Basic usage with a DataFrame containing NaN values:
>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame({ ... 'A': [1, 2, None, 4], ... 'B': ['x', 'y', None, 'z'], ... 'C': [1.0, 2.0, 3.0, 4.0] ... }) >>> result = check_blanks(df, silent=True) >>> result['A'] 1 >>> result['B'] 1 >>> result['C'] 0
Test with specific columns:
>>> result = check_blanks(df, columns=['A', 'B'], silent=True) >>> len(result) 2 >>> 'C' in result False
Test including zeroes as blanks:
>>> df_zeros = pd.DataFrame({'A': [1, 0, 3], 'B': [0, 2, 0]}) >>> result = check_blanks(df_zeros, include_zeroes=True, silent=True) >>> result['A'] 1 >>> result['B'] 2
Test including null strings and spaces:
>>> df_strings = pd.DataFrame({ ... 'text': ['hello', '', ' ', 'world', None] ... }) >>> result = check_blanks(df_strings, include_nullstrings_and_spaces=True, silent=True) >>> result['text'] 3
Test with Series input:
>>> series = pd.Series([1, None, 3, None], name='my_series') >>> result = check_blanks(series, silent=True) >>> result['my_series'] 2
Test with totals_only=False to get detailed DataFrame:
>>> df_small = pd.DataFrame({'A': [1, None], 'B': [None, 2]}) >>> result = check_blanks(df_small, totals_only=False, silent=True) >>> 'A_blanks' in result.columns True >>> 'has_blanks' in result.columns True >>> int(result['has_blanks'].sum()) 2