pydit.wrangling.duplicates.check_duplicates¶
- pydit.wrangling.duplicates.check_duplicates(obj, columns=None, keep=False, ascending=None, add_indicator_column=False, also_return_non_duplicates=False, dropna=True, silent=False)[source]¶
Check for duplicates in a dataframe.
- Parameters:
obj (DataFrame or Series) – The dataframe or series to check for duplicates
columns (str or list, optional) – Column or list of column(s) to check even if it is one column only. If multiple columns provided the check is combined duplicates.
keep ('first','last' or False, optional) – Argument for pandas df.duplicated() method. Defaults to ‘first’.
ascending (True, False, boolean list with same len() as columns, or None, optional) – Sorting criteria to provide to DataFrame.sort_values() which runs just before the duplicates check. Defaults to None.
indicator (bool, optional) – If True, a boolean column is added to the dataframe to flag duplicate rows. Defaults to False
also_return_non_duplicates (bool, optional) – If True, the return values will include non-duplicate rows too.
dropna (bool, optional) – If True, the check will ignore NaN values. Defaults to True.
silent (bool) – Minimises outputs Defaults to False.
- Returns:
Returns the DataFrame with the duplicates or None if no duplicates found. If also_return_non_duplicates is True, the return values will include non-duplicate rows too.
- Return type:
pandas.DataFrame