pydit.wrangling.duplicates.check_duplicates

pydit.wrangling.duplicates.check_duplicates(obj, columns=None, keep=False, ascending=None, add_indicator_column=False, also_return_non_duplicates=False, dropna=True, silent=False)[source]

Check for duplicates in a dataframe.

Parameters:
  • obj (DataFrame or Series) – The dataframe or series to check for duplicates

  • columns (str or list, optional) – Column or list of column(s) to check even if it is one column only. If multiple columns provided the check is combined duplicates.

  • keep ('first','last' or False, optional) – Argument for pandas df.duplicated() method. Defaults to ‘first’.

  • ascending (True, False, boolean list with same len() as columns, or None, optional) – Sorting criteria to provide to DataFrame.sort_values() which runs just before the duplicates check. Defaults to None.

  • indicator (bool, optional) – If True, a boolean column is added to the dataframe to flag duplicate rows. Defaults to False

  • also_return_non_duplicates (bool, optional) – If True, the return values will include non-duplicate rows too.

  • dropna (bool, optional) – If True, the check will ignore NaN values. Defaults to True.

  • silent (bool) – Minimises outputs Defaults to False.

Returns:

Returns the DataFrame with the duplicates or None if no duplicates found. If also_return_non_duplicates is True, the return values will include non-duplicate rows too.

Return type:

pandas.DataFrame