pydit.wrangling.duplicates

Module for checking for duplicates in a dataframe.

It wraps the pandas.DataFrame.duplicated() to perform a more “end to end” check with logging and informational messages. This can be useful in an audit scenario as we tend to have to do a lot of duplicate checks in intermediate files.

Functions

check_duplicates

Check for duplicates in a dataframe.