pydit.wrangling.counts.count_related_key¶
- pydit.wrangling.counts.count_related_key(df1, df2, left_on='', right_on='', on='')[source]¶
Adds column in each df counting occurences of each key in the other dataframe
This works similar to adding countif() in Excel to sense check if an identifier in one sheet is fullly in another (presumably master), or if there are duplicated keys, orphans/gaps, etc.
This routine does both ways to quickly check whether the relationship is one to one, many to many etc.
Check also cross_check_key() which checks referential integrity and does this in a more conceptual way, but often you just want to add some counting numbers and filter for >1 or zeroes.
- Parameters:
df1 (DataFrame) – A pandas Dataframe object
df2 (DataFrame) – A pandas Dataframe object to compare against
left_on (str, optional, default "") – column to use as key for df1
right_on (str, optional, default "") – column to use as key for df2
on (str, optional, default "") – column to use as key for df1 and df2 if they are the same”
- Returns:
It returns a tuple of the two dataframes with a new column with the count of records found. In df1 it will be “count_[key2]” and in df2 it will be “count_[key1]”.
- Return type:
DataFrame