Navigation

  • index
  • modules |
  • next |
  • previous |
  • pydit 0.2.00 documentation »
  • pydit »
  • pydit.wrangling »
  • pydit.wrangling.counts »
  • pydit.wrangling.counts.count_related_key

pydit.wrangling.counts.count_related_key¶

pydit.wrangling.counts.count_related_key(df1, df2, left_on='', right_on='', on='')[source]¶

Adds column in each df counting occurences of each key in the other dataframe

This works similar to adding countif() in Excel to sense check if an identifier in one sheet is fullly in another (presumably master), or if there are duplicated keys, orphans/gaps, etc.

This routine does both ways to quickly check whether the relationship is one to one, many to many etc.

Check also cross_check_key() which checks referential integrity and does this in a more conceptual way, but often you just want to add some counting numbers and filter for >1 or zeroes.

Parameters:
  • df1 (DataFrame) – A pandas Dataframe object

  • df2 (DataFrame) – A pandas Dataframe object to compare against

  • left_on (str, optional, default "") – column to use as key for df1

  • right_on (str, optional, default "") – column to use as key for df2

  • on (str, optional, default "") – column to use as key for df1 and df2 if they are the same”

Returns:

It returns a tuple of the two dataframes with a new column with the count of records found. In df1 it will be “count_[key2]” and in df2 it will be “count_[key1]”.

Return type:

DataFrame

Table of Contents

  • pydit.wrangling.counts.count_related_key
    • count_related_key()

Previous topic

pydit.wrangling.counts.count_notna

Next topic

pydit.wrangling.counts.count_values_in_col

This Page

  • Show Source

Quick search

Navigation

  • index
  • modules |
  • next |
  • previous |
  • pydit 0.2.00 documentation »
  • pydit »
  • pydit.wrangling »
  • pydit.wrangling.counts »
  • pydit.wrangling.counts.count_related_key
© Copyright . Created using Sphinx 9.1.0.