pydit.wrangling.merge.merge_outer_and_split¶

pydit.wrangling.merge.merge_outer_and_split(dffact, dfdim, on=None, left_on=None, right_on=None, suffixes=(None, None))[source]¶

Merge two dataframes, and keep the joinable 1:1 or 1:n the nan or unmatched are returned in separate dataframes or Excel sheets. Under the bonnet it will - filter nulls from the key columns (and store in a separate dataframe to return those exceptions - on the clean version will do an outer merge with indicator=True - will split the results into the 3 dataframes (both, left, right) - will write the results to an Excel file if specified

It will also check for duplicates in the key columns and exit if found in the right dataframe (presumed to be a dimension/ file ) or warn if they are in the left dataframe (presumably the transaction/fact file).

Parameters:

dffact (pandas.DataFrame) – The left dataframe
dfdim (pandas.DataFrame) – The right dataframe
left_on (list) – The list of key columns to join on in the left dataframe
right_on (list) – The list of key columns to join on in the right dataframe
suffixes (tuple, optional) – The suffixes to use for the left and right dataframes, by default (None,None)

Returns:

A tuple of dataframes, (both, left, right, left_na, right_na) both : the rows that matched left : the rows that matched but had nulls in the right dataframe right : the rows that matched but had nulls in the left dataframe left_na : the rows that have nulls in the specified key columns in the left dataframe right_na : the rows that have nulls in the specified key columns in the right dataframe

Return type:

tuple

Raises:

ValueError – If the right dataframe has duplicates in the key columns

pydit.wrangling.merge.merge_outer_and_split¶

Table of Contents

Previous topic

Next topic

This Page