pydit.wrangling.split_transactions.check_for_split_transactions

pydit.wrangling.split_transactions.check_for_split_transactions(df, limits, amount_col='amount', categ_col='supplier', date_col='date', tolerance_perc=0.01, tolerance_abs=100, days_horizon=30)[source]

checks for transactions that are just below a threshold

This function checks for transactions that are just below a threshold and returns a DataFrame with the original columns, sorted by category and date, flagging those transactions that would have accumulated a hit just below the threshold or going over the threshold, within the specified tolerance and days horizon.

Parameters:
  • df (pd.DataFrame) – The dataframe to check

  • limits (list or tuple) – The list of limits to check for, expressed in the same units as the amount column

  • amount_col (str) – The name of the column in the dataframe that contains the amounts

  • categ_col (str) – The name of the column in the dataframe that contains the categories e.g. supplier, submitter, etc.

  • date_col (str) – The name of the column in the dataframe that contains the dates

  • tolerance_perc (float) – The percentage tolerance to apply to the limits Default is 0.01

  • tolerance_abs (float) – The absolute tolerance to apply to the limits Default is 100

  • days_horizon (int) – The number of days to look back for the running total Default is 30

Returns:

A new DataFrame with the original columns, sorted (asc) by category and date, plus the following columns: - highest_limit_hit_just_below: the highest limit hit just below - highest_limit_hit_above: the highest limit hit just above - running_total: the running total of the amounts for the category

Return type:

pd.DataFrame