pydit.wrangling.split_transactions.check_for_split_transactions¶
- pydit.wrangling.split_transactions.check_for_split_transactions(df, limits, amount_col='amount', categ_col='supplier', date_col='date', tolerance_perc=0.01, tolerance_abs=100, days_horizon=30)[source]¶
checks for transactions that are just below a threshold
This function checks for transactions that are just below a threshold and returns a DataFrame with the original columns, sorted by category and date, flagging those transactions that would have accumulated a hit just below the threshold or going over the threshold, within the specified tolerance and days horizon.
- Parameters:
df (pd.DataFrame) – The dataframe to check
limits (list or tuple) – The list of limits to check for, expressed in the same units as the amount column
amount_col (str) – The name of the column in the dataframe that contains the amounts
categ_col (str) – The name of the column in the dataframe that contains the categories e.g. supplier, submitter, etc.
date_col (str) – The name of the column in the dataframe that contains the dates
tolerance_perc (float) – The percentage tolerance to apply to the limits Default is 0.01
tolerance_abs (float) – The absolute tolerance to apply to the limits Default is 100
days_horizon (int) – The number of days to look back for the running total Default is 30
- Returns:
A new DataFrame with the original columns, sorted (asc) by category and date, plus the following columns: - highest_limit_hit_just_below: the highest limit hit just below - highest_limit_hit_above: the highest limit hit just above - running_total: the running total of the amounts for the category
- Return type:
pd.DataFrame