pydit.statistics.benford

Module to compute the Benford’s Law frequencies for a column in a dataframe

This is an common audit test to find indications (non conclusive) of fraud or errors in the population The Benford’s Law is an expected distribution for the “first n digits” of a magnitude.

It applies to natural magnitudes (please do research before applying it), typically height of people, lenght of rivers, etc. Because it posit that low digits should be more common, it tends to highlight fabricated transactions as, to humans, it look more natural to create them with a mix of low and high digits (e.g a transaction starting with 9 or 8 are disproportionally less likely to occur according to Benford’s Law)

Also where there is an artificial limit (approvals are needed over a certain amount) there is a tendency to see higher number of transactions with high first digits (e.g. $4,980 vs $4,000 for a limit of $5,000)

Functions

benford_list_anomalies

Returns the Benford's Law frequencies expected and actual for a column of values.

benford_mad

Returns the Mean Absolute Deviation (MAD) of the Benford's Law frequencies.

benford_probability

Returns the Benford's Law probability for the first n digits provided

benford_to_dataframe

Returns a summary with the expected and actual Benford's Law frequency.

benford_to_plot

Plots the histogram with Benford's Law expected and the actual frequencies.