pydit.wrangling¶

Sub-package (./wrangling) containing the core data wrangling functionality.

The modules are also self standing, you should be able to copy any .py file and import it in your script to use it with no dependencies on other modules.

There may be some exceptions to this principle in the logging module, but you should be able to create your own logger object and run with it.

`anonymise`	Module for anonymising a key/identifier column
`blanks`	Checks for various types of nulls/blanks in a dataframe and returns counts.
`calendar_table`	Function to create a calendar DataFrame to be used as a lookup table
`cleanup_dataframe_columns_names`	Module for cleaning up column names of a DataFrame
`coalesce_dataframe_columns`	Function for coalescing columns in a pandas DataFrame.
`coalesce_dataframe_values`	Creates a new column with the top N most frequent values and the rest are replaced by Other
`collapse_dataframe_levels`	Implementation of the collapse_levels function.
`counts`	Module that implements a few useful count related functions Takes inspiration on the usual counta and countif functions in Excel
`date_time_calculations`	Module with functions for date and time calculations.
`duplicates`	Module for checking for duplicates in a dataframe.
`file_utils`	File utilities for saving and loading files
`fillna`	Improving on fillna() with options for various data types and opinionated defaults.
`fuzzy_matching`	Module with utility functions for fuzzy matching
`groupby_text_concatenate`	Groupby text column into concatenated text
`keyword_search_batch`	Functions to sweep a dataframe for keywords and return a matrix of matches.
`lookup_values`(df, key, df_ref, key_ref, ...)	Lookup values from a reference dataframe and return values from a column If the key is a list, it will return a list of values
`map_common_values`	Module to map/add various values like 1, 2, 3 to "High", "Medium", "Low".
`merge`	Module to merge dataframes with prefixes or suffixes for all fields not just those that have colissions.
`referential_integrity_check`	Module to perform referential integrity checks on two dataframes.
`sequence`	Module to check for numerical sequence of DataFrame column or Series
`split_transactions`	Utility functions to do analysis/detection of split purchases/expenses
`truncate_datetime`	Implementation of the truncate_datetime family of functions.
`various`	Utility functions, they are not used directly in the core functions.

pydit.wrangling¶

Previous topic

Next topic

This Page