pydit.wrangling.fuzzy_matching.create_fuzzy_key¶
- pydit.wrangling.fuzzy_matching.create_fuzzy_key(df, input_col, output_col='fuzzy_key', token_sort=None)[source]¶
Create a fuzzy key for a dataframe, note that this key preserves the spaces after tokenisation, thing this may work better when computing the lev distance. If you want a more compact string you need to tweak the code to set the clean_string function to remove spaces.
- Parameters:
df (pd.DataFrame) – The dataframe to create the fuzzy key for
input_col (str) – The column to create the fuzzy key from
output_col (str, optional) – The column to create the fuzzy key to, by default “fuzzy_key”
token_sort (str, optional) – Whether to use a token sorting algorithm or not and rely on other libraries. Can be “token_set_sort”, “token_sort” or None
- Returns:
The fuzzy key
- Return type:
pandas.Series