pydit.wrangling.fuzzy_matching.create_fuzzy_key

pydit.wrangling.fuzzy_matching.create_fuzzy_key(df, input_col, output_col='fuzzy_key', token_sort=None)[source]

Create a fuzzy key for a dataframe, note that this key preserves the spaces after tokenisation, thing this may work better when computing the lev distance. If you want a more compact string you need to tweak the code to set the clean_string function to remove spaces.

Parameters:
  • df (pd.DataFrame) – The dataframe to create the fuzzy key for

  • input_col (str) – The column to create the fuzzy key from

  • output_col (str, optional) – The column to create the fuzzy key to, by default “fuzzy_key”

  • token_sort (str, optional) – Whether to use a token sorting algorithm or not and rely on other libraries. Can be “token_set_sort”, “token_sort” or None

Returns:

The fuzzy key

Return type:

pandas.Series