Navigation

  • index
  • modules |
  • next |
  • previous |
  • pydit 0.2.00 documentation »
  • pydit »
  • pydit.wrangling »
  • pydit.wrangling.keyword_search_batch »
  • pydit.wrangling.keyword_search_batch.keyword_search

pydit.wrangling.keyword_search_batch.keyword_search¶

pydit.wrangling.keyword_search_batch.keyword_search(obj, keywords, columns=None, return_data='full', regexp=True, case_sensitive=False, labels=None, key_column=None)[source]¶

Searches the keywords in a dataframe or series and returns a matrix of matches

Creates a boolean column in the dataframe, one per keyword and a combined column that is True if any of the other columns is True. For simplicity by default we name columns sequentially, pushing keywords straight away as columns may yield error with special characters or duplicated/banned names. If you need labels there is an option to provide them.

Parameters:
  • obj (pandas.DataFrame or pandas.Series) – The dataframe or series to search

  • keywords (list) – The list of regular expressions or string keywords to search for.

  • columns (list) – The list of columns to search in, if None then all columns are searched

  • return_data (str, optional default="full") – If “full” then the full dataframe is returned, plus hit columns If “target” then the target columns and hits are returned, If “result” then only the boolean result columns will be returned, If “detail” then a dataframe with a hit per row is returned If you use “full_hits”, “target_hits” or “result_hits” then only hit rows are returned

  • regexp (bool, default True) – If True then the keywords are treated as regular expressions, otherwise a simpler string search is performed.

  • case_sensitive (bool, default False) – If True then the keywords are case sensitive. The most typical case is that we do NOT care about case sensitivity. Note: use case_sensitive=True and include special prefix (?i) in the regexp itself to disable case sensitivity. E.g. the same way you do re.findall(‘(?i)test’, s)

  • labels (list, optional) – The list of labels to use for the columns, if None then the labels are kw_match_NN. Labels must be the same length as the number of keywords. But they could be repeated and automagically will be grouped/rolled up.

  • key_column (str, optional, default=None) – If return_data=”detail”, this is the column to use as the key for the returned dataframe

Returns:

A copy of the dataframe with the new hit columns added or just the boolean columns for each keyword (depending on return_hit_columns_only) Plus a column kw_match_all that is True if any of the other columns is True.

Return type:

DataFrame

Table of Contents

  • pydit.wrangling.keyword_search_batch.keyword_search
    • keyword_search()

Previous topic

pydit.wrangling.keyword_search_batch

Next topic

pydit.wrangling.lookup_values

This Page

  • Show Source

Quick search

Navigation

  • index
  • modules |
  • next |
  • previous |
  • pydit 0.2.00 documentation »
  • pydit »
  • pydit.wrangling »
  • pydit.wrangling.keyword_search_batch »
  • pydit.wrangling.keyword_search_batch.keyword_search
© Copyright . Created using Sphinx 9.1.0.