'Is there a non-looping way to perform text searching in a data frame

I have a huge list of ngrams to search. I want to know what frequency they have on my historic dataframe and the mean of a numeric variable that I have on my historic. I have a really really ugly way of doing it (that works), but as the list of ngrams is huge, it's really slow.

I am trying to avoid doing the loop, as I guess is the main reason of my velocity problem, but I don't see how I can do it.

Any idea?

output = pd.DataFrame()

ngrams = ['ngram1', 'ngram2', 'ngram3', ..., 'ngram350000']

for i in list(ngrams):
    temp = pd.DataFrame(data={'ngram' : [i],
                              'count' : historic_df['text_variable'].str.contains(i, na=False).sum(),
                              'mean' : historic_df[historic_df['text_variable'].str.contains(i, na=False)]['numeric_variable'].mean()})
    output = pd.concat([output, temp], axis=0)


Solution 1:[1]

Try DataFrame.apply()

def func(x):
    temp = pd.DataFrame(data={'ngram' : [i],
                              'count' : historic_df['text_variable'].str.contains(i, na=False).sum(),
                              'mean' : historic_df[historic_df['text_variable'].str.contains(i, na=False)]['numeric_variable'].mean()})
    output = pd.concat([output, temp], axis=0)
    return x

output = pd.DataFrame()

ngrams = pd.DataFrame({'ngram':['ngram1', 'ngram2', 'ngram3', ..., 'ngram350000']})

ngrams.apply(func)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 stahh