'Apply Lambda to Function Not Working? - Python

I have a df called central_edi_grouped_posts like:

df = pd.DataFrame({'PageNumber': [175, 162, 576], 'new_tags': [['flower architecture people'], ['hair red bobbles'], ['sweets chocolate shop']})

<OUT>
PageNumber   new_tags
   175       flower architecture people...
   162       hair red bobbles...
   576       sweets chocolate shop...

I have a function defined to tokenise, apply a topic model and write the top 20 words to a csv:

def topic_model_new(grid_document):
    ''' this function is used to conduct topic modelling for each document '''
    text_list=grid_document.tolist()
    tokens = grid_document.astype(str).apply(nltk.word_tokenize)
    #convert tokenized lists into dictionary
    dictionary = corpora.Dictionary(tokens)
    #create document term matrix
    doc_term_matrix = [dictionary.doc2bow(tag) for tag in tokens]
    #initialise topic model from gensim
    LDA = gensim.models.ldamodel.LdaModel
    #build and train topic model
    lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=8, random_state=100,
                chunksize=400, passes=50,iterations=100)
    #let's check out the coheence number 
    #from gensim.models.coherencemodel import CoherenceModel
    #coherence_model_lda = CoherenceModel(model=lda_model, texts=tokens, dictionary=dictionary , coherence='c_v')
    #coherence_lda = coherence_model_lda.get_coherence()

    #write top 20 words from each document as csv
    top_words_per_topic = []
    for t in range(lda_model.num_topics):
        top_words_per_topic.extend([(t, ) + x for x in lda_model.show_topic(t, topn = 20)])
    #return csv - write first row then append subsequent rows
    return pd.DataFrame(top_words_per_topic, columns=['Topic', 'Word', 'P']).to_csv("top_words_loop_test.csv", mode='a', index = False, header=True)

I want the function to occur for each row in the dataframe. I have tried to use apply() to do so, however, the code I've written appears to not recognise the new_tags column. The code I am using is:

central_edi_posts_grouped['new_tags'].apply(lambda x: topic_model_new(x))

Anyone have any ideas how to solve this?? Thanks!



Solution 1:[1]

apply() would be executed for each row but your method seems to expect a whole column.

So you could either

  1. Change your method to only process single rows (so to a method with a single string input), in this case by the way central_edi_posts_grouped['new_tags'].apply(lambda x: topic_model_new(x)) is equivalent to central_edi_posts_grouped['new_tags'].apply(topic_model_new), you don't need the lambda there. After modification that would create a topic model for each row.

Or

  1. Use your existing method but call it on the whole column and not each row: topic_model_new(central_edi_posts_grouped['new_tags']). That would create a topic model for the entire DataFrame.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ewz93