'Apply Lambda to Function Not Working? - Python
I have a df called central_edi_grouped_posts like:
df = pd.DataFrame({'PageNumber': [175, 162, 576], 'new_tags': [['flower architecture people'], ['hair red bobbles'], ['sweets chocolate shop']})
<OUT>
PageNumber new_tags
175 flower architecture people...
162 hair red bobbles...
576 sweets chocolate shop...
I have a function defined to tokenise, apply a topic model and write the top 20 words to a csv:
def topic_model_new(grid_document):
''' this function is used to conduct topic modelling for each document '''
text_list=grid_document.tolist()
tokens = grid_document.astype(str).apply(nltk.word_tokenize)
#convert tokenized lists into dictionary
dictionary = corpora.Dictionary(tokens)
#create document term matrix
doc_term_matrix = [dictionary.doc2bow(tag) for tag in tokens]
#initialise topic model from gensim
LDA = gensim.models.ldamodel.LdaModel
#build and train topic model
lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=8, random_state=100,
chunksize=400, passes=50,iterations=100)
#let's check out the coheence number
#from gensim.models.coherencemodel import CoherenceModel
#coherence_model_lda = CoherenceModel(model=lda_model, texts=tokens, dictionary=dictionary , coherence='c_v')
#coherence_lda = coherence_model_lda.get_coherence()
#write top 20 words from each document as csv
top_words_per_topic = []
for t in range(lda_model.num_topics):
top_words_per_topic.extend([(t, ) + x for x in lda_model.show_topic(t, topn = 20)])
#return csv - write first row then append subsequent rows
return pd.DataFrame(top_words_per_topic, columns=['Topic', 'Word', 'P']).to_csv("top_words_loop_test.csv", mode='a', index = False, header=True)
I want the function to occur for each row in the dataframe. I have tried to use apply() to do so, however, the code I've written appears to not recognise the new_tags column.
The code I am using is:
central_edi_posts_grouped['new_tags'].apply(lambda x: topic_model_new(x))
Anyone have any ideas how to solve this?? Thanks!
Solution 1:[1]
apply() would be executed for each row but your method seems to expect a whole column.
So you could either
- Change your method to only process single rows (so to a method with a single string input), in this case by the way
central_edi_posts_grouped['new_tags'].apply(lambda x: topic_model_new(x))is equivalent tocentral_edi_posts_grouped['new_tags'].apply(topic_model_new), you don't need the lambda there. After modification that would create a topic model for each row.
Or
- Use your existing method but call it on the whole column and not each row:
topic_model_new(central_edi_posts_grouped['new_tags']). That would create a topic model for the entire DataFrame.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ewz93 |
