Category "nlp"

Analysing words in dataset based on training data

I have a training dataset for eg. Letter Word A Apple B Bat C Cat D Dog E Elephant and I need to check the dataframe

Word2Vec + CNN Overfitting

Currently I'am training my Word2Vec + CNN for Twitter sentiment analysis about COVID-19 vaccine domain. I used the pre-trained GoogleNewsVectorNegative300 word

How to get the dimensions of a word2vec vector?

I have run a word2vec model on my data list_of_sentence: from gensim.models import Word2Vec w2v_model=Word2Vec(list_of_sentence,min_count=5, workers=4) print(

Can I used NLP to recommend something

I have a project description and based on that I want to recommend the best users who can work on this project, I will use the user's CV. Is the NLP good for th

Word2Vec + LSTM Good Training and Validation but Poor on Test

currently I'am training my Word2Vec + LSTM for Twitter sentiment analysis. I use the pre-trained GoogleNewsVectorNegative300 word embedding. The reason I used t

Output 2D array to a Matrix as a CSV - Python

I have a 2D array with vectorised rows with each row representing a document in the corpus: array[[ 0.0 0.0 0.4583 0.6584 0.0] ...

SpaCy Matcher - Restricting Potential Matches

Not too sure exactly how to word the problem, so thank you for indulging the title... I'm using SpaCy's Matcher function to parse clauses (adverbial/preposition

Shop name classification

I have a list of merchant name and its corresponding Merchant Category Code (MCC). It seems that about 80 percent of MCCs are true. Total number of MCCs are abo

Vectoring text data of dictionaries' values from pickle file

I'm new to NLP and trying to learn it by myself and I am doing classification. I have a pickle file with some data like this, {'food' : {'f1.txt', 'f2.txt', 'f

R: How can I add titles based on grouping variable in word_associate?

I am using the word_associate package in R Markdown to create word clouds across a grouping variable with multiple categories. I would like the titles of each w

How to generate a sentence around words in Keras?

I know that how to generate next word in keras with lstm but how to predict previous word for example If you have two words like "car" and "running" then It sho

I created a TF-IDF code to analyze an annual report, I want to know the importance of specific keywords

import pandas as pd from sklearn.feature_extraction.text import TfidfTransformer from sklearn.feature_extraction.text import TfidfVectorizer import path import

Will NER improve Text Categorization?

I was wondering - if I'm doing text categorization (with SpaCy, using their textcat-multi component for example), will those results improve if an NER component

Text Classification on a custom dataset with spacy v3

I am really struggling to make things work with the new spacy v3 version. The documentation is full. However, I am trying to run a training loop in a script. (I

Add Noise to Background for Voice Separation

I want to implement a voice separation project. Now, I got a voice dataset with no background noise and a dataset about noise, such as engine sound , horn sound

How to get TF-IDF value of a word from all set of documents?

I need a TF-IDF value for a word that is found in number of documents and not only a single document or a specific document. For example, Consider this corpus c

Removing Non-English Words from CSV - NLTK

I am relatively new to Python and NLTK and have a hold of Flickr data stored in CSV and want to remove non-english words from the tags column. I keep getting er

kwic() function returns less rows than it should

I'm currently trying to perform a sentiment analysis on a kwic object, but I'm afraid that the kwic() function does not return all rows it should return. I'm no

I want to ask you about the structure of "query, key, value" of "transformer"

I'm a beginner at NLP. So I'm trying to reproduce the most basic transformer all you need code. But I got a question while doing it. In the MultiHeadAttention l

Tell `kwic()` to ignore stopwords when situating keywords in context?

I once again have a question about the kwic() function from the quanteda package. I want to extract the five words around a specific keyword (in the example bel

Category "nlp"

Other Categories