Category "nlp"

Word2Vec + LSTM Good Training and Validation but Poor on Test

currently I'am training my Word2Vec + LSTM for Twitter sentiment analysis. I use the pre-trained GoogleNewsVectorNegative300 word embedding. The reason I used t

Output 2D array to a Matrix as a CSV - Python

I have a 2D array with vectorised rows with each row representing a document in the corpus: array[[ 0.0 0.0 0.4583 0.6584 0.0] ...

SpaCy Matcher - Restricting Potential Matches

Not too sure exactly how to word the problem, so thank you for indulging the title... I'm using SpaCy's Matcher function to parse clauses (adverbial/preposition

Shop name classification

I have a list of merchant name and its corresponding Merchant Category Code (MCC). It seems that about 80 percent of MCCs are true. Total number of MCCs are abo

Vectoring text data of dictionaries' values from pickle file

I'm new to NLP and trying to learn it by myself and I am doing classification. I have a pickle file with some data like this, {'food' : {'f1.txt', 'f2.txt', 'f

R: How can I add titles based on grouping variable in word_associate?

I am using the word_associate package in R Markdown to create word clouds across a grouping variable with multiple categories. I would like the titles of each w

How to generate a sentence around words in Keras?

I know that how to generate next word in keras with lstm but how to predict previous word for example If you have two words like "car" and "running" then It sho

I created a TF-IDF code to analyze an annual report, I want to know the importance of specific keywords

import pandas as pd from sklearn.feature_extraction.text import TfidfTransformer from sklearn.feature_extraction.text import TfidfVectorizer import path import

Will NER improve Text Categorization?

I was wondering - if I'm doing text categorization (with SpaCy, using their textcat-multi component for example), will those results improve if an NER component

Text Classification on a custom dataset with spacy v3

I am really struggling to make things work with the new spacy v3 version. The documentation is full. However, I am trying to run a training loop in a script. (I

Add Noise to Background for Voice Separation

I want to implement a voice separation project. Now, I got a voice dataset with no background noise and a dataset about noise, such as engine sound , horn sound

How to get TF-IDF value of a word from all set of documents?

I need a TF-IDF value for a word that is found in number of documents and not only a single document or a specific document. For example, Consider this corpus c

Removing Non-English Words from CSV - NLTK

I am relatively new to Python and NLTK and have a hold of Flickr data stored in CSV and want to remove non-english words from the tags column. I keep getting er

kwic() function returns less rows than it should

I'm currently trying to perform a sentiment analysis on a kwic object, but I'm afraid that the kwic() function does not return all rows it should return. I'm no

I want to ask you about the structure of "query, key, value" of "transformer"

I'm a beginner at NLP. So I'm trying to reproduce the most basic transformer all you need code. But I got a question while doing it. In the MultiHeadAttention l

Tell `kwic()` to ignore stopwords when situating keywords in context?

I once again have a question about the kwic() function from the quanteda package. I want to extract the five words around a specific keyword (in the example bel

Using a target size (torch.Size([2])) that is different to the input size (torch.Size([2, 5])) is deprecated. Please ensure they have the same size

When I am using criterion = nn.BCELoss() for my binary classification task it creates problem and print "Using a target size (torch.Size([2])) that is different

Error while creating a model for binary classification for text classification

code: model = create_model() model.compile(optimize=tf.keras.optimizers.Adam(learning_rate=2e-5), loss=tf.keras.losses.BinaryCrossentropy(),

Continous Bag of Words

I have a question related to the continous Bag of Words model. If I have a vocabulary size of 1000, a window size of 2, and the number of nodes in the hidden la

I want to add numeric columns to my tfidf sparse matrix

[here] I tried to do it with sp.hstack() and with