Category "nlp"

How to Vectorize python function

I have made a resume parser but to parse my resumes, I am using a for loop to run my parse function over each resume. Is there a way to vectorize this approach?

How to store Bag of Words or Embeddings in a Database

I would like to store vector features, like Bag-of-Words or Word-Embedding vectors of a large number of texts, in a dataset, stored in a SQL Database. What're t

R: Correct Way to Calculate Cosine Similarity?

I am working with the R programming language. I have the following data: text = structure(list(id = 1:8, reviews = c("I guess the employee decided to buy their

Error 'power iteration failed to converge within 100 iterations') when I tried to summarize a text document using python networkx

I got an PowerIterationFailedConvergence:(PowerIterationFailedConvergence(...), 'power iteration failed to converge within 100 iterations') when I tried to summ

Continual pre-training vs. Fine-tuning a language model with MLM

I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far: Starting with a pre-trained BER

How to get up and running with spaCy for Vietnamese?

I success with English python -m spacy download en_core_web_lg python -m spacy download en_core_web_sm python -m spacy download en I read https://spacy.io/mod

Definition of downstream tasks in NLP

What does downstream tasks terminology mean in NLP? I saw this terminology used in several articles but I can't understand the idea behind it.

How to fix LDA model coherence score runtime Error?

text='Alice is a student.She likes studying.Teachers are giving a lot of homewok.' I am trying to get topics from a simple text(like above) with coherance scor

Follow-up question regarding a Keras model issue

So about a week ago I posted this question: Issues running a Keras model with custom layers. The suggestion there was to try to make this question smaller and t

Extracting names from a text file using Spacy

I have a text file which contains lines as shown below: Electronically signed : Wes Scott, M.D.; Jun 26 2010 11:10AM CST The patient was referred by Dr. J

How do I remove nonsensical or incomplete words from a corpus?

I am using some text for some NLP analyses. I have cleaned the text taking steps to remove non-alphanumeric characters, blanks, duplicate words and stopwords, a

Spacy train ner using multiprocessing

I am trying to train a custom ner model using spacy. Currently, I have more than 2k records for training and each text consists of more than 100 words, at least

Tokenizing an HTML document

I have an HTML document and I'd like to tokenize it using spaCy while keeping HTML tags as a single token. Here's my code: import spacy from spacy.symbols impo

Embedding 3D data in Pytorch

I want to implement character-level embedding. This is usual word embedding. Word Embedding Input: [ [‘who’, ‘is’, ‘this&rsquo

Tensorflow-addons seq2seq - start and end tokens in BaseDecoder or BasicDecoder

I am writing code inspired from https://www.tensorflow.org/addons/api_docs/python/tfa/seq2seq/BasicDecoder. In the translation/generation we instantiate a Basic

assertion failed: [Condition x == y did not hold element-wise:]

I have built a BiLSTM model with an attention layer for sentence classification task but I am getting an error that my assertion has failed due to mismatch in n

TFA BeamSearchDecoder Clarification Request

If the question seems to dumb, it is because I am new to TensorFlow. I was implementing a toy endocer-decoder problem using TensorFlow 2’s TFA seq2seq imp

Read GloVe pre-trained embeddings into R, as a matrix

Working in R. I know the pre-trained GloVe embeddings (e.g., "glove.6B.50d.txt") can be found here: https://nlp.stanford.edu/projects/glove/. However, I've had

Huggingface distilbert-base-uncased-finetuned-sst-2-english runs out of ram with only a few kb?

My dataset is only 10 thousand sentences. I run it in batches of 100, and clear the memory on each run. I manually slice the sentences to only 50 characters. Af

ValueError: The first argument to `Layer.call` must always be passed

I was trying to build a model with the Sequential API (it has already worked for me with the Functional API). Here is the model that I'm trying to built in Sequ