Category "data-preprocessing"

Generating multiple labels for documents

Currently, i am working on a task where we are scraping pages from web and trying to generate labels for each webpage. For that, we have extracted the text data

Preprocess the text contained in the cells of an excel column using Orange

I would like to preprocess (lower case, remove stopwords, lemmatization, remove punctuation ecc...) the text contained in the cells of a column of an excel file

Join large set of CSV files where the header is the timestamp for the file

I have a large set of CSV files. Approx. 15 000 files. And would like to figure out how to join them together as one file for data processing. Each file is in a

PCA for Recurrent Neural Networks (LSTM) - Shall I use PCA for target variables too?

I have a seasonal timeseries dataset containing 3 target variables and n feature variables. I am trying to apply a PCA algorithm before feeding the data to a si

Encoding each value in a pandas cell

I have a dataset Inp1 Inp2 Inp3 Output A,B,C AI,UI,JI Apple,Bat,Dog Animals L,M,N LI,DO,LI Lawn, Moon, Noon Noun X,Y

When predicting, shall we scale unseen inputs, and un-scale outputs of a model?

I am new to Machine Learning, and I followed this tutorial to implement LSTM model in Keras/Tensorflow: https://www.tensorflow.org/tutorials/structured_data/tim