Category "scikit-learn"

Using Scikit's StandardScaler correctly across multiple programs

I am having a question that is very similar to this topic but I want to reuse the StandardScaler instead of LabelEncoder. Here's what I have done: # in one pro

PCA on sklearn - how to interpret pca.components_

I ran PCA on a data frame with 10 features using this simple code: pca = PCA() fit = pca.fit(dfPca) The result of pca.explained_variance_ratio_ shows: array

In keras/ tensorflow, Is there a way to add a preprocessing layer to the output, similar to TargetTransformRegressor in sklearn?

I want to use keras to build a neural network regression model from X_train -> Y_train. In this example, however, I need to perform a preprocessing transform

ImportError: No module named 'sklearn.lda'

When I run classifier.py in the openface demos directory using: classifier.py train ./generated-embeddings/ I get the following error message: --> fro

multilayer_perceptron : ConvergenceWarning: Stochastic Optimizer: Maximum iterations reached and the optimization hasn't converged yet.Warning?

I have written a basic program to understand what's happening in MLP classifier? from sklearn.neural_network import MLPClassifier data: a dataset of body met

PLS-DA Loading Plot in Python

How can I make a Loading plot with Matplotlib of a PLS-DA plot, like the loading plot like that of PCA? This answer explains how it can be done with PCA: Plot

sklearn lda gridsearchcv with pipeline

pipe = Pipeline([('reduce_dim', LinearDiscriminantAnalysis()),('classify', LogisticRegression())]) param_grid = [{'classify__penalty': ['l1', 'l2'],

sklearn RandomForestRegressor discrepancy in the displayed tree values

while using the RandomForestRegressor I noticed something strange. To illustrate the problem, here a small example. I applied the RandomForestRegressor on a tes

How to improve the prediction of missing data using sklearn regression?

I need to predict some missing data. I have a dataset of production values over the last 7 year which are supposedly reported hourly. However many datapoints ar

Stratified Sampling in Pandas

I've looked at the Sklearn stratified sampling docs as well as the pandas docs and also Stratified samples from Pandas and sklearn stratified sampling based on

What is the difference between OneVsRestClassifier and MultiOutputClassifier in scikit learn?

Can someone please explain (with example maybe) what is the difference between OneVsRestClassifier and MultiOutputClassifier in scikit-learn? I've read docume

ImportError: No module named model_selection

I am trying to use train_test_split function and write: from sklearn.model_selection import train_test_split and this causes ImportError: No module named m

python warnings.filterwarnings does not ignore DeprecationWarning from 'import sklearn.ensemble'

I am trying to silence the DeprecationWarning with the following method. import warnings warnings.filterwarnings(action='ignore') from sklearn.ensemble import

Roc_curve over number of nearest-neighbors

I'm struggling to re-implement and catch the results of one of the unsupervised anomaly detections, which are shown below: The credit of picture to this paper

returning cov and std from sklearn gaussian process?

I can return the covariance or the standard deviation from a GP using sklearn, like: y, cov = gp.predict(Xpredict,return_cov=True) y, std = gp.predict(Xpredict,

Plot PCA loadings and loading in biplot in sklearn (like R's autoplot)

I saw this tutorial in R w/ autoplot. They plotted the loadings and loading labels: autoplot(prcomp(df), data = iris, colour = 'Species', loadings =

Return confidence score with custom model for Vertex AI batch predictions

I uploaded a pretrained scikit learn classification model to Vertex AI and ran a batch prediction on 5 samples. It just returned a list of false predictions wit

Suppress scientific notation in sklearn.metrics.plot_confusion_matrix

I was trying to plot a confusion matrix nicely, so I followed scikit-learn's newer version 0.22's in built plot confusion matrix function. However, one value of

sklearn: calculating accuracy score of k-means on the test data set

I am doing k-means clustering on the set of 30 samples with 2 clusters (I already know there are two classes). I divide my data into training and test set and t

Import error _euclidean_distances from sklearn.metrics.pairwise

I am working with Orange 3.30.1 trying to use the Python Script widget to add SMOTE to my data classification problem (the Orange team has refrained from implem