Category "scikit-learn"

SHAP: XGBoost and LightGBM difference in shap_values calculation

I have this code in visual studio code: import pandas as pd import numpy as np import shap import matplotlib.pyplot as plt import xgboost as xgb from sklearn.m

Difference between Shuffle and Random_State in train test split?

I tried both on a small dataset sample and it returned the same output. So the question is, what is the difference between the "shuffle" and the "random_state"

AttributeError: 'RandomOverSampler' object has no attribute 'fit_sample'

I am trying to use RandomOverSampler from imblearn but I'm getting error. Looking at other posts, there seems to be a problem with older versions, but I checked

ValueError: Unable to coerce to Series, length must be 1: given n

I have been trying to use RF regression from scikit-learn, but I’m getting an error with my standard (from docs and tutorials) model. Here is the code: im

Get prediction confidence through Decision Tree Regression in sklearn

Is there a way I can attach some sort of confidence with my predictions from Decision Tree Regression output in python? from sklearn.tree import DecisionTreeR

Difference between cosine similarity and cosine distance

It looks like scipy.spatial.distance.cdist cosine similariy distance: link to cos distance 1 1 - u*v/(||u||||v||) is different from sklearn.metrics.pairwis

Getting a value Error : how to use string data type in model.fit for jupyter using DecisionTreeClassifier?

this is the code import pandas as pd from sklearn.tree import DecisionTreeClassifier dataset = pd.read_csv("emotion.csv") X = dataset.drop(columns = ["mood"]) y

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Importing from pyxdameraulevenshtein gives the following error, I have pyxdameraulevenshtein==1.5.3, pandas==1.1.4 and scikit-learn==0.20.2. Numpy is 1.16.1.

Sklearn - Permutation Importance leads to non-zero values for zero-coefficients in model

I'm confused by sklearn's permutation_importance function. I have fitted a pipeline with a regularized logistic regression, leading to several feature coefficie

How to plot the pricipal vectors of each variable after performing PCA?

My question mainly comes from this post :https://stats.stackexchange.com/questions/53/pca-on-correlation-or-covariance In the article, the author plotted the v

featureUnion vs columnTransformer?

what is the difference between FeatureUnion() and ColumnTransformer() in sklearn? which should i use if i want to build a supervised model with features cont

How can I use a ML model trained with Google Vertex AI with scikit learn?

I have a problem with Vertex AI. I have trained a model using the API for Vertex AI in Python. After the training, I want to retrieve the model and use it as a

Having issues to import imblearn python package on Jupyter notebook on Anaconda

I wanted to install imbalanced-learn using pip install imbalanced-learn. Then I have tried import from imblearn.ensemble import EasyEnsembleClassifier This imp

How to increase the number of iterations to optimize my cost function at each step using partial_fit at Scikit SGDClassifier?

When using partial_fit at Scikit SGDClassifier the number of iteration for the convergence of the cost functions equals 1, as stated in the description: Perfor

Changing label names of Kmean clusters

I am doing the kmean clustering through sklearn in python. I am wondering how to change the generated label name for kmean clusters. For example: data

Difference between GroupSplitShuffle and GroupKFolds

As the title says, I want to know the difference between sklearn's GroupKFold and GroupShuffleSplit. Both make train-test splits given for data that has a group

Not possible to load skmisc.loess in python

I am using the package plotnine to make ggplot's. In this context I wanted to use "loess". The package gives an error and says: "For loess smoothing, install 's

ImportError: No module named grid_search, learning_curve

Problem with Scikit learn l can't use learning_curve of Sklearn and sklearn.grid_search. When l do import sklearn (it works) from sklearn.cluster import biclus

Installing scipy and scikit-learn on apple m1

The installation on the m1 chip for the following packages: Numpy 1.21.1, pandas 1.3.0, torch 1.9.0 and a few other ones works fine for me. They also seem to wo

How to find cut-off height in agglomerative clustering with a predefined number of clusters in sklearn?

I'm deploying sklearn's hierarchical clustering algorithm with the following code: AgglomerativeClustering(compute_distances = True, n_clusters = 15, linkage =