Category "scikit-learn"

Model works perfectly but GridSearch causes error

While working on a project I have come across a weird error, where fitting my model works perfectly but when I apply gridsearch it gives me an error. The code p

Is there a way to use mutual information as part of a pipeline in scikit learn?

I'm creating a model with scikit-learn. The pipeline that seems to be working best is: mutual_info_classif with a threshold - i.e. only include fields whose mut

Compute class weight function issue in 'sklearn' library when used in 'Keras' classification (Python 3.8, only in VS code)

The classifier script I wrote is working fine and recently added weight balancing to the fitting. Since I added the weight estimate function using 'sklearn' lib

How to write a custom wrapper for a prediction function in xgboost or other estimators

So I want to manipulate the result of my prediction and I need to do it within the estimator. I tried to write a wrapper like this, but my kernel just dies when

Yellowbrick: is it possible to pass in different pairwise distance metrics for scoring methods

sklearn defines a large number of pairwise distance metrics for something like silhouette score: https://scikit-learn.org/stable/modules/generated/sklearn.metri

scikit-learn neural net beginner - results not what I expect

I have a simple example for which I am attempting to perform a classification using the MLPClassifier. from sklearn.neural_network import MLPClassifier # What

how to use the train_x and train_y from sklearn k-fold split generator

I am using the sklearn k-fold generator to split some data 10 times. When I run the code below I expect train_x,train_y,test_x,test_y to contain all 10 splits h

Elbow Method for K-Means in python

I'm using K-Means algorithm (in sklearn) to cluster 1-D array of values, and I want to decide the optimal number of clusters (K) in my script. I'm familiar with

Looping through each row in array to calculate cosine similarity

I have a subset of a dataframe that looks like: <OUT> PageNumber english_only_tags 175 flower architecture people 162 hair red bobbles

Polynomial Expansion without sklearn

I want to try and recreate this functions from scratch (without using sklearn): # The matrix is M which is 1000x10 matrix. from sklearn.preprocessing import Po

Pass information between pipeline steps in sklearn

I am working on a simple text generation problem with LSTMs. To make the preprocessing more compact and reproducible, I decided to implement everything in sklea

Cosine similarity and SVC using scikit-learn

I am trying to utilize the cosine similarity kernel to text classification with SVM with a raw dataset of 1000 words: # Libraries import numpy as np from sklear

Is this a valid approach to scale your target in machine learning without leaking information? [closed]

Consider a housing price dataset, where the goal is to predict the sale price. I would like to do this by predicting the "Sale price per Squar

TypeError: 'module' object is not iterable in django 4

TypeError: 'module' object is not iterable in django 4 I am getting the above error, it has persisted long enough than at this point I really need help. I am u

XGBoost model quantization - Sklearn model quantization

I am looking for solutions to quantize sklearn models. I am specifically looking for XGBoost models. I did find solutions to quantize pytorch and tensorflow mod

How to slice a XGBClassifier/XGBRegressor model into sub-models?

This document shows that a XGBoost API trained model can be sliced by following code: from sklearn.datasets import make_classification import xgboost as xgb bo

How to slice a XGBClassifier/XGBRegressor model into sub-models?

This document shows that a XGBoost API trained model can be sliced by following code: from sklearn.datasets import make_classification import xgboost as xgb bo

No module name 'sklearn.ensemble.forest'

I am using this code to detect face_spoofing import numpy as np import cv2 import joblib from face_detector import get_face_detector, find_faces def calc_hist(

Replace entire pandas dataframe after scaling without warning

I have tried this according to this awnser x = df[feature_collums] y = df[[label_column]][label_column] from sklearn.preprocessing import MinMaxScaler scaler =

How to set AUC as scoring method while searching for hyperparameters?

I want to perform a random search, in classification problem, where the scoring method will be chosen as AUC instead of accuracy score. Have a look at my code f