Category "scikit-learn"

Classifier trained with different number of folds in GridSearchCV gives the same decision_fuction?

As stated in the title, I’m confused by the k-folding approach in GridSearchCV which allows you to specify its cv attribute as the number of folds. Howeve

How to create synthetic data based on dataset with mixed data types for classification problem?

I am trying to build a classification model, but I don't have enough data. What would be the most appropriate way to create synthetic data based on my existing

LabelEncoding a permutation of combination of columns

I'd like to create class labels for a permutation of two columns using sklearn's LabelEncoder(). How do I achieve the following behavior? import pandas as pd im

Error: "ValueError: could not convert string to float: 'Private Sector/Self Employed' "

Output- "ValueError: could not convert string to float: 'Private Sector/Self Employed' ". I need help with this error as I get this error consistently import nu

RandomForestClassifer with large feature datatypes

Is it possible to mix small datatypes (such as bits) and long datatypes (such as 256-bit hashes) when using a machine learning model in scikit-learn such as the

Sklearn error: None of [Int64Index([2, 3], dtype='int64')] are in the [columns]

Could someone explain why this code: from sklearn.model_selection import train_test_split import pandas as pd from sklearn.model_selection import StratifiedKFol

how do i port my machine learning model from python to java web app?

so I've been developing some machine learning models using sklearn and tensorflow in python . and I want to integrate it into a java web app. so far I've been s

Conversion between binary vector and 128 bit number

Is there a way to convert back and forth between a binary vector and a 128-bit number? I have the following binary vector: import numpy as np bits = np.array([

Generate binary outcome dummy data based on probability of items and its feature

I want to generate a synthetic data from scratch which is a binary outcome sequence data (0/1). My data has following property- For the sake of an example, lets

Yellowbrick: PredictionError dimensionality issue

I'm trying to use the yellowbrick PredictionError and am running into strange dimensionality issues. I am using yellowbrick version 1.4. Suppose we had this ver

How to interpret MSE in Keras Regressor

I am trying to build a model to predict house prices. I have some features X (no. of bathrooms , etc.) and target Y (ranging around $300,000 to $800,000) I have

Calculate cosine similarity and output without duplicates?

I have the following vectors in my toy example: data = pd.DataFrame({ 'id': [1, 2, 3, 4, 5], 'a': [55, 2123, -19.3, 9, -8],

Mfcc classification: in sklearn how could I solve the dimension error of y

prediction_class = labelencoder.inverse_transform(predicted_label) prediction_class ValueError: y should be a 1d array, got an array of shape (1, 10) instead. p

In Anaconda couldn't download older version of scikit learn package using "pip install scikit-learn==0.21.3" Collecting scikit-learn==0.21.3 Using c

I'm getting this error:Collecting scikit-learn==0.21.3 Using cached scikit-learn-0.21.3.tar.gz (12.2 MB) Requirement already satisfied: numpy>=1.11.0 in c:\u

How sklearn.metrics.r2_score works

I tried to implement formula from Wikipedia but results are different. Why is it so? y_true = np.array([1, 1, 0]) y_pred = np.array([1, 0, 1]) r2 = r2_score(y_

How to get TF-IDF value of a word from all set of documents?

I need a TF-IDF value for a word that is found in number of documents and not only a single document or a specific document. For example, Consider this corpus c

how do i convert a .csv file to a .data file?

Does anyone know how to convert a .csv file to a .data or know how to use only half of a .data file like a csv file? I'm trying to achieve a mean Average Positi

How can I plot a model, which is trained with a scaled dataset?

I have a major problem with XAI, Shap, Lime you name it in general. Here is a basic example for shap. My problem is that when I use a real tuned model, which is

How can I plot a model, which is trained with a scaled dataset?

I have a major problem with XAI, Shap, Lime you name it in general. Here is a basic example for shap. My problem is that when I use a real tuned model, which is

Find the best feature value to reach the largest predicted value without iterating with sklearn estimator

I have a complex system with a lot of parameters, each parameter interact with others. I could have some parameters values of this system at one time ("a", "b",