Category "scikit-learn"

Sklearn decision tree plot does not appear

I am trying to follow scikit learn example on decision trees: from sklearn.datasets import load_iris from sklearn import tree X, y = load_iris(return_X_y=True)

How to extract coefficients from fitted pipeline for penalized logistic regression?

I have a set of training data that consists of X, which is a set of n columns of data (features), and Y, which is one column of target variable. I am trying to

How tf-idf model handles unseen words during test-data?

I have read many blogs but was not satisfied with the answers, Suppose I train tf-idf model on few documents example: " John like horror movie." " Ryan w

How to extract only English words from a from big text corpus using nltk?

I am want remove all non dictionary english words from text corpus. I have removed stopwords, tokenized and countvectorized the data. I need extract only the E

How to properly pickle sklearn pipeline when using custom transformer

I am trying to pickle a sklearn machine-learning model, and load it in another project. The model is wrapped in pipeline that does feature encoding, scaling etc

SKLearn: Getting distance of each point from decision boundary?

I am using SKLearn to run SVC on my data. from sklearn import svm svc = svm.SVC(kernel='linear', C=C).fit(X, y) I want to know how I can get the distance of

Why does this decision tree's values at each step not sum to the number of samples?

I'm reading about decision trees and bagging classifiers, and I'm trying to show the first decision tree that is used in the bagging classifier. I'm confused a

How to use manhattan distance for SpectralCluster in sklearn

I am trying to use manhattan distance for SpectralClustering() in Sklearn. I am trying to set the affinity parameter to be manhattan, but getting the following

Reading ARFF from ZIP with zipfile and scipy.io.arff

I want to process quite big ARFF files in scikit-learn. The files are in a zip archive and I do not want to unpack the archive to a folder before processing. He

Create Bayesian Network and learn parameters with Python3.x [closed]

I'm searching for the most appropriate tool for python3.x on Windows to create a Bayesian Network, learn its parameters from data and perform

UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples

I'm getting this weird error: classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.

Classification metrics can't handle a mix of binary and continuous targets [duplicate]

I try to train and test several scikit-learn models and attempt to print off the accuracy. Only some of these models work, others fail with th

CatBoost precision imbalanced classes

I use a CatBoostClassifier and my classes are highly imbalanced. I applied a scale_pos_weight parameter to account for that. While training with an evaluation d

Create ngrams only for words on the same line (disregarding line breaks) with Scikit-learn CountVectorizer

When using the scikit-learn library in Python, I can use the CountVectorizer to create ngrams of a desired length (e.g. 2 words) like so: from sklearn.metrics.

Get the coefficients of my sklearn polynomial regression model in Python

I want to get the coefficients of my sklearn polynomial regression model in Python so I can write the equation elsewhere.. i.e. ax1^2 + ax + bx2^2 + bx2 + c I'

How can i impelement SMOTE inside a columnTransformer?

I'm trying to implement SMOTENC inside a column transformer. However I'm getting error. The code and the error is provided below. #Create a mask for categorical

Kmean clustering labels in Python

I have a dataset with 7 labels in the target variable. X = data.drop('target', axis=1) Y = data['target'] Y.unique() array(['Normal_Weight', 'Overweight_Level_

Proper inputs for Scikit Learn roc_auc_score and ROC Plot

I am trying to determine roc_auc_score for a fit model on a validation set. I am seeing some conflicting information on function inputs. Documentation says: "y_

How to apply cross_val_score to cross valid our own model

Usually, we apply cross_val_score to the Sklearn models by doing the following way. scores = cross_val_score(clf, X, y, cv=5, scoring='f1_macro') Now I have my

Get U, Sigma, V* matrix from Truncated SVD in scikit-learn

I am using truncated SVD from scikit-learn package. In the definition of SVD, an original matrix A is approxmated as a product A ≈ UΣV* where