Category "data-science"

ParseError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file. (read_csv)

I cannot use read_csv method of pandas properly on kaggle. Error that I get is: ParseError: Error tokenizing data. C error: Buffer overflow caught - possible ma

How to add a new row after every unique entries in pandas dataframe

I have to add a new row at the end of each person information. In the new row which we will add all the information will be same as last row like name, last_upd

Extract YouTube Channel Community Feed Data

I am trying to collect the community feed data from a channel for analytics. I couldn't find a way using the YouTube Data API v3. Is there a way to extract such

Creating custom colourmap for geopandas.explore plot

all code: def rgb2hex(r,g,b): return '#{:02x}{:02x}{:02x}'.format(r,g,b) def rg(num): num = int(np.round((num / 100) * 124)) r = (124 - num) g

Plot scikit-learn (sklearn) SVM decision boundary / surface

I am currently performing multi class SVM with linear kernel using python's scikit library. The sample training data and testing data are as given below: Mode

Find closest datapoint to a date in another dataframe

I have two data frames. One data frame is called Measurements and has 500 rows. The columns are PatientID, Value and M_Date. The other data frame is called Pati

Network Flow Dataframe - Merging Memory Error - Unable to allocate array with shape and data type

I have big 3 CSV files and they are all 76 same columns. The number of rows are different 17809 rows - 124262 rows - 108779 rows I am trying to merge these 3 d

how to merge multiple datasets with differences in merge-index strings?

Hello I am struggling to find a solution to probably a very common problem. I want to merge two csv-files with soccer data. They basically store different data

Integer Programming for NNC

I'm trying to implement Integer Programming for Nearest Neighbor Classifier in python using cvxpy. Short intro Given a dataset of n points with a color (red or

Integer Programming for NNC

I'm trying to implement Integer Programming for Nearest Neighbor Classifier in python using cvxpy. Short intro Given a dataset of n points with a color (red or

Upsampling using SMOTE in python

I am trying to use SMOTE in python to handle highly imbalanced data set. After splitting the data set into train and test I generate synthetic samples using SMO

Pandas dataframe divide features to group of high correlation

I have a dataframe with over 280 features. I ran correlation map to detect groups of features that are highly correlated: Now, I want to divide the features to

How to download a file in PyCharm instead of !wget in Colab? [duplicate]

When I try some codes in pandas, the bash code wget is used in colab as the following: import pandas as pd !wget abc.com/sales.csv If I want

How to count the same rows between multiple CSV files in Pandas?

I merged 3 different CSV(D1,D2,D3) Netflow datasets and created one big dataset(df), and applied KMeans clustering to this dataset. To merge them I did not use

How to convert the values of an attribute having categorical values to integer type?

I have a dataset in which one of its columns is Ex-Showroom_Price, and I'm trying to convert its values to integers but I'm getting an error. import pandas as p

how to log hydra's multi-run in mlflow

I am trying to manage the results of machine learning with mlflow and hydra. So I tried to run it using the multi-run feature of hydra. I used the following cod

Is there a way to forecast sales for multiple products across multiple stores?

My Data is in this format(Both Multiple and Multivariate Timeseries) I need to predict number of units sold is gonna be for every product across different st

Keras ModelCheckpoint val_loss decreases but says it doesn't

I use a ModelCheckPoint in Keras to save only the best models. Although, I see the val_loss decreasing the ModelCheckPoint says; No. Any ideas? checkpoint = Mod

pandas copy value from one column to another if condition is met

I have a dataframe: df = col1 col2 col3 1 2 3 1 4 6 3 7 2 I want to edit df, such that when the value of col1 is smaller than

multivariate xgboost time series

I implemented a univariate xgboost time series using the following code, def series_to_supervised(data, n_in=1, n_out=1, dropnan=True): n_vars = 1 if type(d