Category "data-science"

Seasonality is always 7 when running seasonal_decompose(). Why is that?

I have been running seasonal_decompose() from the statsmodels on about 20 totally different datasets. Is it standard that the seasonality is 7 when looking at a

JSON link from google developer tools not working in Python (or in browser)

I am trying to extract the data in the table at https://www.ecoregistry.io/emit-certifications/ra/10 Using the google developer tools>network tab, I am able

Writing charcters in the csv files instead of writing the sentence

I want to save my data in the CSV format, I have some sentences and I want to save every sentence in a different row, but the output is like this: This is my c

Efficiently and constantly reorganize searchable data based on access frequency

I want to be able to organize data for efficiency and constantly update the order of that data based on frequency of access, relevancy, and accuracy. For exampl

Add a new column for color code from red to green based on the increasing value of Opportunity in data frame

I have a data frame and I wanted to generate a new column for colour codes which stars from red for the least value of Opportunity and moves toward green for hi

Tasks sequence in Prefect Flow Python

I'm currently working with the Python framework - Prefect (prefect.io) I wrote the code below from prefect import Flow, task @task def say_hello(): print('H

VIF Vs Mutual Info

I was searching for the best ways for feature selection in a regression problem & came across a post suggesting mutual info for regression, I tried the same

Problems with logistic regression in Titanic dataset

I'm an aspiring data scientist. I stumbled across the titanic dataset. I tried to use logistic regression for the problem. However, I got stuck. Since I have tw

How to convert mean value of each column variable and fill this mean value to corresponding variable in dataframe? [duplicate]

I have a mining dataset which has a following features Rock_type, Gold in grams(AU). Rock type has 8 different rock types and Gold (AU) has pr

Plot Python surface with non-square data

I ran a series of simulations and want to create a response surface of the performance based off my two parameters, tol and eta. The issue I'm having is actuall

RAPIDS/NUMBA: Faster way to parallelize a for-loop on small data?

If I have data that easily fits into memory, but I need to iterate over it hundreds or thousands of times, is there a faster way? For instance, if I have 400k d

"Classification metrics can't handle a mix of continuous and binary targets" when trying to set a custom eval_metric using LGBMClassifier

My y going in and both y_train and y_eval are binary int, what am I doing wrong? I noticed the predictions going out are like this [0.,1.,0. ...] which is proba

Cannot import functions in from one directory's subfolder to other pyhton files in same parent directories sub folders. How do I use init.py files

parent_folder / subfolder1 / subsubfolder1/ a.py b.py subsubfolder2/ c.py d.py e.py subfolder2 / subsubfolder2/ f.py g.py subfolder3 / h.py i.py g.py I want to

Model works perfectly but GridSearch causes error

While working on a project I have come across a weird error, where fitting my model works perfectly but when I apply gridsearch it gives me an error. The code p

How to merge two dfs in pandas (based on datetime period), and add rows if duplicates

I have the following 2 dfs: diag id encounter_key start_of_period end_of_period 1 AAA 2020-06-12 2021-07-07 1 BBB 2021-12-31 2022-01-04 drug id start_datetime

How to estimate similarity between sensor data based on the number of occurrence?

Following is my sample data: data = {850.0: 6, -852.0: 5, 992.0: 29, -993.0: 25, 990.0: 27, -992.0: 28, 965.0: 127, 988.0: 37, -994.0: 24, 996.0: 14, -996.0: 1

I want to add numeric columns to my tfidf sparse matrix

[here] I tried to do it with sp.hstack() and with

Keras Dense Model ValueError: logits and labels must have the same shape ((None, 200, 1) vs (None, 1, 1))

I'm new in machine learning and I'm trying to train a model. I'm using this Keras oficial example as a guide to set my dataset and feed it into the model: https

How to calculate values in Pandas Dataframe itself?

You can see my dataframe below, x values are different value, but other values are same with left values, for example, column 15 and column 16 are same value. I

Python cant find database if script is run from other file

I have this struggle with a dataheavy project. I can run a file that uses a query file -- Al the query's and converters are in here -- without problems, but whe