Category "pandas"

What does "100 *" mean in "100 * df. isna().mean()"?

Can anyone explain what is the use of 100 * in the following line of code: 100 * df.isna().mean() Is it intended to get the percentage of the average value?

Is it more profitable to read files too large line by line or read all files in one step with pandas Dataframe, maybe?

I have run my script in an instance of 18Gb of ram, 4 CPU, and 20 Gb of a disk in both use cases My use case is (read line by line): Read line by line and proce

remove outliers from df based on one column

My df has a price column that looks like 0 2125.000000 1 14469.483703 2 14101.832820 3 20287.619019 4 14469.483703

How to solve the problem with installing google colab?

Tried to solve a simple problem from google.colab import files import numpy as np file = files.upload() !ls my_array = np.loadtxt('train_vector.csv', delimi

Calculating a difference for groups within dataframe

I have a dataframe structured like the example, df, below. This contains 2 variables, time and state. Since these are repeated observations for identity, I want

Add a column based on a condition that iterates over a list

So I have the following dataframe: Person_x Person_y Apple_x Banana_x Orange_x Apple_y Banana_y Orange_y Tomas Sidd

How to exclude future dates from excel data file using pandas?

I'm trying to limit my dataset to dates before today. Below creates a graph but the mask doesn't have any impact. Any help appreciated. df = pd.read_excel("./da

Why memory usage increases when reopening a Parquet file with pandas?

I generated a Pandas dataframe of 8.481.288 rows and 451 columns, where most of the columns have integer values. When I generate this dataframe, the total memor

Pandas pick the higher value for each unique id

I have a df of customers CUST_ID | SEGMENT | AREA 1 | B | CAD 1 | A | RAM 2 | B | CAD 2 | C | RAM 3 | B

Calculate value based on previous value and multiplication

I am trying to do something which is very simple in excel, but I cant seem to find the way the way to do it in python. I want to calculate the next value in a d

How to get pandas to return the row index on which a CSV read error occurs

I have a CSV: '1\n2\na'. If I read it with something like pd.read_csv(io.StringIO('1\n2\na'), names=['A'], dtype={'A': 'float'}) specifying that the first colum

Setting Data Frame Column Names with Data Frame includes extra characters: ('ColumnName',)

I've got a python script set to pull data and column names from a Pervasive PSQL database, and it then creates the table and records in MS SQL. I'm creating dat

Python API Call: JSON to Pandas DF

I'm working on pulling data from a public API and converting the response JSON file to a Pandas Dataframe. I've written the code to pull the data and gotten a s

How to transform columns with method chaining?

What's the most fluent (or easy to read) method chaining solution for transforming columns in Pandas? (“method chaining” or “fluent” is

Python/Pandas Add string to rows in a column that contain a character a specific number of times

I have a Pandas DataFrame(data) with a ['Duration'] column as 'object' type that has time durations in format: 'H:%M:%S' such as '1:47:54' with 7 characters, bu

Adding new dataframe colonms using information extracted from the url in the url column, but the url could be missing information

Given: A pandas dataframe that contains a user_url column among other columns. Expectation: New columns added to the original dataframe where the columns are co

How do I remove hours and seconds from my DataFrame column in python? [duplicate]

I have a DataFrame : Age Gender Address Date 15 M 172 ST 2022-02-07 00:00:00 I Want to remove hh:mm:ss I tried: import datetime

Getting `A value is trying to be set on a copy of a slice from a DataFrame.` when setting a column

I know a value should not be set on a view of a pandas dataframe and I'm not doing that but I'm getting this error. I have a function like this: def do_somethin

Pandas groupby feature question for output CSV

I have the following code df.groupby('AccountNumber')[['TotalStake','TotalPayout']].sum() which displays as I would like it to in pandas The issue is when I ou

Alternative way to append a dataframe to itself N times and populate new column

Is there an alternative way to append a dataframe to itself N times where N is based on a list length, and the list contents are added as a new column to the da