Category "pandas"

Retrieving values based on other values (dataframe) - how to make my code more efficient?

So after much trying I've managed to get something a bit closer to what I intend to do. Scenario is as follows, a dataframe with many columns of which one conta

How can I plot specific Excel data from two columns with conditions?

I have a huge spreadsheet of data that looks something like this: Date IDNumber Item 2021-05-10 1 Apple 2021-05-10 1 Orange 2021-05-10 2 Apple 2021-05-10 2 Gra

Sum of list values in a df, new column, values are objects

I have a df made of values from a dictionary. I can get rid of [], ',' and split it all in different cols (one col per number). But can't make the transfer to f

make a mean of several year dataframes, hour by hour

I have several dataframes of some value taken very hour, on several year, like this : df1 Out[6]: time P G(i) H_sun T2m WS10m Int

How to convert mean value of each column variable and fill this mean value to corresponding variable in dataframe? [duplicate]

I have a mining dataset which has a following features Rock_type, Gold in grams(AU). Rock type has 8 different rock types and Gold (AU) has pr

Iterating through XMLs, making dataframes from nodes and merging them with a master dataframe. How should I optimize this code?

I'm trying to iterate through a lot of xml files that have ~1000 individual nodes that I want to iterate through to extract specific attributes (each node has 1

Split second level multindex column to create three level column in Pandas

Given a multiindex df X E1_ex0 E1_ex2 E2_ex0 E4_ex0 0 3 4 1 1 1 4 3 2 0 I would like to s

Pandas Merging 101

How can I perform a (INNER| (LEFT|RIGHT|FULL) OUTER) JOIN with pandas? How do I add NaNs for missing rows after a merge? How do I get rid of NaNs after merging?

Comparing lists within dataframe column to another list using numpy's where function

This is my first post at Stackoverflow, so thank you for the help. I am trying to replicate a code where I can match a list within a dataframe to another list,

Read Parquet file form S3 in EMR cluster taling a long time

I am trying to read a parquet file (not compressed) into a pandas dataframe on a EMR cluster. I am using EMR 6.4 and parquet version 1.1.5. We are in the proces

How to handle the variable size json file in python to create DataFrame using pandas

I am trying to build a DataFrame using pandas but I am not able to handle the case when I have the variable size of JSON chunks I am getting. eg: 1st chunk: {'a

Pytest logging ignores dependency warnings

I have a simple python script that leads to a pandas SettingsWithCopyWarning: import logging import pandas as pd def method(): logging.info("info") l

Add a comma after two words in pandas

I have the following texts in a df column: La Palma La Palma Nueva La Palma, Nueva Concepcion El Estor El Estor Nuevo Nuevo Leon San Jose La Paz Colombia Mexico

Generate binary outcome dummy data based on probability of items and its feature

I want to generate a synthetic data from scratch which is a binary outcome sequence data (0/1). My data has following property- For the sake of an example, lets

Pandas to read a excel file from s3 and apply some operation and write the file in same location

i am using pandas to read an excel file from s3 and i will be doing some operation in one of the column and write the new version in same location. Basically ne

How do I calculate the percentage (counted non-numerical values) in Pandas?

Basically, I have the columns date and intensity which I have grouped by date this way: intensity = dataframe_scraped.groupby(["date","intensity"]).count()['sen

Yellowbrick: PredictionError dimensionality issue

I'm trying to use the yellowbrick PredictionError and am running into strange dimensionality issues. I am using yellowbrick version 1.4. Suppose we had this ver

Find last available date if date does not exist in other DataFrame

Suppose that you have two data frames which can be created using code below: df1 = pd.DataFrame(data={'start_date': ['2021-07-02', '2021-07-09',

Unable to identify cause of: ValueError: Must have equal len keys and value when setting with an iterable

Background:I have a script that makes a daily API call for financial data, returns the data as a JSON object, saves it into a pandas df before doing some manipu

Python/Pandas Calculate the mean time (hour) of a Datetime column

I have a Pandas DataFrame (data) with a column ['Date'] in DateTime (date and time) which represents the time of arrival. How to calculate the mean of only the