Category "dataframe"

How to make a loop of random column combinations without repeating the combination in pandas dataframe?

I have a pandas dataframe that has 4 columns (A,B,D,E,F,G). I want to randomize each combination into 4 combinations (e.g. ABDE, ADEF, AEFG). And then add the c

How to save the Pandas dataframe from pd.crosstab as a figure (with render_mpl_table)?

I'm trying to save output from crosstab as an image. I found a great solution here How to save the Pandas dataframe/series data as a figure?. However, I am not

Extracting specific number of rows from dataframe

I have a csv file having two columns i.e. imagename and ID. There are multiple image names for same ID as shown in picture. Number of image names against id is

Importing a data frame from CSV file using Pandas with column name having spaces

I am trying to import a data frame from a .csv file which contains Per Capita Income. Moreover, in the above mentioned file the column name is Per Capita Income

How to plot Dataframe for many rows?

I have a dataset where each row plots an ECG, with 50k rows, 181 columns and has 4 classes, represented in the last column (0, 1, 2, 3). So, I need to "convert"

Dataframe new columns to tell if the row contains column's header text

2 columns dataframe as the first screenshot. I want to add new columns (by the contents in the Note column from the original dataframe) to tell if the Note colu

R Dataframe Filter Values

I have a dataframe looks like below: Place Time1 Time2 Time3 Time4 Time5 Time6 Time7 Time8 Time9 ... CA 0.2 0.3 0.1 0.

Code acting differently inside of a function in R

i've got this set of code here in R that seperates a dataframe containing tweets by their day posted. I'm finding a weird interaction where, if i was to run the

Generate multiple new pandas dataframes using lists and for loops

I have the following dataframe: import pandas as pd import numpy as np from numpy import rec, nan df1=pd.DataFrame.from_records(rec.array([(202001L, 2020L, 'app

DataFrame is highly fragmented

I have the following code, but when I run it I receive the error: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling fra

Python Bytes & Lists & Encryption

I'm using Fernet to encrypt my data with this implementation. Let's assume that I have these three data: data = [fernet.encrypt("Hello".encode()), fernet.encryp

Python: How do I save scholarly.search_pubs() result as a dataframe?

I used the following code to find an article using the scholarly.search_pubs() function: search_query = scholarly.search_pubs('A Bayesian Analysis of the Style

Iterating over a dataframe twice: which is the ideal way?

I am trying to create a dataframe for Sankey chart in Power BI which needs source and destination like this. id Source Destination 1 Starting a next point b 1

Read many parquet files from S3 to pandas dataframe

I've been researching this topic for a few days now and have yet to come up with a working solution. Apologies if this question is repetitive (although I have c

Error in creating dynamic columns from existing column having nested list of lists

I want to create two column from an existing column which contains nested list of list as values. Rows of record consisting of 3 companies participant and their

Apply function on pandas.DataFrame by group of values in a columns

I have a data frame object in pandas with columns (let's say) "group". There are 20 groups. I want to apply a function (sum) to multiple rows of the same groups

How to group by values in a column and find time difference using python?

I have a dataframe as shown below: Col A Time Col B Col C 123 2018-01-06 03:45:23 B 1 141 2018-01-08 12:45:55 C 0 123 2018-01-08 11:45:29 A 0 123 2018-01-08 01

Pandas create rows based on interval between to dates

I am trying to expand a dataframe containing a number of columns by creating rows based on the interval between two date columns. For this I am currently using

Fastest way to fill multiple columns by a given condition on other columns pandas

I'm working with a very long dataframe, so I'm looking for the fastest way to fill several columns at once given certain conditions. So let's say you have this

Adding Pandas column in custom function not working when using numpy

I have the following function: def create_col4(df): df['col4'] = df['col1'] + df['col2'] If I apply this function within my jupyter notebook as in create_c