Category "dataframe"

pandas how to check dtype for all columns in a dataframe?

It seems that dtype only work for pandas.DataFrame.Series, right? Is there a function to display data types of all columns at once?

suppress Name dtype from python pandas describe

Lets say I have r = pd.DataFrame({'A':1 , 'B':pd.Series(1,index=list(range(4)),dtype='float32')}) And r['B'].describe()[['mean','std','min','m

Color pandas DataFrame value if larger than 1.5*median(column)

Let's say I have a DataFrame that looks like this: df= pd.DataFrame({'A': [1,-2,0,-1,17], 'B': [11,-23,1,-3,132], 'C': [121,

How to extract .zst files into a pandas dataframe

I'm a bit of a beginner when it comes to Python, but one of my projects from school needs me to perform classification algorithms on this reddit popularity data

Python Pandas Group by date using datetime data

I have a column Date_Time that I wish to groupby date time without creating a new column. Is this possible the current code I have does not work. df = pd.group

Compare two dataframes Pyspark

I'm trying to compare two data frames with have same number of columns i.e. 4 columns with id as key column in both data frames df1 = spark.read.csv("/path/to/

Remove repeating column values in Python Pandas

I have a data set that has dates and subtotal of other columns. I want to remove the same recurring dates per subtotal

H2O python - How to let h2oframe to dataframe with correctly character and datetime

I have a csv file, and want to use H2O to do DeepLearning. But it has some Chinese and datetime that when I finish my Deeplearning need to save output to csv, i

How to Remove outlier from DataFrame using IQR?

I Have Dataframe with a lot of columns (Around 100 feature), I want to apply the interquartile method and wanted to remove the outlier from the data frame. I a

Find the Max value of an Array column and find associated value in another Array with in the dataframe

I have a csv file with below data. Id Subject Marks 1 M,P,C 10,8,6 2 M,P,C 5,7,9 3 M,P,C 6,7,4 I Need to find out Max value in the Marks column for each Id an

How can i fill in missing csv file value base on reference csv file

I have a reference file like this Id, Value1, Value2 a, a1, a2 b, b1, b2 c, c1, c2 d, d1, d2 ... n, n1, n2 and the missing file Id, Value1, Value2 d, ,

Spark pivot groupby performance very slow

I am trying to pivot the dataframe of raw data size 6 GB and it used to take 30 minutes time (aggregation function sum): x_pivot = raw_df.groupBy("a", "b", "c"

How to replace the missing values with average of ffill() and bfill() in pandas?

This is a sample dataframe and it containsNA: x y z datetime 0 2 3 4 02-02-2019 1 NA NA NA 03-02-2019 2 3 5 7 04-0

How can I get branch of a networkx graph as a list from pandas dataframe in Python?

I have a pandas dataframe df which looks as follows: From To 0 Node1 Node2 1 Node1 Node3 2 Node2 Node4 3 Node2 Node5 4 Node3 Node6 5 No

Instancing objects with loop and get one dataframe from it

I have defined a class "Scraper" and the method "scraping" contained in it outputs a list with price information ("results"). My objects are several online shop

Pandas Rolling window to calculate sum of the same items of the last n days

Following up with this question, now I would like to calculate the sum/mean of a different column given the same grouping on a rolling window. Here is the code

how to read data from multiple folder from adls to databricks dataframe

file path format is data/year/weeknumber/no of day/data_hour.parquet data/2022/05/01/00/data_00.parquet data/2022/05/01/01/data_01.parquet data/2022/05/01/02/da

Select two sets of columns by column names in Pandas

Take the DataFrame in the answer of Loc vs. iloc vs. ix vs. at vs. iat? for example. df = pd.DataFrame( {'age':[30, 2, 12, 4, 32, 33, 69], 'color':['blue', 'g

Combination of all pairs of rows using R

Here is my dataset: data <- read.table(header = TRUE, text = " group index group_index x y z a 1 a1 12 13 14 a 2 a2

How to connect across multiple consecutive missing data values using geom_line?

I have a similar problem to Q: Connecting across missing values with geom_line, but found the answers provided only connect the lines when there is one missing

Category "dataframe"

Other Categories