Category

How can i fill in missing csv file value base on reference csv file

I have a reference file like this Id, Value1, Value2 a, a1, a2 b, b1, b2 c, c1, c2 d, d1, d2 ... n, n1, n2 and the missing file Id, Value1, Value2 d, ,

Spark pivot groupby performance very slow

I am trying to pivot the dataframe of raw data size 6 GB and it used to take 30 minutes time (aggregation function sum): x_pivot = raw_df.groupBy("a", "b", "c"

How to replace the missing values with average of ffill() and bfill() in pandas?

This is a sample dataframe and it containsNA: x y z datetime 0 2 3 4 02-02-2019 1 NA NA NA 03-02-2019 2 3 5 7 04-0

How can I get branch of a networkx graph as a list from pandas dataframe in Python?

I have a pandas dataframe df which looks as follows: From To 0 Node1 Node2 1 Node1 Node3 2 Node2 Node4 3 Node2 Node5 4 Node3 Node6 5 No

Instancing objects with loop and get one dataframe from it

I have defined a class "Scraper" and the method "scraping" contained in it outputs a list with price information ("results"). My objects are several online shop

Pandas Rolling window to calculate sum of the same items of the last n days

Following up with this question, now I would like to calculate the sum/mean of a different column given the same grouping on a rolling window. Here is the code

how to read data from multiple folder from adls to databricks dataframe

file path format is data/year/weeknumber/no of day/data_hour.parquet data/2022/05/01/00/data_00.parquet data/2022/05/01/01/data_01.parquet data/2022/05/01/02/da

Select two sets of columns by column names in Pandas

Take the DataFrame in the answer of Loc vs. iloc vs. ix vs. at vs. iat? for example. df = pd.DataFrame( {'age':[30, 2, 12, 4, 32, 33, 69], 'color':['blue', 'g

Combination of all pairs of rows using R

Here is my dataset: data <- read.table(header = TRUE, text = " group index group_index x y z a 1 a1 12 13 14 a 2 a2

How to connect across multiple consecutive missing data values using geom_line?

I have a similar problem to Q: Connecting across missing values with geom_line, but found the answers provided only connect the lines when there is one missing

Get for each row the last column name with a certain value

I have this kind of dataframe, and I'm looking to get for each row the last column name equals to 1 Here is an example of my dataframe col1 col2

How to select several rows when reading a csv file using pandas?

I have a very large csv file with millions of rows and a list of the row numbers that I need.like rownumberList = [1,2,5,6,8,9,20,22] I know there is somethi

check if timestamp column is in date range from another dataframe

I have a dataframe, df_A with two columns 'amin' and 'amax', which is a set of time range. My objective is to find whether a column in df_B lies between any o

Spark dataframe transform multiple rows to column

I am a novice to spark, and I want to transform below source dataframe (load from JSON file): +--+-----+-----+ |A |count|major| +--+-----+-----+ | a| 1| m

Pandas Dataframe: Replacing NaN with row average

I am trying to learn pandas but I have been puzzled with the following. I want to replace NaNs in a DataFrame with the row average. Hence something like df.fil

DATAFRAME TO BIGQUERY - Error: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp1yeitxcu_job_4b7daa39.parquet'

I am uploading a dataframe to a bigquery table. df.to_gbq('Deduplic.DailyReport', project_id=BQ_PROJECT_ID, credentials=credentials, if_exists='append') And I

Category "dataframe"