Category "pandas"

How to use pandas and numpy to compare two excel workbooks with multiple tabs?

I have two xlsx files that have multiple tabs. I need to compare values in each tab based on the tab name. (e.g. sheet1 in file1 needs to be compared with sheet

Reindex Pandas Series case insensitive (Combining matches)

I have a pandas series with string indices and integer values: (My actual series has >1000 entries) Count apple 1 bear 2 cat 3 Apple 10 pig 20 Cat 30 ApPl

How to get the week number starting from monday of given month in python?

I was trying to calculate the week number starting from first Monday of October. Is there any functions in pandas or datetime to do the calculation efficiently?

Iterating over a dataframe twice: which is the ideal way?

I am trying to create a dataframe for Sankey chart in Power BI which needs source and destination like this. id Source Destination 1 Starting a next point b 1

Read many parquet files from S3 to pandas dataframe

I've been researching this topic for a few days now and have yet to come up with a working solution. Apologies if this question is repetitive (although I have c

Error in creating dynamic columns from existing column having nested list of lists

I want to create two column from an existing column which contains nested list of list as values. Rows of record consisting of 3 companies participant and their

Apply function on pandas.DataFrame by group of values in a columns

I have a data frame object in pandas with columns (let's say) "group". There are 20 groups. I want to apply a function (sum) to multiple rows of the same groups

How to group by values in a column and find time difference using python?

I have a dataframe as shown below: Col A Time Col B Col C 123 2018-01-06 03:45:23 B 1 141 2018-01-08 12:45:55 C 0 123 2018-01-08 11:45:29 A 0 123 2018-01-08 01

Pandas column-wise rolling works with np.float64 but returns empty array with np.float32 and np.float16

I ran into a strange observation where the same code works with np.float64 but not with np.float32 or np.float16. Here's code to reproduce the results: >>

Pandas create rows based on interval between to dates

I am trying to expand a dataframe containing a number of columns by creating rows based on the interval between two date columns. For this I am currently using

Fastest way to fill multiple columns by a given condition on other columns pandas

I'm working with a very long dataframe, so I'm looking for the fastest way to fill several columns at once given certain conditions. So let's say you have this

Adding Pandas column in custom function not working when using numpy

I have the following function: def create_col4(df): df['col4'] = df['col1'] + df['col2'] If I apply this function within my jupyter notebook as in create_c

Fastest way to count event occurences in a Pandas dataframe?

I have a Pandas dataframe with ~100,000,000 rows and 3 columns (Names str, Time int, and Values float), which I compiled from ~500 CSV files using glob.glob(pat

Getting the average value of each hour for specific columns in data frame

I have a data frame with the date/time passed as "parse_dates" and then set as the index column for the data frame. Flow Enter Leave

Pandas MultiIndex DataFrame

I have an array: w = np.array([1, 2, 3]) and I need to create a Dataframe with a MultiIndex looking like this: df= 0 1 2 0 0 1 1 1 1 1 1 1 2 1

Convert pandas dataframe hourly values in column names (H1, H2,... ) to a series in a separate column

I am trying to convert a dataframe in which hourly data appears in distinct columns, like here: ... to a dataframe that only contains two columns ['datetime',

mode values under a categorical column in python appear in form of a list

I am using this code to get the mode of a categorical column: df.groupby('user_id')['product'].agg(pd.Series.mode).reset_index().rename(columns = {'product': 'm

Drop Non English Rows Pandas [duplicate]

df.review: de la nada mi ya no se escucha I tried to set it up It is a good product The aim is to remove non-English rows. I tried this and

How to plot two plotly figures with common animation_frame

I am trying to plot both a scatterplot and a line plot, in the same figure. One is for objects and the other for lane markers. The outcome should be one figure

Pandas Method chaining: reassigning a column using df.assign()

I have a dataframe with stock returns in one column, strategy values in another & and another column called trades with boolean values (True, False). My de