Category "pandas"

Uncomfortable output of mode() in pandas Dataframe

I have a dataframe with several columns (the features). >>> print(df) col1 col2 a 1 1 b 2 2 c 3 3 d 3 2 I woul

Saving pandas data frame to .mat file in python3

I have a pandas data frame 'df', it looks like below but original data has many rows. I would like to save this as .mat file with a name 'meta.mat'. I tried;

How to subset Pandas Dataframe using an OR operator whilst avoiding "FutureWarning: elementwise comparison failed;"

I have a Pandas dataframe (tempDF) of 5 columns by N rows. Each element of the dataframe is an object (string in this case). For example, the dataframe looks li

pandas diff() giving 0 value for first difference, I want the actual value instead

I have df: Hour Energy Wh 1 4 2 6 3 9 4 15 I would like to add a column that shows the per hour differenc

sorting rows in a pandas dataframe in a way which is not alphabetical

I have some dataframes (df) with categorical data starting with: a, b, c and a category for "remaining categories". I would like to sort the month column in t

Combining Python variables into SQL queries

I am pulling data from an online database using SQL/postgresql queries and converting it into a Python dataframe using Pandas. I want to be able to change the d

Joining on datetime64[ns, UTC] fails using pandas.join

I'm trying to join two pandas.DataFrames on a datetime64[ns, UTC] field and it's failing with a ValueError (described below) that is not intuitive to me. Consid

Chunking DataFrame by gaps in datetime index

First of all, my apologies if the title was too ambiguous. I have a pd.DataFrame with datetime64 as a dtype of index. These indices, however, are not equally

Convert large csv to hdf5

I have a 100M line csv file (actually many separate csv files) totaling 84GB. I need to convert it to a HDF5 file with a single float dataset. I used h5py in te

Convert list into a pandas data frame

I am trying to convert my output into a pandas data frame and I am struggling. I have this list my_list = [1,2,3,4,5,6,7,8,9] I want to create a pandas data

Python Pandas add Filename Column CSV

My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step furthe

pandas: merge (join) two data frames on multiple columns

I am trying to join two pandas data frames using two columns: new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]') but got

What is index_col=0? and pd.index.name = None

''' import pandas as pd f500 = pd.read_csv('f500.csv',index_col=0) f500.index.name = None ''' I don't know what this mean. What's roll of 'index_c

Convert n same-size, d-dimensional numpy arrays to a dataframe with d+n columns

I recently asked this question, about converting n 2-dimensional arrays to a dataframe with 2+n columns. The solution I got works perfectly well, but can not ea

AttributeError: 'Series' object has no attribute 'reshape'

I'm using sci-kit learn linear regression algorithm. While scaling Y target feature with: Ys = scaler.fit_transform(Y) I got ValueError: Expected 2D arr

How do I read a large csv file with pandas?

I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error: MemoryError Traceback (most recent

Pandas - find specific value in entire dataframe

I have a dataframe and I want to search all columns for values that is text 'Apple'. I know how to do it with one column, but how can I apply this to ALL column

Confidence Interval in Python dataframe

I am trying to calculate the mean and confidence interval(95%) of a column "Force" in a large dataset. I need the result by using the groupby function by groupi

Looping over pandas DataFrame

I have a weird issue that the result doesn't change for each iteration. The code is the following: import pandas as pd import numpy as np X = np.arange(10,100)

Pivot table sorting

I have a pivot table result as below : len MERCHANT_NAME MCC_CODE 0.0 58635982 742.0 7378 763.0 750 780.0 281 1