I have some dataframes (df) with categorical data starting with: a, b, c and a category for "remaining categories". I would like to sort the month column in t
I am pulling data from an online database using SQL/postgresql queries and converting it into a Python dataframe using Pandas. I want to be able to change the d
I'm trying to join two pandas.DataFrames on a datetime64[ns, UTC] field and it's failing with a ValueError (described below) that is not intuitive to me. Consid
First of all, my apologies if the title was too ambiguous. I have a pd.DataFrame with datetime64 as a dtype of index. These indices, however, are not equally
I have a 100M line csv file (actually many separate csv files) totaling 84GB. I need to convert it to a HDF5 file with a single float dataset. I used h5py in te
I am trying to convert my output into a pandas data frame and I am struggling. I have this list my_list = [1,2,3,4,5,6,7,8,9] I want to create a pandas data
My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step furthe
I am trying to join two pandas data frames using two columns: new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]') but got
''' import pandas as pd f500 = pd.read_csv('f500.csv',index_col=0) f500.index.name = None ''' I don't know what this mean. What's roll of 'index_c
I recently asked this question, about converting n 2-dimensional arrays to a dataframe with 2+n columns. The solution I got works perfectly well, but can not ea
I'm using sci-kit learn linear regression algorithm. While scaling Y target feature with: Ys = scaler.fit_transform(Y) I got ValueError: Expected 2D arr
I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error: MemoryError Traceback (most recent
I have a dataframe and I want to search all columns for values that is text 'Apple'. I know how to do it with one column, but how can I apply this to ALL column
I am trying to calculate the mean and confidence interval(95%) of a column "Force" in a large dataset. I need the result by using the groupby function by groupi
I have a weird issue that the result doesn't change for each iteration. The code is the following: import pandas as pd import numpy as np X = np.arange(10,100)
I have a pivot table result as below : len MERCHANT_NAME MCC_CODE 0.0 58635982 742.0 7378 763.0 750 780.0 281 1
I am using Seaborn to plot some data in Pandas. I am making some very large plots (factorplots). To see them, I am using some visualisation facilities at my u
I have pandas.DataFrame that contains some values with scientific notation and I want to change those values to a normal value without the e+... import pandas a
I am using Seaborn to plot some data in Pandas. I am making some very large plots (factorplots). To see them, I am using some visualisation facilities at my u
Background:I have a large 40MB XLSX file that contains client data which is Grouped over multiple levels, like so: Expanded - Not Expanded (sorry about the ter