I have a 100M line csv file (actually many separate csv files) totaling 84GB. I need to convert it to a HDF5 file with a single float dataset. I used h5py in te
I am trying to convert my output into a pandas data frame and I am struggling. I have this list my_list = [1,2,3,4,5,6,7,8,9] I want to create a pandas data
My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step furthe
I am trying to join two pandas data frames using two columns: new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]') but got
''' import pandas as pd f500 = pd.read_csv('f500.csv',index_col=0) f500.index.name = None ''' I don't know what this mean. What's roll of 'index_c
I recently asked this question, about converting n 2-dimensional arrays to a dataframe with 2+n columns. The solution I got works perfectly well, but can not ea
I'm using sci-kit learn linear regression algorithm. While scaling Y target feature with: Ys = scaler.fit_transform(Y) I got ValueError: Expected 2D arr
I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error: MemoryError Traceback (most recent
I have a dataframe and I want to search all columns for values that is text 'Apple'. I know how to do it with one column, but how can I apply this to ALL column
I am trying to calculate the mean and confidence interval(95%) of a column "Force" in a large dataset. I need the result by using the groupby function by groupi
I have a weird issue that the result doesn't change for each iteration. The code is the following: import pandas as pd import numpy as np X = np.arange(10,100)
I have a pivot table result as below : len MERCHANT_NAME MCC_CODE 0.0 58635982 742.0 7378 763.0 750 780.0 281 1
I am using Seaborn to plot some data in Pandas. I am making some very large plots (factorplots). To see them, I am using some visualisation facilities at my u
I have pandas.DataFrame that contains some values with scientific notation and I want to change those values to a normal value without the e+... import pandas a
I am using Seaborn to plot some data in Pandas. I am making some very large plots (factorplots). To see them, I am using some visualisation facilities at my u
Background:I have a large 40MB XLSX file that contains client data which is Grouped over multiple levels, like so: Expanded - Not Expanded (sorry about the ter
I have a data frame with a date time index where index values are non unique (see last two index values). I would like to get the next valid index value given a
From Pandas data frame, how to get index of non "NaN" values? My data frame is A b c 0 1 q1 1 1 2 NaN 3 2 3 q2 3 3 4 q1
I have a dataframe: s1 = pd.Series([5, 6, 7]) s2 = pd.Series([7, 8, 9]) df = pd.DataFrame([list(s1), list(s2)], columns = ["A", "B", "C"]) A B C 0 5
I have a dataframe df Cat B_1 A_2 C_3 A 1 2 3 B 4 5 6 C 7 8 9 which I want to convert into a dataframe so that the rows in column