Category "pandas"

Convert large csv to hdf5

I have a 100M line csv file (actually many separate csv files) totaling 84GB. I need to convert it to a HDF5 file with a single float dataset. I used h5py in te

Convert list into a pandas data frame

I am trying to convert my output into a pandas data frame and I am struggling. I have this list my_list = [1,2,3,4,5,6,7,8,9] I want to create a pandas data

Python Pandas add Filename Column CSV

My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step furthe

pandas: merge (join) two data frames on multiple columns

I am trying to join two pandas data frames using two columns: new_df = pd.merge(A_df, B_df, how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]') but got

What is index_col=0? and pd.index.name = None

''' import pandas as pd f500 = pd.read_csv('f500.csv',index_col=0) f500.index.name = None ''' I don't know what this mean. What's roll of 'index_c

Convert n same-size, d-dimensional numpy arrays to a dataframe with d+n columns

I recently asked this question, about converting n 2-dimensional arrays to a dataframe with 2+n columns. The solution I got works perfectly well, but can not ea

AttributeError: 'Series' object has no attribute 'reshape'

I'm using sci-kit learn linear regression algorithm. While scaling Y target feature with: Ys = scaler.fit_transform(Y) I got ValueError: Expected 2D arr

How do I read a large csv file with pandas?

I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error: MemoryError Traceback (most recent

Pandas - find specific value in entire dataframe

I have a dataframe and I want to search all columns for values that is text 'Apple'. I know how to do it with one column, but how can I apply this to ALL column

Confidence Interval in Python dataframe

I am trying to calculate the mean and confidence interval(95%) of a column "Force" in a large dataset. I need the result by using the groupby function by groupi

Looping over pandas DataFrame

I have a weird issue that the result doesn't change for each iteration. The code is the following: import pandas as pd import numpy as np X = np.arange(10,100)

Pivot table sorting

I have a pivot table result as below : len MERCHANT_NAME MCC_CODE 0.0 58635982 742.0 7378 763.0 750 780.0 281 1

Setting plot background colour in Seaborn

I am using Seaborn to plot some data in Pandas. I am making some very large plots (factorplots). To see them, I am using some visualisation facilities at my u

How to suppress scientific notation in values in a pandas dataframe?

I have pandas.DataFrame that contains some values with scientific notation and I want to change those values to a normal value without the e+... import pandas a

Setting plot background colour in Seaborn

I am using Seaborn to plot some data in Pandas. I am making some very large plots (factorplots). To see them, I am using some visualisation facilities at my u

Splitting Excel Data by Groupings into Separate Workbook Sheets

Background:I have a large 40MB XLSX file that contains client data which is Grouped over multiple levels, like so: Expanded - Not Expanded (sorry about the ter

Pandas: how to get index value of non-unique index

I have a data frame with a date time index where index values are non unique (see last two index values). I would like to get the next valid index value given a

index of non "NaN" values in Pandas

From Pandas data frame, how to get index of non "NaN" values? My data frame is A b c 0 1 q1 1 1 2 NaN 3 2 3 q2 3 3 4 q1

Insert a row to pandas dataframe

I have a dataframe: s1 = pd.Series([5, 6, 7]) s2 = pd.Series([7, 8, 9]) df = pd.DataFrame([list(s1), list(s2)], columns = ["A", "B", "C"]) A B C 0 5

python: how to melt dataframe retaining specific order / custom sorting

I have a dataframe df Cat B_1 A_2 C_3 A 1 2 3 B 4 5 6 C 7 8 9 which I want to convert into a dataframe so that the rows in column