Category "pandas"

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

I have a dataframe which looks like this: A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657

How to get multiple column-slices of a dataframe in pandas

for example, from pandas import DataFrame df = DataFrame(np.arange(8).reshape(1, 8), columns = list('abcdefgh')) I want to select the columns 'b':'d' and 'f

Select multiple columns by labels in pandas

I've been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic.

Best format for Pandas serialization on disk

For my workload, I need to serialize on disk Pandas dataframe (Text +Datas) with a size of 5Go per Dataframe. Came across various solutions: HDF5 : Issues wi

VS Code no longer showing option to view DataFrame in Data Viewer

I'm working with pandas in VS Code and I've been using the View value in Data Viewer option to look at my Data frames while debugging. For some reason VS Code h

Pandas count null values in a groupby function

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'thr

How to replace pandas dataframe column names with another list or dictionary matching

I need to replace column names of a pandas DataFrame having names like 'real_tag' and rename them like 'Descripcion' list = [{'real_tag': 'FA0:4AIS0007', 'Descr

Python: create a pandas data frame from a list

I am using the following code to create a data frame from a list: test_list = ['a','b','c','d'] df_test = pd.DataFrame.from_records(test_list, columns=['my_let

MemoryError: Unable to allocate 1.88 GiB for an array with shape (2549150, 99) and data type object

I have a problem. I want to normalize with pd.json_normalize(...) a list with inside dict but unfortunately I got a MemoryError. Is there an option to work arou

Pandas- rename dataframe multilevel header according to the name of the first level header

I have a dataframe like this : X Y a b a b 0 1 3 4 2 1 5 7 8 6 And I want to rename a specific column name, fo

pandas astype python bool instead of numpy.bool_

I need to convert a pandas dataframe to a JSON object. However json.dumps(df.to_dict(orient='records')) fails as the boolean columns are not JSON serializa

reading multiple tabs from excel in different dataframes

I am trying to read multiple tabs in spreadsheet to different dataframes and once all tabs with data are over the program should stop. For first part I am look

Uncomfortable output of mode() in pandas Dataframe

I have a dataframe with several columns (the features). >>> print(df) col1 col2 a 1 1 b 2 2 c 3 3 d 3 2 I woul

Saving pandas data frame to .mat file in python3

I have a pandas data frame 'df', it looks like below but original data has many rows. I would like to save this as .mat file with a name 'meta.mat'. I tried;

How to subset Pandas Dataframe using an OR operator whilst avoiding "FutureWarning: elementwise comparison failed;"

I have a Pandas dataframe (tempDF) of 5 columns by N rows. Each element of the dataframe is an object (string in this case). For example, the dataframe looks li

pandas diff() giving 0 value for first difference, I want the actual value instead

I have df: Hour Energy Wh 1 4 2 6 3 9 4 15 I would like to add a column that shows the per hour differenc

sorting rows in a pandas dataframe in a way which is not alphabetical

I have some dataframes (df) with categorical data starting with: a, b, c and a category for "remaining categories". I would like to sort the month column in t

Combining Python variables into SQL queries

I am pulling data from an online database using SQL/postgresql queries and converting it into a Python dataframe using Pandas. I want to be able to change the d

Joining on datetime64[ns, UTC] fails using pandas.join

I'm trying to join two pandas.DataFrames on a datetime64[ns, UTC] field and it's failing with a ValueError (described below) that is not intuitive to me. Consid

Chunking DataFrame by gaps in datetime index

First of all, my apologies if the title was too ambiguous. I have a pd.DataFrame with datetime64 as a dtype of index. These indices, however, are not equally