Category "pandas"

'function' object has no attribute 'apply'

I have a data frame df , which has a column 'query' having text data. I am trying to clean text data with the help of apply function. But getting the above er

Filter rows in csv file based on another csv file and save the filtered data in a new file

Good day all so I was trying to filter file2 based on file1, where file1 is a subset from file2. But file2 has a description column that I need to be able to an

In Pandas, how to return the id for the next value which is above/below a threshold

I have a dataframe like this: date value 0 2018-05-15 06:00:00 100.86 1 2018-05-15 07:00:00 101.99 2 2018-05-15 08:00:00 110.00 3 201

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

I have a dataframe which looks like this: A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657

How to get multiple column-slices of a dataframe in pandas

for example, from pandas import DataFrame df = DataFrame(np.arange(8).reshape(1, 8), columns = list('abcdefgh')) I want to select the columns 'b':'d' and 'f

Select multiple columns by labels in pandas

I've been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic.

Best format for Pandas serialization on disk

For my workload, I need to serialize on disk Pandas dataframe (Text +Datas) with a size of 5Go per Dataframe. Came across various solutions: HDF5 : Issues wi

VS Code no longer showing option to view DataFrame in Data Viewer

I'm working with pandas in VS Code and I've been using the View value in Data Viewer option to look at my Data frames while debugging. For some reason VS Code h

Pandas count null values in a groupby function

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'thr

How to replace pandas dataframe column names with another list or dictionary matching

I need to replace column names of a pandas DataFrame having names like 'real_tag' and rename them like 'Descripcion' list = [{'real_tag': 'FA0:4AIS0007', 'Descr

Python: create a pandas data frame from a list

I am using the following code to create a data frame from a list: test_list = ['a','b','c','d'] df_test = pd.DataFrame.from_records(test_list, columns=['my_let

MemoryError: Unable to allocate 1.88 GiB for an array with shape (2549150, 99) and data type object

I have a problem. I want to normalize with pd.json_normalize(...) a list with inside dict but unfortunately I got a MemoryError. Is there an option to work arou

Pandas- rename dataframe multilevel header according to the name of the first level header

I have a dataframe like this : X Y a b a b 0 1 3 4 2 1 5 7 8 6 And I want to rename a specific column name, fo

pandas astype python bool instead of numpy.bool_

I need to convert a pandas dataframe to a JSON object. However json.dumps(df.to_dict(orient='records')) fails as the boolean columns are not JSON serializa

reading multiple tabs from excel in different dataframes

I am trying to read multiple tabs in spreadsheet to different dataframes and once all tabs with data are over the program should stop. For first part I am look

Uncomfortable output of mode() in pandas Dataframe

I have a dataframe with several columns (the features). >>> print(df) col1 col2 a 1 1 b 2 2 c 3 3 d 3 2 I woul

Saving pandas data frame to .mat file in python3

I have a pandas data frame 'df', it looks like below but original data has many rows. I would like to save this as .mat file with a name 'meta.mat'. I tried;

How to subset Pandas Dataframe using an OR operator whilst avoiding "FutureWarning: elementwise comparison failed;"

I have a Pandas dataframe (tempDF) of 5 columns by N rows. Each element of the dataframe is an object (string in this case). For example, the dataframe looks li

pandas diff() giving 0 value for first difference, I want the actual value instead

I have df: Hour Energy Wh 1 4 2 6 3 9 4 15 I would like to add a column that shows the per hour differenc

sorting rows in a pandas dataframe in a way which is not alphabetical

I have some dataframes (df) with categorical data starting with: a, b, c and a category for "remaining categories". I would like to sort the month column in t