Category "pandas"

pandas dataframe replace blanks with NaN

I have a dataframe with empty cells and would like to replace these empty cells with NaN. A solution previously proposed at this forum works, but only if the ce

How to reorder indexed rows based on a list in Pandas data frame

I have a data frame that looks like this: company Amazon Apple Yahoo name A 0 130 0 C 173 0 0 Z 0 0

Find the column name which has the maximum value for each row

I have a DataFrame like this one: In [7]: frame.head() Out[7]: Communications and Search Business General Lifestyle 0 0.745763 0.050847 0.118644

Count most frequent 100 words from sentences in Dataframe Pandas

I have text reviews in one column in Pandas dataframe and I want to count the N-most frequent words with their frequency counts (in whole column - NOT in single

How To Solve KeyError: u"None of [Index([..], dtype='object')] are in the [columns]"

I'm trying to create a SVM model from what I found in github here, but it keeps returning this error. Traceback (most recent call last): File "C:\Users\Me\Do

'function' object has no attribute 'apply'

I have a data frame df , which has a column 'query' having text data. I am trying to clean text data with the help of apply function. But getting the above er

Filter rows in csv file based on another csv file and save the filtered data in a new file

Good day all so I was trying to filter file2 based on file1, where file1 is a subset from file2. But file2 has a description column that I need to be able to an

In Pandas, how to return the id for the next value which is above/below a threshold

I have a dataframe like this: date value 0 2018-05-15 06:00:00 100.86 1 2018-05-15 07:00:00 101.99 2 2018-05-15 08:00:00 110.00 3 201

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

I have a dataframe which looks like this: A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657

How to get multiple column-slices of a dataframe in pandas

for example, from pandas import DataFrame df = DataFrame(np.arange(8).reshape(1, 8), columns = list('abcdefgh')) I want to select the columns 'b':'d' and 'f

Select multiple columns by labels in pandas

I've been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic.

Best format for Pandas serialization on disk

For my workload, I need to serialize on disk Pandas dataframe (Text +Datas) with a size of 5Go per Dataframe. Came across various solutions: HDF5 : Issues wi

VS Code no longer showing option to view DataFrame in Data Viewer

I'm working with pandas in VS Code and I've been using the View value in Data Viewer option to look at my Data frames while debugging. For some reason VS Code h

Pandas count null values in a groupby function

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'thr

How to replace pandas dataframe column names with another list or dictionary matching

I need to replace column names of a pandas DataFrame having names like 'real_tag' and rename them like 'Descripcion' list = [{'real_tag': 'FA0:4AIS0007', 'Descr

Python: create a pandas data frame from a list

I am using the following code to create a data frame from a list: test_list = ['a','b','c','d'] df_test = pd.DataFrame.from_records(test_list, columns=['my_let

MemoryError: Unable to allocate 1.88 GiB for an array with shape (2549150, 99) and data type object

I have a problem. I want to normalize with pd.json_normalize(...) a list with inside dict but unfortunately I got a MemoryError. Is there an option to work arou

Pandas- rename dataframe multilevel header according to the name of the first level header

I have a dataframe like this : X Y a b a b 0 1 3 4 2 1 5 7 8 6 And I want to rename a specific column name, fo

pandas astype python bool instead of numpy.bool_

I need to convert a pandas dataframe to a JSON object. However json.dumps(df.to_dict(orient='records')) fails as the boolean columns are not JSON serializa

reading multiple tabs from excel in different dataframes

I am trying to read multiple tabs in spreadsheet to different dataframes and once all tabs with data are over the program should stop. For first part I am look