Category "pandas"

How to save a new sheet in an existing excel file, using Pandas?

I want to use excel files to store data elaborated with python. My problem is that I can't add sheets to an existing excel file. Here I suggest a sample code to

How to create dictionary to look for dropped zeros?

I ran into this specific problem where I have a dataframe of ID numbers. Some of these account numbers have dropped leading zeros. dataframe is df. ID 345 345 5

Why has seaborn/matplotlib filled below the line in this lineplot

I don't know why my plot looks like this: I only want to display lines with no fill. Code below. Note this also happens if I run in Spyder or cmd. import ma

Remove one out of two legends from Seaborn Scatterplot

Using the 'tips' dataset as a toy model, I generate the following plot: import seaborn as sns import matplotlib.pyplot as plt tips = sns.load_dataset("tips")

Word count Matrix of document corpus with Pandas Dataframe

Well, I have a corpus of 2000+ text documents and I'm trying to make a matrix with pandas dataframe in the most elegant way. The matrix would look like this: d

Python_OSError: [Errno 28] No space left on device

I have the following error while exporting pandas dataframe into csv file. I have enough space in my hard disk. OSError: [Errno 28] No space left on device

pandas - filter on groups which have at least one column containing non-null values in a groupby

I have the following python pandas dataframe: df = pd.DataFrame({'Id': ['1', '1', '1', '2', '2', '3'], 'A': ['TRUE', 'TRUE', 'TRUE', 'TRUE', 'TRUE', 'FALSE'],

Streamlit Panda Query Function Syntax Error When Finding Column in CSV Dataframe

When Using Streamlit to build a data interface getting a syntax error. My downloaded csv dataframe has a column 'NUMBER OF PERSONS INJURED', after converting i

How do I read SQL stored procedure data through pyodbc and get results into a dataframe?

I have a stored proc in SQL Server called test.storedproc My py script is as follows import pyodbc import pandas as pd conn = pyodbc.connect('Driver={SQL Server

Plot elapsed time on x axis using date indexed time-series data

In my pandas dataframe, my time series data is indexed by absolute time (a date of format YYYY-MM-DD HH24:MI:SS.nnnnn): 2017-01-04 16:25:25.143493 58 2017-0

how to assign an entire list to each row of a pandas dataframe

I have a dataframe and a list df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6]}) mylist= [10,20,30,40,50] I would like to have a list as element in each row of a

Convert numpy array from space separated to comma separated in python

This is data in .csv format file generally we expect array/ list with [1,2,3,4] comma separated values which it seems that nothing happened in this case data =

Pandas(Python) : Fill empty cells with with previous row value?

I want to fill empty cells with with previous row value if they start with number. For example, I have Text Text 30 Text Text

How to create and annotate a stacked proportional bar chart

I'm struggling to create a stacked bar chart derived from value_counts() of a columns from a dataframe. Assume a dataframe like the following, where responder i

How do I get list of all possible tickers (and also maybe their meanings) for various dataset libraries?

so the way I usually get some dataset (in this example, US Product Price Index) from econdb library is this: import datetime import pandas_datareader as pdr imp

Unable to read "#N/A" as string

I am having problem in reading "#N/A" as string. Tried using both "keep_default_na=False" and "na_filter = False". It is working out for "NA", "N/A", "#NA" but

pandas how to check dtype for all columns in a dataframe?

It seems that dtype only work for pandas.DataFrame.Series, right? Is there a function to display data types of all columns at once?

suppress Name dtype from python pandas describe

Lets say I have r = pd.DataFrame({'A':1 , 'B':pd.Series(1,index=list(range(4)),dtype='float32')}) And r['B'].describe()[['mean','std','min','m

Showing gps points on altair world map

I'm building (for learning purposes) a python program that extracts gps-data from *jpg files in a directory and display the gps-coordinates from the photo's on

Color pandas DataFrame value if larger than 1.5*median(column)

Let's say I have a DataFrame that looks like this: df= pd.DataFrame({'A': [1,-2,0,-1,17], 'B': [11,-23,1,-3,132], 'C': [121,