Category "pandas"

Writing a Luigi Target as a csv with Pandas

I have a basic Luigi pipeline that I'm writing. The pipeline will download Apple stock data and create a CSV out of it. The following is what I've written: # Do

PEP8 guidance for column names in pandas dataframe?

Is there a standard naming convention that is suggestible for columns in Pandas Dataframes ? As I looked around, this seems to be the most relevant question or

How do I request a zipfile, extract it, then create pandas dataframes from the csv files?

Load in these CSV files from the Sean Lahman's Baseball Database. For this assignment, we will use the 'Salaries.csv' and 'Teams.csv' tables. Read these tables

pandas group by ALL functionality?

I'm using the pandas groupby+agg functionality to generate nice reports aggs_dict = {'a':['mean', 'std'], 'b': 'size'} df.groupby('year').agg(aggs_dict) I wo

Append column to pandas dataframe

This is probably easy, but I have the following data: In data frame 1: index dat1 0 9 1 5 In data frame 2: index dat2 0 7 1 6 I want a da

How to annotate bar chart with values different to those from get_height()

I solved my own question after a long and failed search, so I'm posting the question here and the answer immediately below. The goal: plot percentages but annot

How to rank plot in seaborn boxplot

Take the following seaborn boxplot for example, from https://stanford.edu/~mwaskom/software/seaborn/examples/horizontal_boxplot.html import numpy as np import

Remove white space from entire DataFrame

i have a dataframe, 22 columns and 65 rows. The data comes in from csv file. Each of the values with dataframe has an extra unwanted whitespace. So if i do a lo

check if pair of values is in pair of columns in pandas

Basically, I have latitude and longitude (on a grid) in two different columns. I am getting fed two-element lists (could be numpy arrays) of a new coordinate se

Duplicated rows when merging dataframes in Python

I am currently merging two dataframes with an outer join. However, after merging, I see all the rows are duplicated even when the columns that I merged upon con

How to prevent rows combining in pd.read_csv() google sheets

I'm seeing an odd behaviour where the first 5 rows in my google sheet are combining to one row in my dataframe. This is the output from df.columns.values: ['bus

How to remove the border of Pandas dataframe?

When I use pandas dataframe to excel, the border of the header will be generated automatically. When I use styleframe to excel, the border of the whole table wi

Apply a function to three parallel array of arrays

I have three arrays of arrays like this: catLabels = [catA, catB, catC] binaryLabels = [binA, binB, binC] trueLabels = [] trueLabels.extend(repeat(y_true_cat

Is there a way to reorder a dataframe's column using a user defined list?

Hi there heroes! I'm currently working on a project where I have to process 2D arrays using pandas (numpy is out of question in the context for reasons I can't

Pandas Merge - How to avoid duplicating columns

I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (curre

Append existing excel sheet with new dataframe using python pandas

I currently have this code. It works perfectly. It loops through excel files in a folder, removes the first 2 rows, then saves them as individual excel files,

pandas_ta parabolic SAR giving wrong values for yfinance

I made a function that uses the psar function from the pandas_ta library. This function seems to work incorrectly, it gives the PSARl, PSARs and PSARr values on

Pretty Printing a pandas dataframe

How can I print a pandas dataframe as a nice text-based table, like the following? +------------+---------+-------------+ | column_one | col_two | column_3

Pandas split column into multiple columns by comma

I am trying to split a column into multiple columns based on comma/space separation. My dataframe currently looks like KEYS

How do you create merge_asof functionality in PySpark?

Table A has many columns with a date column, Table B has a datetime and a value. The data in both tables are generated sporadically with no regular interval. Ta