Category "pandas"

Grouping by multiple columns to find duplicate rows pandas

I have a df id val1 val2 1 1.1 2.2 1 1.1 2.2 2 2.1 5.5 3 8.8 6.2 4 1.1 2.2 5 8.8 6.2 I want t

convert df.apply to spark to run parallely iusing all the cores

We have a panda dataframe that are using. We have a function we use in retail data which runs on a daily basis row by row to calculate the item to item differe

Pandas - dataframe groupby - how to get sum of multiple columns

This should be an easy one, but somehow I couldn't find a solution that works. I have a pandas dataframe which looks like this: index col1 col2 col3 col4

Strict regex in Pandas replace

I need to write a strict regular expression to replace certain values in my pandas dataframe. This is an issue that was raised after solving the question that I

Pyspark-pandas not working on Spark 3.1.2

I am using spark 3.1.2 and attempting to use pyspark-pandas. However when attempting from pyspark import pandas as ps I am getting the following error: ImportEr

Python for Google Sheets: write dataframes to different sheets in the same workbook

Using the code below, I am able to write the dataframe df1 to the default first sheet (starting at cell ‘B7’) of the Google Sheet workbook. In the s

Drop Columns in Pandas Dataframe: Inconsistency in Output

Problem: While dropping column labelled 'Happiness_Score' below, I'm getting it dropped in the parent Dataframe as well. This is not supposed to happen, would l

SQLAlchemy (psycopg2.ProgrammingError) can't adapt type 'dict'

Couldn't find a solution on the web for my problem. I am trying to insert this pandas df to a Postgresql table using SQLAlchemy Pandas 0.24.2 sqlalchemy 1.3.

Maintaining the order of the elements in a frozen set

I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this: x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]

How to sort Plotly bar chart in descending order

I have created a basic bar chart in plotly that I would like to sort by descending order. I couldn't find an easy way to specify this in the plotly syntax, so

Melting pandas data frame with multiple variable names and multiple value names

How can I melt a pandas data frame using multiple variable names and values? I have the following data frame that changes its shape in a for loop. In one of the

ERR_CONNECTION_REFUSED on browser when opening dtale with Eclipse Pydev

Opening a dtale sheet using Eclipse Pydev on Windows leads to ERR_CONNECTION_REFUSED on browser. The same code works on spyder and jupyter however. I know dtale

Python Pandas - Concat dataframes with different columns ignoring column names

I have two pandas.DataFrames which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings

Python Pandas - Concat dataframes with different columns ignoring column names

I have two pandas.DataFrames which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings

Python : Changing the original data using a for loop

I have some really big txt files (> 2 gb) where the quality of the data is not good. In some columns (that should be integer), for values below 1000.00 , '.'

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte

I am new to Python, I am trying to read csv file using below script. Past=pd.read_csv("C:/Users/Admin/Desktop/Python/Past.csv",encoding='utf-8') But, getting

How to split a DataFrame based on consecutive index?

I have a DataFrame 'work' with non consecutive index, here is an example: Index Column1 Column2 4464 10.5 12.7 4465 11.3 12.8 4466 10.3 22.8 5123 1

Getting alternating results with pandas melt

I was trying to convert the first image in this album into the second with pandas but all I got was the third one... Original Year Jan Feb Mar A

Passing dataframe and using its name to create the csv file

I have a requirment where i need to pass different dataframes and print the rows in dataframes to the csv file and the name of the file needs to be the datafram

Slice DataFrame using Dates from a List and its offset as a range for slicing in a for loop

This my DataFrame df with calendar days frequency and DateTime Object as Index. This data starts from 1989-01-03 till present day: Pri