Category "pandas"

Upload large csv file to cloud storage using Python

Hi I am trying to upload a large csv file but I am getting the below error: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded w

Pandas rolling window cumsum

I have a pandas df as follows: YEAR MONTH USERID TRX_COUNT 2020 1 1 1 2020 2 1 2 2020 3 1 1 2020 12

Pandas rolling window cumsum

I have a pandas df as follows: YEAR MONTH USERID TRX_COUNT 2020 1 1 1 2020 2 1 2 2020 3 1 1 2020 12

Pandas dataframe to mongoDB document

I have a pandas dataframe that I import to a MongoDB database. Each row of the dataframe is transformed into a document like bellow. { title: 'Spaci

How to read a csv file with commas in field with pandas python?

Hi I have a csv file with items like this product_id,url 100,https://url/p/Cimory-Yogurt-Squeeze-Original-120-g-745133 "1000,""https://url/p/OREO-Biskuit-Dark-

How to append two cell values located in the same dataframe column?

Disclosure Source 35 36 37 38 39 202-1 GRI 202: Market Presence 40 2016 41 42 43 The Source Co

Standard Deviation coming NaN in Pyspark rolling window

I have a dataset with 4 sensor values, 'volt', 'pressure', 'rotate' and 'vibration'. For these sensor values I am calculating rolling mean and rolling standard

Elegant pandas way to stratified frequencies

I want to know if I can reach my desired result with a "better" way? This means with less steps (but readable code!) and some pandas in-build features. That is

Plot histogram from two columns of csv using pandas

I have a csv file containing two columns. What I'd like to do is to plot a histogram based on these two columns. My code is as follows: data = pd.read_csv('data

Error while translating the data-frame column in pandas : IndexError: list index out of range

I am trying to translate a column of my dataframe. Using the below code: # Import the library: import googletrans from googletrans import Translator # Create

How to replace values in column when other column is not nan or replace with other?

I am new to Pandas, I am looking for solution where I can replace one column values with other columns. For eg: Replace value of col A with the values in Col E

Google Analytics response to Pandas Dataframe in Python

Still a newbie to Python so please be gentle. I'm trying to parse a Google Analytics Reporting API V4 response to a Pandas dataframe in Python, specifically usi

Python pandas : pivot_table simple string aggregation and sort

I'm trying to achieve something with pandas which is very straightforward to do in Excel PivotTable: From what I've seen, the following code seems logic, but

AttributeError: 'int' object has no attribute 'split' pandas

Please, I know there are several issues related to this error of mine, but I'm learning, I don't understand almost anything, so please, if it's not asking too m

How do you compare columns 'a' and 'b' to return 'c' or 'd'?

I am trying to compare two columns and then return a third value from one of the two adjacent columns. I have read that using iterrows is not the correct way to

issue on pandas_ta adx indicator

when i run this code it's obvious get this error s missing close value. df['ADX'] = ta.adx(df['High'], df['Low'],length = 14) df output: TypeError

pd.read_csv - dates in pandas multiindex column names

I import a csv file into a pandas dataframe. df=pd.read_csv('data.csv',index_col=[0],header=[0,1]) My data has a column multiindex with two levels. Level(0) co

Verify that a column name is a unique identifier

I have a dataset called df_authors and in that dataset I have a column called author. I have to verify that df_authors.author is a unique identifier. What I tri

Clustering between two sets of data points - Python

I'm hoping to use k-means clustering to plot and return the position of each cluster's centroid. The following groups two sets of xy scatter points into 6 clust

Sklearn Pipeline with KernelExplainer and data to predict as DataFrame leads to error

I want to calculate shap values from a sklearn pipeline with a preprocessor and a model. When i do it with the code below I get 0 for all shape_values def creat