Category "pandas"

Standard Deviation coming NaN in Pyspark rolling window

I have a dataset with 4 sensor values, 'volt', 'pressure', 'rotate' and 'vibration'. For these sensor values I am calculating rolling mean and rolling standard

Elegant pandas way to stratified frequencies

I want to know if I can reach my desired result with a "better" way? This means with less steps (but readable code!) and some pandas in-build features. That is

Plot histogram from two columns of csv using pandas

I have a csv file containing two columns. What I'd like to do is to plot a histogram based on these two columns. My code is as follows: data = pd.read_csv('data

Error while translating the data-frame column in pandas : IndexError: list index out of range

I am trying to translate a column of my dataframe. Using the below code: # Import the library: import googletrans from googletrans import Translator # Create

How to replace values in column when other column is not nan or replace with other?

I am new to Pandas, I am looking for solution where I can replace one column values with other columns. For eg: Replace value of col A with the values in Col E

Google Analytics response to Pandas Dataframe in Python

Still a newbie to Python so please be gentle. I'm trying to parse a Google Analytics Reporting API V4 response to a Pandas dataframe in Python, specifically usi

Python pandas : pivot_table simple string aggregation and sort

I'm trying to achieve something with pandas which is very straightforward to do in Excel PivotTable: From what I've seen, the following code seems logic, but

AttributeError: 'int' object has no attribute 'split' pandas

Please, I know there are several issues related to this error of mine, but I'm learning, I don't understand almost anything, so please, if it's not asking too m

How do you compare columns 'a' and 'b' to return 'c' or 'd'?

I am trying to compare two columns and then return a third value from one of the two adjacent columns. I have read that using iterrows is not the correct way to

issue on pandas_ta adx indicator

when i run this code it's obvious get this error s missing close value. df['ADX'] = ta.adx(df['High'], df['Low'],length = 14) df output: TypeError

pd.read_csv - dates in pandas multiindex column names

I import a csv file into a pandas dataframe. df=pd.read_csv('data.csv',index_col=[0],header=[0,1]) My data has a column multiindex with two levels. Level(0) co

Verify that a column name is a unique identifier

I have a dataset called df_authors and in that dataset I have a column called author. I have to verify that df_authors.author is a unique identifier. What I tri

Clustering between two sets of data points - Python

I'm hoping to use k-means clustering to plot and return the position of each cluster's centroid. The following groups two sets of xy scatter points into 6 clust

Sklearn Pipeline with KernelExplainer and data to predict as DataFrame leads to error

I want to calculate shap values from a sklearn pipeline with a preprocessor and a model. When i do it with the code below I get 0 for all shape_values def creat

remove rows in dataframe which are not all 1 or all 0

I need to retain rows in the dataframe which has all row values as 0 or all 1. a = np.repeat(0,10) b = np.repeat(1,10) ab = pd.DataFrame({'col1':a,'col2':b}).tr

Sort columns values based on floats inside a string, then concat

I'm working on a pretty messy DF. Looking like this, but with 30 columns: a b some text (other text) : 56.3% (text again: 40%) again text (not same text) : 33%

How to save my first dataframe value with Pandas?

I just don't get it. I'm trying to save two different value(to different position) to an excel file, but the first one gets overwritten everytime. Why? @classme

How do I get a conditional total in pandas dataframe

I have a 32000 row 20 column dataframe consisting of data around many securities. Eg of target columns is as follows: The output that I want is like this: Eff

How do I get a conditional total in pandas dataframe

I have a 32000 row 20 column dataframe consisting of data around many securities. Eg of target columns is as follows: The output that I want is like this: Eff

Using a variable within str.contains()

Pretty much the title. Any way to use a variable to filter in str.contain()? i have been unsuccessful in using a str+@variable