Category "pandas"

Plotting Time Series using pandas

I have a .csv file containing time series data with headers like Description, Date and Values. I am looking to make a line graph for this time series in such th

How to extract only English words from a from big text corpus using nltk?

I am want remove all non dictionary english words from text corpus. I have removed stopwords, tokenized and countvectorized the data. I need extract only the E

check if timestamp column is in date range from another dataframe

I have a dataframe, df_A with two columns 'amin' and 'amax', which is a set of time range. My objective is to find whether a column in df_B lies between any o

How do I replicate SuperTrend indicator from Binance website?

I'm trying to implement (in Python) SuperTrend indicator that you can see on Binance website if you click on TradingView tab and add it here So far I've tried m

Pandas TimeSeries resample produces NaNs

I am resampling a Pandas TimeSeries. The timeseries consist of binary values (it is a categorical variable) with no missing values, but after resampling NaNs ap

Why is "insert into" inside stored procedure not working from python?

I wrote a stored procedure in SQL Server that gets passed 4 parameters. I want to check the first parameter @table_name to make sure it uses only whitelist char

Pandas: ValueError: cannot convert float NaN to integer

I get ValueError: cannot convert float NaN to integer for following: df = pandas.read_csv('zoom11.csv') df[['x']] = df[['x']].astype(int) The "x" is a column i

Pandas read csv not reading a file properly. Not splitting into proper columns

So I'm trying to read in this dataset from Kaggle. https://www.kaggle.com/gmadevs/atp-matches-dataset#atp_matches_2016.csv I'm using pandas' read_csv functio

Pandas read csv not reading a file properly. Not splitting into proper columns

So I'm trying to read in this dataset from Kaggle. https://www.kaggle.com/gmadevs/atp-matches-dataset#atp_matches_2016.csv I'm using pandas' read_csv functio

pandas fill missing dates in time series

I have a dataframe which has aggregated data for some days. I want to add in the missing days I was following another post, Add missing dates to pandas datafr

Pandas Dataframe: Replacing NaN with row average

I am trying to learn pandas but I have been puzzled with the following. I want to replace NaNs in a DataFrame with the row average. Hence something like df.fil

ImportError: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic'

I have this output : [Pandas-profiling] ImportError: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic' when trying to import pandas-profili

How to handle seaborn pairplot errors when the dataset has NaN values?

I have a pandas DataFrame with multiple columns filled with numbers and rows, and the 1st column has the categorical data. Obviously, I have NaN values and zero

DATAFRAME TO BIGQUERY - Error: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp1yeitxcu_job_4b7daa39.parquet'

I am uploading a dataframe to a bigquery table. df.to_gbq('Deduplic.DailyReport', project_id=BQ_PROJECT_ID, credentials=credentials, if_exists='append') And I

What is the difference between combine_first and fillna?

These two functions seem equivalent to me. You can see that they accomplish the same goal in the code below, as columns c and d are equal. So when should I use

Grouping by multiple columns to find duplicate rows pandas

I have a df id val1 val2 1 1.1 2.2 1 1.1 2.2 2 2.1 5.5 3 8.8 6.2 4 1.1 2.2 5 8.8 6.2 I want t

convert df.apply to spark to run parallely iusing all the cores

We have a panda dataframe that are using. We have a function we use in retail data which runs on a daily basis row by row to calculate the item to item differe

Pandas - dataframe groupby - how to get sum of multiple columns

This should be an easy one, but somehow I couldn't find a solution that works. I have a pandas dataframe which looks like this: index col1 col2 col3 col4

Strict regex in Pandas replace

I need to write a strict regular expression to replace certain values in my pandas dataframe. This is an issue that was raised after solving the question that I

Pyspark-pandas not working on Spark 3.1.2

I am using spark 3.1.2 and attempting to use pyspark-pandas. However when attempting from pyspark import pandas as ps I am getting the following error: ImportEr