Category "pandas"

How to query a numerical column name in pandas?

Lets suppose I create a dataframe with columns and query i.e pd.DataFrame([[1,2],[3,4],[5,6]],columns=['a','b']).query('a>1') This will give me a b

Get for each row the last column name with a certain value

I have this kind of dataframe, and I'm looking to get for each row the last column name equals to 1 Here is an example of my dataframe col1 col2

Best way to store many pandas dataframes in a single file

I have 20,000 ~1000-row dataframes, each of which has a name, in a 170GB pickle file at the moment. I'd like to write these to a file these so I can load them i

How to select several rows when reading a csv file using pandas?

I have a very large csv file with millions of rows and a list of the row numbers that I need.like rownumberList = [1,2,5,6,8,9,20,22] I know there is somethi

Plotting Time Series using pandas

I have a .csv file containing time series data with headers like Description, Date and Values. I am looking to make a line graph for this time series in such th

Plotting Time Series using pandas

I have a .csv file containing time series data with headers like Description, Date and Values. I am looking to make a line graph for this time series in such th

How to extract only English words from a from big text corpus using nltk?

I am want remove all non dictionary english words from text corpus. I have removed stopwords, tokenized and countvectorized the data. I need extract only the E

check if timestamp column is in date range from another dataframe

I have a dataframe, df_A with two columns 'amin' and 'amax', which is a set of time range. My objective is to find whether a column in df_B lies between any o

How do I replicate SuperTrend indicator from Binance website?

I'm trying to implement (in Python) SuperTrend indicator that you can see on Binance website if you click on TradingView tab and add it here So far I've tried m

Pandas TimeSeries resample produces NaNs

I am resampling a Pandas TimeSeries. The timeseries consist of binary values (it is a categorical variable) with no missing values, but after resampling NaNs ap

Why is "insert into" inside stored procedure not working from python?

I wrote a stored procedure in SQL Server that gets passed 4 parameters. I want to check the first parameter @table_name to make sure it uses only whitelist char

Pandas: ValueError: cannot convert float NaN to integer

I get ValueError: cannot convert float NaN to integer for following: df = pandas.read_csv('zoom11.csv') df[['x']] = df[['x']].astype(int) The "x" is a column i

Pandas read csv not reading a file properly. Not splitting into proper columns

So I'm trying to read in this dataset from Kaggle. https://www.kaggle.com/gmadevs/atp-matches-dataset#atp_matches_2016.csv I'm using pandas' read_csv functio

Pandas read csv not reading a file properly. Not splitting into proper columns

So I'm trying to read in this dataset from Kaggle. https://www.kaggle.com/gmadevs/atp-matches-dataset#atp_matches_2016.csv I'm using pandas' read_csv functio

pandas fill missing dates in time series

I have a dataframe which has aggregated data for some days. I want to add in the missing days I was following another post, Add missing dates to pandas datafr

Pandas Dataframe: Replacing NaN with row average

I am trying to learn pandas but I have been puzzled with the following. I want to replace NaNs in a DataFrame with the row average. Hence something like df.fil

ImportError: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic'

I have this output : [Pandas-profiling] ImportError: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic' when trying to import pandas-profili

How to handle seaborn pairplot errors when the dataset has NaN values?

I have a pandas DataFrame with multiple columns filled with numbers and rows, and the 1st column has the categorical data. Obviously, I have NaN values and zero

DATAFRAME TO BIGQUERY - Error: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp1yeitxcu_job_4b7daa39.parquet'

I am uploading a dataframe to a bigquery table. df.to_gbq('Deduplic.DailyReport', project_id=BQ_PROJECT_ID, credentials=credentials, if_exists='append') And I

What is the difference between combine_first and fillna?

These two functions seem equivalent to me. You can see that they accomplish the same goal in the code below, as columns c and d are equal. So when should I use