Category "pandas"

AWS Athena table from python output with dates - dates get wrongly converted

I have a pandas DataFrame containing a date column ("2022-02-02"). I write this table to parquet using pyarrow. df[col] = df[col].astype(str) df.to_parquet(loc)

Binning 2D data with circles instead of rectangles - from pandas df

I have a dataframe of x, y data and need to bin it into circles. Ie a grid of circles of certain size and spacing centered on some point. So for example some da

How Do I Uploading Data Externally in Explainerdashboard

I am trying to upload external data into the dashboard using explainer.set_x_row_func() and explainer.set_y_func(). Does anyone know how to do this? Below is ho

Panda merge returns NAN values

Please consider 2 dataframes panda df1 and df2: df1 = pd.read_csv('df1.csv', sep=';') df2 = pd.read_csv('df2.csv', sep=';') We convert to date fields: df1['

Add a new record for each missing second in a DataFrame with TimeStamp [duplicate]

Be the next Pandas DataFrame: | date | counter | |-------------------------------------|--------------

Comparing 2 columns with different rows in different csv files, and output status to another csv file

I have 2 csv files as shown below. They contain different numbers of rows and the columns are not aligned/sorted along a common index. I need to compare the col

Error with delimiters on dataframe when trying to upload it to MSSQL

So I've been trying to upload a dataframe to an specific table that is under MSSQL, I've trying to use the BCPANDAS library to upload the data to it. However th

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

I have a data frame df and I use several columns from it to groupby: df['col1','col2','col3','col4'].groupby(['col1','col2']).mean() In the above way I almos

Poor accuarcy score for Semi-Supervised Support Vector machine

I am using a Semi-Supervised approach for Support Vector Machine in Python for the image classification from PASCAL VOC 2007 data. I have tried with the default

dtale show in jupyter notebook

I am exploring this new Python package named dtale. It is very convenient for pandas data frames visualization. https://pypi.org/project/dtale/ It worked onc

Inconsistent indexing of subplots returned by `pandas.DataFrame.plot` when changing plot kind

I know that, this issue is known and was already discussed. But I am encountering a strange behaviour, may be someone has idea why: When I run this: plot = df.p

how to check if value in a DataFrame is a type Decimal

I am writing a data test for some api calls that return a DataFrame with a date type and a type Decimal. I can't find a way to verify the Decimal the DataFrame

Get index and column with multiple headers and index_col in Pandas DataFrame

I have a dataframe with multiple headers and column indexes, and would like to retrieve the list of entries that are non-zero. The dataframe is constructed from

How to edit/ sort a non-column column in Python?

I wrote the script below, and I'm 98% content with the output. However, the unorganized manner/ disorder of the 'Approved' field bugs me. As you can see, I trie

Geopandas not plotting correct colors

My Geopandas DataFrame has 3 polygons and 9 points with color_rgba column computed with matplotlib.colors.to_rgba function: import contextily as ctx import geop

Numpy where function in python

I have a data frame like this: pd.DataFrame({'Material': ['Steel (16MnCr5)', 'X', 'X', 'X', 'Carbon black', 'Sulfur', 'Copper'], 'Weight': [4, 8, 0, 8, 6, 9, 3

how to do count of particular value of given column corresponding to other column

To count the particular value of given column

How to fix ParserError: year 0 is out of range: 0000-00-00 with Python Pandas to_datetime method

I am trying to convert a column "travel_start" to a datetime object. Dashboard["travel_start"] = pd.to_datetime(Dashboard["travel_start"]) But I get the fol

rows wise correlation between two Dataframe which have unequal columns

I have two Dataframes, (Dataset1=200rows, 34 column)(Dataset2=200rows, 22 column). I want rows wise correlation between both datasets. how can I perform this. I

Plot multiple columns side by side

I have the dataframe below. 111_a 111_b 222_a 222_b 333_a 333_b row_1 1.0 2.0 1.5 2.5 1.0 2.5 row_2 1.0 2.0 1.5 2.5 1.0