Category "pandas"

How to merge two dfs in pandas (based on datetime period), and add rows if duplicates

I have the following 2 dfs: diag id encounter_key start_of_period end_of_period 1 AAA 2020-06-12 2021-07-07 1 BBB 2021-12-31 2022-01-04 drug id start_datetime

How to estimate similarity between sensor data based on the number of occurrence?

Following is my sample data: data = {850.0: 6, -852.0: 5, 992.0: 29, -993.0: 25, 990.0: 27, -992.0: 28, 965.0: 127, 988.0: 37, -994.0: 24, 996.0: 14, -996.0: 1

Python DataFrame manipulation: How to extract a set of columns in a fast way

I need to access and extract information from a Dataframe that is used for other colleagues in a research group. The DataFrame structure is: zee.loc[zee['layer'

Issues sorting dataframe using isin

I have a dataframe that was converted from a csv using pd.read_csv filled with information with California counties; it looks a little something like this: Cou

Python how ?comparing two columns data into one dataframe

so i have grouping data from this column and then i want to comparing 2 type of the country is 'US' & 'GB into one dataframe so i can make vissualization f

Dataframe add new row if the index does not exist like a dictionary without checking existence

import pandas as pd a = [['a', 1, 2, 3], ['b', 4, 5, 6], ['c', 7, 8, 9]] df = pd.DataFrame(a, columns=['alpha', 'one', 'two', 'three']) df.set_index(['alpha'],

Adding a summarised column back into a dataframe in python

as part of some data cleansing, i want to add the mean of a variable back into a dataframe to use if the variable is missing for a particular observation. so i'

Highlight element based on boolean pandas df

I have 2 data frames with identical indices/columns: df = pd.DataFrame({'A':[5.5, 3, 0, 3, 1], 'B':[2, 1, 0.2, 4, 5],

How to groupby a column but keep all rows as columns

I have a dataframe that was a result of a join operation. This operation had multiple matches, resulting in multiple rows. I want to move resulting match rows t

Python: Formatting a Pandas dataframe head with LaTex

I have made a Pandas dataframe from several NumPy arrays and tried to format columns heads using LaTex, but it looks awful. I'm working with Jupyter Notebook. i

Creating a mean column in a dataframe dependent on other variables of the dataframe in pandas

I have a code that is roughly like this: import numpy as np import pandas as pd df = pd.DataFrame({'Group':['a','a','b','b','b','c','c'], 'Label':[0,1,0,1,1,0,

how to create 3 tables using join in pandas/python?

I need help / guidance with my code below to see if I am doing wrong or what i need to add. I am trying to create three tables using joins in pandas. Can anyone

Simple way to create multiindex columns with pandas

I am sorry for asking but I did not get the still existing answers. I simply glued two data frames with the same column names. | | X | Y | X | Y | |-

module 'numpy' has no attribute 'ndarray'

My Jupiter notebook was crushed, so I have to reinstall the notebook, but in the new Jupiter notebook, I cannot run pandas. import pandas as pd AttributeError

Creating a new dataframe column with the number of overlapping words between dataframe and list

I'm having some trouble fixing the following problem: I have a dataframe with tokenised text on every row that looks (something) like the following index feelin

Plotting subplots of dataframe with subplots of piecharts or nested pie chart with Pandas and Matplotlib

Hi my dataframe looks like the followig format and is named immunizations_df: I'm trying to plot a subplot of piecharts, each piechart symbols the number of va

How to map single column in pandas using multiple columns (text and numbers) in a separate df

I'm trying to convert U.S. geolocation codes for states, counties and cities. The problem is, the county and city codes are duplicated -- meaning, multiple stat

How to select all columns whose names start with X in a pandas DataFrame

I have a DataFrame: import pandas as pd import numpy as np df = pd.DataFrame({'foo.aa': [1, 2.1, np.nan, 4.7, 5.6, 6.8], 'foo.fighters': [0

How to display an error message when pd.read.csv fails

I use pd.read_csv to fetch GCS data. However, when the file size is too large or something, the python task force quit automatically at the line using pd.read_c

Get specific rows which match condition pandas [duplicate]

I have the following dataframe My current code is as follows: Outcome is to only show instances where ImageFileName is services.exe and the P