Category "dataframe"

How to merge two dfs in pandas (based on datetime period), and add rows if duplicates

I have the following 2 dfs: diag id encounter_key start_of_period end_of_period 1 AAA 2020-06-12 2021-07-07 1 BBB 2021-12-31 2022-01-04 drug id start_datetime

How do I pass arguments to srvyr inside of a function?

so I'm using srvyr to calculate survey means of a variable (y) from a survey object, grouping by a categorical variable (x) from that same survey object, and th

How to estimate similarity between sensor data based on the number of occurrence?

Following is my sample data: data = {850.0: 6, -852.0: 5, 992.0: 29, -993.0: 25, 990.0: 27, -992.0: 28, 965.0: 127, 988.0: 37, -994.0: 24, 996.0: 14, -996.0: 1

Python DataFrame manipulation: How to extract a set of columns in a fast way

I need to access and extract information from a Dataframe that is used for other colleagues in a research group. The DataFrame structure is: zee.loc[zee['layer'

Dataframe add new row if the index does not exist like a dictionary without checking existence

import pandas as pd a = [['a', 1, 2, 3], ['b', 4, 5, 6], ['c', 7, 8, 9]] df = pd.DataFrame(a, columns=['alpha', 'one', 'two', 'three']) df.set_index(['alpha'],

Constant warning message with reshape::melt in r

I am constantly getting warning message like : as.is should be specified by the caller using true Code is like : difficulty_data <- data_original[,c(-1)] %

Highlight element based on boolean pandas df

I have 2 data frames with identical indices/columns: df = pd.DataFrame({'A':[5.5, 3, 0, 3, 1], 'B':[2, 1, 0.2, 4, 5],

Simultaneously remove the first and last rows of a data frame until reaching a row that does not have an NA

I have a dataframe that contains NA values, and I want to remove some rows that have an NA (i.e., not complete cases). However, I only want to remove rows at th

Replace values in a dataframe based on lookup table

I am having some trouble replacing values in a dataframe. I would like to replace values based on a separate table. Below is an example of what I am trying to d

Interactive filtering data table in Plotly by using a dropdown

I am trying to make an interactive table where the values of the table change by selecting a value from a dropdown. This should be done only in Plotly (not Dash

Creating a new dataframe column with the number of overlapping words between dataframe and list

I'm having some trouble fixing the following problem: I have a dataframe with tokenised text on every row that looks (something) like the following index feelin

How to map single column in pandas using multiple columns (text and numbers) in a separate df

I'm trying to convert U.S. geolocation codes for states, counties and cities. The problem is, the county and city codes are duplicated -- meaning, multiple stat

How to select all columns whose names start with X in a pandas DataFrame

I have a DataFrame: import pandas as pd import numpy as np df = pd.DataFrame({'foo.aa': [1, 2.1, np.nan, 4.7, 5.6, 6.8], 'foo.fighters': [0

Show Method for Dynamic Frame in AWS glue returns empty field

When I try to use the dyF.show() it returns an empty field, even though I checked the schema and count() and I know the table is populated. I transformed it int

Retrieving data from the Air Quality Index (AQI) website through the API and only recieving small nr. of stations

I'm working on a personal project and I'm trying to retrieve air quality data from the https://aqicn.org website using their API. I've used this code, which I'v

Get specific rows which match condition pandas [duplicate]

I have the following dataframe My current code is as follows: Outcome is to only show instances where ImageFileName is services.exe and the P

How to join two very large dataframes together with same columns?

I have two datasets that look like this: df1: Date City State Quantity 2019-01 Chicago IL 35 2019-01 Orlando FL 322 ... .... ... ... 2021-07 Chicago IL 334 202

Get records that are a time interval away from a given date and specific conditions on a pandas DataFrame

Let it be the following Python Panda DataFrame: | ID | date | direction | country_ID | |-----------|-------------------------|----

how to get a single value from dataframe only in Python

I have dataframe df_my that looks like this id name age major ---------------------------------------- 0 1 Mark 34 Engli

Selecting a subset of a dataframe based on a list - pandas

I am working with a large dataframe (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt) with pandas in Python 3, using PyCharm. The column