Category "dataframe"

Way to change value based on condition with previous validated data?

I cannot manage to implement in an efficient way a method that could change values in dataframes based on difference with previous "validated" data. I have a da

How to preform loc with one condition that include two columns

I have df with two columns A and B both of them are columns with string values. Example: df_1 = pd.DataFrame(data={ "A":['a','b','c'], "B":['a x d','z y

Python- How to Combine 2 pandas.core.frame =.dataframe with the same column name together in python [duplicate]

So i got 2 pandas.core.frame.DataFrame like this: anomalies: Sales outlet Date 2006-07-01 700 2 a

Trying to get the minimum date and getting TypeError: '<' not supported between instances of 'datetime.datetime' and 'int'

i'm reading from an excel file GA = pd.read_excel("file.xlsx", sheet_name=0, engine= "openpyxl") The data type is: Email object Date datetime64[ns] Name object

Convert date + time strings to epoch milliseconds in dataframe column (when present)

I have a dataframe with a column called "snapshot_timestamp" where the time is in this format: 2022-05-01 23:45:47.428 (year, month, day, hour, minutes, seconds

How to combine two columns in pandas dataframe and set values to them?

I have two columns in pandas dataframe Latitude and Longitude. I am trying two combine them in single column LOCATION. If we see the data there are only two loc

R Studio keeps crashing when I'm trying to merge multiple csv files into a data frame. How do I fix this?

I have 12 csv files that I need to merge for analysis project and their size ranges from 20mb to 120mb per file. I attempted cutting down to only using the nece

How to do if else condition on Pyspark columns with regex?

I have a pyspark dataframe event_name 0 a-markets-l1 1 a-markets-watch 2 a-markets-buy 3 a-markets-z2 4 scroll_down This dataframe has event_name column EXCL

how to fill missing in one column based on another in r

I have a subset of data frame as below. I want to fill the NAs in column "age at disease" so that the age of one individual with disease be same as the sibling

Assigning each excel sheet to a variable while looping (using openpyxl) and create dataframe of each sheets

I have an excel document with multiple sheets containing different data sets. For instance, first sheet has 2 column data where as the second sheet (sheet 2) ha

Python pandas - series to dataframe

. How do I print out only the country names that exist in the dataframe among series with country names as index?

convert month of dates into sequence

i want to combine months from years into sequence, for example, i have dataframe like this: stuff_id date 1 2015-02-03 2 2015-03-03 3

(Pandas, Python) Selecting indices of a parent DF based on shared column values with a child DF

(I recently asked this question on r/learnpython (here), but didn't get any feedback, so am re-posting it verbatim here. Hope that is okay!) Suppose I have a D

What could be wrong with a Pandas DataFrame?

I couldn't make head or tail of this: I have a function that reads a bunch of csv files from a S3 bucket, concats them and returns the DataFrame: def create_df(

My x-axis is messed up for huge datasets?

I am trying to plot a countplot using Seaborn library. The data-set is a huge dataset with lots of data of more than 100,000 entries and 67 columns. I have trie

Summing row values after a groupby but based on a dictionary condition?

I am trying to figure out how to add row entries of the numeric columns(supply,demand) . I am at a complete loss. My initial thoughts are to do this with a dic

Sum of list values in a df, new column, values are objects

I have a df made of values from a dictionary. I can get rid of [], ',' and split it all in different cols (one col per number). But can't make the transfer to f

Create several new variables using a vector of names and a vector for computation within dplyr::mutate

I'd like to create several new columns. They should take their names from one vector and they should be computed by taking one column in the data and dividing i

make a mean of several year dataframes, hour by hour

I have several dataframes of some value taken very hour, on several year, like this : df1 Out[6]: time P G(i) H_sun T2m WS10m Int

How to do a new data frame of the latest value reported in each column?

I've got a table like this: country continent date n_case Ex TD TC -----------------------------------------------------