'how can I remove some NA rows but not all of them

I have multiple data frames with information about listed companies from the year 2000 So I want to put them in a list (lets call it df) because I want to do regression on them. But companies that got listed in 2005 for example will have NA values on the rows before 2005 and I want to remove the rows before the company was listed for each data frame (And the number of NA rows varies in each data frame).

I only know of lapply(df, na.omit). but the problem with this is that, since there are some missing values from the data, e.g. where a company did not record some variable, so there is NA for that SINGLE value even after 2005 and I want to replace it with zero and not remove the whole row.

How can I remove the first rows with NA values but replace the ones within the data with zeros using R?



Solution 1:[1]

Assuming Company is the company name column, date is the date column and value is the desired column of your operation try either one:

If you do have company wise starting dates in a dataframe say joinig_df then it is quite easy:

df$start_dates <- merge(df, joinig_df, by="company")
df <- df[df$date>=df$start_dates,]
df$value[is.na(df$value)] <- 0

If you dont have joining dates in a separate df as above then try following:

df$value[is.na(df$value)] < -0
df <- df[order(df$dompany, df$date),] # Ensure data is sorted over company and by dates
df$val_csum <- ave(df$value, df$id, FUN=cumsum) # Do a cumulative sum of values
df <- df[df$val_csum>0, ]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Muhammad Rasel