'Set a time varying variable to its initial value in a dataframe

I have this dataframe, which I have data for US's states population for different time periods, I want to set the population of every state to its initial value since i need this variable to be time-independent.

This is my dataset:

state <- c(Alabama, Alabama, Alabama, Arkansas, Arkansas, Arkansas, Arkansas)
year<- c(1990, 1991, 1992, 2002, 2003, 2005, 2011)
population <- c(10000, 11000, 12000, 23000, 24000, 25000, 30000)
df <- data.frame( state, year, population)

I want to obtain this (only changes in "population"):

state <- c(Alabama, Alabama, Alabama, Arkansas, Arkansas, Arkansas, Arkansas)
year<- c(1990, 1991, 1992, 2002, 2003, 2005, 2011)
population <- c(10000, 10000, 10000, 23000, 23000, 23000, 23000)
df <- data.frame( state, year, population)

This is just a small fraction of my full dataset, so I need a code to not change constantly the state's name.

Thanks!

r


Solution 1:[1]

We can get the first value for each state and set that as the population using first.

library(dplyr)

df %>%
  group_by(state) %>%
  mutate(population = first(population))

Output

  state     year population
  <chr>    <dbl>      <dbl>
1 Alabama   1990      10000
2 Alabama   1991      10000
3 Alabama   1992      10000
4 Arkansas  2002      23000
5 Arkansas  2003      23000
6 Arkansas  2005      23000
7 Arkansas  2011      23000

Or it could be written as:

df %>%
  group_by(state) %>%
  mutate(population = population[1])

Or with data.table:

library(data.table)
dt <- as.data.table(df)

dt[, population := population[1], by = state]

Or with base R:

df[, "population"] <-
  sapply(df["population"], function(z)
    ave(z, df$state, FUN = function(y) y[1]))

Data

df <- structure(list(state = c("Alabama", "Alabama", "Alabama", "Arkansas", 
"Arkansas", "Arkansas", "Arkansas"), year = c(1990, 1991, 1992, 
2002, 2003, 2005, 2011), population = c(10000, 11000, 12000, 
23000, 24000, 25000, 30000)), class = "data.frame", row.names = c(NA, 
-7L))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1