'Loop through timeseries and fill missing data - Python

I have a DF such as the one below:

ID Year Value
1 2007 1
1 2008 1
1 2009 1
1 2011 1
1 2013 1
1 2014 1
1 2015 1
2 2008 1
2 2010 1
2 2011 1
2 2012 1
2 2013 1
2 2014 1
3 2009 1
3 2010 1
3 2011 1
3 2012 1
3 2013 1
3 2014 1
3 2015 1

As you can see, in ID '1' I am missing values for 2010 and 2012; and for ID '2' I am missing values for 2008, 2009, 2015, and ID '3' I am missing 2007, 2008. So, I would like to fill these gaps with the value '1'. What I would like to achieve is below:

ID Year Value
1 2007 1
1 2008 1
1 2009 1
1 2010 1
1 2011 1
1 2012 1
1 2013 1
1 2014 1
1 2015 1
2 2007 1
2 2008 1
2 2009 1
2 2010 1
2 2011 1
2 2012 1
2 2013 1
2 2014 1
2 2015 1
3 2007 1
3 2008 1
3 2009 1
3 2010 1
3 2011 1
3 2012 1
3 2013 1
3 2014 1
3 2015 1

I have created the below so far; however, that only fills for one ID, and i was struggling to find a way to loop through each ID adding a 'value' for each year that is missing:

idx = pd.date_range('2007', '2020', freq ='Y')
DF.index = pd.DatetimeIndex(DF.index)
DF_s = DF.reindex(idx, fill_value=0)

Any ideas would be helpful, please.



Solution 1:[1]

I'm not sure I got what you want to achieve, but if you want to fill NaNs in the "Value" column between 2007 and 2015 (suggesting that there are more years where you don't want to fill the column), you could do something like this:

import math 
df1 = pd.DataFrame({'ID': [1,1,1,2,2,2],
                    'Year': [2007,2010,2020,2007,2010,2015],
                    'Value': [1,None,None,None,1,None]})

# Write a function with your logic 
def func(x, y):
    return 0 if math.isnan(y) and 2007<=x<=2015 else y

# Apply it to the df and update the column 
df1['Value'] = df1.apply(lambda x: func(x.Year, x.Value), axis=1)

#    ID  Year  Value
# 0  1   2007  1.0
# 1  1   2010  0.0
# 2  1   2020  NaN
# 3  2   2007  0.0
# 4  2   2010  1.0
# 5  2   2015  0.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Frank Gallagher