'How to make Matlab fillmissing function impute only a certain number of missing values between known values?

Let's consider this code only for exemplification purpose:

A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
dates = datetime({'2010','2011','2012','2013','2014'},'InputFormat','yyyy')';
TT = array2timetable(A,'RowTimes',dates);

The resulting timetable is: starting timetable with undesired missing values

I would like to use the matlab function fillmissing to impute missing data according to the following rules:

  • missing data at the beginning of the time series should not be imputed
  • missing data at the end of the time series should not be imputed
  • missing data within known values should be imputed only if the number of missing values between known values is strictly minor than 2

The resulting timetable should be: the final table with desired missing data imputation

Notice that only the 4th row in the column A2 has been imputed here. Can I do that with fillmissing? Otherwise how can I do that?



Solution 1:[1]

You can find the first and last non-NaN values using find. Based on these indicies, you can conditionally fill missing data if there are fewer than 2 missing values. For some vector v:

idxNaN = isnan( v ); % Get indicies of values which are NaN
idxDataStart = find( ~idxNaN, 1, 'first' ); % First NaN index
idxDataEnd =   find( ~idxNaN, 1, 'last' );  % Last NaN index
idxData = idxDataStart:idxDataEnd;          % Indices of valid data
numValsMissing = nnz( idxNaN(idxData) );    % Number of NaNs in valid data
if numValsMissing < 2 % Check for max number of NaNs
    v(idxData) = fillmissing(v(idxData));   % Fill missing on this data
end

For your array A you can loop over the columns and apply the above, where each column is a vector v.

A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];

for ii = 1:size(A,2)
    v = A(:,ii);
    idxNaN = isnan( v );
    idxDataStart = find( ~idxNaN, 1, 'first' );
    idxDataEnd =   find( ~idxNaN, 1, 'last' );
    idxData = idxDataStart:idxDataEnd;
    numValsMissing = nnz( idxNaN(idxData) );
    if numValsMissing < 2
        v(idxData) = fillmissing(v(idxData),'linear');
    end
    A(:,ii) = v;
end

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wolfie