'How to match fit_tranform of the imputed dataset with the original dataset while handling missing data in an ML model?

When trying to fill up missing values using KNNImputer algorithm using the following line of code:

pd.DataFrame(knn_imputer.fit_transform(data),
                        index=data.index,
                        columns=data.columns)

I am receiving error message:

Traceback (most recent call last):
  File "c:\Users\myname\Desktop\Project\PythonTool\calculator\database-analyzer\database_analyzer.py", line 384, in <module>
    main()
  File "c:\Users\myname\Desktop\Project\PythonTool\calculator\database-analyzer\database_analyzer.py", line 232, in main
    train_data_engineered = missingvalue_handler(train_data_engineered)
  File "c:\Users\myname\Desktop\Project\PythonTool\calculator\database-analyzer\utilities_module.py", line 1268, in missingvalue_handler
    return pd.DataFrame(knn_imputer.fit_transform(new_data),
  File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pandas\core\frame.py", line 695, in __init__
    mgr = ndarray_to_mgr(
  File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pandas\core\internals\construction.py", line 351, in ndarray_to_mgr    
    _check_values_indices_shape_match(values, index, columns)
  File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\pandas\core\internals\construction.py", line 422, in _check_values_indices_shape_match
    raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (196, 1032), indices imply (196, 1033)

I know there reason for this is that imputer actually imputes one column completely bringing them down from 1033 to 1032. How can I fix the issue while not knowing which column has been removed?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source