'How do engineered features help when they are not present in the test data
I am trying to classify between drones and birds using machine learning. I have got a big number of samples of feature vectors from a radar which generally consists of position(x,y,z), velocity(vx,vy,vz), acceleration(ax,ay,az), Noise, SNR etc plus some more features. Actual classes are known for the samples. However, These basic features are not able to distinguish between drones and birds for new(out of bag) samples. so I am going to try feature engineering to generate new features like standard deviation of speed calculated using mean-speed and then uses the difference between mean-speed and speeds obtained from individual samples(of the same track) to calculate standard deviation by averaging out the differences . Similarly, I generate new features using some other formula by using sum or difference or deviation from average(of different samples from same track) etc.
After obtaining these features we will use the same to create a trained model which will be used for classification.
However, I can apply all these feature engineering on the training dataset whereas the same engineered features will not be present in the test dataset obtained in the operational scenario where we get one sample after another. Also in operational scenario we do not know how many samples we will be getting for a track. So, how can these engineered features be obtained so as to create a test feature vector with the same in actual operational scenario.
If these cannot be obtained while testing ,then how will the same engineered features (used for model training) be able to solve the classification problem when we do not have these in the test data?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
