'Model classification problem in knowledge distillation
I was reading some articles on knowledge distillation recently. In my understanding, the teacher model and the student model mentioned in these articles finally have the same number of classifications after the fully connected layer. I have a question that if the teacher model is divided into ten categories, the student model only wants to take three or four of them. Is this possible?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
