'How to put 20b model on single gpu with deepspeed offload

Im trying to start inference GPT NEOX on single a100(40gb) gpu with deepspeed zero offload(https://www.deepspeed.ai/tutorials/zero-offload/), i know its possible to use it on training, but any solution to use it on inference?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source