'Vowpal Wabbit: Warm-starting a contextual bandit
I know based on this paper (https://arxiv.org/pdf/1901.00301.pdf) that contextual bandits can be implemented with a warm-start method. However, after reviewing VowpalWabbit documentation (https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Warm-starting-contextual-bandits) and other stack overflow questions (e.g., Vowpal Wabbit: question on training contextual bandit on historical data), it is not very clear the most appropriate way of doing this. There are several other questions online where people are confused of the correct way of doing this (e.g., How to do warm start and use vowpal wabbit for contextual bandits).
For my example, I am using a contextual bandit (cb_explore_adf) with 8 actions which represent price margins and 6 features. I have historical data where price margins were based on a deterministic heuristic and probabilities are set to 1. Cost was based on profits.
However, what is not clear to me is arguments for warm-starting (i.e., --warm_start X, --interaction X, --warm_start_update, and --interaction_update). Should the value for --warm_start be the number of observations of warm-start examples? For example, if I have a data frame with a length of 50000 observations, should the argument be --warm_start 50000? How do you determine the interaction value? I have seen two examples where they used --interaction 1000. Lastly, in what cases should you use –warm_start_update and --interaction_update?
I trained a contextual bandit in Vowpal Wabbit (Python) with a warm-start using these arguments: ‘--cb_explore_adf -q :: --epsilon 0.10 --bag 5 --power_t 0 --warm_start 21873 --interaction 1000 --warm_start_update --interaction_update --warm_cb_cs --save_resume’. I then loaded this model and ran a simulation and compared it to a bandit without a warm-start and found better results when warm-starting. Were these arguments used correctly? I know this is a lot of questions, but I would like to see more clarification around how to warm-start a contextual bandit in Vowpal Wabbit.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|