An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
[24] P. Auer and C. Chao-Kai. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In 29th Annual Conference on Learning Theory, 2016.
PreviousOne practical algorithm for both stochastic and adversarial banditsNextFriend-or-Foe Q-Learning in General-Sum Games
Last updated