[20] P. A. Ortega and D. A. Braun. Information, utility and bounded rationality. In International Conference on Artificial General Intelligence. Springer Berlin Heidelberg, 2011. [21] D. H. Wolpert, M. Harré, E. Olbrich, N. Bertschinger, and J. Jost. Hysteresis effects of changing the parameters of noncooperative games. Physical Review E, 85, 2012. [22] S. Bubeck and A. Slivkins. The best of both worlds: stochastic and adversarial bandits. In In Proceedings ofthe International Conference on Computational Learning Theory (COLT), 2012. [23] Y. Seldin and A. Silvkins. One practical algorithm for both stochastic and adversarial bandits. In 31 st International Conference on Machine Learning, 2014. [24] P. Auer and C. Chao-Kai. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In 29th Annual Conference on Learning Theory, 2016. [25] M. L. Littman. Friend-or-Foe Q-Learning in General-Sum Games. In Proceedings of the International Conference on Machine Learning (ICML), 2001. [26] R. Powers and Y. Shoham. New criteria and a new algorithm for learning in multi-agent systems. In Advances in neural information processing systems, pages 1089–1096, 2005. [27] A. Greenwald and K. Hall. Correlated Q-Learning. In Proceedings of the 22nd Conference on Artificial Intelligence, pages 242–249, 2003. [28] J. W. Crandall and M. A. Goodrich. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Machine Learning, 82(3):281–314, 2011. [29] P. Hernandez-Leal and M. Kaisers. Learning against sequential opponents in repeated stochastic games. In The 3rd Multi-disciplinary Conference on Reinforcement Learning and Decision Making, Ann Arbor, 2017. [30] W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285–294, 1933. [31] O. Chappelle and L. Li. An empirical evaluation of Thompson Sampling. In Advances in neural information processing systems, 2011. [32] C. K. Ling, F. Fang, and J. Z. Kolter. What game are we playing? end-to-end learning in normal and extensive form games. arXiv preprint arXiv:1805.02777, 2018. [33] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. [34] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.