AGI University
  • The AGI Landscape
  • 我们的愿景 Our vision
  • Papers
  • Rationality and intelligence
  • AI safety gridworlds
  • Modeling Friends and Foes
  • Forget-me-not-Process
  • Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study
  • Universal Transformers
  • Graph Convolutional Policy Network
  • Thermodynamics as a theory of decision-making with informationprocessing costs
  • Concrete Problems in AI Safety
  • A course in game theory
  • Theory of games and economic behavior
  • Reinforcement learning: An introduction 1e
  • Regret analysis of stochastic and nonstochastic multi-armed bandit problems
  • The nonstochastic multiarmed bandit problem
  • Information theory of decisions and actions
  • Clustering with bregman divergences
  • Quantal Response Equilibria for Normal Form Games
  • The numerics of gans
  • The Mechanics of n-Player Differentiable Games
  • Reactive bandits with attitude
  • Data clustering by markovian relaxation and the information bottleneck method
  • Information bottleneck for Gaussian variables
  • Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimal
  • Risk sensitive path integral control
  • Information, utility and bounded rationality
  • Hysteresis effects of changing the parameters of noncooperative games
  • The best of both worlds: stochastic and adversarial bandits
  • One practical algorithm for both stochastic and adversarial bandits
  • An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
  • Friend-or-Foe Q-Learning in General-Sum Games
  • New criteria and a new algorithm for learning in multi-agent systems
  • Correlated Q-Learning
  • Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning
  • Learning against sequential opponents in repeated stochastic games
  • On the likelihood that one unknown probability exceeds another in view of the evidence of two sample
  • An empirical evaluation of Thompson Sampling
  • What game are we playing? end-to-end learning in normal and extensive form games
  • Intriguing properties of neural networks
    • Untitled
  • Explaining and harnessing adversarial examples
  • go-explore
  • The Landscape of Deep Reinforcement Learning
  • 用因果影响图建模通用人工智能安全框架
  • Papers
    • test
    • Measuring and avoiding side effects using relative reachability
Powered by GitBook
On this page

Information, utility and bounded rationality

[20] P. A. Ortega and D. A. Braun. Information, utility and bounded rationality. In International Conference on Artificial General Intelligence. Springer Berlin Heidelberg, 2011. [21] D. H. Wolpert, M. Harré, E. Olbrich, N. Bertschinger, and J. Jost. Hysteresis effects of changing the parameters of noncooperative games. Physical Review E, 85, 2012. [22] S. Bubeck and A. Slivkins. The best of both worlds: stochastic and adversarial bandits. In In Proceedings ofthe International Conference on Computational Learning Theory (COLT), 2012. [23] Y. Seldin and A. Silvkins. One practical algorithm for both stochastic and adversarial bandits. In 31 st International Conference on Machine Learning, 2014. [24] P. Auer and C. Chao-Kai. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In 29th Annual Conference on Learning Theory, 2016. [25] M. L. Littman. Friend-or-Foe Q-Learning in General-Sum Games. In Proceedings of the International Conference on Machine Learning (ICML), 2001. [26] R. Powers and Y. Shoham. New criteria and a new algorithm for learning in multi-agent systems. In Advances in neural information processing systems, pages 1089–1096, 2005. [27] A. Greenwald and K. Hall. Correlated Q-Learning. In Proceedings of the 22nd Conference on Artificial Intelligence, pages 242–249, 2003. [28] J. W. Crandall and M. A. Goodrich. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Machine Learning, 82(3):281–314, 2011. [29] P. Hernandez-Leal and M. Kaisers. Learning against sequential opponents in repeated stochastic games. In The 3rd Multi-disciplinary Conference on Reinforcement Learning and Decision Making, Ann Arbor, 2017. [30] W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285–294, 1933. [31] O. Chappelle and L. Li. An empirical evaluation of Thompson Sampling. In Advances in neural information processing systems, 2011. [32] C. K. Ling, F. Fang, and J. Z. Kolter. What game are we playing? end-to-end learning in normal and extensive form games. arXiv preprint arXiv:1805.02777, 2018. [33] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. [34] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.

PreviousRisk sensitive path integral controlNextHysteresis effects of changing the parameters of noncooperative games

Last updated 6 years ago