AGI University
  • The AGI Landscape
  • 我们的愿景 Our vision
  • Papers
  • Rationality and intelligence
  • AI safety gridworlds
  • Modeling Friends and Foes
  • Forget-me-not-Process
  • Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study
  • Universal Transformers
  • Graph Convolutional Policy Network
  • Thermodynamics as a theory of decision-making with informationprocessing costs
  • Concrete Problems in AI Safety
  • A course in game theory
  • Theory of games and economic behavior
  • Reinforcement learning: An introduction 1e
  • Regret analysis of stochastic and nonstochastic multi-armed bandit problems
  • The nonstochastic multiarmed bandit problem
  • Information theory of decisions and actions
  • Clustering with bregman divergences
  • Quantal Response Equilibria for Normal Form Games
  • The numerics of gans
  • The Mechanics of n-Player Differentiable Games
  • Reactive bandits with attitude
  • Data clustering by markovian relaxation and the information bottleneck method
  • Information bottleneck for Gaussian variables
  • Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimal
  • Risk sensitive path integral control
  • Information, utility and bounded rationality
  • Hysteresis effects of changing the parameters of noncooperative games
  • The best of both worlds: stochastic and adversarial bandits
  • One practical algorithm for both stochastic and adversarial bandits
  • An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
  • Friend-or-Foe Q-Learning in General-Sum Games
  • New criteria and a new algorithm for learning in multi-agent systems
  • Correlated Q-Learning
  • Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning
  • Learning against sequential opponents in repeated stochastic games
  • On the likelihood that one unknown probability exceeds another in view of the evidence of two sample
  • An empirical evaluation of Thompson Sampling
  • What game are we playing? end-to-end learning in normal and extensive form games
  • Intriguing properties of neural networks
    • Untitled
  • Explaining and harnessing adversarial examples
  • go-explore
  • The Landscape of Deep Reinforcement Learning
  • 用因果影响图建模通用人工智能安全框架
  • Papers
    • test
    • Measuring and avoiding side effects using relative reachability
Powered by GitBook
On this page

Universal Transformers

PreviousCognitive Psychology for Deep Neural Networks: A Shape Bias Case StudyNextGraph Convolutional Policy Network

Last updated 6 years ago

Despite these successes, however, feed-forward sequence models like the Transformer fail to generalize in many tasks that recurrent models handle with ease (e.g. copying when the string lengths exceed those observed at training time). Moreover, and in contrast to RNNs, the Transformer model is not computationally universal, limiting its theoretical expressivity. In this paper we propose the Universal Transformer which addresses these practical and theoretical shortcomings and we show that it leads to improved performance on several tasks.

  1. Instead of recurring over the individual symb ols of sequences like RNNs, the Universal Transformer repeatedly revises its representations of all symbols in the sequence with each recurrent step.

  2. In order to combine information from different parts of a sequence, it employs a self-attention mechanism in every recurrent step.

  3. Assuming sufficient memory, its recurrence makes the Universal Transformer computationally universal.

  4. We further employ an adaptive computation time (ACT) mechanism to allow the model to dynamically adjust the number of times the representation of each position in a sequence is revised.

    1. Beyond saving computation, we show that ACT can improve the accuracy of the model.

Our experiments show that on various algorithmic tasks and a diverse set of large-scale language understanding tasks the Universal Transformer generalizes significantly better and outperforms both a vanilla Transformer and an LSTM in machine translation, and achieves a new state of the art on the bAbI linguistic reasoning task and the challenging LAMBADA language modeling task.