AGI University
  • The AGI Landscape
  • 我们的愿景 Our vision
  • Papers
  • Rationality and intelligence
  • AI safety gridworlds
  • Modeling Friends and Foes
  • Forget-me-not-Process
  • Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study
  • Universal Transformers
  • Graph Convolutional Policy Network
  • Thermodynamics as a theory of decision-making with informationprocessing costs
  • Concrete Problems in AI Safety
  • A course in game theory
  • Theory of games and economic behavior
  • Reinforcement learning: An introduction 1e
  • Regret analysis of stochastic and nonstochastic multi-armed bandit problems
  • The nonstochastic multiarmed bandit problem
  • Information theory of decisions and actions
  • Clustering with bregman divergences
  • Quantal Response Equilibria for Normal Form Games
  • The numerics of gans
  • The Mechanics of n-Player Differentiable Games
  • Reactive bandits with attitude
  • Data clustering by markovian relaxation and the information bottleneck method
  • Information bottleneck for Gaussian variables
  • Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimal
  • Risk sensitive path integral control
  • Information, utility and bounded rationality
  • Hysteresis effects of changing the parameters of noncooperative games
  • The best of both worlds: stochastic and adversarial bandits
  • One practical algorithm for both stochastic and adversarial bandits
  • An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
  • Friend-or-Foe Q-Learning in General-Sum Games
  • New criteria and a new algorithm for learning in multi-agent systems
  • Correlated Q-Learning
  • Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning
  • Learning against sequential opponents in repeated stochastic games
  • On the likelihood that one unknown probability exceeds another in view of the evidence of two sample
  • An empirical evaluation of Thompson Sampling
  • What game are we playing? end-to-end learning in normal and extensive form games
  • Intriguing properties of neural networks
    • Untitled
  • Explaining and harnessing adversarial examples
  • go-explore
  • The Landscape of Deep Reinforcement Learning
  • 用因果影响图建模通用人工智能安全框架
  • Papers
    • test
    • Measuring and avoiding side effects using relative reachability
Powered by GitBook
On this page

Graph Convolutional Policy Network

Generating novel graph structures that optimize given objectives while obeying some given underlying rules is fundamental for chemistry, biology and social science research.

This is especially important in the task of molecular graph generation, whose goal is to discover novel molecules with desired properties such as drug-likeness and synthetic accessibility, while obeying physical laws such as chemical valency.

However, designing models to find molecules that optimize desired properties while incorporating highly complex and non-differentiable rules remains to be a challenging task.

Here we propose Graph Convolutional Policy Network (GCPN), a general graph convolutional network based model for goaldirected graph generation through reinforcement learning.

The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules.

Experimental results show that GCPN can achieve 61% improvement on chemical property optimization over state-of-the-art baselines while resembling known molecules, and achieve 184% improvement on the constrained property optimization task.

PreviousUniversal TransformersNextThermodynamics as a theory of decision-making with informationprocessing costs

Last updated 6 years ago