research-article Free Access
- Authors:
- Nicholas Rhinehart UC Berkeley
UC Berkeley
Search about this author
- Jenny Wang UC Berkeley
UC Berkeley
Search about this author
- Glen Berseth UC Berkeley
UC Berkeley
Search about this author
- John D. Co-Reyes UC Berkeley
UC Berkeley
Search about this author
- Danijar Hafner University of Toronto, Google Research, Brain Team
University of Toronto, Google Research, Brain Team
Search about this author
- Chelsea Finn Stanford University
Stanford University
Search about this author
- Sergey Levine UC Berkeley
UC Berkeley
Search about this author
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing SystemsDecember 2021Article No.: 822Pages 10745–10758
Published:10 June 2024Publication History
- 0citation
- 0
- Downloads
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems
Information is power: intrinsic control via information capture
Pages 10745–10758
PreviousChapterNextChapter
ABSTRACT
Humans and animals explore their environment and acquire useful skills even in the absence of clear goals, exhibiting intrinsic motivation. The study of intrinsic motivation in artificial agents is concerned with the following question: what is a good general-purpose objective for an agent? We study this question in dynamic partially-observed environments, and argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.
References
- J. Achiam and S. Sastry. Surprise-based intrinsic motivation for deep reinforcement learning. arXiv preprint arXiv:1703.01732, 2017.Google Scholar
- E. H. Aoki, A. Bagchi, P. Mandal, and Y. Boers. A theoretical look at information-driven sensor management criteria. In 14th International Conference on Information Fusion, pages 1–8. IEEE, 2011.Google Scholar
- A. G. Barto, S. Singh, and N. Chentanez. Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of the 3rd International Conference on Development and Learning, pages 112–19. Piscataway, NJ, 2004.Google Scholar
- M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos. Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems, pages 1471–1479, 2016.Google Scholar
- C. H. Bennett. The thermodynamics of computation—a review. International Journal of Theoretical Physics, 21(12):905–940, 1982.Google ScholarCross Ref
- G. Berseth, D. Geng, C. M. Devin, N. Rhinehart, C. Finn, D. Jayaraman, and S. Levine. {SM}irl: Surprise minimizing reinforcement learning in unstable environments. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=cPZOyoDloxl.Google Scholar
- Y. Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell, and A. A. Efros. Large-Scale Study of Curiosity-Driven Learning. 2018. URL http://arxiv.org/abs/1808.04355.Google Scholar
- N. Chentanez, A. G. Barto, and S. P. Singh. Intrinsically motivated reinforcement learning. In Advances in neural information processing systems, pages 1281–1288, 2005.Google Scholar
- M. Chevalier-Boisvert. gym-miniworld environment for openai gym. https://github.com/maximecb/gym-miniworld, 2018.Google Scholar
- N. Das, M. Karl, P. Becker-Ehmck, and P. van der Smagt. Beta dvbf: Learning state-space models for control from high dimensional observations. arXiv preprint arXiv:1911.00756, 2019.Google Scholar
- B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.Google Scholar
- H. J. S. Feder, J. J. Leonard, and C. M. Smith. Adaptive mobile robot navigation and mapping. The International Journal of Robotics Research, 18(7):650–668, 1999.Google ScholarCross Ref
- C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 512–519. IEEE, 2016.Google Scholar
- Z. Fountas, N. Sajid, P. A. Mediano, and K. Friston. Deep active inference agents using monte-carlo methods. arXiv preprint arXiv:2006.04176, 2020.Google Scholar
- K. Friston. The free-energy principle: a rough guide to the brain? Trends in cognitive sciences, 13(7): 293–301, 2009.Google Scholar
- K. Friston. Life as we know it. Journal of the Royal Society Interface, 10(86):20130475, 2013.Google ScholarCross Ref
- K. J. Friston, J. Daunizeau, and S. J. Kiebel. Reinforcement learning or active inference? PLOS ONE, 4 (7):1–13, 07 2009. doi: 10.1371/journal.pone.0006421. .Google ScholarCross Ref
- K. J. Friston, J. Daunizeau, J. Kilner, and S. J. Kiebel. Action and behavior: a free-energy formulation. Biological cybernetics, 102(3):227–260, 2010.Google ScholarDigital Library
- J. Fu, J. Co-Reyes, and S. Levine. Ex2: Exploration with exemplar models for deep reinforcement learning. In Advances in neural information processing systems, pages 2577–2587, 2017.Google Scholar
- M. Gheshlaghi Azar, B. Piot, B. Avila Pires, J.-B. Grill, F. Altché, and R. Munos. World discovery models. arXiv e-prints, pages arXiv–1902, 2019.Google Scholar
- K. Gregor, D. J. Rezende, and D. Wierstra. Variational intrinsic control. arXiv preprint arXiv:1611.07507, 2016.Google Scholar
- D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson. Learning latent dynamics for planning from pixels. arXiv preprint arXiv:1811.04551, 2018.Google Scholar
- D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.Google Scholar
- D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.Google Scholar
- E. Hazan, S. Kakade, K. Singh, and A. Van Soest. Provably efficient maximum entropy exploration. In International Conference on Machine Learning, pages 2681–2691. PMLR, 2019.Google Scholar
- R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel. VIME: Variational Information Maximizing Exploration. 2016. URL http://arxiv.org/abs/1605.09674.Google Scholar
- P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.Google Scholar
- M. Karl, J. Bayer, and P. van der Smagt. Efficient empowerment. arXiv preprint arXiv:1509.08455, 2015.Google Scholar
- M. Karl, M. Soelch, J. Bayer, and P. van der Smagt. Deep variational bayes filters: Unsupervised learning of state space models from raw data. arXiv preprint arXiv:1605.06432, 2016.Google Scholar
- M. Karl, M. Soelch, P. Becker-Ehmck, D. Benbouzid, P. van der Smagt, and J. Bayer. Unsupervised real-time control through variational empowerment. arXiv preprint arXiv:1710.05101, 2017.Google Scholar
- M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaśkowski. Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1–8. IEEE, 2016.Google Scholar
- H. Kim, J. Kim, Y. Jeong, S. Levine, and H. O. Song. Emi: Exploration with mutual information. arXiv preprint arXiv:1810.01176, 2018.Google Scholar
- Y. Kim, W. Nam, H. Kim, J.-H. Kim, and G. Kim. Curiosity-bottleneck: Exploration by distilling task-specific novelty. In International Conference on Machine Learning, pages 3379–3388, 2019.Google Scholar
- A. S. Klyubin, D. Polani, and C. L. Nehaniv. All else being equal be empowered. In M. S. Capcarrère, A. A. Freitas, P. J. Bentley, C. G. Johnson, and J. Timmis, editors, Advances in Artificial Life, pages 744–753, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. ISBN 978-3-540-31816-3.Google ScholarDigital Library
- A. S. Klyubin, D. Polani, and C. L. Nehaniv. Empowerment: A universal agent-centric measure of control. In 2005 IEEE Congress on Evolutionary Computation, volume 1, pages 128–135. IEEE, 2005.Google Scholar
- G. Konidaris and A. G. Barto. Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in neural information processing systems, pages 1015–1023, 2009.Google Scholar
- C. Kreucher, K. Kastella, and A. O. Hero Iii. Sensor management using an active sensing approach. Signal Processing, 85(3):607–624, 2005.Google ScholarDigital Library
- R. G. Krishnan, U. Shalit, and D. Sontag. Deep kalman filters. arXiv preprint arXiv:1511.05121, 2015.Google Scholar
- S. Lange and M. Riedmiller. Deep auto-encoder neural networks in reinforcement learning. In The 2010 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2010.Google Scholar
- A. X. Lee, A. Nagabandi, P. Abbeel, and S. Levine. Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. arXiv preprint arXiv:1907.00953, 2019.Google Scholar
- H. S. Leff and A. F. Rex. Maxwell's demon: entropy, information, computing. Princeton University Press, 2014.Google Scholar
- J. Lehman and K. O. Stanley. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19(2):189–223, 2011.Google ScholarDigital Library
- M. Lopes, T. Lang, M. Toussaint, and P.-Y. Oudeyer. Exploration in model-based reinforcement learning by empirically estimating learning progress. In Advances in neural information processing systems, pages 206–214, 2012.Google Scholar
- C. J. Maddison, D. Lawson, G. Tucker, N. Heess, M. Norouzi, A. Mnih, A. Doucet, and Y. W. Teh. Filtering variational objectives. arXiv preprint arXiv:1705.09279, 2017.Google Scholar
- M. Magnasco. Szilard's heat engine. EPL (Europhysics Letters), 33(8):583, 1996.Google Scholar
- J. C. Maxwell and P. Pesic. Theory of heat. Courier Corporation, 2001.Google Scholar
- A. Mirchev, B. Kayalibay, M. Soelch, P. van der Smagt, and J. Bayer. Approximate bayesian inference in spatial environments. arXiv preprint arXiv:1805.07206, 2018.Google Scholar
- A. Mirchev, B. Kayalibay, P. van der Smagt, and J. Bayer. Variational state-space models for localisation and dense 3d mapping in 6 dof. arXiv preprint arXiv:2006.10178, 2020.Google Scholar
- S. Mohamed and D. Jimenez Rezende. Variational information maximisation for intrinsically motivated reinforcement learning. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2125–2133. Curran Associates, Inc., 2015.Google Scholar
- S. Mohamed and D. J. Rezende. Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in neural information processing systems, pages 2125–2133, 2015.Google Scholar
- A. V. Nair, V. Pong, M. Dalal, S. Bahl, S. Lin, and S. Levine. Visual reinforcement learning with imagined goals. In Advances in Neural Information Processing Systems, pages 9191–9200, 2018.Google Scholar
- P.-Y. Oudeyer and F. Kaplan. What is intrinsic motivation? a typology of computational approaches. Frontiers in neurorobotics, 1:6, 2009.Google Scholar
- P.-Y. Oudeyer, F. Kaplan, and V. V. Hafner. Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2):265–286, 2007.Google ScholarDigital Library
- T. Parr and K. J. Friston. Generalised free energy and active inference. Biological Cybernetics, 113(5): 495–513, Dec 2019. ISSN 1432-0770. doi: 10.1007/s00422-019-00805-w. .Google ScholarCross Ref
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703, 2019.Google Scholar
- D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven Exploration by Self-supervised Prediction. 2017.Google ScholarCross Ref
- D. Pathak, D. Gandhi, and A. Gupta. Self-Supervised Exploration via Disagreement. 2019.Google Scholar
- R. Rafailov, T. Yu, A. Rajeswaran, and C. Finn. Offline reinforcement learning from images with latent space models. Learning for Decision Making and Control (L4DC), 2021.Google Scholar
- J. Schmidhuber. Curious model-building control systems. In Proc. international joint conference on neural networks, pages 1458–1463, 1991.Google ScholarCross Ref
- J. Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3):230–247, 2010.Google ScholarDigital Library
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.Google Scholar
- R. Sekar, O. Rybkin, K. Daniilidis, P. Abbeel, D. Hafner, and D. Pathak. Planning to explore via selfsupervised world models. In International Conference on Machine Learning, pages 8583–8592. PMLR, 2020.Google Scholar
- A. Sharma, S. Gu, S. Levine, V. Kumar, and K. Hausman. Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657, 2019.Google Scholar
- P. Shyam, W. Jaśkowski, and F. Gomez. Model-based active exploration. arXiv preprint arXiv:1810.12162, 2018.Google Scholar
- B. C. Stadie, S. Levine, and P. Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015.Google Scholar
- S. Still and D. Precup. An information-theoretic approach to curiosity-driven reinforcement learning. Theory in Biosciences, 131(3):139–148, 2012.Google ScholarCross Ref
- Y. Sun, F. Gomez, and J. Schmidhuber. Planning to be surprised: Optimal bayesian exploration in dynamic environments. In International Conference on Artificial General Intelligence, pages 41–51. Springer, 2011.Google Scholar
- L. Szilard. Über die entropieverminderung in einem thermodynamischen system bei eingriffen intelligenter wesen. Zeitschrift für Physik, 53(11):840–856, 1929.Google ScholarCross Ref
- H. Tang, R. Houthooft, D. Foote, A. Stooke, O. X. Chen, Y. Duan, J. Schulman, F. DeTurck, and P. Abbeel. # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in neural information processing systems, pages 2753–2762, 2017.Google Scholar
- K. Ueltzhöffer. Deep active inference. Biological Cybernetics, 112(6):547–573, 2018.Google ScholarDigital Library
- G. Vezzani, A. Gupta, L. Natale, and P. Abbeel. Learning latent state representation for speeding up exploration. arXiv preprint arXiv:1905.12621, 2019.Google Scholar
- M. Watter, J. T. Springenberg, J. Boedecker, and M. Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. arXiv preprint arXiv:1506.07365, 2015.Google Scholar
- G. Wayne, C.-C. Hung, D. Amos, M. Mirza, A. Ahuja, A. Grabska-Barwinska, J. Rae, P. Mirowski, J. Z. Leibo, A. Santoro, et al. Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv:1803.10760, 2018.Google Scholar
- J. L. Williams. Information Theoretic Sensor Management. PhD thesis, Massachusetts Institute of Technology, 2007.Google ScholarDigital Library
- K. Xu, S. Verma, C. Finn, and S. Levine. Continual learning of control primitives: Skill discovery via reset-games. arXiv preprint arXiv:2011.05286, 2020.Google Scholar
- M. Zhang, S. Vikram, L. Smith, P. Abbeel, M. J. Johnson, and S. Levine. Solar: Deep structured latent representations for model-based reinforcement learning. arXiv preprint arXiv:1808.09105, 2018.Google Scholar
- R. Zhao, K. Lu, P. Abbeel, and S. Tiomkin. Efficient empowerment estimation for unsupervised stabilization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=u2YNJPcQlwq.Google Scholar
Cited By
View all
Recommendations
- Extracting information from the power spectrum of voltage noise
We outline an approximation for obtaining an analytic expression of the power spectral density (PSD) of the membrane potential (V"m) in neurons subject to synaptic noise. In high-conductance states, there is a remarkable agreement between this ...
Read More
- Glial cells for information routing?
We investigate a possible functional role of glial cells as information routing devices of the cerebral cortex. On the one hand, functionally motivated models of neural information processing were lately suggested which rely on short-term changes of ...
Read More
- Self-enforcing coalitions with power accumulation
Abstract
Agents endowed with power compete for a divisible resource by forming coalitions with other agents. The coalition with the greatest power wins the resource and divides it among its members. The agents’ power increases according to their share of ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
- Information
- Contributors
Published in
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems
December 2021
30517 pages
ISBN:9781713845393
- Editors:
- M. Ranzato,
- A. Beygelzimer,
- Y. Dauphin,
- P.S. Liang,
- J. Wortman Vaughan
Copyright © 2021 Neural Information Processing Systems Foundation, Inc.
Sponsors
In-Cooperation
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
- Published: 10 June 2024
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations0
Article Metrics
- View Citations
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Table of Contents
Export Citations
Your Search Results Download Request
We are preparing your search results for download ...
We will inform you here when the file is ready.
Download now!
Your Search Results Download Request
Your file of search results citations is now ready.
Download now!
Your Search Results Download Request
Your search export query has expired. Please try again.