Dopamine - 谷歌开源基于 TensorFlow 的强化学习框架-FinClip官网

Dopamine - 谷歌开源基于 TensorFlow 的强化学习框架

网友投稿 640 2022-10-28

Dopamine - 谷歌开源基于 TensorFlow 的强化学习框架

Dopamine

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).

Our design principles are:

Easy experimentation: Make it easy for new users to run benchmark experiments.Flexible development: Make it easy for new users to try out research ideas.Compact and reliable: Provide implementations for a few, battle-tested algorithms.Reproducible: Facilitate reproducibility in results. In particular, our setup follows the recommendations given by Machado et al. (2018).

In the spirit of these principles, this first version focuses on supporting the state-of-the-art, single-GPU Rainbow agent (Hessel et al., 2018) applied to Atari 2600 game-playing (Bellemare et al., 2013). Specifically, our Rainbow agent implements the three components identified as most important by Hessel et al.:

n-step Bellman updates (see e.g. Mnih et al., 2016)Prioritized experience replay (Schaul et al., 2015)Distributional reinforcement learning (C51; Bellemare et al., 2017)

For completeness, we also provide an implementation of DQN (Mnih et al., 2015). For additional details, please see our documentation.

We provide a set of Colaboratory notebooks which demonstrate how to use Dopamine.

This is not an official Google product.

What's new

02/09/2019: Dopamine has switched its network definitions to use tf.keras.Model. The previous tf.contrib.slim based networks are removed. If your agents inherit from dopamine agents you need to update your code. ._get_network_type() and ._network_template() functions are replaced with ._create_network() and network_type definitions are moved inside the model definition. # The following two functions are replaced with `_create_network()`.# def _get_network_type(self):# return collections.namedtuple('DQN_network', ['q_values'])# def _network_template(self, state):# return self-work(self.num_actions, self._get_network_type(), state)def _create_network(self, name): """Builds the convolutional network used to compute the agent's Q-values. Args: name: str, this name is passed to the tf.keras.Model and used to create variable scope under the hood by the tf.keras.Model. Returns: network: tf.keras.Model, the network instantiated by the Keras model. """ # `self-work` is set to `atari_lib.NatureDQNNetwork`. network = self-work(self.num_actions, name=name) return networkdef _build_networks(self): # The following two lines are replaced. # self.online_convnet = tf.make_template('Online', self._network_template) # self.target_convnet = tf.make_template('Target', self._network_template) self.online_convnet = self._create_network(name='Online') self.target_convnet = self._create_network(name='Target') If your code overwrites ._network_template(), ._get_network_type() or ._build_networks() make sure you update your code to fit with the new API. If your code overwrites ._build_networks() you need to replace tf.make_template('Online', self._network_template) with self._create_network(name='Online'). The variables of each network can be obtained from the networks as follows: vars = self.online_convnet.variables. Baselines and older checkpoints can be loaded by adding the following line to your gin file. atari_lib.maybe_transform_variable_names.legacy_checkpoint_load = True 11/06/2019: Visualization utilities added to generate videos and still images of a trained agent interacting with its environment. See an example colaboratory here.30/01/2019: Dopamine 2.0 now supports general discrete-domain gym environments.01/11/2018: Download links for each individual checkpoint, to avoid having to download all of the checkpoints.29/10/2018: Graph definitions now show up in Tensorboard.16/10/2018: Fixed a subtle bug in the IQN implementation and upated the colab tools, the JSON files, and all the downloadable data.18/09/2018: Added support for double-DQN style updates for the ImplicitQuantileAgent. Can be enabled via the double_dqn constructor parameter. 18/09/2018: Added support for reporting in-iteration losses directly from the agent to Tensorboard. Set the run_experiment.create_agent.debug_mode = True via the configuration file or using the gin_bindings flag to enable it.Control frequency of writes with the summary_writing_frequency agent constructor parameter (defaults to 500). 27/08/2018: Dopamine launched!

Instructions

Install via source

Installing from source allows you to modify the agents and experiments as you please, and is likely to be the pathway of choice for long-term use. These instructions assume that you've already set up your favourite package manager (e.g. apt on Ubuntu, homebrew on Mac OS X), and that a C++ compiler is available from the command-line (almost certainly the case if your favourite package manager works).

The instructions below assume that you will be running Dopamine in a virtual environment. A virtual environment lets you control which dependencies are installed for which program; however, this step is optional and you may choose to ignore it.

Dopamine is a Tensorflow-based framework, and we recommend you also consult the Tensorflow documentation for additional details.

Finally, these instructions are for Python 2.7. While Dopamine is Python 3 compatible, there may be some additional steps needed during installation.

First install Anaconda, which we will use as the environment manager, then proceed below.

conda create --name dopamine-env python=3.6conda activate dopamine-env

This will create a directory called dopamine-env in which your virtual environment lives. The last command activates the environment.

Install the dependencies below, based on your operating system, and then finally download the Dopamine source, e.g.

git clone https://github.com/google/dopamine.git

Ubuntu

sudo apt-get update && sudo apt-get install cmake zlib1g-devpip install absl-py atari-py gin-config gym opencv-python tensorflow==1.15

Mac OS X

brew install cmake zlibpip install absl-py atari-py gin-config gym opencv-python tensorflow==1.15

Running tests

You can test whether the installation was successful by running the following:

cd dopamineexport PYTHONPATH=${PYTHONPATH}:.python tests/dopamine/atari_init_test.py

If you want to run some of the other tests you will need to pip install mock.

Training agents

Atari games

The entry point to the standard Atari 2600 experiment is dopamine/discrete_domains/train.py. To run the basic DQN agent,

python -um dopamine.discrete_domains.train \ --base_dir=/tmp/dopamine \ --gin_files='dopamine/agents/dqn/configs/dqn.gin'

By default, this will kick off an experiment lasting 200 million frames. The command-line interface will output statistics about the latest training episode:

[...]I0824 17:13:33.078342 140196395337472 tf_logging.py:115] gamma: 0.990000I0824 17:13:33.795608 140196395337472 tf_logging.py:115] Beginning training...Steps executed: 5903 Episode length: 1203 Return: -19.

To get finer-grained information about the process, you can adjust the experiment parameters in dopamine/agents/dqn/configs/dqn.gin, in particular by reducing Runner.training_steps and Runner.evaluation_steps, which together determine the total number of steps needed to complete an iteration. This is useful if you want to inspect log files or checkpoints, which are generated at the end of each iteration.

More generally, the whole of Dopamine is easily configured using the gin configuration framework.

Non-Atari discrete environments

We provide sample configuration files for training an agent on Cartpole and Acrobot. For example, to train C51 on Cartpole with default settings, run the following command:

python -um dopamine.discrete_domains.train \ --base_dir=/tmp/dopamine \ --gin_files='dopamine/agents/rainbow/configs/c51_cartpole.gin'

You can train Rainbow on Acrobot with the following command:

python -um dopamine.discrete_domains.train \ --base_dir=/tmp/dopamine \ --gin_files='dopamine/agents/rainbow/configs/rainbow_acrobot.gin'

Install as a library

An easy, alternative way to install Dopamine is as a Python library:

# Alternatively brew install, see Mac OS X instructions above.sudo apt-get update && sudo apt-get install cmakepip install dopamine-rlpip install atari-py

Depending on your particular system configuration, you may also need to install zlib (see "Install via source" above).

Running tests

From the root directory, tests can be run with a command such as:

python -um tests.agents.rainbow.rainbow_agent_test

References

Bellemare et al., The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013.

Machado et al., Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents, Journal of Artificial Intelligence Research, 2018.

Hessel et al., Rainbow: Combining Improvements in Deep Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2018.

Mnih et al., Human-level Control through Deep Reinforcement Learning. Nature, 2015.

Mnih et al., Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the International Conference on Machine Learning, 2016.

Schaul et al., Prioritized Experience Replay. Proceedings of the International Conference on Learning Representations, 2016.

Giving credit

If you use Dopamine in your work, we ask that you cite our white paper. Here is an example BibTeX entry:

@article{castro18dopamine, author = {Pablo Samuel Castro and Subhodeep Moitra and Carles Gelada and Saurabh Kumar and Marc G. Bellemare}, title = {Dopamine: {A} {R}esearch {F}ramework for {D}eep {R}einforcement {L}earning}, year = {2018}, url = {http://arxiv.org/abs/1812.06110}, archivePrefix = {arXiv}}

标签：root

轻量级前端框架助力开发者提升项目效率与性能

640 2022-10-28

Dopamine - 谷歌开源基于 TensorFlow 的强化学习框架

react 前端框架如何驱动企业数字化转型与创新发展

轻量级前端框架助力开发者提升项目效率与性能

angular前端框架如何塑造现代企业的数字化转型之路

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计