OpenAI发布的用于评价强化学习智能体泛化技能学习的程序化生成环境

网友投稿 763 2022-11-03

Status: Maintenance (expect bug fixes and minor updates)

Procgen Benchmark

[Blog Post] [Paper]

16 simple-to-use procedurally-generated gym environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills. The environments run at high speed (thousands of steps per second) on a single core.

We're currently running a competition which uses these environments to measure sample efficiency and generalization in RL. You can learn more and register here.

These environments are associated with the paper Leveraging Procedural Generation to Benchmark Reinforcement Learning (citation). The code for running some experiments from the paper is in the train-procgen repo. For those familiar with the original CoinRun environment, be sure to read the updated CoinRun description below as there have been subtle changes to the environment.

Compared to Gym Retro, these environments are:

Faster: Gym Retro environments are already fast, but Procgen environments can run >4x faster.Randomized: Gym Retro environments are always the same, so you can memorize a sequence of actions that will get the highest reward. Procgen environments are randomized so this is not possible.Customizable: If you install from source, you can perform experiments where you change the environments, or build your own environments. The environment-specific code for each environment is often less than 300 lines. This is almost impossible with Gym Retro.

Supported platforms:

Windows 10macOS 10.14 (Mojave), 10.15 (Catalina)Linux (manylinux2010)

Supported pythons:

3.6 64-bit3.7 64-bit3.8 64-bit

Supported CPUs:

Must have at least AVX

Installation

First make sure you have a supported version of python:

# run these commands to check for the correct python versionpython -c "import sys; assert (3,6,0) <= sys.version_info <= (3,9,0), 'python is incorrect version'; print('ok')"python -c "import platform; assert platform.architecture()[0] == '64bit', 'python is not 64-bit'; print('ok')"

To install the wheel:

pip install procgen

If you get an error like "Could not find a version that satisfies the requirement procgen", please upgrade pip: pip install --upgrade pip.

To try an environment out interactively:

python -m procgen.interactive --env-name coinrun

The keys are: left/right/up/down + q, w, e, a, s, d for the different (environment-dependent) actions. Your score is displayed as "episode_return" in the lower left. At the end of an episode, you can see your final "episode_return" as well as "prev_level_complete" which will be 1 if you successfully completed the level.

To create an instance of the gym environment:

import gymenv = gym.make("procgen:procgen-coinrun-v0")

To create an instance of the gym3 (vectorized) environment:

from procgen import ProcgenGym3Envenv = ProcgenGym3Env(num=1, env_name="coinrun")

Docker

A Dockerfile is included to demonstrate a minimal Docker-based setup that works for running random agent.

docker build docker --tag procgendocker run --rm -it procgen python3 -m procgen.examples.random_agent_gym

Environments

The observation space is a box space with the RGB pixels the agent sees in a numpy array of shape (64, 64, 3). The expected step rate for a human player is 15 Hz.

The action space is Discrete(15) for which button combo to press. The button combos are defined in env.py.

If you are using the vectorized environment, the observation space is a dictionary space where the pixels are under the key "rgb".

Here are the 16 environments:

Environment Options

env_name - Name of environment, or comma-separate list of environment names to instantiate as each env in the VecEnv.num_levels=0 - The number of unique levels that can be generated. Set to 0 to use unlimited levels.start_level=0 - The lowest seed that will be used to generated levels. 'start_level' and 'num_levels' fully specify the set of possible levels.paint_vel_info=False - Paint player velocity info in the top left corner. Only supported by certain games.use_generated_assets=False - Use randomly generated assets in place of human designed assets.debug=False - Set to True to use the debug build if building from source.debug_mode=0 - A useful flag that's passed through to procgen envs. Use however you want during debugging.center_agent=True - Determines whether observations are centered on the agent or display the full level. Override at your own risk.use_sequential_levels=False - When you reach the end of a level, the episode is ended and a new level is selected. If use_sequential_levels is set to True, reaching the end of a level does not end the episode, and the seed for the new level is derived from the current level seed. If you combine this with start_level= and num_levels=1, you can have a single linear series of levels similar to a gym-retro or ALE game.distribution_mode="hard" - What variant of the levels to use, the options are "easy", "hard", "extreme", "memory", "exploration". All games support "easy" and "hard", while other options are game-specific. The default is "hard". Switching to "easy" will reduce the number of timesteps required to solve each game and is useful for testing or when working with limited compute resources.use_backgrounds=True - Normally games use human designed backgrounds, if this flag is set to False, games will use pure black backgrounds.restrict_themes=False - Some games select assets from multiple themes, if this flag is set to True, those games will only use a single theme.use_monochrome_assets=False - If set to True, games will use monochromatic rectangles instead of human designed assets. best used with restrict_themes=True.

Here's how to set the options:

import gymenv = gym.make("procgen:procgen-coinrun-v0", start_level=0, num_levels=1)

Since the gym environment is adapted from a gym3 environment, early calls to reset() are disallowed and the render() method does not do anything. To render the environment, pass render=True, which will set render_human=True to the environment and wrap it in a gym3.ViewerWrapper.

For the gym3 vectorized environment:

from procgen import ProcgenGym3Envenv = ProcgenGym3Env(num=1, env_name="coinrun", start_level=0, num_levels=1)

Saving and loading the environment state

If you are using the gym3 interface, you can save and load the environment state:

from procgen import ProcgenGym3Envenv = ProcgenGym3Env(num=1, env_name="coinrun", start_level=0, num_levels=1)states = env.callmethod("get_state")env.callmethod("set_state", states)

This returns a list of byte strings representing the state of each game in the vectorized environment.

Notes

You should depend on a specific version of this library (using ==) for your experiments to ensure they are reproducible. You can get the current installed version with pip show procgen.This library does not require or make use of GPUs.While the library should be thread safe, each individual environment instance should only be used from a single thread. The library is not fork safe unless you set num_threads=0. Even if you do that, Qt is not guaranteed to be fork safe, so you should probably create the environment after forking or not use fork at all.

Install from Source

If you want to change the environments or create new ones, you should build from source. You can get miniconda from https://docs.conda.io/en/latest/miniconda.html if you don't have it, or install the dependencies from environment.yml manually. On Windows you will also need "Visual Studio 15 2017" installed.

git clone git@github.com:openai/procgen.gitcd procgenconda env update --name procgen --file environment.ymlconda activate procgenpip install -e .# this should say "building procgen...done"python -c "from procgen import ProcgenGym3Env; ProcgenGym3Env(num=1, env_name='coinrun')"# this should create a window where you can play the coinrun environmentpython -m procgen.interactive

The environment code is in C++ and is compiled into a shared library exposing the gym3.libenv C interface that is then loaded by python. The C++ code uses Qt for drawing.

Create a new environment

Once you have installed from source, you can customize an existing environment or make a new environment of your own. If you want to create a fast C++ 2D environment, you can fork this repo and do the following:

Copy src/games/bigfish.cpp to src/games/.cppReplace BigFish with and "bigfish" with "" in your cpp fileAdd src/games/.cpp to CMakeLists.txtRun python -m procgen.interactive --env-name to test it out

This repo includes a travis configuration that will compile your environment and build python wheels for easy installation. In order to have this build more quickly by caching the Qt compilation, you will want to configure a GCS bucket in common.py and setup service account credentials.

Add information to the info dictionary

To export game information from the C++ game code to Python, you can define a new info_type. info_types appear in the info dict returned by the gym environment, or in get_info() from the gym3 environment.

To define a new one, add the following code to the VecGame constructor here: vecgame.cpp

{ struct libenv_tensortype s; strcpy(s.name, "heist_key_count"); s.scalar_type = LIBENV_SCALAR_TYPE_DISCRETE; s.dtype = LIBENV_DTYPE_INT32; s.ndim = 0, s.low.int32 = 0; s.high.int32 = INT32_MAX; info_types.push_back(s);}

This lets the Python code know to expect a single integer and expose it in the info dict.

After adding that, you can add the following code to heist.cpp:

void observe() override { Game::observe(); int32_t key_count = 0; for (const auto& has_key : has_keys) { if (has_key) { key_count++; } } *(int32_t *)(info_bufs[info_name_to_offset.at("heist_key_count")]) = key_count;}

This populates the heist_key_count info value each time the environment is observed.

If you run the interactive script (making sure that you installed from source), the new keys should appear in the bottom left hand corner:

python -m procgen.interactive --env-name heist

Changelog

See CHANGES for changes present in each release.

Contributing

See CONTRIBUTING for information on contributing.

Assets

See ASSET_LICENSES for asset license information.

Citation

Please cite using the following bibtex entry:

@article{cobbe2019procgen, title={Leveraging Procedural Generation to Benchmark Reinforcement Learning}, author={Cobbe, Karl and Hesse, Christopher and Hilton, Jacob and Schulman, John}, journal={arXiv preprint arXiv:1912.01588}, year={2019}}

标签：python

暂时没有评论，来抢沙发吧~

OpenAI发布的用于评价强化学习智能体泛化技能学习的程序化生成环境

后台小程序开发的全方位指南

itchat 详细介绍如下

6 篇有关查询天气文章推荐

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计