torch-twrl是一个来自Twitter的RL增强学习框架

网友投稿 955 2022-10-29

torch-twrl: Reinforcement Learning in Torch

torch-twrl is an RL framework built in Lua/Torch by Twitter.

Installation

Install torch

git clone https://github.com/torch/distro.git ~/torch --recursivecd ~/torch; bash install-deps;./install.sh

Install torch-twrl

git clone --recursive https://github.com/twitter/torch-twrl.gitcd torch-twrlluarocks make

Want to play in the gym?

Start a virtual environment, not necessary but it helps keep your installation clean Download and install OpenAI Gym, gym-http-api requirements, and ffmpeg

pip install virtualenvvirtualenv venvsource venv/bin/activatepip install gympip install -r src/gym-http-api/requirements.txtbrew install ffmpeg

Works so far?

You should have everything you need:

Start your gym_http_server with

python src/gym-http-api/gym_http_server.py

In a new console window (or tab), run the example script (policy gradient agent in environment CartPole-v0)

cd exampleschmod u+x cartpole-pg.sh./cartpole-pg.sh

This script sets parameters for the experiment, in detail here is what it is calling:

th run.lua \ -env 'CartPole-v0' \ -policy categorical \ -learningUpdate reinforce \ -model mlp \ -optimAlpha 0.9 \ -timestepsPerBatch 1000 \ -stepsizeStart 0.3 \ -gamma 1 \ -nHiddenLayerSize 10 \ -gradClip 5 \ -baselineType padTimeDepAvReturn \ -beta 0.01 \ -weightDecay 0 \ -windowSize 10 \ -nSteps 1000 \ -nIterations 1000 \ -video 100 \ -optimType rmsprop \ -verboseUpdate true \ -uploadResults false \ -renderAllSteps false

Your results should look something our results from the OpenAI Gym leaderboard

Doesn't work?

Test the gym-http-api

cd /src/gym-http-api/nose2

Start a Gym HTTP server in your virtual environment

python src/gym-http-api/gym_http_server.py

In a new console window (or tab), run torch-twrl tests

luarocks make; th test/test.lua

Dependencies

Testing of RL development is a tricky endeavor, it requires well established, unified, baselines and a large community of active developers. The OpenAI Gym provides a great set of example environments for this purpose. Link: https://github.com/openai/gym

The OpenAI Gym is written in python and it expects algorithms which interact with its various environments to be as well. torch-twrl is compatible with the OpenAI Gym with the use of a Gym HTTP API from OpenAI; gym-http-api is a submodule of torch-twrl.

All Lua dependencies should be installed on your first build.

Note: if you make changes, you will need to recompile with

luarocks make

Agents

torch-twrl implements several agents, they are located in src/agents. Agents are defined by a model, policy, and learning update.

Randommodel: noModelpolicy: randomlearningUpdate: noLearning TD(Lambda)model: qFunctionpolicy: egreedylearningUpdate: tdLambda - implements temporal difference (Q-learning or SARSA) learning with eligibility traces (replacing or accumulating) Policy Gradient Williams, 1992: model: mlp - multilayer perceptron, final layeer: tanh for continuous, softmax for discretepolicy: stochasticModelPolicy, normal for continuous actions, categorical for discretelearningUpdate: reinforce

Important note about agent/environment compatibility:

The OpenAI Gym has many environments and is continuously growing. Some agents may be compatible with only a subset of environments. That is, an agent built for continuous action space environments may not work if the environment expects discrete action spaces.

Here is a useful table of the environments, with details on the different variables that may help to configure agents appropriately.

Testing details:

Continuous integration is accomplished by building with Travis. Testing is done with LUAJIT21, LUA51 and LUA52 with compilers gcc and clang.

Tests are defined in the /tests directory with separate basic unit tests set and a Gym integration test set.

Known Issues:

LUA52 and libhash not working, so tilecoding examples fail in LUA52.

Future Work

Automatic policy differentiation with AutogradParallel batch samplingAdditional baselines for advantage function computationCross Entropy Method (CEM)Deep Q Learning (DQN)Double DQNAsynchronous Advantage Actor-Critic (A3C)Deep Deterministic Policy Gradient (DDPG)Trust Region Policy Optimization (TRPO)Expected-SARSATrue Online-TD

References

Boyan, J., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems, 369-376.Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3(1), 9-44.Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine learning, 22(1-3), 123-158.Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, (5), 834-846.Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning." Machine learning 8.3-4 (1992): 229-256.

License

标签：python

暂时没有评论，来抢沙发吧~

torch-twrl是一个来自Twitter的RL增强学习框架

后台小程序开发的全方位指南

itchat 详细介绍如下

6 篇有关查询天气文章推荐

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计