后台小程序开发的全方位指南
955
2022-10-29
torch-twrl是一个来自Twitter的RL增强学习框架
torch-twrl: Reinforcement Learning in Torch
torch-twrl is an RL framework built in Lua/Torch by Twitter.
Installation
Install torch
git clone https://github.com/torch/distro.git ~/torch --recursivecd ~/torch; bash install-deps;./install.sh
Install torch-twrl
git clone --recursive https://github.com/twitter/torch-twrl.gitcd torch-twrlluarocks make
Want to play in the gym?
Start a virtual environment, not necessary but it helps keep your installation clean Download and install OpenAI Gym, gym-http-api requirements, and ffmpeg
pip install virtualenvvirtualenv venvsource venv/bin/activatepip install gympip install -r src/gym-http-api/requirements.txtbrew install ffmpeg
Works so far?
You should have everything you need:
Start your gym_http_server with
python src/gym-http-api/gym_http_server.py
In a new console window (or tab), run the example script (policy gradient agent in environment CartPole-v0)
cd exampleschmod u+x cartpole-pg.sh./cartpole-pg.sh
This script sets parameters for the experiment, in detail here is what it is calling:
th run.lua \ -env 'CartPole-v0' \ -policy categorical \ -learningUpdate reinforce \ -model mlp \ -optimAlpha 0.9 \ -timestepsPerBatch 1000 \ -stepsizeStart 0.3 \ -gamma 1 \ -nHiddenLayerSize 10 \ -gradClip 5 \ -baselineType padTimeDepAvReturn \ -beta 0.01 \ -weightDecay 0 \ -windowSize 10 \ -nSteps 1000 \ -nIterations 1000 \ -video 100 \ -optimType rmsprop \ -verboseUpdate true \ -uploadResults false \ -renderAllSteps false
Your results should look something our results from the OpenAI Gym leaderboard
Doesn't work?
Test the gym-http-api
cd /src/gym-http-api/nose2
Start a Gym HTTP server in your virtual environment
python src/gym-http-api/gym_http_server.py
In a new console window (or tab), run torch-twrl tests
luarocks make; th test/test.lua
Dependencies
Testing of RL development is a tricky endeavor, it requires well established, unified, baselines and a large community of active developers. The OpenAI Gym provides a great set of example environments for this purpose. Link: https://github.com/openai/gym
The OpenAI Gym is written in python and it expects algorithms which interact with its various environments to be as well. torch-twrl is compatible with the OpenAI Gym with the use of a Gym HTTP API from OpenAI; gym-http-api is a submodule of torch-twrl.
All Lua dependencies should be installed on your first build.
Note: if you make changes, you will need to recompile with
luarocks make
Agents
torch-twrl implements several agents, they are located in src/agents. Agents are defined by a model, policy, and learning update.
Randommodel: noModelpolicy: randomlearningUpdate: noLearning TD(Lambda)model: qFunctionpolicy: egreedylearningUpdate: tdLambda - implements temporal difference (Q-learning or SARSA) learning with eligibility traces (replacing or accumulating) Policy Gradient Williams, 1992: model: mlp - multilayer perceptron, final layeer: tanh for continuous, softmax for discretepolicy: stochasticModelPolicy, normal for continuous actions, categorical for discretelearningUpdate: reinforce
Important note about agent/environment compatibility:
The OpenAI Gym has many environments and is continuously growing. Some agents may be compatible with only a subset of environments. That is, an agent built for continuous action space environments may not work if the environment expects discrete action spaces.
Here is a useful table of the environments, with details on the different variables that may help to configure agents appropriately.
Testing details:
Continuous integration is accomplished by building with Travis. Testing is done with LUAJIT21, LUA51 and LUA52 with compilers gcc and clang.
Tests are defined in the /tests directory with separate basic unit tests set and a Gym integration test set.
Known Issues:
LUA52 and libhash not working, so tilecoding examples fail in LUA52.
Future Work
Automatic policy differentiation with AutogradParallel batch samplingAdditional baselines for advantage function computationCross Entropy Method (CEM)Deep Q Learning (DQN)Double DQNAsynchronous Advantage Actor-Critic (A3C)Deep Deterministic Policy Gradient (DDPG)Trust Region Policy Optimization (TRPO)Expected-SARSATrue Online-TD
References
Boyan, J., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems, 369-376.Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3(1), 9-44.Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine learning, 22(1-3), 123-158.Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, (5), 834-846.Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning." Machine learning 8.3-4 (1992): 229-256.
License
torch-twrl is released under the MIT License. Copyright (c) 2016 Twitter, Inc.
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~