项目中增加Redis,更稳定高效(项目中加redis)
850
2022-11-02
MLComp - 用于机器学习的分布式DAG(有向无环图)框架
The goal of MLComp is to provide tools for training, inferencing, creating complex pipelines (especially for computer vision) in a rapid, well manageable way. MLComp is compatible with: Python 3.6+, Unix operation system.
Part of Catalyst Ecosystem. Project manifest.
Features
Amazing UICatalyst supportDistributed trainingSupervisor that controls computational resourcesSynchronization of both code and dataResource monitoringFull functionality of the pause and continue on UIAuto control of the requirementsCode dumping (with syntax highlight on UI)Kaggle integrationHierarchical loggingGrid searchExperiments comparisonCustomizing layout system
Contents
Screenshots Installation UI Usage Docs and examples Environment variables
Screenshots
Dags
Computers
Reports
Code
Graph
More screenshots
Installation
Install MLComp packagesudo apt-get install -y \libavformat-dev libavcodec-dev libavdevice-dev \libavutil-dev libswscale-dev libavresample-dev libavfilter-devpip install mlcompmlcomp initmlcomp migrate Setup your environment. Please consider Environment variables section Run db, redis, mlcomp-server, mlcomp-workers: Variant 1: minimal (if you have 1 computer) Run all necessary (mlcomp-server, mlcomp-workers, redis-server), it uses SQLITE:mlcomp-server start --daemon=True Variant 2: full a. Change your Environment variables to use PostgreSql b. Install rsync on each work computer sudo apt-get install rsync Ensure that every computer is available by SSH protocol with IP/PORT you specified in the Environment variables file. rsync will perform the following commands: to uploadrsync -vhru -e "ssh -p {target.port} -o StrictHostKeyChecking=no" \{folder}/ {target.user}@{target.ip}:{folder}/ --perms --chmod=777 to download rsync -vhru -e "ssh -p {source.port} -o StrictHostKeyChecking=no" \{source.user}@{source.ip}:{folder}/ {folder}/ --perms --chmod=777 c. Install apex for distributed learning d. To Run postgresql, redis-server, mlcomp-server, execute on your server-computer:cd ~/mlcomp/configs/docker-compose -f server-compose.yml up -d e. Run on each worker-computer:mlcomp-worker start
UI
Web site is available at http://{WEB_HOST}:{WEB_PORT}
By default, it is http://localhost:4201
The front is built with AngularJS.
In case you desire to change it, please consider front's Readme page
Usage
Run
mlcomp dag PATH_TO_CONFIG.yml
This command copies files of the directory to the database.
Then, the server schedules the DAG considering free resources.
For more information, please consider Docs
Docs and examples
You can find advanced tutorials and MLComp best practices in the examples folder of the repository.
FileSync tutorial describes data synchronization mechanism
Environment variables
The single file to setup your computer environment is located at ~/mlcomp/configs/.env
ROOT_FOLDER - folder to save MLComp files: configs, db, tasks, etc.TOKEN - site security token. Please change it to any stringDB_TYPE. Either SQLITE or POSTGRESQLPOSTGRES_DB. PostgreSql db namePOSTGRES_USER. PostgreSql userPOSTGRES_PASSWORD. PostgreSql passwordPOSTGRES_HOST. PostgreSql hostPGDATA. PostgreSql db files locationREDIS_HOST. Redis hostREDIS_PORT. Redis portREDIS_PASSWORD. Redis passwordWEB_HOST. MLComp site host. 0.0.0.0 means it is available from everywhereWEB_PORT. MLComp site portCONSOLE_LOG_LEVEL. log level for output to the consoleDB_LOG_LEVEL. log level for output to the databaseIP. Ip of a work computer. The work computer must be accessible from other work computers by these IP/PORTPORT. Port of a work computer. The work computer must be accessible from other work computers by these IP/PORT (SSH protocol)MASTER_PORT_RANGE. distributed port range for a work computer. 29500-29510 means that if this work computer is a master in a distributed learning, it will use the first free port from this range. Ranges of different work computers must not overlap.NCCL_SOCKET_IFNAME. NCCL network interface.FILE_SYNC_INTERVAL. File sync interval in seconds. 0 means file sync is offWORKER_USAGE_INTERVAL. Interval in seconds of writing worker usage to DBINSTALL_DEPENDENCIES. True/False. Either install dependent libraries or notSYNC_WITH_THIS_COMPUTER. True/False. If False, all computers except that will not sync with that oneCAN_PROCESS_TASKS. True/False. If false, this computer does not process tasks
You can see your network interfaces with ifconfig command. Please consider nvidia doc
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~