后台小程序开发的全方位指南
726
2022-10-22
cola:一个分布式爬虫框架
Cola: high-level distributed crawling framework
Overview
Cola is a high-level distributed crawling framework, used to crawl pages and extract structured data from websites. It provides simple and fast yet flexible way to achieve your data acquisition objective. Users only need to write one piece of code which can run under both local and distributed mode.
Requirements
python2.7 (Python3+ will be supported later)Work on Linux, Windows and Mac OSX
Install
The quick way:
pip install cola
Or, download source code, then run:
python setup.py install
Write applications
Documents will update soon, now just refer to the wiki or weibo application.
Run applications
For the wiki or weibo app, please ensure the installation of dependencies, weibo as an example:
pip install -r /path/to/cola/app/weibo/requirements.txt
Local mode
In order to let your application support local mode, just add code to the entrance as below.
from cola.context import Contextctx = Context(local_mode=True)ctx.run_job(os.path.dirname(os.path.abspath(__file__)))
Then run the application:
python __init__.py
Stop the local job by CTRL+C.
Distributed mode
Start master:
coca master -s [ip:port]
Start one or more workers:
coca worker -s -m [ip:port]
Then run the application(weibo as an example):
coca job -u /path/to/cola/app/weibo -r
Coca command
Coca is a convenient command-line tool for the whole cola environment.
master
Kill master to stop the whole cluster:
coca master -k
job
List all jobs:
coca job -m [ip:port] -l
Example as:
list jobs at master: 10.211.55.2:11103====> job id: 8ZcGfAqHmzc, job description: sina weibo crawler, status: stopped
You can run a job which shown in the list above:
coca job -r 8ZcGfAqHmzc
Actually, you don't have to input the complete job name:
coca job -r 8Z
Part of the job name is fine if there's no conflict.
You can know the status of a running job by:
coca job -t 8Z
The status like counters during running and so on will be output to the terminal.
You can kill a job by the kill command:
coca job -k 8Z
startproject
You can create an application by this command:
coca startproject colatest
Remember, help command will always be helpful:
coca -h
or
coca master -h
Notes
Chinese docs(wiki).
Donation
Cola is a non-profit project and by now maintained by myself, thus any donation will be encouragement for the further improvements of cola project.
Alipay & Paypal: qinxuye@gmail.com
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~