plait.py - 一个用于从可组合的yaml模板生成假数据的程序

网友投稿 612 2022-10-29

plait.py - 一个用于从可组合的yaml模板生成假数据的程序

plait.py - 一个用于从可组合的yaml模板生成假数据的程序

plait.py

plait.py is a program for generating fake data from composable yaml templates.

The idea behind plait.py is that it should be easy to model fake data that has an interesting shape. Currently, many fake data generators model their data as a collection of IID variables; with plait.py we can stitch together those variables into a more coherent model.

some example uses for plait.py are:

generating mock application data in test environmentsvalidating the usefulness of statistical techniquescreating synthetic datasets for performance tuning databases

features

declarative syntaxuse basic faker.rb fields with #{} interpolatorssample and join data from CSV fileslambda expressions, switch and mixture fieldsnested and composable templatesstatic variables and hidden fields

an example template

# a person generatordefine: min_age: 10 minor_age: 13 working_age: 18fields: age: random: gauss(25, 5) # minimum age is $min_age finalize: max($min_age, value) gender: mixture: - value: M - value: F name: "#{name.name}" job: value: "#{job.title}" onlyif: this.age > $working_age address: template: address/usa.yaml phone: # add a phone if the person is older than the minor age template: device/phone.yaml onlyif: this.age > ${minor_age} # we model our height as a gaussian that varies based on # age and gender height: lambda: this._base_height * this._age_factor _base_height: switch: - onlyif: this.gender == "F" random: gauss(60, 5) - onlyif: this.gender == "M" random: gauss(70, 5) _age_factor: switch: - onlyif: this.age < 15 lambda: 1 - (20 - (this.age + 5)) / 20 - default: value: 1

how its different

some specific examples of what plait.py can do:

generate proportional populations using census data and CSVscreate realistic zipcodes by state, city or region (also using CSVs)create a taxi trip dataset with a cost model based on geodistanceadd seasonal patterns (daily, weekly, etc) to data

usage

installation

# install with pythonpip install plaitpy# or with pypypypy-pip install plaitpy

cloning the repo for development

git clone https://github.com/plaitpy/plaitpy# get the fakerb repogit submodule initgit submodule update

generating records from command line

specify a template as a yaml file, then generate records from that yaml file.

# a simple example (if cloning plait.py repo)python main.py templates/timestamp/uniform.yaml# if plait.py is installed via pipplait.py templates/timestamp/uniform.yaml

generating records from API

import plaitpyt = plaitpy.Template("templates/timestamp/uniform.yaml")print t.gen_record()print t.gen_records(10)

looking up faker fields

plait.py also simplifies looking up faker fields:

# list faker namespacesplait.py --list# lookup faker namespacesplait.py --lookup name# lookup faker keys# (-ll is short for --lookup)plait.py --ll name.suffix

documentation

yaml file commands

see docs/FORMAT.md

datasets

see docs/EXAMPLES.mdalso see templates/ dir

troubleshooting

see docs/TROUBLESHOOTING.md

Dependent Markov Processes

To simulate data that comes from many markov processes (a markov ecosystem), see the plaitpy-ipc repository.

future direction

If you have ideas on features to add, open an issue - Feedback is appreciated!

License

MIT

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:哎哟不行了,让我笑一下
下一篇:MICS 2018第五届医学图像计算青年研讨会
相关文章

 发表评论

暂时没有评论,来抢沙发吧~