PocketFlow - 腾讯开源的模型压缩自动化(AutoMC)框架-FinClip官网

PocketFlow - 腾讯开源的模型压缩自动化(AutoMC)框架

网友投稿 632 2022-10-29

PocketFlow - 腾讯开源的模型压缩自动化(AutoMC)框架

PocketFlow

PocketFlow is an open-source framework for compressing and accelerating deep learning models with minimal human effort. Deep learning is widely used in various areas, such as computer vision, speech recognition, and natural language translation. However, deep learning models are often computational expensive, which limits further applications on mobile devices with limited computational resources.

PocketFlow aims at providing an easy-to-use toolkit for developers to improve the inference efficiency with little or no performance degradation. Developers only needs to specify the desired compression and/or acceleration ratios and then PocketFlow will automatically choose proper hyper-parameters to generate a highly efficient compressed model for deployment.

PocketFlow was originally developed by researchers and engineers working on machine learning team within Tencent AI Lab for the purposes of compacting deep neural networks with industrial applications.

For full documentation, please refer to PocketFlow's GitHub Pages. To start with, you may be interested in the installation guide and the tutorial on how to train a compressed model and deploy it on mobile devices.

For general discussions about PocketFlow development and directions please refer to PocketFlow Google Group. If you need a general help, please direct to Stack Overflow. You can report issues, bug reports, and feature requests on GitHub Issue Page.

Framework

The proposed framework mainly consists of two categories of algorithm components, i.e. learners and hyper-parameter optimizers, as depicted in the figure below. Given an uncompressed original model, the learner module generates a candidate compressed model using some randomly chosen hyper-parameter combination. The candidate model's accuracy and computation efficiency is then evaluated and used by hyper-parameter optimizer module as the feedback signal to determine the next hyper-parameter combination to be explored by the learner module. After a few iterations, the best one of all the candidate models is output as the final compressed model.

Learners

A learner refers to some model compression algorithm augmented with several training techniques as shown in the figure above. Below is a list of model compression algorithms supported in PocketFlow:

Name	Description
`ChannelPrunedLearner`	channel pruning with LASSO-based channel selection (He et al., 2017)
`DisChnPrunedLearner`	discrimination-aware channel pruning (Zhuang et al., 2018)
`WeightSparseLearner`	weight sparsification with dynamic pruning schedule (Zhu & Gupta, 2017)
`UniformQuantLearner`	weight quantization with uniform reconstruction levels (Jacob et al., 2018)
`UniformQuantTFLearner`	weight quantization with uniform reconstruction levels and TensorFlow APIs
`NonUniformQuantLearner`	weight quantization with non-uniform reconstruction levels (Han et al., 2016)

All the above model compression algorithms can trained with fast fine-tuning, which is to directly derive a compressed model from the original one by applying either pruning masks or quantization functions. The resulting model can be fine-tuned with a few iterations to recover the accuracy to some extent. Alternatively, the compressed model can be re-trained with the full training data, which leads to higher accuracy but usually takes longer to complete.

To further reduce the compressed model's performance degradation, we adopt network distillation to augment its training process with an extra loss term, using the original uncompressed model's outputs as soft labels. Additionally, multi-GPU distributed training is enabled for all learners to speed-up the time-consuming training process.

Hyper-parameter Optimizers

For model compression algorithms, there are several hyper-parameters that may have a large impact on the final compressed model's performance. It can be quite difficult to manually determine proper values for these hyper-parameters, especially for developers that are not very familiar with algorithm details. Recently, several AutoML systems, e.g. Cloud AutoML from Google, have been developed to train high-quality machine learning models with minimal human effort. Particularly, the AMC algorithm (He et al., 2018) presents promising results for adopting reinforcement learning for automated model compression with channel pruning and fine-grained pruning.

In PocketFlow, we introduce the hyper-parameter optimizer module to iteratively search for the optimal hyper-parameter setting. We provide several implementations of hyper-parameter optimizer, based on models including Gaussian Processes (GP, Mockus, 1975), Tree-structured Parzen Estimator (TPE, Bergstra et al., 2013), and Deterministic Deep Policy Gradients (DDPG, Lillicrap et al., 2016). The hyper-parameter setting is optimized through an iterative process. In each iteration, the hyper-parameter optimizer chooses a combination of hyper-parameter values, and the learner generates a candidate model with fast fast-tuning. The candidate model is evaluated to calculate the reward of the current hyper-parameter setting. After that, the hyper-parameter optimizer updates its model to improve its estimation on the hyper-parameter space. Finally, when the best candidate model (and corresponding hyper-parameter setting) is selected after some iterations, this model can be re-trained with full data to further reduce the performance loss.

Performance

In this section, we present some of our results for applying various model compression methods for ResNet and MobileNet models on the ImageNet classification task, including channel pruning, weight sparsification, and uniform quantization. For complete evaluation results, please refer to here.

Channel Pruning

We adopt the DDPG algorithm as the RL agent to find the optimal layer-wise pruning ratios, and use group fine-tuning to further improve the compressed model's accuracy:

Model	FLOPs	Uniform	RL-based	RL-based + Group Fine-tuning
MobileNet-v1	50%	66.5%	67.8% (+1.3%)	67.9% (+1.4%)
MobileNet-v1	40%	66.2%	66.9% (+0.7%)	67.0% (+0.8%)
MobileNet-v1	30%	64.4%	64.5% (+0.1%)	64.8% (+0.4%)
Mobilenet-v1	20%	61.4%	61.4% (+0.0%)	62.2% (+0.8%)

Weight Sparsification

Comparing with the original algorithm (Zhu & Gupta, 2017) which uses the same sparsity for all layers, we incorporate the DDPG algorithm to iteratively search for the optimal sparsity of each layer, which leads to the increased accuracy:

Model	Sparsity	(Zhu & Gupta, 2017)	RL-based
MobileNet-v1	50%	69.5%	70.5% (+1.0%)
MobileNet-v1	75%	67.7%	68.5% (+0.8%)
MobileNet-v1	90%	61.8%	63.4% (+1.6%)
MobileNet-v1	95%	53.6%	56.8% (+3.2%)

Uniform Quantization

We show that models with 32-bit floating-point number weights can be safely quantized into their 8-bit counterpart without accuracy loss (sometimes even better!). The resulting model can be deployed on mobile devices for faster inference (Device: XiaoMi 8 with a Snapdragon 845 CPU):

Model	Acc. (32-bit)	Acc. (8-bit)	Time (32-bit)	Time (8-bit)
MobileNet-v1	70.89%	71.29% (+0.40%)	124.53	56.12 (2.22x)
MobileNet-v2	71.84%	72.26% (+0.42%)	120.59	49.04 (2.46x)

All the reported time are in milliseconds.

Citation

Please cite PocketFlow in your publications if it helps your research:

@incollection{wu2018pocketflow, author = {Jiaxiang Wu and Yao Zhang and Haoli Bai and Huasong Zhong and Jinlong Hou and Wei Liu and Junzhou Huang}, title = {PocketFlow: An Automated Framework for Compressing and Accelerating Deep Neural Networks}, booktitle = {Advances in Neural Information Processing Systems (NIPS), Workshop on Compact Deep Neural Networks with Industrial Applications}, year = {2018},}

Reference

[Bergstra et al., 2013] J. Bergstra, D. Yamins, and D. D. Cox. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In International Conference on Machine Learning (ICML), pages 115-123, Jun 2013.[Han et al., 2016] Song Han, Huizi Mao, and William J. Dally. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In International Conference on Learning Representations (ICLR), 2016.[He et al., 2017] Yihui He, Xiangyu Zhang, and Jian Sun. Channel Pruning for Accelerating Very Deep Neural Networks. In IEEE International Conference on Computer Vision (ICCV), pages 1389-1397, 2017.[He et al., 2018] Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In European Conference on Computer Vision (ECCV), pages 784-800, 2018.[Jacob et al., 2018] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2704-2713, 2018.[Lillicrap et al., 2016] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous Control with Deep Reinforcement Learning. In International Conference on Learning Representations (ICLR), 2016.[Mockus, 1975] J. Mockus. On Bayesian Methods for Seeking the Extremum. In Optimization Techniques IFIP Technical Conference, pages 400-404, 1975.[Zhu & Gupta, 2017] Michael Zhu and Suyog Gupta. To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression. CoRR, abs/1710.01878, 2017.[Zhuang et al., 2018] Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Jiezhang Cao, Qingyao Wu, Junzhou Huang, and Jinhui Zhu. Discrimination-aware Channel Pruning for Deep Neural Networks. In Annual Conference on Neural Information Processing Systems (NIPS), 2018.

Contributing

If you are interested in contributing, check out the CONTRIBUTING.md, also join our Tencent OpenSource Plan.

小程序容器助力企业在金融与物联网领域实现高效合规运营，带来的新机遇与挑战如何管理？

632 2022-10-29

PocketFlow - 腾讯开源的模型压缩自动化(AutoMC)框架

小程序容器助力企业在金融与物联网领域实现高效合规运营，带来的新机遇与挑战如何管理？

小程序引擎如何促进企业在金融行业的数字化转型及合规运营

企业如何通过vue小程序开发满足高效运营与合规性需求

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计