基于随机森林的PM2.5实时预报系统Real-time forecasting system of PM2.5 concentration based on spark framework and random forest model
侯俊雄,李琦,朱亚杰,冯逍,毛曦
摘要(Abstract):
针对我国当前重污染天气PM2.5浓度的实时预报问题,该文提出了一种基于随机森林算法的PM2.5浓度实时预报方法,并利用此方法对北京市地面空气质量监测数据和气象数据进行分析,建立了基于随机森林算法的PM2.5浓度实时预报模型。实验证明,该模型能够对72h内PM2.5浓度进行较高精度的实时预报,通过使用Spark分布式计算框架,能够有效降低算法耗时,文章基于此模型与Spark分布式计算框架建立了PM2.5实时预报系统。
关键词(KeyWords): PM2.5实时预报;分布式计算;随机森林;空气质量;Spark
基金项目(Foundation):
作者(Author): 侯俊雄,李琦,朱亚杰,冯逍,毛曦
DOI: 10.16251/j.cnki.1009-2307.2017.01.001
参考文献(References):
- [1]World Health Organization.Health aspects of air pollution with particulate matter,ozone and nitrogen dioxide[R].World Health Organization,Regional Office for Europe,2003.
- [2]ZHANG Y,BOCQUET M,MALLET V,et al.Realtime air quality forecasting,part I:History,techniques,and current status[J].Atmospheric Environment,2012,60:632-655.
- [3]朱亚杰.基于机器学习的区域空气质量预报研究:以京津冀地区为例[D].北京:北京大学2016.(ZHU Yajie.Air quality forecasting and spatio-temporal modeling in Jing-jin-ji area[D].Beijing:Peking University 2016.)
- [4]SIWEK K,OSOWSKI S.Improving the accuracy of prediction of PM10pollution by the wavelet transformation and an ensemble of neural predictors[J].Engineering Applications of Artificial Intel-ligence,2012(9):1246-1258.
- [5]DONG M,YANG D,KUANG Y,et al.PM2.5concentration prediction using hidden semi-Markov modelbased times series data mining[J].Expert Systems with Applications,2009(7):9046-9055.
- [6]SUN W,ZHANG H,PALAZOGLU A,et al.Prediction of 24-hour-average PM2.5concentrations using a hidden Markov model with different emission distributions in Northern California[J].The Science of the Total Environment,2013(1):93-103.
- [7]朱亚杰,李琦,侯俊雄,等.基于支持向量回归的PM2.5浓度实时预报[J].测绘科学,2016,41(1):12-17.(ZHU Yajie,LI Qi,HOU Junxiong et al.Real time prediction of PM2.5concentration based on support vector regresion algorithms[J].Science of Surveying and Mapping,2016,41(1):12-17.)
- [8]FENG X,LI Q,ZHU Y,et al.Artificial neural networks forecasting of PM 2.5pollution using air mass trajectory based geographic model and wavelet transformation[J].Atmospheric Environment,2015,107:118-128.
- [9]LOH W Y.Classification and regression trees[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2011,1(1):14-23.
- [10]BREIMANL.Random forests[J].Machine Learning,2001,45(1):5-32.
- [11]MENG X,BRADLEY J,YUVAZ B,et al.Mllib:Machine learning in apache spark[J].JMLR,2016,17(34):1-7.
- [12]World Health Organization.Health aspects of air pollution results from the WHO project Systematic Review of Health Aspects of Air Pollution in Europe[R].World Health Organizatio,Regional Office for Europe,2004.
- [13]ZHANG Y,BOCQUET M,MALLET V,et al.Realtime air quality forecasting,part I:History,techniques and current status[J].Atmospheric Environment,2012,60:632-655.
- [14]Kurt A,Gulbagci B,Karaca F,et al.An online air pollution forecasting system using neural networks[J].Environment International,2008,34(5):592-598.
- [15]MENGX.MLlib:Scalable Machine Learning on Spark[C]//Spark Workshop April.2014.