京津冀大气能见度特征分析及影响因素研究Analysis of characteristics and influencing factors of atmospheric visibility in Beijing-Tianjin-Hebei region
张杨,张福浩,陈才,焦冠棋,仇阿根,欧尔格力
摘要(Abstract):
为研究京津冀能见度状况和分析影响能见度的特征贡献模式,基于2019年京津冀气象站点和空气质量监测站点数据研究能见度时序变化特征,运用随机森林算法建立能见度估算模型分析影响因子整体解释度,并基于SHAP框架结合随机森林模型构建能见度影响因子可解释模型,对特征因子贡献大小、方向以及单变量贡献情况进行了详细解释和分析:(1)能见度状况在早晚高峰时较差,每日15时左右最好,工作日和非工作日无明显差别,从季节上看冬季能见度最差;(2)随机森林模型拟合系数解释方差为0.897 3,R~2为0.897 8,拟合结果良好;(3)根据SHAP可解释模型分析结果可得,PM_(2.5)是影响能见度的最重要因子,呈负向相关,且贡献度变化率以浓度100μ/m~3为转折点由急促转向平缓。实验证明,基于SHAP框架的能见度解释模型不仅能反映贡献度的大小以及影响效应的方向,而且可以对单个变量的贡献进行详细分析,提高了特征贡献分析的精细度和准确性。
关键词(KeyWords): 随机森林;能见度;SHAP框架;贡献解释
基金项目(Foundation): 国家重点研发计划项目(2019YFB2102500)
作者(Author): 张杨,张福浩,陈才,焦冠棋,仇阿根,欧尔格力
DOI: 10.16251/j.cnki.1009-2307.2021.07.027
参考文献(References):
- [1] 马佳,于兴娜,安俊琳,等.南京北郊冬春季大气能见度影响因子贡献研究[J].环境科学,2016,37(1):41-50.(MA Jia,YU Xingna,AN Junlin,et al.Contributions of factors that influenced the visibility in north suburb of Nanjing in winter and spring[J].Environmental Science,2016,37(1):41-50.)
- [2] 韩贵甫.雾霾天气形成的原因及治理对策[J].湖北农机化,2020(16):38-39.(HAN Guifu.Causes of the formation of haze weather and countermeasures [J].HuBei NongJiHua,2020(16):38-39.)
- [3] 郝巨飞,张功文,杨允凌.大气能见度及影响因子特征分析[J].干旱区资源与环境,2017,31(4):160-164.(HAO Jufei,ZHANG Gongwen,YANG Yunling.The characteristics of atmospheric visibility and influencing factors[J].Journal of Arid Land Resources and Environment,2017,31(4):160-164.)
- [4] 姜江,张国平,高金兵.北京大气能见度的主要影响因子[J].应用气象学报,2018,29(2):188-199.(JIANG Jiang,ZHANG Guoping,GAO Jinbing.Main influencing factors of visibility in Beijing[J].Journal of Applied Meteorological Science,2018,29(2):188-199.)
- [5] XIAO S,WANG Q Y,CAO J J,et al.Long-term trends in visibility and impacts of aerosol composition on visibility impairment in Baoji,China[J].Atmospheric Research,2014,149:88-95.
- [6] DUTTA D,CHAUDHURI S.Nowcasting visibility during wintertime fog over the airport of a metropolis of India:Decision tree algorithm and artificial neural network approach[J].Natural Hazards,2015,75(2):1349-1368.
- [7] CORNEJO-BUENO L,CASANOVA-MATEO C,SANZ-JUSTO J,et al.Efficient prediction of low-visibility events at airports using machine-learning regression[J].Boundary-Layer Meteorology,2017,165(2):349-370.
- [8] 卢盛栋,赵俊杰,于小红,等.太原大气能见度对相对湿度及颗粒物质量浓度的响应关系研究[J].陕西气象,2020(5):21-26.(LU Shengdong,ZHAO Junjie,YU Xiaohong,et al.Response of visibility to relative humidity and particulate matter concentration in Taiyuan City[J].Journal of Shaanxi Meteorology,2020(5):21-26.)
- [9] WANG JIAXUAN,WIENS J,LUNDBERG S.Shapley flow:A graph-based approach to interpreting model predictions[EB/OL].2020
- [10] 虞思逸.城市三维空间形态对人居环境影响的测度与评价研究[D].上海:华东师范大学,2020.(YU Siyi.Measuring and evaluating the effects of three-dimensional urban spatial pattern on urban human settlements[D].Shanghai:East China Normal University,2020.)
- [11] 陈文豪.基于XGBoost的互联网金融贷前逾期识别与模型表达[D].哈尔滨:哈尔滨工业大学,2019.(CHEN Wenhao.Internet financial pre-lending recognition and model expression base on XGBoost[D].Harbin:Harbin Institute of Technology,2019.)
- [12] 王雨晨.基于随机森林的上海市PM2.5质量浓度预测研究[D].上海:华东师范大学,2017.(WANG Yuchen.A prediction model of PM2.5 concentrations in Shanghai based on random forest[D].Shanghai:East China Normal University,2017.)
- [13] 王超,阚瑷珂,曾业隆,等.基于随机森林模型的西藏人口分布格局及影响因素[J].地理学报,2019,74(4):664-680.(WANG Chao,KAN Aike,ZENG Yelong,et al.Population distribution pattern and influencing factors in Tibet based on random forest model[J].Acta Geographica Sinica,2019,74(4):664-680.)
- [14] 刘海猛,方创琳,黄解军,等.京津冀城市群大气污染的时空特征与影响因素解析[J].地理学报,2018,73(1):177-191.(LIU Haimeng,FANG Chuanglin,HUANG Jiejun,et al.The spatial-temporal characteristics and influencing factors of air pollution in Beijing-Tianjin-Hebei urban agglomeration[J].Acta Geographica Sinica,2018,73(1):177-191.)
- [15] BREIMAN L.Random forests[J].Machine Learning,2001,45(1):5-32.
- [16] 夏晓圣,陈菁菁,王佳佳,等.基于随机森林模型的中国PM2.5浓度影响因素分析[J].环境科学,2020,41(5):2057-2065.(XIA Xiaosheng,CHEN Jingjing,WANG Jiajia,et al.PM2.5 concentration influencing factors in China based on the random forest model[J].Environmental Science,2020,41(5):2057-2065.)
- [17] 任才溶,谢刚.基于随机森林和气象参数的PM2.5浓度等级预测[J].计算机工程与应用,2019,55(2):213-220.(REN Cairong,XIE Gang.Prediction of PM2.5 concentration level based on random forest and meteorological parameters[J].Computer Engineering and Applications,2019,55(2):213-220.)
- [18] 侯俊雄,李琦,朱亚杰,等.基于随机森林的PM2.5实时预报系统[J].测绘科学,2017,42(1):1-6.(HOU Junxiong,LI Qi,ZHU Yajie,et al.Real-time forecasting system of PM2.5 concentration based on spark framework and random forest model [J].Science of Surveying and Mapping,2017,42(1):1-6.)
- [19] SHAPLEY L S.17.A value for n-person games[M]//Contributions to the Theory of Games (AM-28),Volume II.Princeton:Princeton University Press,1953:307-318.
- [20] 刘鑫.基于分类集成的Lendingclub数据中客户违约预测[D].兰州:兰州大学,2020.(LIU Xin.Customer default prediction in lendingclub data based on classification integration[D].Lanzhou:Lanzhou University,2020.)
- [21] 陈登科.网络钓鱼网址识别的深度学习模型及可解释性研究[D].昆明:云南财经大学,2020.(CHEN Dengke.Deep learning model for phishing URL identification and interpretability research[D].Kunming:Yunnan University of Finance and Economics,2020.)
- [22] LUNDBERG S M,ERION G G,LEE S I.Consistent individualized feature attribution for tree ensembles[EB/OL].2018
- [23] LUNDBERG S M,NAIR B,VAVILALA M S,et al.Explainable machine-learning predictions for the prevention of hypoxaemia during surgery[J].Nature Biomedical Engineering,2018,2(10):749-760.
- [24] 王志宇.基于LightGBM框架的上海市大气能见度预报订正研究[D].上海:华东师范大学,2019.(WANG Zhiyu.The correction of atmospheric visibility prediction in Shanghai based on LightGBM framework[D].Shanghai:East China Normal University,2019.)
- [25] HAN Ling,SUN Zhaobin,HE Juan,et al.Seasonal variation in health impacts associated with visibility in Beijing,China[J].Science of the Total Environment,2020,730:139149.
- [26] 于丽胖.城市道路绿化配置对空气颗粒物和CO扩散的影响[D].北京:中国林业科学研究院,2009.(YU Lipang.Effects of urban road greening on the distribution of air particulate matter and carbon monoxide[D].Beijing:Chinese Academy of Forestry,2009.)
- [27] 艳艳,缪育聪,李建,等.海口地区2018年2月持续低能见度过程的气象条件分析[J].北京大学学报(自然科学版),2019,55(5):899-906.(YAN Yan,MIAO Yucong,LI Jian,et al.Meteorological characteristics of prolong low-visibility events in Haikou during February 2018[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2019,55(5):899-906.)
- [28] 施悯悯.合肥市大气能见度时空变化特征及影响因子[D].合肥:安徽农业大学,2017.(SHI Minmin.Tempo-spatial variations characteristics of atmospheric visibility in Hefei city and their influencing factors[D].Hefei:Anhui Agricultural University,2017.)
- [29] 陆冰鉴.基于机器学习的京津冀地区PM2.5浓度及能见度预报[D].南京:南京信息工程大学,2019.(LU Bingjian.PM2.5 concentration and visibility forecast in Beijing Tianjin and Hebei region based on machine learning[D].Nanjing:Nanjing University of Information Science & Technology,2019.)