融合神经网络和空间关系的中文地址解析Chinese address understanding by integrating neural network and spatial relationship
刘现印,李玉琳,尹斌,田沁
摘要(Abstract):
中文地址解析是中文地址匹配最核心的问题。针对当前比较流行的基于条件随机场(CRF)或者基于规则的中文地址解析方法,该文结合深度学习中的双向门控循环网络(BiGRU)和CRF的方法来实现中文地址分词;并且针对当前的层次地址模型和四词位标注体系,该文采用了基于空间关系地址模型和五词位的标注方法。然后分别采用基于规则的模型、CRF、BiGRU+SoftMax和BiGRU+CRF模型进行对比实验,发现该文提出的BiGRU+CRF模型配上新的空间关系地址模型及标注体系,可以对地址解析方面有更好的效果。
关键词(KeyWords): 空间关系地址模型;中文地址解析;深度学习;双向门控循环网络(BiGRU)
基金项目(Foundation): 山东省重大科技创新工程项目(2019JZZY020103);; 山东省“十三五”基础测绘规划基金资助项目(201605097)
作者(Author): 刘现印,李玉琳,尹斌,田沁
DOI: 10.16251/j.cnki.1009-2307.2021.08.023
参考文献(References):
- [1]肖昕.空间信息可视化关键技术与方法研究[D].广州:华南师范大学,2005.(XIAO Xin.Research on key technologies and methods of spatial information visualization[D].Guangzhou:South China Normal University,2005.)
- [2]HAHMANN S,BURGHARDT D.How much information is geospatially referenced?Networks and cognition[J].International Journal of Geographical Information Science,2013,27(6):1171-1189.
- [3]GOLDBERG D W,WILSON J P,KNOBLOCK C A.From text to geographic coordinates:the current state of geocoding[J].URISA Journal,2007,19(1):33-46.
- [4]江洲,李琦.地理编码(Geocoding)的应用研究[J].地理与地理信息科学,2003,19(3):22-25.(JIANGZhou,LI Qi.Research on the applications of geocoding[J].Geography and Geo-Information Science,2003,19(3):22-25.)
- [5]TIAN Qin,REN Fu,HU Tao,et al.Using an optimized Chinese address matching method to develop ageocoding service:a case study of Shenzhen,China[J].ISPRSInternational Journal of Geo-Information,2016,5(5):65.
- [6]于焕菊,齐清文,李云岭.街道的城市地址编码模型与实验[J].地球信息科学学报,2013,15(2):175-179.(YUHuanju,QI Qingwen,LI Yunling.Study on city address geocoding model based on street[J].Journal of Geo-information Science,2013,15(2):175-179.)
- [7]李军,李琦,毛东军,等.北京市地理编码数据库的研究[J].计算机工程与应用,2004,40(2):1-3.(LI Jun,LIQi,MAO Dongjun,et al.Research on Beijing geocoding database[J].Computer Engineering and Applications,2004,40(2):1-3.)
- [8]宋子辉.自然语言理解的中文地址匹配算法[J].遥感学报,2013,17(4):788-801.(SONG Zihui.Address matching algorithm based on Chinese natural language understanding[J].Journal of Remote Sensing,2013,17(4):788-801.)
- [9]亢孟军,杜清运,王明军.地址树模型的中文地址提取方法[J].测绘学报,2015,44(1):99-107.(KANGMengjun,DU Qingyun,WANG Mingjun.A new method of Chinese address extraction based on address tree model[J].Acta Geodaetica et Cartographica Sinica,2015,44(1):99-107.)
- [10]谢小蕙.地理编码原理及方法研究[D].长沙:中南大学,2006.(XIE Xiaohui.Research on the principal and methods of geocding[D].Changsha:Central South University,2006.)
- [11]赵欢,朱红权.基于双数组Trie树中文分词研究[J].湖南大学学报(自然科学版),2009,36(5):77-80.(ZHAOHuan,ZHU Hongquan.Research of Chinese word segmentation based on double-array trie[J].Journal of Hunan University(Natural Sciences),2009,36(5):77-80.)
- [12]李莉,丁忆,周建.重庆市标准地址模型与地址库研究与实践[J].地理信息世界,2014,21(4):83-88.(LI Li,DING Yi,ZHOU Jian.Research and application on standard address model and address database of Chongqing[J].Geomatics World,2014,21(4):83-88.)
- [13]臧英斐,王斌,瞿晓雯.重庆市中文语义地址模型构建方法探讨[J].地理空间信息,2015,13(3):122-125.(ZANG Yingfei,WANG Bin,QU Xiaowen.Construction method of Chinese semantic address model in Chongqing[J].Geospatial Information,2015,13(3):122-125.)
- [14]周海,杜泽欣,范瑞杰,等.空间关系地址模型及其表达模式分析[J].测绘工程,2016,25(5):25-31.(ZHOUHai,DU Zexin,FAN Ruijie,et al.Address model based on spatial-relation and its analysis of expression patterns[J].Engineering of Surveying and Mapping,2016,25(5):25-31.)
- [15]曹卫峰.中文分词关键技术研究[D].南京:南京理工大学,2009.(CAO Weifeng.Research on key techniques of Chinese segmentation[D].Nanjing:Nanjing University of Science and Technology,2009.)
- [16]程昌秀,于滨.一种基于规则的模糊中文地址分词匹配方法[J].地理与地理信息科学,2011,27(3):26-29.(CHENG Changxiu,YU Bin.A rule-based segmenting and matching method for fuzzy Chinese addresses[J].Geography and Geo-Information Science,2011,27(3):26-29.)
- [17]王克永,刘纪平,罗安,等.前后缀与特征词相结合的地名地址提取[J].测绘通报,2016(2):64-68.(WANGKeyong,LIU Jiping,LUO An,et al.Extracting toponomy and location based on the combination of prefix and suffix with feature words[J].Bulletin of Surveying and Mapping,2016(2):64-68.)
- [18]张雪英,闾国年,李伯秋,等.基于规则的中文地址要素解析方法[J].地球信息科学学报,2010,12(1):9-16.(ZHANG Xueying,LYU Guonian,LI Boqiu,et al.Rule-based approach to semantic resolution of Chinese addresses[J].Journal of Geo-information Science,2010,12(1):9-16.)
- [19]丁林芳.基于层次隐马尔科夫模型的中文地址切分标注系统[D].北京:北京大学,2010.(DING Linfang.Chinese address segmentation and labeling system based on hierarchical hidden Markov model[D].Beijing:Peking University,2010.)
- [20]郑家恒,张辉.基于HMM的中国组织机构名自动识别[J].计算机应用,2002,22(11):1-2.(ZHENG Jiaheng,ZHANG Hui.Recognition of HMM-based Chinese institution terms[J].Computer Applications,2002,22(11):1-2.)
- [21]段艳会,李晓林,黄爽.基于条件随机场的中文地址行政区划提取方法[J].武汉工程大学学报,2015,37(11):47-51.(DUAN Yanhui,LI Xiaolin,HUANG Shuang.Extraction of administrative division of Chinese address based on conditional random fields[J].Journal of Wuhan Institute of Technology,2015,37(11):47-51.)
- [22]邬伦,刘磊,李浩然,等.基于条件随机场的中文地名识别方法[J].武汉大学学报(信息科学版),2017,42(2):150-156.(WU Lun,LIU Lei,LI Haoran,et al.A Chinese toponym recognition method based on conditional random field[J].Geomatics and Information Science of Wuhan University,2017,42(2):150-156.)
- [23]刁琦,古丽米拉·克孜尔别克,钟丽峰,等.基于循环神经网络序列标注的中文分词研究[J].计算机技术与发展,2017,27(10):65-68.(DIAO Qi,GULIMILA·KEZIERBIEKE,ZHONG Lifeng,et al.Research on Chinese word segmentation method of sequence labeling based on recurrent neural networks[J].Computer Technology and Development,2017,27(10):65-68.)
- [24]李雪莲,段鸿,许牧.基于门循环单元神经网络的中文分词法[J].厦门大学学报(自然科学版),2017,56(2):237-243.(LI Xuelian,DUAN Hong,XU Mu.A gated recurrent unit neural network for Chinese word segmentation[J].Journal of Xiamen University(Natural Science),2017,56(2):237-243.)
- [25]张洪刚,李焕.基于双向长短时记忆模型的中文分词方法[J].华南理工大学学报(自然科学版),2017,45(3):61-67.(ZHANG Honggang,LI Huan.Chinese word segmentation method on the basis of bidirectional long-short term memory model[J].Journal of South China University of Technology(Natural Science Edition),2017,45(3):61-67.)
- [26]WANG P,QIAN Y,SOONG F K,et al.Part-of-speech tagging with bidirectional long short-term memory recurrent neural network[J/OL].[2021-01-10].http:∥arxiv.org/abs/1510.06168.
- [27]YAO Yushi,HUANG Zheng.Bi-directional LSTMrecurrent neural network for Chinese word segmentation[C]∥Neural Information Processing.[S.l.]:Springer,Cham,2016:345-353.
- [28]HUANG Z,XU W,YU K.Bidirectional LSTM-CRFmodels for sequence tagging[J/OL].[2021-01-10].http:∥arxiv.org/abs/1508.01991.
- [29]CHEN Xinchi,QIU Xipeng,ZHU Chenxi,et al.Long short-term memory neural networks for Chinese word segmentation[C]∥Proceedings of the 2015Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA,USA:Association for Computational Linguistics,2015:1197-1206.
- [30]程博,李卫红,童昊昕.基于BiLSTM-CRF的中文层级地址分词[J].地球信息科学学报,2019,21(8):1143-1151.(CHENG Bo,LI Weihong,TONG Haoxin.Chinese address segmentation based on BiLSTM-CRF[J].Journal of Geo-information Science,2019,21(8):1143-1151.)