HMM的地名地址时空数据引擎构建方法The address spatiotemporal data engine building method based on HMM
王勇,周松,邢策梅
摘要(Abstract):
为了解决中文地名地址的解析和空间化普遍存在语义理解不准确、空间化精度不高、匹配效率不理想等问题、构建适用于智慧城市时空大数据平台的地名地址时空数据引擎,该文通过引入HMM构建了一种地名地址语义解析及地址空间化的方法。该方法主要通过语义分析、中文分词、创建索引、访问标准地址数据库、计算匹配度、结果排序等步骤,实现了对中文地名地址的高精度、高效率匹配。从高邮市的实验数据匹配结果来看,该批次样本匹配成功率达95.2%,匹配准确率达89.1%,时间和存储空间开销均较为理想,大幅提升了地名地址空间匹配的效率和效果,已基本能够满足时空大数据平台对地名地址数据空间化的要求。
关键词(KeyWords): HMM;中文分词;地名地址引擎;时空大数据平台;智慧城市
基金项目(Foundation):
作者(Author): 王勇,周松,邢策梅
DOI: 10.16251/j.cnki.1009-2307.2020.10.023
参考文献(References):
- [1] 陈德权.GIS地名搜索系统的关键技术设计与实现[J].测绘与空间地理信息,2013,36(8):58-60.(CHEN Dequan.Design and implementation of key technologies for GIS place search system[J].Geomatics & Spatial Infomation Technology,2013,36(8):58-60.)
- [2] 柴洁.基于IKAnalyzer和Lucene的地理编码中文搜索引擎的研究与实现[J].城市勘测,2014(6):45-50.(CHAI Jie.Research and implementation of Chinese search engine in geocoding based on IKAnalyzer and Lucene[J].Urban Geotechnical Investigation & Surveying,2014(6):45-50.)
- [3] 李瑞昶,田沁,任福.基于Lucene引擎构建在线地址匹配服务[J].测绘与空间地理信息,2016,39(2):85.(LI Ruichang,TIAN Qin,REN Fu.Building online address matching services based on Lucene engine[J].Geomatics & Spatial Infomation Technology,2016,39(2):85.)
- [4] 邱儒琼,郑丽娜,谢超.基于语义提取的中文地名搜索引擎研究[C]//全国测绘科技信息网中南分网第二十四次学术信息交流会论文集.[出版地不详]:[出版者不详],2010:302.(QIU Ruqiong,ZHENG Lina,XIE Chao.A research on Chinese place name search engine based on semantic extraction[C]//Proceedings of the 24th Academic Information Exchange Meeting of Central-South Branch of National Surveying and Mapping Science and Technology Information Network.[S.l.]:[s.n.],2010:302.)
- [5] 苏菲,王丹力,戴国忠.基于标记的规则统计模型与未登录词识别算法[J].计算机工程与应用,2004(15):43-45.(SU Fei,WANG Danli,DAI Guozhong.A rule-statistic model based on tag and an algorithm to recognize unknown words[J].Computer Engineering and Applications,2004(15):43-45.)
- [6] 李旭瑞,邱雪涛,赵金涛,等.基于流式聚类及增量隐马尔可夫模型的实时反欺诈系统[J].计算机工程,2018,44(6):130.(LI Xurui,QIU Xuetao,ZHAO Jintao,et al.Real-time anti-fraud system based on stream clustering and incremental hidden Markov model[J].Computer Engineering,2018,44(6):130.)
- [7] 魏晓宁.基于隐马尔科夫模型的中文分词研究[J].电脑知识与技术(学术交流),2007(21):885-886.(WEI Xiaoning.HMM-based of study on Chinese language classifying words[J].Computer Knowledge and Technology(Academic Exchange),2007(21):885-886.)
- [8] 陈曦.基于子串的文本分割与主题标注研究[D].武汉:武汉大学,2009.(CHEN Xi.Research on subsequence-based text segmentation and topic labeling[D].Wuhan:Wuhan University,2009.)
- [9] 朱会亮.基于RSS信息源的服务型机器人网络检索系统的设计与实现[D].天津:天津大学,2012.(ZHU Huiliang.Design and implementation of service robot network retrieval system based on RSS information source[D].Tianjin:Tianjin University,2012.)
- [10] 梁东阳.中文地址名称识别算法设计和实现[D].天津:天津大学,2015.(LIANG Dongyang.Chinese address name reconition algorithm design and implementation[D].Tianjin:Tianjin University,2015.)
- [11] 李艳红,庞小平,李海亭.地名分词搜索的词典设计与匹配方法研究[J].测绘信息与工程,2011,36(2):53.(LI Yanhong,PANG Xiaoping,LI Haiting.Design and matching research of geographical name dictionary for words segmentation search[J].Journal of Geomatics,2011,36(2):53.)
- [12] 张文元,周世宇.基于Lucene的地名数据库快速检索系统[J].计算机应用研究,2017,34(6):1758.(ZHANG Wenyuan,ZHOU Shiyu.Place name database quick searching system based on Lucene[J].Application Research of Computers,2017,34(6):1758.)
- [13] 程钢,卢小平.顾及通名语义的汉语地名相似度匹配算法[J].测绘学报,2014,43(4):404-410,418.(CHENG Gang,LU Xiaoping.Matching algorithm for Chinese place names by similarity in consideration of semantics of general names for places[J].Acta Geodaetica et Cartographica Sinica,2014,43(4):404-410,418.)