深度学习与先验知识结合的英语地名音译技术Automatic transliteration technology of English geographical names based on deep learning and prior knowledge
王春苗,王继周,毛曦,马维军
摘要(Abstract):
针对传统人工地名翻译效率低、且囿于翻译者个人水平差异会导致翻译质量参差不齐,现有机器翻译不能有效解决地名翻译规则的交叉型歧义等问题,该文提出基于深度学习与先验知识相结合的英语地名音译技术,对地名翻译中音标生成、音标优化、音节划分3个核心环节进行研究,提出了基于深度学习的音标生成方法、基于先验知识的音标优化方法和基于双向最大匹配的音节划分方法,解决了机器翻译对地名单词音标识别差、汉字译写不规范的问题。实验结果表明,该地名专名音译技术比传统地名翻译方法时效性高、准确率高、翻译规范。
关键词(KeyWords): 地名翻译;机器翻译;深度学习;先验知识;双向最大匹配算法
基金项目(Foundation): 自然资源部重大项目(12113600000018);; 科技部重点研发项目(2017-2021)(2017YFB0503600);; 中国测绘科学研究院基本科研业务经费项目(AR1912)
作者(Author): 王春苗,王继周,毛曦,马维军
DOI: 10.16251/j.cnki.1009-2307.2020.05.027
参考文献(References):
- [1] 屈文生,李润.近代以来外国地名译名的规范化[J].出版发行研究,2013(2):93-95.(QU Wensheng,LI Run.Standardization of the translation of foreign place names in modern times[J].Publishing Research,2013(2):93-95.)
- [2] 侯强,侯瑞丽.机器翻译方法研究与发展综述[J].计算机工程与应用,2019,55(10):30-35,66.(HOU Qiang,HOU Ruili.Review of studies and developments on machine translation methodology[J].Computer Engineering and Applications,2019,55(10):35-40,71.)
- [3] 刘新贵,孙群,马小青,等.境外地名翻译软件的研制[J].测绘通报,2011(4):74-76.(LIU Xingui,SUN Qun,MA Xiaoqing,et al.A study of foreign geographical name translation software[J].Bulletin of Surveying and Mapping,2011(4):74-76.)
- [4] 王海葳,郭卫红,丁海燕.澳大利亚地名辅助翻译实现方法[J].科技资讯,2015,13(14):190-191.(WANG Haiwei,GUO Weihong,DING Haiyan.The realization method of Australian place name auxiliary translation[J].Science & Technology Information,2015,13(14):190-191.)
- [5] 王海葳,杨清.基于EXCEL的东南亚外语地名辅助智能翻译[J].科技资讯,2015,13(2):207-208.(WANG Haiwei,YANG Qing.Assisted intelligent translation of foreign names in Southeast Asia based on Excel[J].Science & Technology Information,2015,13(2):207-208.)
- [6] 赵云鹏,胡斯玥,刘新贵,等.俄语地名翻译的音节切分研究[J].地理空间信息,2015,13(6):161-163,16.(ZHAO Yunpeng,HU Siyue,LIU Xingui,et al.A study of syllable segmentation in Russian place name translation[J].Geospatial Information,2015,13(6):161-163,16.)
- [7] 赵云鹏,刘新贵,宋华标,等.一种俄语地名专名快速音译方法[J].测绘与空间地理信息,2016,39(6):47-49,55.(ZHAO Yunpeng,LIU Xingui,SONG Huabiao,et al.A rapid russian geographical proper names transliteration method[J].Geomatics and Spatial Information Technology,2016,39(6):47-49,55.)
- [8] 颜闻,刘德钦,毛曦,等.机器学习的地名专名音译技术研究[J].测绘科学,2019,44(10):87-92.(YAN Wen,LIU Deqin,MAO Xi,et al.Research on transliteration technology of place names based on machine learning[J].Science of Surveying and Mapping,2019,44(10):87-92.)
- [9] 刘祥清.音译的历史、现状及其评价[J].中国科技翻译,2008(2):38-41,32.(LIU Xiangqing.The history,current situation and evaluation of transliteration[J].Chinese Science & Technology Translators Journal,2008(2):38-41,32.)
- [10] GREFF K,SRIVASTAVA R K,KOUTNíK J,et al.LSTM:a search space odyssey[J].IEEE Transactions on Neural Networks & Learning Systems,2015,28(10):2222-2232.
- [11] SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Advances in neural information processing systems.[S.l.:s.n.],2014:3104-3112.
- [12] SHI H,JI X,LI H.Application of SQLite Database in Dispatch and Substation Integrated System[C]//Advanced Science and Industry Research Center.Proceedings of 2018 2nd International Conference on Electrical Engineering and Automation(ICEEA2018).Advanced Science and Industry Research Center:Science and Engineering Research Center,2018:32-34.
- [13] 赵旭,胡璇.网络链接爬取算法优化策略[J].电子技术与软件工程,2019(2):9.(ZHAO Xu,HU Xuan.Optimization strategy of web crawler algorithm [J].Electronic Technology & Software Engineering,2019(2):9.)
- [14] 周定国.谈外国地名翻译的几个原则性问题[J].地图,1989(2):48-51.(ZHOU Dingguo.Some principles of translation of foreign place names[J].Map,1989(2):48-51.)
- [15] FRASER C B,IRVING R W,MIDDENDORF M.Maximal common subsequences and minimal common supersequences[J].Information and Computation,1996,124(2).
- [16] 陈之彦,李晓杰,朱淑华,等.基于Hash结构词典的双向最大匹配分词法[J].计算机科学,2015,42(S2):49-54.(CHEN Zhiyan,LI Xiaojie,ZHU Shuhua,et al.Bi-direction maximum matching method based on Hash Structural Dictionary[J].Computer Science,2015,42(S2):49-54.)
- [17] PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40 th Annual Meeting on Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2002:311-318.