面向非关系型数据库的时空数据信息量平衡索引方法Information balanced indexing method for spatiotemporal data oriented to non-relational databases
李龙海,谢鹏,何列松,吴昊天,付少锋
摘要(Abstract):
针对利用HBase等非关系型数据库存储倾斜分布的时空数据时查询效率明显下降的问题,该文提出了一种面向非关系型数据库的基于非均匀网格编码的时空数据索引方法。首先,按标准STCode编码方法将整个时空区域递归划分为均匀三维网格;然后,根据每个标准网格的数据密度,将该网格与其相邻网格或者进行合并,或者拆分成更小的网格,这样就生成了非均匀网格,每个非均匀网格被赋予一个信息量平衡时空编码(IBSTCode),并建立STCode与IBSTCode之间的映射表;最后,IBSTCode作为前缀被嵌入到时空数据记录的主键索引中。实验结果表明,当利用HBase存储非均匀分布的时空点状数据时,该文提出的信息量平衡索引方法可以显著提高查询效率。
关键词(KeyWords): HBase数据库;时空数据;数据库索引;时空编码
基金项目(Foundation): 地理信息工程国家重点实验室开放基金课题项目(SKLGIE2014-M-4-1);; 国家自然科学基金项目(41301527)
作者(Author): 李龙海,谢鹏,何列松,吴昊天,付少锋
DOI: 10.16251/j.cnki.1009-2307.2022.04.021
参考文献(References):
- [1] 林珲,游兰,胡传博,等.时空大数据时代的地理知识工程展望[J].武汉大学学报(信息科学版),2018,43(12):2205-2211.(LIN Hui,YOU Lan,HU Chuanbo,et al.Prospect of geo-knowledge engineering in the era of spatio-temporal big data[J].Geomatics and Information Science of Wuhan University,2018,43(12):2205-2211.)
- [2] 刘纪平,董春,亢晓琛,等.大数据时代的地理国情统计分析[J].武汉大学学报(信息科学版),2019,44(1):68-76.(LIU Jiping,DUNG Chun,KANG Xiaochen,et al.National geographical conditions statistical analysis in the era of big data[J].Geomatics and Information Science of Wuhan University,2019,44(1):68-76.)
- [3] LARS G.Hbase:the definitive guide[M].Sebastopol,CA:O’Reilly Media,Inc,2011:1-29.
- [4] MORTON G M.A computer oriented geodetic data base and a new technique in file sequencing[M].Ottawa,Canada:International Business Machines Company Technical Report,1966:1-32.
- [5] MOKBEL M F,AREF W G,KAMEL I.Analysis of multi-dimensional space-filling curves[J].Geoinformatica:an International Journal of Advances of Computer Science for Geographic,2003,7(3):179-209.
- [6] JEZEK J,KOLINGEROVA I.STCode:the text encoding algorithm for latitude/longitude/time[J].Lecture Notes in Geoinformation and Cartography,2014,22(11):163-177.
- [7] VAN LE H,TAKASU A.A scalable spatio-temporal data storage for intelligent transportation systems based on HBase[C]//2015 IEEE 18th International Conference on Intelligent Transportation Systems.Gran Canaria,Spain:IEEE,2015:2733-2738.
- [8] VAN L H.Distributed moving objects database based on key-value stores[C]//Proceedings of the VLDB 2016 PhD Workshop.New Delhi,India:[s.n.],2016:57-41.
- [9] GUAN Xuefeng,BO Cheng,LI Zhenqiang,et al.ST-hash:an efficient spatiotemporal index for massive trajectory data in a NoSQL database[C]//2017 25th International Conference on Geoinformatics.Buffalo,NY,USA:IEEE,2017:1-7.
- [10] FOX A,EICHELBERGER C,HUGHES J,et al.Spatio-temporal indexing in non-relational distributed databases[C]//2013 IEEE International Conference on Big Data.Silicon Valley,CA,USA:IEEE,2013:291-299.
- [11] GeoMesa[EB/OL].[2021-03-05].https://www.geomesa.org/.
- [12] NISHIMURA S,DAS S,AGRAWAL D,et al.MD-HBase:A scalable multi-dimensional data infrastructure for location aware services[C]//2011 IEEE 12th International Conference on Mobile Data Management.Lulea,Sweden:IEEE,2011:7-16.
- [13] CHEN Xiaoying,ZHANG Chong,GE Bin,et al.Spatio-temporal queries in HBase[C]//2015 IEEE International Conference on Big Data (Big Data).Santa Clara,CA,USA:IEEE,2015:1929-1937.
- [14] CHANG F,DEAN G,GHEMAWAT S,et al.Bigtable:a distributed storage system for structured data[J].ACM Transactions on Computer Systems,2008,26(2):1-26.
- [15] GDELT[EB/OL].[2021-03-05].https://www.gdeltproject.org/data.html#rawdatafiles.