A Highly Accurate Data Synchronization and Full-text Search Algorithm for Canal and Elasticsearch

2023 IEEE International Conference on Networking, Sensing and Control (ICNSC)(2023)

引用 0|浏览0
暂无评分
摘要
Currently, there are numerous thorny issues in structured data and semi-structured full-text search scheme with large-scale text nature, like long data synchronization delay, inconvenient personalized business processing and low efficiency. To address these issues, this paper proposes an efficient algorithm based on Canal data synchronization framework and Elasticsearch full-text search engine. Firstly, we rewrite the Canal adapter component to obtain the flexible configuration of business data processing, thereby enhancing the secondary data processing ability of the framework and achieving the purpose of improving the efficiency of data synchronization. Secondly, by recording the synchronization time of nearby data in Canal framework, the weight of time series data is gradually decreased by combining with the exponential weighted average function to highlight the influence of recent data and present the novelty of data, which can achieve effective control the synchronization interval and duration by dynamically and flexibly setting the synchronization trigger period. Lastly, the Elasticsearch word tokenizer is modified, and then the configuration of custom expansion words and stop words dictionary are proposed to filter the query data effectively, thereby enhancing the query hit rate and accuracy. Extensive experiments on the data of traditional Chinese medicine demonstrate that the designed algorithm obtains high data synchronization efficiency, full text search speed and accuracy. Hence, the proposed algorithm is a milestone in smart healthcare.
更多
查看译文
关键词
Elasticsearch,Canal,Real-Time Synchronization,Full Text Search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要