Ar-Spider: Text-to-SQL in Arabic
CoRR(2024)
摘要
In Natural Language Processing (NLP), one of the most important tasks is
text-to-SQL semantic parsing, which focuses on enabling users to interact with
the database in a more natural manner. In recent years, text-to-SQL has made
significant progress, but most were English-centric. In this paper, we
introduce Ar-Spider 1, the first Arabic cross-domain text-to-SQL dataset. Due
to the unique nature of the language, two major challenges have been
encountered, namely schema linguistic and SQL structural challenges. In order
to handle these issues and conduct the experiments, we adopt two baseline
models LGESQL [4] and S2SQL [12], both of which are tested with two
cross-lingual models to alleviate the effects of schema linguistic and SQL
structure linking challenges. The baselines demonstrate decent single-language
performance on our Arabic text-to-SQL dataset, Ar-Spider, achieving 62.48
S2SQL and 65.57
the baselines when trained in English dataset. To achieve better performance on
Arabic text-to-SQL, we propose the context similarity relationship (CSR)
approach, which results in a significant increase in the overall performance of
about 1.52
and English languages to 7.73
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要