SeSG: a search string generator for Secondary Studies with hybrid search strategies using text mining

Empirical Software Engineering(2022)

引用 0|浏览5
暂无评分
摘要
A Secondary Study (SS) is an important research method used in several areas. A crucial step in the Conduction phase of a SS is the search of studies. This step is time-consuming and error-prone, mainly due to the refinement of the search string. The objective of this study is to validate the effectiveness of an automatic formulation of search strings for SS. Our approach, termed Search String Generator (SeSG), takes as input a small set of studies (as a Quasi-Gold Standard) and processes them using text mining. After that, SeSG generates search strings that deliver a high F1-Score on the start set of a hybrid search strategy. To achieve this objective, we (1) generate a structured textual representation of the initial set of input studies as a bag-of-words using Term Frequency and Document Frequency; (2) perform automatic topic modeling using LDA (Latent Dirichlet Allocation) and enrichment of terms with a pre-trained dense language representation (embedding) called BERT (Bidirectional Encoder Representations from Transformers); (3) formulate and evaluate the search string using the obtained terms; and (4) use the developed search strings in a digital library. For the validation of our approach, we conduct an experiment—using some SS as objects—comparing the effectiveness of automatically formulated search strings by SeSG with manual search strings reported in these studies. SeSG generates search strings that achieve a better final F1-Score on the start set than the searches reported by these SS. Our study shows that SeSG can effectively supersede the formulation of search strings, in hybrid search strategies, since it dismisses the manual string refinements.
更多
查看译文
关键词
Search string,Text mining,Secondary studies,Systematic literature review,Systematic mapping study
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要