Analysis and parsing of unstructured cyber-security incident data: poster
Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks(2019)
摘要
The latest threat intelligence platforms use structured protocols to share and analyze cyber-security data. However, most of this data is reported to the platform in the form of unstructured text such as social media posts, emails, and news articles, which then require manual conversion to structured form. In order to bridge the gap between unstructured and structured data, we propose to implement a natural-language-processing-(NLP)-based information extraction (IE) system that takes texts within the cyber-security domain and parses them into structured format. Our approach targets the VERIS format and makes use of the VERIS Community Database as a source of unstructured texts---primarily consisting of news articles-and their structured counterparts (VERIS reports). We propose first to use a supervised machine learning (ML) classifier to discriminate between cyber-related and non-cyber-related texts, and then to use ML classifiers decide which VERIS parameters are relevant in a given text. Then, we propose to use NLP and IE techniques to extract tuples of grammatically co-dependent words. Finally, these tuples will be passed to a domain- and field-specific IE components to fill in different fields of an output VERIS report.
更多查看译文
关键词
VERIS, cyber-security, information extraction, natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络