bigNN: An open-source big data toolkit focused on biomedical sentence classification

Ahmad P. Tafti,Ehsun Behravesh,Mehdi Assefi,Eric LaRose,Jonathan Badger,John Mayer,AnHai Doan,David Page,Peggy Peissig

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)（2017）

引用 19|浏览6

暂无评分

摘要

Every single day, a massive amount of text data is generated by different medical data sources, such as scientific literature, medical web pages, health-related social media, clinical notes, and drug reviews. Processing this wealth of data is indeed a daunting task, and it forces us to adopt smart and scalable computational strategies, including machine intelligence, big data analytics, and distributed architecture. In this contribution, we designed and developed an open-source big data neural network toolkit, namely bigNN which tackles the problem of large-scale biomedical text classification in an efficient fashion, facilitating fast prototyping and reproducible text analytics researches. bigNN scales up a word2vec-based neural network model over Apache Spark 2.10 and Hadoop Distributed File System (HDFS) 2.7.3, allowing for more efficient big data sentence classification. The toolkit supports big data computing, and simplifies rapid application development in sentence analysis by allowing users to configure and examine different internal parameters of both Apache Spark and the neural network model. bigNN is fully documented, and it is publicly and freely available at https://github.com/bircatmcri/bigNN.

查看译文

关键词

Big Data Computing, Big Data Biomedical Text Classification, Open-Source Big Data Neural Network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要