谷歌浏览器插件
订阅小程序
在清言上使用

Machine Learning Methods for Small Data Challenges in Molecular Science

Chemical reviews(2023)

引用 10|浏览17
暂无评分
摘要
Small data are often used in scientific and engineeringresearchdue to the presence of various constraints, such as time, cost, ethics,privacy, security, and technical limitations in data acquisition.However, big data have been the focus for the past decade, small dataand their challenges have received little attention, even though theyare technically more severe in machine learning (ML) and deep learning(DL) studies. Overall, the small data challenge is often compoundedby issues, such as data diversity, imputation, noise, imbalance, andhigh-dimensionality. Fortunately, the current big data era is characterizedby technological breakthroughs in ML, DL, and artificial intelligence(AI), which enable data-driven scientific discovery, and many advancedML and DL technologies developed for big data have inadvertently providedsolutions for small data problems. As a result, significant progresshas been made in ML and DL for small data challenges in the past decade.In this review, we summarize and analyze several emerging potentialsolutions to small data challenges in molecular science, includingchemical and biological sciences. We review both basic machine learningalgorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM),kernel learning (KL), random forest (RF), and gradient boosting trees(GBT), and more advanced techniques, including artificial neural network(ANN), convolutional neural network (CNN), U-Net, graph neural network(GNN), Generative Adversarial Network (GAN), long short-term memory(LSTM), autoencoder, transformer, transfer learning, active learning,graph-based semi-supervised learning, combining deep learning withtraditional machine learning, and physical model-based data augmentation.We also briefly discuss the latest advances in these methods. Finally,we conclude the survey with a discussion of promising trends in smalldata challenges in molecular science.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要