Manifold coordinates with physical meaning

J. Mach. Learn. Res.(2019)

引用 5|浏览2
暂无评分
摘要
One of the aims of both linear and non-linear dimension reduction is to find a reduced set of collective variables that describe the data manifold. While algorithms return abstract coordinates such as spaces spanned by eigenvectors of data-dependent matrices, one can often associate these with features of the data, and hence with domain-related meaning. Usually, finding these domain-related or physical meanings is done via visual inspection by an expert. Our work formulates this problem as sparse, non-parametric, non-linear recovery of the manifold coordinates over a user-defined dictionary of domain-related functions. We show that the original problem can be transformed into a linear Group Lasso problem, and demonstrate the effectiveness of the method on molecular simulation data. 1 Motivation: manifold learning for collective variables Our motivating application is the understanding of the slow dynamic modes of molecules and other atomic systems from molecular dynamics simulations. In such simulations, the positions of atoms within a molecule are sampled as they proceed through time from some initial conditions. Even though the vector of atomic coordinates can take any value, due to interatomic interactions, the relative positions of atoms within the molecule lie near a low-dimensional manifold. Manifold Learning (ML) methods have become the framework of choice for finding these collective variables in molecular systems in a data-driven way. These variables correspond to macroscopically interesting transformations of the system, and can explain some of its properties [Clementi et al., 2000, Noé and Clementi, 2017]. Figure 1 illustrates several manifolds learned from molecular dynamics simulations. The learned collective variables are, in these cases, identified by visual inspection as corresponding to bond torsions, also known as dihedral angles. Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada. (a) Toluene (b) Ethanol (c) Malonaldehyde (d) g1 (e) g1 (f) g1 (g) Torsion example (h) g2 (i) g2 Figure 1: Collective coordinates with physical meaning in Molecular Dynamics (MD) simulations. 1a1c Diagrams of the toluene (C7H8), ethanol (C2H5OH), and malonaldehyde (C3H4O2) molecules, with the carbon (C) atoms in grey, the oxygen (O) atoms in red, and the hydrogen (H) atoms in white. Bonds defining important torsions gj are marked in purple and blue. The bond torsion is the angle of the planes inscribing the first three and last three atoms on the line (1g). 1d Embedding of the configurations of toluene into m = 2 dimensions, showing a manifold of d = 1. The color corresponds to the values of the purple torsion g1. 1e, 1h Embedding of the configurations of the ethanol in m = 3 dimensions, showing a manifold of dimension d = 2, respectively colored by the blue and purple torsions in Figure 1b. 1f, 1i. Embedding of the configurations of the malonaldehyde in m = 3 dimensions, showing a manifold of dimension d = 2, respectively colored by the blue and purple torsions in Figure 1c. Data is from Chmiela et al. [2017]. 2 Problem formulation We propose to replace such visual interpretation with a statistical procedure. We make the standard assumption that the observed data D = {ξi ∈ R : i ∈ 1 . . . n} are sampled i.i.d. from a smooth Riemannian manifold 1 (M, id) of intrinsic dimension d embedded in a feature space R by the inclusion map, with id the identity metric with respect to R. We assume that the intrinsic dimension d ofM is known. Furthermore, we assume the existence of a smooth embedding map φ :M→ φ(M) ⊂ R, where typically m << D. That is, φ restricted toM is a diffeomorphism onto its image, and φ(M) is a submanifold of R. We call the coordinates φ(ξi) in thism dimensional ambient space the embedding coordinates; let Φ = [φ(ξi) ]i=1:n ∈ Rn×m. In practice, the mapping of the data D onto φ(D) represents the output of an embedding algorithm, and we only have access toM and φ via D and its image Φ. The reader is referred to Lee [2003] for the definitions of the differential geometric terms used in this paper.
更多
查看译文
关键词
dimension reduction, manifold learning, functional regression, gradient, group, lasso
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要