[P] Writing mission statement for project related to machine learning for catalyst design.
I am starting my first scientific job after my PhD in Physics, as a research scientist this summer in Beijing for a project related to catalyst design. I am very excited to be part of this project. I am starting to write out the scope of the project. Let me know any tips or advice you may have.
Many fields in Physics and Chemistry utilize the Schrodinger Equation or Density Functional Theory to compute various properties and insights into the systems under study. Any computational natural scientist will know that in practice the starting point for many of these calculations, in other words what is fed as input into the machine are the atomic numbers and geometries.
(a) Can machine learning help discover new insights and physics into the systems of interest- From knowledge of just the molecular structures of the atoms are we able to acquire any insight into more efficient and less harmful combustion reactions that utilize catalysts? In a more broader sense this may give insight into the computational quantum mechanical approach.
(b) The inverse design problem deals with the prediction of novel undiscovered molecules.
These two problems have the opposite approaches: in case (a) we move from a known chemical space towards prediction of physical and chemical quantum properties, whereas in the second case we would like to start from a desired property to make predictions about the chemical space.
To approach question (a) the work uses quantum mechanical data to undergo supervised learning methods which have been shown to perform generally on the same scale as models that utilize unsupervised (for e.g. convulutional neural networks).
No theoretical methods exist to explore all combinatorially possible alloyed systems. ( for the smallest known thiolated nanocluster, Au_15 (SR)_13 there would be over 32k possibilities which presents a significant computational challenge to characterize all potential structures)
The overall problem of catalyst design from a brute force machine learning method would therefore be to have stacks or layers of features and predictions starting from knowledge of just the geometry and atomic numbers of the molecular structures.
(b) The inverse design problem revolves around finding the best chemical structures with desired properties. One could utilize invertible models from machine learning such as generative models (GANs, autoencoders).
According to Kulik et al.  only a tiny fraction (1 part in 1050) of chemical space has ever been explored. This necessitates the need for machine learning approach rather than design each molecular system by hand (either experimentally or computationally). A great review paper that provides insight into how the inverse design problem may be approached was written by B. Sanchez-Lengeling and Alan Aspuru-Guzik . One could begin unsupervised learning starting from databases of potential catalysts.
 Machine-Learning Prediction CO Adsorption in Thiolated, Ag- Alloyed Au Nanoclusters, J. Am. Chem. Soc. 2018, James P. Lewis et al.
 Designing in the Face of Uncertainty: Exploiting Electronic Structure and Machine Learning Models for Discovery, Inorganic Chemistry, Inorg. Chem. 2019, H. Kulik et al.
 Inverse Molecular Design using Machine Learning: Generative models for matter engineering, Science 2018, B.Sanchez-Lengeling and Alan Aspuru-Guzik.