Volume 11, Issue 16 p. 4096-4107
Full Paper

Data Mining the C−C Cross-Coupling Genome

Boodsarin Sawatlon

Boodsarin Sawatlon

Laboratory for Computational Molecular Design Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland

These authors contributed equally to this work.

Search for more papers by this author
Dr. Matthew D. Wodrich

Dr. Matthew D. Wodrich

Laboratory for Computational Molecular Design Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland

These authors contributed equally to this work.

Search for more papers by this author
Dr. Benjamin Meyer

Dr. Benjamin Meyer

Laboratory for Computational Molecular Design Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland

National Centre for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland

Search for more papers by this author
Alberto Fabrizio

Alberto Fabrizio

Laboratory for Computational Molecular Design Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland

National Centre for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland

Search for more papers by this author
Prof. Clémence Corminboeuf

Corresponding Author

Prof. Clémence Corminboeuf

Laboratory for Computational Molecular Design Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland

National Centre for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland

Search for more papers by this author
First published: 29 April 2019
Citations: 15

Graphical Abstract

Digging with precision: The tandem of dimensionality-reducing data-clustering maps, machine-learning, and molecular volcano plots permits a database of ∼25000 catalysts to be mined with precision in order to extract overarching chemical trends or to focus on the anticipated behavior of specific classes of catalysts.

Abstract

The speed and precision of machine-learning (ML) techniques in determining quantum chemical properties has resulted in a considerable computational speed up in comparison to traditional quantum chemical methods, and now allows a desired property of thousands of molecules to be assessed virtually instantaneously. The large databases that result from employing ML can, in turn, be mined with the goal of uncovering relationships that may be missed through more commonly used small scale screening procedures. Due to its prominent place in chemistry, catalysis represents a particularly fruitful playground, where drawing connections between the quantum chemical properties of catalysts and their overall catalytic performance may lead to the identification of new, highly functional species. In this spirit, we previously trained ML models to predict the performance of 18000 prospective catalysts for a Suzuki coupling reaction using molecular volcano plots. Here, we apply concepts from big data to probe a type of “C−C cross-coupling genome” that explores results from many different named cross-coupling reactions. The use of interactive dimensionality-reducing data-clustering maps facilitates the identification of relationships between the thermodynamics of different catalysts and the chemical properties of their constituent metal and ligands. Analyzing large numbers of species in this manner leads to the identification of not only unexpected catalysts that have thermodynamically ideal profiles to catalyze C−C cross-coupling reactions, but also reveals a wealth of interesting chemical trends regarding the influence played by different metals and ligands, as well as their unique combinations.

Conflict of interest

The authors declare no conflict of interest.