Accelerating the discovery of novel catalysts utilizing machine learning algorithms and computational and experimental data
Katie McCullough

We interact with artificial intelligence and machine learning (ML) algorithms on a day-to-day basis, from image and voice recognition to predicting fraudulent credit transactions. Researchers at the Catalysis Center for Energy Innovation (CCEI) are using ML to accelerate the catalyst discovery process by predicting new bimetallic catalysts for ethanol reforming. ML is ideally suited to quickly and accurately identify important chemical parameters and developing models for catalyst predictions. Catalysts are involved in over 80% of all chemical manufacturing processes. They are constantly being improved to increase the overall efficiency and economic feasibility of chemical processes. Accelerating the discovery process not only saves time and financial resources but ensures that sustainable energy technologies can be integrated into large, industrial-scale processes more quickly. With the massive number of intertwined variables affecting chemical reactivity in heterogeneous catalysis, artificial intelligence and ML algorithms can be used to learn these relationships and make predictions to discover never before seen catalysts and with fewer resources than the typical one-at-a-time approach.

The complexity of catalysis

A metal can catalyze reactions by lowering the overall energy needed to turn one kind of chemical into another. In heterogeneous catalysis, metal nanoparticles are often dispersed over a high surface area material to create active sites for the reaction to occur. Different metals are ideal for different kinds of reactions, e.g., cobalt and iron are ideal for synthesizing light hydrocarbons, and platinum is more suited for oxidizing toxic carbon monoxide into carbon dioxide. A reactant will adsorb to the surface of the active metal, react, and then desorb from the metal’s surface. Different metals are ideal for different reactions primarily due to the strength of the bond that forms between the reactant and metal surface. The bond strength of the reactant to the metal must be strong enough for the reactant to “stick” to the surface of the metal, but not so strong as to inhibit the final product from desorbing. Otherwise, new reactant molecules will not be able to bind to the surface of the metal, and the activity of the catalyst will decrease. During this process of turning a reactant into a product, many intermediary chemical bonds are formed and broken. This can result in many pathways for different products to form, and selectively forming one product over another is not straightforward.

Experimental determination of catalytically active metals and materials often requires a systematic trial-and-error approach that can span the course of months or years. Take for example Alwin Mittasch in 1909, whose team studied over 2,500 different catalysts and conducted 20,000 different experiments to determine how to make ammonia from atmospheric nitrogen and hydrogen. After 10 years of experimentation, this ammonia synthesis catalyst revolutionized the agricultural industry and is still used today for fertilizer production. Nowadays, we can utilize computational methods to gain better understanding of the mechanism behind how products form on different metal surfaces. However, computational calculations for hundreds or thousands of materials is similarly infeasible to perform. This is where ML excels. Researchers at CCEI have successfully utilized ML algorithms to extract knowledge from small amounts of computational and experimental data for catalyst predictions.

Developing ML algorithms with predictive capabilities

Developing innovative catalysts for the transformation of non-food-based biomass into fuels and chemicals is what CCEI does; integration of ML can accelerate this mission. ML typically needs hundreds to thousands of data points in order to produce accurate predictions, and consistent experimental studies are often much smaller. CCEI researchers have circumvented this issue by using atomic-scale factors that control catalyst activity to develop predictive ML models. They have developed a two-step approach that augments a small amount of experimental data with computational data, containing relevant reaction energies for products that can be formed from ethanol reforming.

Schematic. Flowchart of the combined machine-learning (ML) approach consisting of two ML models. ML Model 1 is a nonlinear model combining random forest regression (RFR) and Gaussian process regression (GPR) trained on extensive reference data from density-functional theory (DFT) calculations to predict transition state energies of ethanol decomposition reaction steps. The predicted transition-state energies enter a second linear model (ML Model 2) that is trained on a smaller data set of catalytic activities and selectivities from published experiments. Artrith et al., ACS Catal. 2020, 10, 9438-9444.

Ethanol reforming is one method to sustainably produce hydrogen for use in fuel cells.  Ethanol can be produced from waste-biomass such as corn husks and can be reformed to produce hydrogen. However, decarbonylation may also occur and leads to the formation of undesirable methane, which is a greenhouse gas. Ethanol can also be completely decomposed to form atomic carbon which leads to catalyst deactivation. Artrith et al. set out to predict and develop a suitable catalyst for ethanol reforming using both experimental and computational data, with the goal of optimizing the amount of ethanol that is converted and the amount of hydrogen produced.

In order to have the ML model learn the fundamental reaction pathways for transforming a reactant into a product, the first ML model was trained on a computational database containing carbon–carbon and carbon–oxygen bond scission energies, and ethanol surface reactivity of metals. This model was used to predict transition state energies for ethanol decomposition reactions over various platinum based bimetallic catalysts. The transition states involve the different pathways that ethanol can take to transform into either methane, hydrogen, or pure carbon as mentioned above. Next, the output from the first ML model was used as an input to a second ML model to predict the final selectivity (how much one product forms over another) and conversion (how much reactant is converted into the product) for different bimetallic catalysts.

The second model utilized experimental selectivities and conversions from the literature to make the final predictions of new bimetallic catalysts. The second ML model was validated against experimental results, using a leave one out cross validation. This means that one of the data points used to train the model was removed, and the rest of the data was used to predict this singular data point. This is repeated for each data point used for training. The model exhibited remarkable accuracy for predicting both conversion and selectivity. The researchers were able to identify four promising compositions (containing either chromium, manganese, cobalt, or zinc in combination with platinum) that may yield very high selectivity to hydrogen production from ethanol reforming. They also were able to identify which combinations of metals would not be ideal to produce hydrogen from ethanol.

Researchers at CCEI demonstrated not only how ML can accelerate catalyst discovery, but how ML and computational calculations can be utilized to extract knowledge from small amounts of experimental data. The ability to predict both good and bad catalysts without the need to synthesize and test hundreds of catalyst combinations dramatically accelerates the catalyst discovery process by saving time and financial resources. Ultimately, the results of this study are a huge, innovative step forward for CCEI in assisting the U.S. in meeting the energy demands of the future.

More Information

Artrith et al., “Predicting the Activity and Selectivity of Bimetallic Metal Catalysts for Ethanol Reforming using Machine Learning” ACS Catal. 2020, 10, 9438-9444.


DFT calculations and ML model construction made use of the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1053575 (allocation no. DMR14005). Calculations were also performed on the computational resources of the Center for Functional Nanomaterials, which is a U.S. DOE Office of Science Facility, at Brookhaven National Laboratory under Contract No. DESC0012704. We also acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by National Institutes of Health (NIH) Research Facility Improvement Grant 1G20RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010. This article was sponsored by the Catalysis Center for Energy Innovation (CCEI), an Energy Frontier Research Center (EFRC) funded by the U.S. Department of Energy, Office of Basic Energy Sciences under Award Number DE-SC0001004. N.A. thanks Dr. Jose Garrido Torres and Dr. Mark S Hybertsen for discussions.

About the author(s):

Katie McCullough is a postdoctoral appointee at Argonne National Laboratory, where her work focuses on plastic upcycling and data-driven approaches to catalyst discovery by coupling high throughput experimentation with machine learning algorithms. Katie earned her PhD in Chemical Engineering from the University of South Carolina in 2021. Her research interests include high throughput experimentation, heterogeneous catalysis, surface science, and sustainability. Katie worked on The Inorganometallic Catalyst Design Center (ICDC). ORCID ID #0000-0001-9752-5740.