Accelerating understanding of the protein assembly process holds potential for fast advances
Guomin Zhu

Proteins self assemble to form different aggregated structures. This self-assembly process is integral to the formation of highly ordered protein architectures that are widely found in cellular structural elements and connective tissues, the most abundant tissue found in the human body. The assembly structure plays an important role in governing biological function. For instance, collagen is the main structural protein in the body’s connective tissues, making up 25% to 35% of the whole-body protein content. The collagen assembled structure determines whether the collagen tissues are healthy. Defective assembled structures could lead to various diseases, e.g., Parkinson’s disease, which is associated with the self assembly of a condensed protein phase.

Understanding the formation mechanism of protein architectures is important to better elucidating how defective structures form, and could also help to harness the structural complexity of the protein architecture for materials design and formation.

However, capturing the formation process of protein structures is difficult. Some characterization techniques equip researchers to study individual proteins, while other techniques can help to observe final aggregate structures. Rarely, however, are they well-suited to do both. Additionally, monitoring the evolution of a system over time requires taking many quick snapshots, when most techniques can only resolve poor quality images in a short time. One emerging microscopy tool that can meet these requirements is high-speed atomic force microscopy (AFM), which shows potential to enable imaging protein assembly processes directly in solution at both high spatial and temporal resolution.

High speed AFM is promising, but challenges still remain. One of the major problems arises due to the large volume of data generated, which takes time to process and analyze. Recently, scientists from the University of Washington and Oak Ridge National Laboratory in the Center for the Science of Synthesis Across Scales (CSSAS) helped to address this by demonstrating a deep machine learning-based method to tackle the problem.

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify weak patterns that are difficult or impossible for researchers to discern, and make decisions with minimal human involvement.

Protein self-assembly usually begins with randomly oriented proteins and gradually assembles into an ordered structure. Each individual protein has its own trajectory to assemble into the final structure. The researchers at CSSAS used data from the latest stage of protein self-assembly structure as training sets for the algorithm since identification of individual particles improves with time as their motion slows and they form ordered domains, such that the position and orientation of the protein are relatively well defined. They then use the output of the algorithm and apply it to all other data sets where the proteins are more dynamic, and with more uncertainty, to identify position and orientation. Eventually, the full spatiotemporal diagram of protein trajectories is then reconstructed, and the individual particle/domain trajectories are isolated and analyzed, as shown in the Figure 1.

Figure 1. The relationship of the deep learning algorithm and experimental protein assembly process. The protein has a rod-shaped structure, and its position and direction can be outlined by the algorithm to extrapolate kinetic information during protein assembly from a structure with random oriented proteins to a structure with more oriented proteins.

This method is an initial step of the efforts by CSSAS to have a full understanding of the protein assembly process, which could help understand the formation of defective protein architecture in human tissue. The assembled structure also has the potential to be harnessed for advanced synthesis of materials. Plus, the algorithm could also be applied to many other areas involving large volume and kinetic data that require automatic intervention to increase data processing efficiency and identify weak patterns that are manually challenging to see.

More Information

EFRC Publications

Ziatdinov, Maxim, et al. "Quantifying the dynamics of protein self-organization using deep learning analysis of atomic force microscopy data." Nano Lett. 2021, 21, 1, 158–165.

Other Sources

Gajko-Galicka A. “Mutations in type I collagen genes resulting in osteogenesis imperfecta in humans.” Acta Biochim Pol. 2002;49(2):433-41.

McManus, Jennifer J., et al. "The physics of protein self-assembly." Current opinion in colloid & interface science 22 (2016): 73-79.


This work1 is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, as part of the Energy Frontier Research Centers program: CSSAS, The Center for the Science of Synthesis Across Scales, under Award Number DESC0019288, located at University of Washington. The design of the proteins (H.P, D.B) was supported by Grant DESC0018940 funded by the U.S. Department of Energy, Office of Science. The machine learning was performed and partially supported (M.Z.) at the Oak Ridge National Laboratory’s Center for Nanophase Materials Sciences (CNMS), a U.S. Department of Energy, Office of Science User Facility. High speed AFM experiments were performed at the Department of Energy’s Pacific Northwest National Laboratory (PNNL). PNNL is a multi program national laboratory operated for Department of Energy by Battelle under Contract No. DEAC05- 76RL01830. The authors are grateful to Pratyush Tiwari (UMD) for valuable discussions.

About the author(s):

Guomin Zhu is a Ph.D. student in Materials Science and Engineering at the University of Washington and also a visiting scholar at Pacific Northwest National Laboratory. Zhu works on the CSSAS EFRC, where he focuses on understanding material nucleation, crystallization, and self-assembly by using electron-microscopy-based techniques. Specifically, he is studying iron oxide mesocrystal formation mechanisms by applying mainly transmission electron microscopy. He has passion for fundamental science and hopes it will lead to future dividends for society.

Newsletter Articles

Research Highlights