Protein functions are highly regulated within living cells. For multiple functions, proteins interact with other proteins, forming macromolecular protein complexes. Analysis of these protein complexes is necessary to understand their organization into functional units, which regulate multiple metabolic pathways.
What is the Macromolecular Protein Complex?
In 1971, the Protein Data Bank (PDB) was established as an archive for biological macromolecular structures. Initially, the archive only stored twelve protein structures, including myoglobin, carboxypeptidase A, hemoglobin, and subtilisin. Over the years, the number of protein structures has significantly increased to more than 28000 entries.
Macromolecular protein complexes are associated with many cellular processes, including mRNA splicing, protein degradation, and protein folding. Among macromolecular protein complexes, such as ribosomes, multienzyme complexes, chaperonins, and structural proteins, viruses were the first to be analyzed under high resolution.
Ribosomes are massive ribonucleoprotein complexes that act as a translator of genetic information. Multienzyme complexes (e.g., glutamine synthetase, pyruvate dehydrogenase complex, and proteasome) contain different enzymes in close proximity, which efficiently carry out many biochemical reactions.
Scientists face immense difficulties in analyzing macromolecular protein complexes. For instance, in some cases, the resolution of the protein structures is much lower than those determined by X-ray crystallography, which results in the generation of less accurate atomic coordinates. In these cases, electron microscopy is used to better visualize macromolecular protein complexes.
Recent and Classical Methods of Macromolecular Protein Complex Analysis
DNA recombination techniques have significantly contributed to determining the structures of biological macromolecules. This technique enabled cloning, expressing, and purifying large quantities of protein macromolecules. Developments and advancements in automated technologies have made this a time-efficient approach.
Two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) is commonly used for separating and studying protein complexes. This approach provides quantitative information on hundreds of proteins. Recently, scientists have used one-dimensional fractionation of protein complexes by size-exclusion chromatography (SEC) or blue-native PAGE (BN) to analyze the size distribution of 3400 proteins in a single experiment.
Macromolecular protein structures are commonly studied using X-ray crystallography. Besides X-ray macromolecular crystallography, nuclear magnetic resonance (NMR) spectroscopy has also significantly contributed to the rapid growth in PDB holdings. Electron microscopy (EM) is one of the most common tools used to determine complex macromolecular protein structures.
Although large macromolecular protein complexes represent a small portion of PDB, the number of structures is rapidly growing. Most large macromolecular protein complexes are analyzed using X-ray crystallography, followed by electron microscopy, electron diffraction, and electron tomography. X-ray crystallography, in combination with a free electron laser (XFEL), has been recently used to determine omplex protein structures at the sub-micron level.
The continual advancements in EM technology and image processing methods have enabled the analysis of macromolecular protein complexes with higher accuracy. For instance, electron cryomicroscopy (cryoEM) has been recently used to analyze the structure of 50S ribosomes.
The subnanometer resolution cryoEM maps enabled the analysis of smaller macromolecular protein complexes, such as γ-secretase and icosahedral viruses. Recently, the use of single-molecule fluorescence resonance energy transfer (smFRET) and cryoEM has indicated that ribosomal function involves significant conformational heterogeneity and stochastic dynamics.
G protein-coupled receptors (GPCRs) constitute 2% of the human genome, which comprises the majority of pharmaceutical targets. Extracellular signals are transduced by GPCRs to heterotrimeric G proteins. These structures are visualized through EM and X-ray crystallographic imaging.
Many computational tools have been developed for the modeling of protein structures. At present, automated software is used to determine the crystal structure of protein complexes. This method has significantly reduced time for data processing and analysis. Some of the common tools used to visualize macromolecular protein complexes are VISION, Chimera, Deep View Swiss-PDB Viewer, SenSitus, SAIL, PyMol, and Amira. Many software can dock or fit models in EM volumes.
Analyzing Macromolecular Protein Complexes With Experimental Approaches
Technically, analysis of macromolecular protein complexes is a challenging and time-consuming task. Protein complexes are identified through various experimental approaches, such as affinity purification in combination with mass spectrometry (AP-MS) and classical yeast two-hybrid (Y2H) system.
Both the aforementioned experimental approaches are associated with advantages and limitations. For instance, the Y2H system generates a large number of false positive and false negative results. In addition, it has limited binary readout based on theory without considering interaction stability, subunit stoichiometry, and subcellular localization.
The AP-MS approach often requires modifications of the biological system, particularly during the introduction of tags. This leads to alterations of native protein structure and changes in quantity due to overexpression. In addition, AP-MS is a labor-intensive method that can only analyze a small number of proteins in parallel.
These aforementioned limitations are often overcome by applying classical methods of protein fractions such as PAGE and column-based chromatography with quantitative mass spectrometry.
Artificial Intelligence in Analyzing Macromolecular Protein Complex
Some common modeling approaches to determine the structure of protein complexes are shape complementarity docking, template-based modeling, and integrative modeling. Recently, AlphaFold, an artificial intelligence program developed by DeepMind, has outperformed the conventional modeling methods in analyzing dimeric protein complexes.
Initially, this deep learning method was designed to predict the structure of two protein chains. This limitation was overcome in the latest version of AlphaFold, which can predict protein complexes of up to a few thousand residues. However, scientists have suggested that AlphaFold requires more validation for accurate prediction and that this tool should be used with caution.
See More: What is Forensic Proteomics?
Bryant, P. et al. (2022) Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nature Communications, 13(1), pp.1-14. https://doi.org/10.1038/s41467-022-33729-4
Gorka, M. et al. (2019) Protein Complex Identification and quantitative complexome by CN-PAGE. Scientific Reports, 9(1), pp.1-14. https://doi.org/10.1038/s41598-019-47829-7
Purdy, M. D. et al. (2014) Function and dynamics of macromolecular complexes explored by integrative structural and computational biology. Current Opinion in Structural Biology, 27, 138. https://doi.org/10.1016/j.sbi.2014.08.006
Durand, A. et al. (2013) Structure, assembly and dynamics of macromolecular complexes by single particle cryo-electron microscopy. Journal of Nanobiotechnology,11. https://doi.org/10.1186/1477-3155-11-S1-S4
Srivastava, S. K. et al. (2012) Analysis of Conformational Variation in Macromolecular Structural Models. PLOS ONE, 7(7), e39993. https://doi.org/10.1371/journal.pone.0039993
Kirshenbaum, N. et al. (2010) Analyzing Large Protein Complexes by Structural Mass Spectrometry. Journal of Visualized Experiments: JoVE, (40). https://doi.org/10.3791/1954
Dutta, S. and Berman, H. M. (2005) Large Macromolecular Complexes in the Protein Data Bank: A Status Report. Structure, 13(3), pp.381-388. https://doi.org/10.1016/j.str.2005.01.008