Preparation of porcine mitochondria and cryo-EM grids
Mitochondria were isolated from porcine hearts following a modified version of a protocol originally described by A. L. Smith57. Before mitochondrial extraction, the pig hearts were subjected to three distinct treatment conditions: (1) fresh—immediately placed on ice for all subsequent procedures; (2) mild—incubated at room temperature for 40 min and put on ice to quickly cool down for isolation; and (3) harsh—incubated at room temperature for more than 4 h before being cooled on ice. The isolated mitochondria were then resuspended in a solution containing 0.25 M sucrose, 10 mM Tris-buffered with H2SO4 and 0.2 mM EDTA at pH 7.8. The suspension was adjusted to achieve a final optical density at 600 nm of 1.3 absorbance units.
For cryo-EM grid preparation, 3.3 μl of the mitochondrial suspension was applied to each Quantifoil holey carbon grid (R2/1, 300 mesh gold). Grids were incubated for 5 s in a Vitrobot Mark IV (Thermo Fisher Scientific) chamber maintained at 8 °C and 95% relative humidity. Excess solution was blotted using standard Vitrobot filter paper before the grids were rapidly plunged into liquid ethane at a temperature of approximately −170 °C.
Cryo-ET data collection
Grids were initially screened for optimal ice conditions using a 200 kV Glacios microscope (Thermo Fisher Scientific) at the Yale Science Hill Electron Microscopy Facility. Selected grids were subsequently transferred to a 300 kV Titan Krios microscope (Thermo Fisher Scientific), equipped with a Bioquantum Energy Filter and a K3 direct electron detector (Gatan), for high-resolution data acquisition at the Yale West Campus Electron Microscopy Facility. Automated data collection was facilitated using SerialEM software58 and Gatan DigitalMicrograph. All images were captured in superresolution mode, with a physical pixel size of 6.1 Å (effectively 3.05 Å in superresolution). A total of eight tilt series were collected, targeting a relatively high defocus range, from −6 µm to −10 µm, for better contrast to guarantee a more reliable initial reconstruction. A grouped dose-symmetric scheme, spanning from −60° to 60° at 2° increments, was used for tilt series acquisition, with an accumulated dose of 100 e−/Å2.
Cryo-ET reconstruction and subtomogram averaging
Tomogram reconstruction was streamlined using custom scripts. Initial frame alignment was performed using MotionCorr2 (ref. 59), followed by micrograph binning at a factor of two. Tilt series stacks were generated using in-house scripts. All tilt series were aligned and reconstructed using AreTomo 1.2.5 (ref. 60). Initial contrast transfer function (CTF) parameters were estimated with GCTF61 and cryoSPARC62. Raw micrographs and reconstructed results were visualized and diagnosed using IMOD63 and ChimeraX64.
Individual SC particles were picked in EMAN2 (ref. 65). Metadata preparation yielded 12,000 subtomogram particles in RELION-4.0 (ref. 66) with a binning factor of 2 (pixel size 12.2 Å). Following two rounds of 3D classification, 806 SC particles were selected for final refinement, resulting in a 37 Å subtomogram averaging map. Resolution was assessed using Fourier shell correlation with a threshold of 0.143 in RELION-4.0 (ref. 66). The averaged map was backprojected onto the original tomogram using the subtomo2Chimera code, available at https://github.com/builab/subtomo2Chimera.
Single-particle cryo-EM data collection
Automated data acquisition was performed using either a Glacios or a Titan Krios electron microscope (Thermo Fisher Scientific). The Glacios was equipped with a K3 direct electron detector (Gatan) and operated at 200 kV at a pixel size of 0.434 Å in superresolution mode, with an objective aperture of 100 μm. The Titan Krios, also equipped with a K3 direct electron detector, was operated at 300 kV at a pixel size of 0.416 Å in superresolution mode with a Gatan energy filter. Automatic data collection was facilitated using the SerialEM software package58. Multishot acquisition parameters were set at 3 × 3 holes per imaging location, with four exposures per hole at 200 kV and five exposures per hole at 300 kV. The total electron dose was fractionated to 42 e−/Å2 for the Glacios and 50 e−/Å2 for the Titan Krios, distributed across 45 frames at 40 ms per frame. Defocus parameters ranged from −1.0 μm to −3.0 μm for the 200 kV dataset and from −1.3 μm to −3.0 μm for the 300 kV datasets. Details of the data collection are summarized in Supplementary Tables 1–6.
Preprocessing
For all datasets, motion correction was performed using MotionCor2 (ref. 59) or cryoSPARC62. The CTF of each motion-corrected micrograph was estimated using Gctf61 or cryoSPARC62. Particles were picked with Gautomatch or cryoSPARC using an iterative sorting strategy as described below. Cryo-EM scripts used for real-time data transfer and on-the-fly preprocessing can be downloaded from https://github.com/JackZhang-Lab.
Overall particle selection and sorting strategy
Owing to the challenges posed by low signal-to-noise ratios and a highly congested macromolecular environment (Extended Data Fig. 1a), traditional particle selection methodologies were insufficient for generating datasets amenable to reliable two-dimensional (2D) classification, ab initio three-dimensional (3D) reconstruction and subsequent local refinement. To address this issue, we implemented an iterative strategy to optimize particle selection and sorting. The approach involved several rounds of iterative 2D particle picking, 2D classification and 3D analyses including ab initio 3D reconstruction, 3D classification and multilevel local refinement. Unlike the conventional particle selection approach, our strategy used Gautomatch and cryoSPARC62 for template matching to gradually increase the resolution of 3D projections as the reconstructions were progressively improved over cycles. We used several independent sources of references to cross-validate the final results. To maximize the yield of high-quality particles, particles from the classes that show clear features of SCs in all cycles were merged for subsequent 3D cross-classification. More details of the strategy are explained in the following sections.
Initial 3D reconstruction with surrounding membranes (type A)
Conventional 2D classification failed to generate meaningful class averages using images selected from our in situ cryo-EM micrographs of mitochondria for three main reasons: (1) thick samples that led to low signal-to-noise ratios and large defocus variations, (2) a crowded environment that affected particle detection and alignment, and (3) strong membrane signals that dominated the alignment, leading to blurred averages of protein regions (Extended Data Fig. 1b).
To address this, we initially used the strong membrane signals and focused on the side views surrounded by membranes using 2D classification. These side views in principle contained sufficient orientational information for a complete 3D reconstruction. At the outset, protein signals were completely averaged out in the 2D classification, whereas the membranes were well aligned owing to the strong side-view signals (Extended Data Fig. 1c). We then conducted several cycles of 2D classification to focus only on particles exhibiting clear membrane signals.
Through comprehensive 2D analyses, we found that regions potentially harbouring mitochondrial SCs exhibited special features of local curvature. Specifically, these regions were characterized by membrane signals that seemed to be concave towards the matrix direction, indicative of the presence of CIII2 (Extended Data Fig. 1c–e). By merging particles from classes with characteristic concave membranes surrounding CIII2 and conducting further 2D classification, we achieved improved 2D averages showing clear membrane features around CIII2 (Extended Data Fig. 1d). Notably, extra protein densities adjacent to CIII2 were obvious, probably representing CI or CIV densities. However, it was unclear how many types of respiratory SC exist in native mitochondria and whether CI, CIII2 and CIV always appear in the form of SCs or just partially.
To further address these observations and obtain unbiased density maps, we used four independent methods to generate initial references: (1) cryo-ET subvolume averaging (Supplementary Fig. 1), (2) ab initio reconstruction using particles assigned to the 2D averages with visible protein densities (Extended Data Fig. 1f) and characteristic CIII2 membrane features (Extended Data Fig. 1g), (3) ab initio 3D reconstruction using particles after membrane signal subtraction (Extended Data Fig. 1h), and (4) models generated from random selection of unsorted particles or random noise (Extended Data Fig. 1i). All these references were combined for 3D classification and subsequently used for local refinement and focused classification (Extended Data Fig. 1i). Given that in situ cryo-EM datasets are more heterogeneous than conventional single-particle datasets, we included ‘false references’ generated from approach (4) for better classification. Finally, particles corresponding to classes showing clear features of type-A SC were re-extracted and merged for further classification and refinement (Extended Data Fig. 1i).
Cross-classification of multiple SCs
Around the reconstructed SC I1III2IV1 (type A) map, we observed extra densities, clearly indicating that more proteins bound to the type-A SC to form larger SCs. We suspected that more types of SC existed in native mitochondrial membranes. Preliminary results from both large single-particle 3D classification at low resolutions and cryo-ET subvolume averaging confirmed this speculation. To further improve the accuracy of 3D classification for high-resolution refinement, we deliberately provided extra false references generated from random subdatasets using discarded particles from previous cycles. These false references served to randomly absorb low-quality and falsely picked particles, leading to a relatively clean dataset for the target class. We then accumulated particles classified into good classes, defined by clear secondary structures, over several cycles. Owing to the crowded mitochondrial environment, misclassified and misaligned particles were always present. To address this, we reorganized the particles by merging those that fell into classes generating similar 3D maps. We selected multiple references from different classes, including those considered ‘bad’ and reperformed 3D classification on each subdataset. Afterwards, we recombined all subsets of different classes that were considered ‘good’ and reclassified them. On the basis of these results, we then merged all the particles belonging to a specific target from previous cycles and performed a further cycle of 3D classification on the merged dataset. This further classification used high-resolution references generated from previous classification cycles and local refinement to discard low-quality or misclassified particles.
After numerous rounds of cross-classification followed by local refinement, we identified various other types of SC, including the three other main classes: type B (I1III2IV2), type O (I2III2IV2) and type X (I2III4IV2). In addition to the four main classes, other classes such as I1III2, I4III4IV4 and even higher-order assemblies were observed; however, they were not subject to further refinement in this study owing to the low population. Subsequently, we cross-validated our classification results by providing a set of references lacking the correct form of the SC for subclassification of each class. We also performed further reference-free 2D classification after 3D classification and refinement to verify different forms of SC. This allowed us to visualize the distinct features of the four main classes from 2D averages directly, without imposing any references. Only datasets converging to the correct form of supercomplex, regardless of the initial references used, were included in the final multilevel local refinement and focused 3D classification.
Multilevel local refinement and focused 3D classification
A hierarchical masking strategy was used for local refinement on all four main types of SC. Specifically, the mask size was incrementally reduced to focus on distinct regions of each type of respiratory SC, ensuring stable local refinement. We partitioned the type-A SC into five principal domains: (1) CI hydrophilic region, (2) CI hydrophobic region, (3) CIII2, (4) CIV and (5) lipid environment.
Before the multilevel local refinement, the type-A SC was refined to 3.39 Å overall using images binned two times (1.664 Å per pixel after binning) with 1,113,902 high-quality particles. This included type B, type O and type X, as they all share the type-A region. We recentred and re-extracted these particles, generating 1,050,463 final particles for subsequent local refinement (particles near the edges were excluded after re-extraction). Initially, the resolutions of CI, CIII2 and CIV worsened slightly (approximately 3.5 Å) after the first cycle of refinement using the unbinned particles (0.832 Å per pixel). Further improvement was achieved by optimizing several local refinement parameters, including optimization of mask sizes, global CTF, local CTF refinement, local angular refinement and non-uniform refinement67.
By iteratively applying these techniques, we refined the maps of the hydrophilic region of CI and the hydrophobic regions of CI, CIII2 and CIV to average resolutions of 2.46 Å, 2.58 Å, 2.31 Å and 2.66 Å, respectively (Supplementary Fig. 2 and Supplementary Tables 1–6). Even smaller regional masks, focused on CI and CIII2, further improved local resolutions. Local resolutions in most of the protein regions of CIII2 ranged from 1.8 to 2.4 Å (Supplementary Fig. 2c). Focused classification and refinement for specific subdomains, such as the Q/QH2 binding sites, yielded further improvements that aided in model building. For more complex regions, such as the lipid environment surrounding the transmembrane regions of the SCs and Q/QH2 binding sites, further levels of focused classification and local refinement were performed. To ensure seamless integration of adjacent regions, all local masks were manually created so that pairs of adjacent masks contained sufficiently large areas for the generation of final composite maps using the smaller regions individually refined. All locally refined segments were integrated into a composite map in ChimeraX64.
Similar multilevel refinement approaches were used to determine the structures of other forms of respiratory SC. Detailed parameters and refinement results are summarized in Supplementary Figs. 2–5 and Supplementary Tables 1–6.
Membrane signal detection and weakening
One of the critical bottlenecks limiting high-resolution cryo-EM reconstruction of membrane proteins in their native environment is the severe signal interference from surrounding membranes. This interference can significantly affect several steps in cryo-EM data analysis, including ab initio reconstruction, Euler angle determination, and 2D and 3D classification, as well as refinement of alignment parameters. To address this issue, we developed a computational toolkit to detect membrane signals from 2D averages, estimate the local geometry of detected membranes, and suppress or remove these signals to substantially improve the alignment reliability of mitochondrial complexes in native membrane environments.
Initially, we generated a series of 15–30 computationally simulated 2D projections of lipid bilayers, with local curvatures ranging from 0 nm−1 to 0.02 nm−1. These simulated 2D membranes served as templates for detection of the side-view signals of mitochondrial membranes using Gautomatch. Subsequently, three to five cycles of 2D classification were performed to discard low-quality and non-membrane particles, resulting in a subset of particles showing clear side views of lipid bilayers. We then estimated the approximate orientation and centre of each individual lipid bilayer on the basis of its corresponding 2D average using the Radon transform. Local curvature was determined by maximizing the cross-correlation between each 2D average and a series of simulated lipid bilayers. These curves were rotated and translated using alignment parameters from 2D classification generated by cryoSPARC62. Centres of each membrane segment were refined by maximizing the normalized cross-correlation between the raw image and transformed 2D average. Using these estimated parameters, we approximated the principal signals of each membrane segment by locally averaging the image intensities along the membrane curve within a soft mask, which was around 25% larger than the typical lipid bilayer we estimated. Membrane signals that had dominated the alignment in the raw images were weakened to enhance protein signal contributions for subsequent reconstruction, alignment, classification and local refinement. This improved the signal contributions from protein regions for the initial alignment, akin to the critical effects observed in our previously described microtubule signal subtraction method68,69. Finally, alignment and classification parameters were applied to the raw images along with membrane signals for subsequent local refinement and focused classification.
Membrane modelling and geometry analysis
The in situ mitochondrial respiratory chain complexes largely preserved the native state of the membrane architecture, as evidenced by exceptionally clear density maps (Extended Data Fig. 2) compared with previously published in vitro structures. This high fidelity in density was observable in both the final 3D reconstructions and the post-3D-refinement 2D class averages, enabling direct modelling of native membrane structures.
The model building for the inner membrane structures surrounding the mitochondrial SCs involved a four-step procedure. First, discrete points were sampled from the raw signals in a given density map—such as the type-A SC—on the basis of binarized membrane density. A 2D plane was fitted by least-square minimization; the normal vector of each SC was estimated and the coordinate system was rotated so that this vector aligned with the z axis. Second, these sampled discrete points were used to generate two smooth, curved surfaces with a thickness of around 4 nm. Third, planar phospholipid bilayer structures were generated to match the geometry of these estimated surfaces. Finally, the information from the second and third steps was integrated to geometrically deform each planar membrane structure into a smooth, curved surface.
To optimize the initial sampling for membrane model building, we categorized the membrane structures surrounding the protein into three distinct groups: structured lipids, surface-associated lipids and generic bilayer lipids. The first category, structured lipids, included lipids that are closely associated with the transmembrane regions of the protein. This close association enabled identification and direct atomic-level modelling of these specific lipid species, which have also been observed in previously reported structures purified using detergent. The second category, surface-associated lipids, comprised lipids situated around the immediate periphery of the protein, forming a pseudo-lattice structure. Within this lattice, partial phosphatidyl head groups and hydrophobic tails could be discerned. Our in situ density maps allowed us to unambiguously determine the locations of individual lipids in this category; however, the current quality of the density maps does not permit identification of the specific types of lipid present. The third category, generic bilayer lipids, represented a region farther from the protein where only the density features corresponding to the bilayers could be observed. We used a generic phospholipid membrane model to approximate the probable horizontal positions of the phosphatidyl headgroups. Owing to the fluid nature of the lipid bilayer and the high level of noise in the density maps, the central positions of these generic bilayer lipids may still vary among different subclasses even after focused classification. However, the average geometric features and the central locations of the membranes were notably consistent across each of the four main types of SC. Therefore, these generic bilayer lipids were used solely for calibrating the central locations and orientations of the phospholipid bilayer, rather than representing the actual positions of individual phospholipid molecules within the bilayer of each SC. This approach facilitated analysis of the overall geometric changes among the SCs, albeit not at the level of individual phospholipid molecule structures.
To achieve a sufficiently smooth model for the generic bilayer lipids, we performed real-space refinement of the initial structures using the Coot software70. The refined structures were subjected to further smoothing using a local Gaussian filter to minimize residual noise in localized membrane regions. This step enabled precise estimation of the contour map and the local curvature at each point (Fig. 2b). We used the CHARM-GUI web service71 to generate a simulated rectangular planar phospholipid bilayer. This planar structure was then mapped on to the curved surfaces that were obtained after Gaussian smoothing. This mapping process yielded a curved membrane model that optimally fit the density map. From these estimated surfaces, information about the local geometry of the membranes surrounding the mitochondrial SCs could be directly retrieved for subsequent geometry analyses and comparisons.
Model building, refinement and validation
The atomic models were built manually using Coot72. First, high-resolution structures of bovine CI (PDB: 7QSK), bovine CIII2 (PDB: 2A06) and bovine CIV (PDB: 5XDQ) were fitted into the corresponding map as a rigid body using ChimeraX64. Then, the fitted model was manually mutated, adjusted and real-space refined to correct errors in local regions to best match the density maps using Coot72. The final model was refined using phenix.real_space_refine73 with geometric constraints and validated using MolProbity74. Figures were generated using UCSF ChimeraX64 and PyMOL.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.