Single-cell multiregion dissection of Alzheimer’s disease

Data reporting

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

snRNA-seq

Sample selection from ROSMAP

We selected 48 individuals from ROSMAP, both ongoing longitudinal clinical–pathological cohort studies of ageing and dementia, in which all of the participants are brain donors. The studies include clinical data collected annually, detailed post-mortem pathological evaluations, and extensive genetic, epigenomic, transcriptomic, proteomic and metabolomic bulk-tissue profiling²⁸. For the purpose of this study, individuals were selected based on the modified NIA-Reagan diagnosis of AD and the Braak stage score (Braak stages 0, 1 and 2, n = 20; Braak stages 3 and 4, n = 14; Braak stages 5 and 6, n = 14), with 26 individuals having a positive pathologic diagnosis of AD and 22 individuals having a negative pathologic diagnosis of AD⁸³. Details of the clinical and pathological data collection methods have been previously reported^2,5,6,28,84. Individuals were balanced between sexes (male:female ratios 13:13 in AD, 11:11 in NoAD), matched for age (median, 86.6 years (AD) and 86.0 years (no AD)) and post-mortem interval (median, 5.9 h (AD) and 6.3 h (no AD)). Informed consent was obtained from each participant, and the Religious Orders Study and Rush Memory and Aging Project were each approved by an Institutional Review Board (IRB) of Rush University Medical Center. The participants also signed an Anatomic Gift Act, and a repository consent to allow their data to be repurposed.

Dissection criteria

All dissections were done on a bed of dry ice using either a fine-toothed razor saw (for cortical regions) or a jewellers saw with diamond wire (for subcortical regions). Region-specific descriptions are as follows. (1) AG: full thickness cortex from the AG (Brodmann area: BA 39/40); take from the first or second slab posterior to the end of the HC. Minimize white matter. (2) MT: full thickness cortex from the middle temporal gyrus (BA 22); take as close to the level of the anterior commissure as possible. Minimize white matter. (3) PFC: full thickness cortex from the frontal pole (BA 10); take from the lateral side of the first or second slab. Minimize white matter. (4) EC: full thickness cortex from the EC (BA 28); take at the level of the amygdala. Avoid amygdala. Minimize white matter. (5) Posterior HC: take from the last slab containing HC. If the last slab has less than 5 mm of HC, take from the next slab anterior. Collect a full cross section. (6) TH: take from the first slab with thalamus. Include the most medial aspect.

Isolation of nuclei from frozen post-mortem brain tissue

The protocol for the isolation of nuclei from frozen post-mortem brain tissue was adapted from a previous study¹². All of the procedures were performed on ice or at 4 °C. In brief, post-mortem brain tissue was homogenized in 700 µl homogenization buffer (320 mM sucrose, 5 mM CaCl₂, 3 mM Mg(CH₃COO)₂, 10 mM Tris HCl pH 7.8, 0.1 mM EDTA pH 8.0, 0.1% IGEPAL CA-630, 1 mM β-mercaptoethanol and 0.4 U µl⁻¹ recombinant RNase inhibitor (Clontech)) using a Wheaton Dounce tissue grinder (15 strokes with the loose pestle). The homogenized tissue was then filtered through a 40 µm cell strainer, mixed with an equal volume of working solution (50% OptiPrep density gradient medium (Sigma-Aldrich), 5 mM CaCl₂, 3 mM Mg(CH₃COO)₂, 10 mM Tris HCl pH 7.8, 0.1 mM EDTA pH 8.0 and 1 mM β-mercaptoethanol) and loaded on top of an OptiPrep density gradient (750 µl 30% OptiPrep solution (30% OptiPrep density gradient medium, 134 mM sucrose, 5 mM CaCl₂, 3 mM Mg(CH₃COO)₂, 10 mM Tris HCl pH 7.8, 0.1 mM EDTA pH 8.0, 1 mM β-mercaptoethanol, 0.04% IGEPAL CA-630 and 0.17 U µl⁻¹ recombinant RNase inhibitor)) on top of 300 µl 40% OptiPrep solution (40% OptiPrep density gradient medium, 96 mM sucrose, 5 mM CaCl₂, 3 mM Mg(CH₃COO)₂, 10 mM Tris HCl pH 7.8, 0.1 mM EDTA pH 8.0, 1 mM β-mercaptoethanol, 0.03% IGEPAL CA-630 and 0.12 U µl⁻¹ recombinant RNase inhibitor). The nuclei were separated by centrifugation (5 min, 10,000g, 4 °C). A total of 100 µl of nuclei was collected from the 30%/40% interphase and washed with 1 ml of PBS containing 0.04% BSA. The nuclei were centrifuged at 300g for 3 min (4 °C) and washed with 1 ml of PBS containing 0.04% BSA. The nuclei were then centrifuged at 300g for 3 min (4 °C) and resuspended in 100 µl PBS containing 0.04% BSA. The nuclei were counted and diluted to a concentration of 1,000 nuclei per μl in PBS containing 0.04% BSA.

Droplet-based snRNA-seq

For droplet-based snRNA-seq, libraries were prepared using the Chromium Single Cell 3′ Reagent Kits v3 according to the manufacturer’s protocol (10x Genomics). The generated snRNA-seq libraries were sequenced using NextSeq 500/550 High Output v2 kits (150 cycles) or NovaSeq 6000 S2 reagent kits.

snRNA-seq processing, QC, and annotation

snRNA-seq data preprocessing

Gene counts were obtained by aligning reads to the GRCh38 genome using Cell Ranger software (v.3.0.2) (10x Genomics)⁸⁵. To account for unspliced nuclear transcripts, reads mapping to pre-mRNA were counted. After quantification of pre-mRNA using the Cell Ranger count pipeline, the Cell Ranger aggr pipeline was used to aggregate all libraries (without equalizing the read depth between groups) to generate a gene–count matrix. The Cell Ranger v.3.0 default parameters were used to call cell barcodes. We used SCANPY⁸⁶ to process and cluster the expression profiles and infer cell identities. We retained only protein-coding genes and filtered out cells with over 20% mitochondrial or 5% ribosomal RNA, leaving 1.47 million cells over 48 individuals and 283 samples across all regions. We further filtered the dataset to the top 5,000 most variable genes and used them to calculate the low dimensional embedding of the cells (UMAP) (default parameters, using 50 principal components and 15 nearest neighbours) and clusters using the Leiden clustering algorithm at a high resolution (15), giving 337 preliminary clusters⁸⁷. We separately called doublets using DoubletFinder and flagged and removed clusters with strong doublet profiles and clusters showing strong individual-specific batch effects, leaving a final dataset of 1.35 million cells⁸⁸.

Cell type annotations

For the UMAP visualization of individual major cell type classes (excitatory neurons, inhibitory neurons, astrocytes, oligodendrocytes, OPCs, immune cells), the SCTransform-based integration workflow of Seurat was used to align data from individual samples, using the default settings^89,90. We selected the set of relevant principal components on the basis of Elbow plots. We annotated cell types using previously published marker genes and single-cell RNA-seq data^{9,12,33,91,92,93}. To annotate cell types on the basis of previously published single-cell RNA-sequencing data (Allen Institute’s cell types database; https://portal.brain-map.org/atlases-and-data/rnaseq/human-multiple-cortical-areas-smart-seq), we used three separate approaches. First, Spearman rank correlation coefficients between the average expression profiles of neuronal subpopulations previously defined by the Allen Brain Institute³³ and the neuronal subtypes identified in this study were computed using the cor function in R. Second, to project annotations of neuronal subpopulations previously defined by the Allen Brain Institute onto the neuronal cells analysed in this study, we followed the integration and label transfer workflow of Seurat⁹⁰. Third, we determined cell type marker genes based on data published by the Allen Brain Institute³³ using the FindAllMarkers function from Seurat (Wilcoxon rank-sum test with Bonferroni correction for multiple testing; P_adj < 0.05) and computed module scores for each cell type marker gene set across all neuronal cells analysed in this study using the AddModuleScore function of Seurat. To further annotate cell types, we determined marker genes using the FindAllMarkers function from Seurat (Wilcoxon rank-sum test with Bonferroni correction for multiple testing; P_adj < 0.05). We tested only genes that were detected in a minimum of 25% of the cells within the cell type (min.pct = 0.25) and that showed, on average, at least a 0.25-fold difference (log-scale) between the cells of the cell type and all remaining cells (logfc.threshold = 0.25). Marker genes of the high-resolution cell types or states were determined separately for each major cell type class. We additionally compared the EC excitatory neuron subtypes to cell type annotations previously reported previously⁹⁴, which were computed using ACTIONet⁹⁵, and compared microglial markers to previously reported subtypes^96,97.

Cell cycle scores and global properties of gene expression

G2/M and S phase cell cycle scores were determined using the function CellCycleScoring in Seurat. Histograms showing the distribution of the G2/M- and the S phase module scores in each major cell class were generated using Prism 9 software. The statistical analyses comparing the number of genes detected per cell and the number of unique transcripts (UMIs) detected per cell between cell types was performed using Prism 9 software.

Integration of external data sources

Single-cell transcriptomic data from the human dLGN⁹⁸ were obtained from the Allen Brain Institute (https://portal.brain-map.org/atlases-and-data/rnaseq/comparative-lgn). Single-cell transcriptomic data from multiple cortical areas and the hippocampal formation of the mouse brain⁴³ were obtained from the Allen Brain Institute (https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x). Single-cell transcriptomic data across nine regions in the adult mouse brain³⁹ were obtained from the McCarroll and Macosko Labs (http://dropviz.org/). Single-cell transcriptomic data from the mouse nervous system⁴⁰ were obtained from the Linnarsson laboratory (http://mousebrain.org/adolescent/downloads.html). The external datasets and the human multiregion data presented in this study were integrated using the reciprocal PCA (RPCA) pipeline with the default parameters in Seurat (https://satijalab.org/seurat/articles/integration_rpca.html). The integration of single-cell data was performed separately for astrocytes, excitatory neurons and inhibitory neurons. For the integration of GABAergic neurons, the single-cell transcriptomic data from multiple cortical areas and the hippocampal formation of the mouse brain⁴³ were downsampled to 50,000 GABAergic neurons. For the integration of excitatory neurons, the human multiregion dataset was downsampled to 5,000 neurons per high-resolution cell type. The mouse cortical data⁴³ were downsampled to 50,000 excitatory neurons. The frontal cortex, posterior cortex, HC and thalamus data of the DropViz dataset were combined and downsampled to 50,000 neurons. Downsampling of the data was performed using the Seurat function subset with the default parameters. The comparison of microarray data from different human brain regions was performed using the Differential Search tool of the Allen Brain Atlas data portal (https://human.brain-map.org/microarray/search). The thalamus was selected as the target structure and compared to the cerebral cortex as the contrast structure. The differential search results including the fold change values and P values of the top 2,000 genes were downloaded from the data portal.

Gene expression and regulon modules

Gene expression modules using ZCA (scdemon framework)

We would like to find gene expression modules by calculating gene–gene correlations in single-cell data and using these to detect communities of similarly expressed genes. However, in single-cell data, which often contain an unbalanced composition of cell types, modules computed using this approach will be dominated by the most common cell type markers and pathways. Moreover, correlation values will often be inflated for pairs of sparsely expressed genes. We developed a method which accounts for these pitfalls to call multiresolution gene expression programs from single-cell data using an SVD-based approximation of zero-phase component analysis (ZCA) and gene sparsity-dependent thresholding^99,100.

scdemon (single-cell decorrelated module networks) method

The preprocessing transformations alternately called decorrelation, whitening, or sphering, transform a matrix X with a matrix W such that the covariance of XW is the identity matrix¹⁰¹. In particular, ZCA is the transformation which maximizes the similarity of the transformed data to the original, which is achieved by setting W = C^−1/2, where C is the covariance of X.

In single-cell data, given a count matrix X with n cells (rows) and g genes (columns), we would like to perform ZCA decorrelation on the samples as a preprocessing step for calling modules. Computing and storing the n by n sample-wise covariance C_n = XX^T/g is prohibitively expensive for modern datasets (with n > 1 × 10⁶), even without centring X. Instead, we can analytically approximate the covariance with the SVD of X^T = U_nS_nV_n^T as C_n ≈ (U_nS_nV_n^T)^T(U_nS_nV_n^T)/g = V_nS_n²V_n^T/g and therefore C_n^−1/2 = g^1/2V_nS_n⁻¹V_n^T. The ZCA transformation can then be computed as X_ZCA = C_n^−1/2X = g^1/2V_nS_n⁻¹V_n^TX before calculating the covariance of X_ZCA for downstream analysis. While this approximation is already tractable for small single-cell datasets, we may not be able to compute the matrix multiplications or centred SVD for larger datasets. Here, we can use the SVD of X = USV^T, which is commonly calculated in single-cell analyses, to approximate C_n^−1/2 as g^1/2US⁻¹U^T and X_ZCA = g^1/2US⁻¹U^TX. From this, the non-centred covariance of X_ZCA is C_ZCA = X_ZCA^TX_ZCA/n = g × (US⁻¹U^TX)^T(US⁻¹U^TX)/n. By substituting the SVD in for X, this reduces to C_ZCA = g × VV^T/n, which is a simple and efficient approximation for very large single-cell data. As this approximation commonly drops out the largest identity program in the data due to the decorrelation approach, we allow computing C_ZCA = g × VS^pV^T/n, for any p, to tune the relative involvement of the larger eigenvalue components of the SVD.

To control for inflated correlation estimates in highly sparse genes, we bin the estimated correlations (C_ZCA) for each pair of genes according to their sparsities (fraction of cells expressing the gene, binned on a log₁₀ scale). We calculate the mean and s.d. for each 2D bin and smooth the estimates by fitting bivariate splines to the binned statistics, weighted by the log number of examples in the bin. We use the smoothed estimates to z-score the correlation matrix (C_ZCA), which we then threshold with a single z-score cut-off to create an adjacency matrix for a gene–gene graph. Graphs are laid out using the Fruchterman–Reingold algorithm and we remove connected components with fewer than four genes¹⁰². We then use the leidenalg package and the Leiden algorithm with an RBConfiguration vertex partition to cluster the graph into gene modules⁸⁷. To robustly estimate modules for each set of cells in our analyses, we first performed a grid search for the optimal number of SVD components for cells of that type. We then computed the z-scored matrices for each of 10 bootstraps, selecting 90% of batches for each bootstrap and using only genes expressed in over 5% of cells in the full dataset for the cell type. We thresholded the average of the bootstrapped z-score estimates with z = 4.5 to build a graph. To balance the contributions of modules across the compositional spectrum, we calculated and thresholded separate graphs for eigenvalue powers p = 0, 0.25, 0.5, 0.75 and 1 and combined them using multigraph Leiden clustering to call modules with leiden resolution = 3. Although we identify smaller modules, here we only report modules with at least 10 genes. We also ran the modules method on three published datasets, for which we ran the method with the same parameters on each dataset (k = 100, z = 4.5, resolution = 2.5), used genes with >5% sparsity for the COVID¹⁰³ and brain¹⁶ datasets and genes with >10% sparsity for the Tabula Sapiens dataset¹⁰⁴, and report modules with 10 genes or more.

Module enrichments, network and contour plots

Module enrichments for cell subtypes and brain regions were performed using the hypergeometric test by calculating whether cells with a module score above 1 s.d. from the mean score were significantly enriched in a specific subtype or region. For plotting scores against other modules, averaged modules scores to either the subtype by sample level (within the same major cell type) or at the sample level alone (across cell types) and calculated Pearson correlations and P values using the cor.test function of R. To create the module–module network across microglial and immune modules, we calculated module–module score Pearson correlations using the logged module scores at the aggregated subtype by sample level, using a one-sided test with P_adj < 0.01 as a cut-off¹⁰⁵. To generate contour plots of module expression on a UMAP, we first smoothed cell-level expression on a 500 × 500 grid with a 2D Gaussian kernel (size = 25 × 25 and σ = 1) and then plot contours for smoothed values (0.1 to 0.8).

Gene expression programs using cNMF

Gene expression programs underlying both cell type identity and cellular activities were determined according to the consensus NMF (cNMF) analysis pipeline established previously using the default parameters³⁴. The number of components (K) to use for cNMF was determined on the basis of a diagnostic plot showing the stability of the solution and the Frobenius reconstruction error as a function of K. To reduce runtime and working memory requirements, the data were downsampled using the Seurat function subset with the default parameters. The data were downsampled to 200 cells per major cell type. For the cNMF analysis at the level of high-resolution cell types, the analysis was performed separately on excitatory neurons, inhibitory neurons and astrocytes. For these analyses, the data were downsampled to 2,000 cells per astrocyte subtype and 1,000 cells per excitatory and inhibitory neuron subtype. Statistical significance of the overlap between the top 200 genes of a gene expression program and cell type marker genes was computed using Fisher’s exact tests.

SCENIC analysis and computation of regulon module scores

The gene regulatory network analysis was performed using pySCENIC with the default parameters³⁵. To reduce runtime and working memory requirements, the data were downsampled to 2,000 cells per major cell type. For the SCENIC analysis at the level of high-resolution cell types, the analysis was performed separately on excitatory neurons, inhibitory neurons and astrocytes and the data were downsampled to 1,000 cells per high-resolution cell type. To identify the top cell-type-specific regulons, we calculated regulon specificity scores as described by previously and ranked the regulons based on their regulon specificity score¹⁰⁶. Finally, we calculated the activity of each regulon in each cell using the AddModuleScore function of Seurat. The calculation of regulon module scores for major cell types was performed on a random sample of 50% of the cells (676,537 cells). For the analysis at the level of high-resolution cell types, the regulon module scores were determined based on all the cells of a major cell type class. For the statistical analysis of differences in the activity of regulons between cell types, the average regulon module score per individual and major cell type or high-resolution cell type was computed, respectively. The statistical analyses comparing the regulon module score activity was performed using Prism 9 software.

Analysis of GABAergic and glutamatergic module scores

GABAergic and glutamatergic module scores across all neuronal cell types were determined on the basis of a set of GABAergic and glutamatergic neuron marker genes, respectively, using the AddModuleScore function of Seurat. The sets of GABAergic and glutamatergic neuron marker genes were determined based on the human multiple cortical areas SMART-seq dataset from the Allen Brain Institute (https://portal.brain-map.org/atlases-and-data/rnaseq/human-multiple-cortical-areas-smart-seq). We identified marker genes using the FindAllMarkers function from Seurat (Wilcoxon rank-sum test with Bonferroni correction for multiple testing; P_adj < 0.05). We tested only genes that were detected in a minimum of 25% of the cells within the cell type (min.pct = 0.25) and that showed, on average, at least a 0.25-fold difference (log-scale) between the cells of the cell class of interest and all remaining cells (logfc.threshold = 0.25). To quantify the intermediate character of thalamic excitatory and inhibitory neurons, we first computed the average GABAergic and glutamatergic module score values for each neuronal cell type and for each individual. We then used the resulting data to determine the first principal component (PC1) scores (the coordinates of the individual observations on the first principal component axis) using the princomp function in R. The ridgeline plot showing the distribution of PC1 score for each neuronal cell type was generated using the ggplot2 package in R. To determine the association between the average glutamatergic and the GABAergic module score across neuronal cell types, we performed a simple linear regression analysis using Prism 9 software.

Analysis of extratelencephalic projection neuron module scores

Marker genes significantly upregulated in extratelencephalic projection neurons (exc. L5 ET) compared with near-projecting excitatory neurons in layers 5 and 6 (exc. L5/6 NP) were determined using the FindAllMarkers function from Seurat (Wilcoxon rank-sum test with Bonferroni correction for multiple testing; P_adj < 0.05). We tested only genes that were detected in a minimum of 25% of the cells within the cell type (min.pct = 0.25) and that showed, on average, at least a 0.25-fold difference (log-scale) between exc. L5 ET cells and exc. L5/6 NP cells (logfc.threshold = 0.25). The exc. L5 ET module score was computed based on the identified marker genes using the AddModuleScore function of Seurat. To determine the exc. L5 ET module scores across excitatory and inhibitory neurons, cells were downsampled to 2,000 cells per high-resolution cell type.

Cell–cell communication analysis

Cell–cell communication events were predicted using the LIgand-receptor ANalysis frAmework (LIANA)¹⁰⁷ in R. Specifically, the ligand–receptor analysis was performed using liana_wrap(). The methods included were CellPhoneDB¹⁰⁸, NATMI¹⁰⁹ and SingleCellSignalR¹¹⁰. liana_aggregate() with the argument ‘aggregate_how’ set to ‘magnitude’ was run to find consensus ranks of different methods. Only interactions (ligand–receptor pairs) with a robust rank aggregation (RRA) score smaller than 0.05 (aggregate_rank < 0.05) were considered in downstream analyses. The interaction score of ligand–receptor pairs was calculated by applying −log₁₀ transformation to the RRA score (aggregate_rank). To determine the number of interactions and the overlap of interactions between regions, liana_wrap() was run on the pool of cells isolated from all individuals, with separate analyses conducted for each brain region. To determine cell–cell communication events that are brain-region specific, liana_wrap() was run separately for each individual. We then used a linear mixed-effects model to evaluate the association between the interaction scores of individual ligand–receptor pairs obtained from LIANA and the respective brain region serving as the predictor variable. To account for potential confounding factors and individual variability, we included age, sex and post-mortem interval as covariates in the linear mixed-effects model. These variables were added as fixed effects to the model. Moreover, we included a random effect for the individual to capture the participant-specific variability in the data. Linear mixed-effects models were implemented using the R software packages lme4¹¹¹ and lmerTest¹¹². The lmer() function from the lme4 package was employed to fit the models. To obtain P values for the fixed effects in these models, we used the lmerTest package, which incorporated Satterthwaite’s degrees of freedom approximation. To account for multiple hypothesis testing, the obtained P values were further adjusted using the Bonferroni method.

Cell type composition

Analysis of cell type composition differences between brain regions

For comparing the relative abundance of major cell types across brain regions, the fraction of cells of a major cell type class was computed relative to all the cells isolated from a region. For the statistical analysis of cell type composition differences between brain regions, we also computed the relative abundance of major cell type classes separately for each study participant. To this end, the fraction of cells of a major cell type class was computed relative to all the cells isolated from a brain region of an individual. At the level of high-resolution cell types or cell states, two distinct measures of relative abundance were computed. First, the relative abundance of each subtype of a major cell class was computed as the proportion of a subtype relative to all cells of the corresponding major cell class isolated from a brain region. Second, for the statistical analysis of differences between brain regions, the fraction of cells of a subtype was computed relative to all the cells isolated from a brain region of an individual. The statistical analyses comparing the relative abundance of major cell types and subtypes between brain regions was performed using Prism 9 software.

Analysis of cell type composition alterations in AD

We calculated compositional differences in individuals with AD versus individuals without AD (or AD dementia versus no dementia) by modelling the number of cells of a certain cell type or subtype in a specific sample (individual by region) relative to the total number of cells using a quasi-binomial regression model. We modelled AD status by binary ascertainment variables (cogdx 4–5, NIA-Reagan score 1–2, Braak stage 5–6 versus others, as well as any detected presence of NFTs, neuritic plaque or diffuse plaques in the region) while adjusting for brain region and sex. We used the emmeans package in R to assess the significance of the regression contrasts and used p.adjust with the fdr method to adjust P values. We modelled the effects of fraction of cells on cognitive performance in multiple domains with gaussian linear regression of cognitive performance on last visit versus the log₁₀-transformed fraction of cells in the subtype or major cell type jointly with covariates for age, sex, APOE-ε4 and post-mortem interval, with false-discovery rate P-value correction (p.adjust in R). We compared the fractional abundances of pairs of neuronal subtypes between two regions using Kendall’s τ only in individuals with AD (NIA-Reagan score). Significance was assessed using beta regression (R library betareg) controlling for sex, APOE genotype, post-mortem interval and age of death, and we adjusted P values using p.adjust in R with the fdr method.

Differential gene expression

DEGs in AD

We performed differential expression analyses with three separate methods: MAST, Nebula and Wilcoxon testing^113,114. For all methods, we subset the tested genes to only genes present in over 20% of cells. For MAST and Nebula, we calculated and included in the regression the top 10 components of unwanted variation calculated using RUV on the pseudo-bulk-level data (individuals by regions). For these methods, we also included as covariates the individual’s sex, age of death and post-mortem interval, each cell’s counts per gene and number of captured genes and, where applicable, the high-resolution cell subtypes and the brain region. For Nebula we used a Poisson mixed-model on the counts data with an offset of the log₁₀-transformed total counts per cell. For MAST and Wilcoxon, we normalized each cell to a total library size of 10,000 counts. We ran Wilcoxon tests on both the cell and individual levels. We adjusted P values for multiple testing in all cases by using the p.adjust function in R with the fdr method. For our final set of differential genes in each analysis, we took all genes that were significant (P_adj > 0.05) and concordant in both the MAST and Nebula results. We also provide the results for Wilcoxon tests, but did not use these to determine concordant results as they do not correct for many covariates. We computed differential expression against five AD ascertainment variables: continuous measurements of NFT, plaq_n, and plaq_d measured in each region except the thalamus (excluded from these analyses) and binary cognitive impairment (cogdx no dementia = 1 and 2 versus AD dementia = 4 and 5) and NIA-Reagan score classifications (non-AD = 3 and 4 versus AD = 1 and 2). We provide differential expression results for each of the 14 major cell types (with T cells, CNS macrophages, and each vascular subtype separately) for all regions jointly and for each region separately. We also provide results for each of the excitatory subtypes either in its most prevalent region for EC, HC or TH subclasses, or across the neocortex for neocortical subtypes (Supplementary Table 9). We also computed DEGs for the interaction between pathologic diagnosis of AD and sex in each major cell type, both across all regions and in each region separately. For the glial energy metabolism analyses we recomputed all DEGs in glial glycolysis-associated modules separately (keeping all genes, with no cell percentage cut-off). Glycolysis pathway diagram is from the glycolysis and gluconeogenesis pathway from WikiPathways¹¹⁵.

Pathology-biased DEGs

Pathology-biased DEGs were based on neuritic plaque or NFT pathology measurements in each region and were computed in each major cell type across all regions and in each region (except for the TH). Genes were ordered by the residual between NFT effect size and predicted NFT effect size from a regression using plaque effect size and region. Genes were retained if they were consistently up (or down) in 3+ regions for either NFT or plaque but in fewer than 2 regions for the other pathology measurement (shared genes are genes found in 2+ cell types).

Comparison with published DEGs

We compared our differential expression results to results from seven different previously published studies^{11,12,19,47,48,49,50}. We compared the published DEGs both to: (1) cross-region DEGs calculated in each major cell type for individual-level AD status (NIA-Reagan score or clinical diagnosis of AD) and for quantitative measurements of AD pathology (neuritic plaques, diffuse plaques and NFTs); and (2) region-specific DEGs calculated in each major cell type and in endothelial cells, computed relative to pathologic diagnosis of AD (NIA-Reagan score, AD, 1–2; non-AD, 3–4). As some studies reported only the significant genes, we compared the log-transformed fold change estimates for our DEGs and reported DEGs by a Pearson correlation test.

DEG module enrichments

To assess the enrichment of upregulated, downregulated non-differentially expressed genes in each module, we first assigned each tested DEGs to its closest module by correlation to the module’s average expression profile. We then performed a hypergeometric enrichment test for the number of genes in a category (upregulated, downregulated, not differentially expressed) assigned to the module, against the total number of tested genes assigned to the module, the total number in the category and the total number tested and corrected P values using p.adjust (Benjamini–Hochberg). Enrichments of pathology-biased DEGs in modules were performed in the same manner.

Neuronal DEG partitions

To partition neuronal DEGs into non-vulnerable and vulnerable-associated subclasses, we calculated each genes’ average expressions and differential expression effect sizes at the subtype level and compared these to the relative depletion of the subtypes. For each gene that was differentially expressed in late-AD (stratified by Braak stage, late AD, 5–6 versus non-AD or early-AD, 1–4) in at least 25% of all neuronal subtypes, we calculated the correlation of its average subtype expression in late-AD with each subtype’s compositional stability (log₂[OR] in late AD) across excitatory subtypes, separating non-vulnerable-associated genes (correlation > 0.2) from vulnerable-associated genes (correlation < −0.2). We calculated functional enrichments on neuronal DEG partitions using the top 250 genes ordered by effect size in each category. We further separated DEGs with higher effect sizes in vulnerable subtypes from those with similar effect sizes across all neuronal subtypes by calculating the correlation of their differential effect sizes in each subtype with that subtype’s depletion (log₂[OR] in late AD). To perform enrichments along the continuum of genes associated with vulnerability to non-vulnerability, we kept only genes with biased effect sizes (effect size correlation < −0.2) and binned them along the axis of expression correlation (window size 0.2 for at 0.05 intervals) and performed functional enrichments for all bins jointly.

DEG and module pathway enrichments

We performed DEG enrichments for each differential expression run using the gprofiler2 package in R, with multi-query for upregulated and downregulated genes, as unordered queries, a P-value cutoff of 0.05, and using GO, REAC, WP, KEGG and CORUM as annotation sources, and retained enriched terms with fewer than 500 genes. Module and module cluster enrichments were performed in the same manner, using the core genes identified for each module and for genes found in more than two modules within a module cluster.

Markers of neuronal vulnerability

We identified markers associated with excitatory neuron subtype vulnerability by performing linear regression to predict the log₂[OR] of each subtype’s depletion in late AD based on its expression at the subtype-aggregate level (log[X + 1], averaged normalized expression in each subtype by region by individual batch), controlling for age, sex and post-mortem interval and adjusting P values with p.adjust (fdr).

Identification of genes associated with cellular vulnerability in inhibitory neurons

Processed snRNA-seq data (DLPFC, experiment 2) were obtained from Synapse (syn51123521) and integrated with our own PFC snRNA-seq dataset comprising 427 individuals. To identify vulnerable inhibitory neuron subtypes, we examined the association between the relative abundance of cell types and the measure of NFT density (variable tangles). We used a quasi-binomial regression model to model the number of cells belonging to a specific cell type in a given sample (individual study participant), relative to the total number of cells in that sample. We fitted the regression model using the glm function in R, including age, sex and post-mortem interval as covariates. P values were corrected for multiple testing using the Benjamini–Hochberg procedure as implemented in the R function p.adjust. The results are presented in the form of association scores (signed −log₁₀-transformed Benjamini–Hochberg-adjusted P value, where the sign was determined by the direction (positive or negative) of the association). Inhibitory neuron subtypes demonstrating a significant negative association with tangle density (Benjamini–Hochberg-adjusted P value < 0.05) were classified as vulnerable subtypes, while all other subtypes were categorized as non-vulnerable. Genes exhibiting differential expression between vulnerable and non-vulnerable inhibitory neuron subtypes were identified on the basis of our PFC snRNA-seq dataset. This analysis was restricted to individuals without a pathologic diagnosis of AD. The differential expression analysis comparing vulnerable to non-vulnerable inhibitory neuron subtypes was performed using the R package dreamlet (https://diseaseneurogenomics.github.io/dreamlet/). We used the dreamletCompareClusters function with the argument ‘method’ set to ‘fixed’ for this analysis. Adjusted P values for multiple testing were obtained using the topTable function of dreamlet, with the ‘adjust.method’ set to ‘BH’.

GWAS analyses

Intersection of regional expression and pathology-specific DEGs (across all regions) was performed for 149 annotated AD GWAS familial and AD risk loci from recent GWAS^54,56,57,58. We calculated the disease-relevance score of each cell in the dataset against a recent Alzheimer’s GWAS, using scDRS (based on MAGMA)^54,55,116. For the scDRS results, we counted the fraction of cells with significant scDRS scores (FDR < 0.05) in each cell type, subtype and region. To test for overlap with microglia/immune modules, we compared the set of immune cells with significant expression of each module (z score > 2.5) and with the set of cells with significant scDRS scores (FDR < 0.05) and performed a hypergeometric test for significance of the overlap (P_adj < 0.01, Benjamini–Hochberg correction). To identify region-specific GWAS genes, we performed an analysis of variance for the effect of region on average gene expression at the pseudobulk level.

Identification of genes associated with CR

To quantify CR, we computed a CR score as the difference between the observed cognition and the cognition predicted by a linear regression model, given the level of pathology (Fig. 5a). Using this approach, we computed cognitive resilience (CR) scores based on the measure of global cognitive function and CDR scores based on the measure estimating the rate of change of global cognitive function over time (Fig. 5a). Four distinct CR and CDR scores were derived using four distinct measures of AD pathology, namely global AD pathology and, separately, neuritic plaque burden, NFT burden and tangle density.

We performed differential expression analyses using the R package muscat to identify genes associated with CR in the PFC¹¹⁷. Low-expressed genes were excluded and only genes with more than one count in at least ten cells were considered. To take advantage of robust bulk RNA-seq differential expression frameworks, such as edgeR¹¹⁸, in a first step, muscat aggregates measurements for each sample (in each cluster) to obtain pseudobulk data. Using this approach, single-cell measurements were aggregated per study participant and cell type using the sum of raw counts option. Differential expression analysis was run using the edgeR method as implemented in muscat. We included as covariates the individual’s age at death and post-mortem interval. We report adjusted P values for multiple testing in all cases by using the p.adjust function with the Benjamini–Hochberg method as implemented in muscat. The multiple testing correction was performed locally, that is, on each of the cell types separately with the number of tests equal to the number of genes considered. These differential expression analyses were performed on the entire set of 427 individuals except for the group-based differential expression analysis based on our categorical definition of CR. In this case, we focused on comparing two distinct groups determined by their pathologic and clinical diagnoses of AD. First, we identified individuals with a pathologic diagnosis of AD, using the NIA-Reagan pathology criteria. Subsequently, these individuals were further categorized on the basis of their clinical consensus diagnosis of cognitive status at the time of death. Specifically, we compared individuals with no cognitive impairment (NCI, final consensus cognitive diagnosis (cogdx) value of 1) against individuals with a cognitive diagnosis of AD dementia and no other cause of cognitive impairment (cogdx value of 4) among individuals with a pathologic diagnosis of AD.

To confirm the differential gene expression results based on the CR and CDR scores, we also evaluated the association between gene expression and global cognitive function or the rate of change of global cognitive function adjusting for AD pathology as a covariate. The AD pathology measures considered as a covariate were global AD pathology (gpath), neuritic plaque burden (plaq_n), NFT burden (nft), or tangle density (tangles). Thus, together with the DGE analysis based on CR and CDR scores, we performed a total of 16 different tests assessing the association between gene expression and CR.

We used the model-based analysis of single-cell transcriptomics (MAST) tool to investigate whether the CR genes identified in PFC astrocytes were also associated with CR in astrocytes from other regions of the human brain. To ensure robust analysis, we initially filtered the genes under investigation, selecting only those with more than one count in at least 10 cells. The analytical model incorporated the condition variable of interest, as well as several covariates known to influence gene expression. These covariates included the cellular detection rate (cngeneson), age at death (age_death), post-mortem interval (pmi), and sex (msex). We also accounted for potential participant-specific variation in the data by incorporating a random effect term for the individual (1|individual). To account for multiple comparisons, the P values were adjusted using the FDR method as implemented in the p.adjust function.

Bulk RNA-seq differential expression analysis

Differential expression analysis of bulk RNA-seq data from the ROSMAP cohorts was performed using DESeq2¹¹⁹ (plotted) and edgeR¹¹⁸. Age at death and post-mortem interval were converted into z scores and included as covariates in the regression equation. Both approaches (DESeq2 and edgeR) provided similar results.

Permutation test for evaluating the significance of the overlap of DEGs between our dataset and the SEA-AD dataset

The average expression level of each gene within each major cell type was determined using the ‘AverageExpression’ function from the Seurat R package. The genes considered in the differential expression analysis for each major cell type were categorized into ten subsets based on their average expression level within the corresponding cell type. We next intersected the genes in each of the ten subsets with genes identified as significantly associated with either neuritic plaque burden (in our dataset) or the CPS score (in the MTG SEA-AD dataset). This intersection enabled us to determine the number of significant DEGs in each subset. The process was performed separately for genes positively and negatively associated with AD pathology. Subsequently, we randomly sampled the determined number of significant DEGs from each of the 10 subsets, ensuring that the expression level distribution of the DEGs was preserved in the random samples. This random sampling step was repeated for a total of 1,000 iterations. These steps were performed separately for both our dataset and the SEA-AD dataset. For each of the 1,000 random samples, we determined the overlap of genes between datasets and compared it to the observed overlap between the two datasets. To assess the significance of the observed overlap, we computed z scores, which represent the difference between the observed value of overlap and the mean value of overlap based on the permutation results, divided by the s.d. of the permutation results.

Comparison with previously published proteomics study of AD

To further validate our differential expression results, we evaluated the correlation between the effect sizes of gene expression changes observed in our study and those identified through quantitative proteomics⁵¹. We specifically examined the correlation between the effect sizes of genes associated with neuritic plaque burden in our study and the effect sizes of overlapping differentially expressed proteins in the quantitative proteomics analysis of bulk tissue. The correlation was computed using the cor.test function in R with the argument ‘alternative’ set to ‘two.sided’ and the argument ‘method’ set to ‘pearson’. P values were adjusted for multiple testing using the Benjamini–Hochberg method as implemented in the R function p.adjust.

Inter-regional comparison of AD pathology-associated gene sets in glial cells along the spectrum of global AD pathology burden

We determined genes significantly associated with global AD pathology for each glial cell type, using single-nucleus RNA sequencing data from the PFC of 427 participants in the ROSMAP study. We then calculated module scores for these gene sets in astrocytes, microglia, oligodendrocytes, and OPCs using the Seurat ‘AddModuleScore’ function. The module scores were determined separately for genes positively and negatively associated with global AD pathology. To assess the progression of these scores across the spectrum of global AD pathology burden, we averaged the module scores of all cells of a specified cell type isolated from the brain region of interest of an individual. For visualizing the relationship between the global AD pathology burden and mean module scores, we employed Locally Estimated Scatterplot Smoothing (LOESS) using the ggplot2 package in R, with the ‘geom_smooth’ function and the ‘method’ parameter set to ‘loess’. The correlation of mean module scores between regions was determined using the cor.mtest function of the R package corrplot. P values were adjusted for multiple hypotheses testing using the Benjamini–Hochberg method as implemented in the R function p.adjust.

In situ hybridization (RNAscope)

Frozen human post-mortem brain samples were embedded in Tissue-Tek OCT compound (VWR; 25608-930), sectioned on a Leica cryostat at a thickness of 20 μm, and mounted onto Fisherbrand Superfrost Plus microscope slides (Thermo Fisher Scientific; 12-550-15). Slides were fixed in 4% paraformaldehyde at 4 °C for 30 min, and dehydrated in ethanol. The RNAscope 2.5 HD Chromogenic Duplex Detection Kit and RNAscope Multiplex fluorescent V2 Kit (ACDBio) were then used according to the manufacturer’s instructions. Tissue samples were hybridized using the following chromogenic RNAscope probes: GAD2, FOXP2, MEIS2, AQP4, GRM3, ADCY8, PFKP, PNPLA6, GPCPD1 and CHDH (ACDBio). For in situ hybridization of Reelin, tissue samples were hybridized using the following fluorescent RNAscope probes: vGlut and Reelin. Cell nuclei were stained with 50% haematoxylin (for chromogenic experiments) or with Hoechst (for fluorescent experiments). For fluorescence RNAscope analysis, sections were incubated in TrueBlack (Biotium; 23007) for 10 s before Hoechst staining to quench auto-fluorescence. Images were acquired using the Zeiss LSM 900 confocal microscope, with a 63× oil objective. Two images were acquired per tissue sample. For both chromogenic and fluorescence RNAscope experiments, puncta were manually counted by researchers blinded to the experimental group of each image.

Immunohistochemistry

All experiments were performed according to the Guide for the Care and Use of Laboratory Animals and were approved by the National Institute of Health and the Committee on Animal Care at Massachusetts Institute of Technology. Sample sizes were determined on the basis of previous work from our laboratory, without power analysis calculation or randomization. In the experiment comparing App-KI (C57BL/6-App, RBRC06344) and WT mice, the App-KI group consisted of 7 mice (5 male and 2 female) and the WT group included 6 female mice. In the experiment comparing Tau P301S (The Jackson Laboratory, 008169) to WT mice, both groups consisted entirely of male mice, with 6 mice in the Tau(P301S) group and 5 mice in the WT group. Mice were transcardially perfused with ice-cold phosphate-buffered saline, followed by 4% paraformaldehyde for fixation. Brains were dissected out and post-fixed in 4% paraformaldehyde overnight at 4 °C. Brains were sectioned horizontally on the Leica vibratome at a thickness of 40 μm. Slices containing the EC were selected under a dissecting microscope to ensure consistent anatomical structure across all cohorts. Brain sections were incubated in antigen retrieval solution (pH 6, 100 mM sodium citrate buffer, prewarmed to 80 °C) for 20 min, and then cooled to room temperature. The sections were then washed twice with phosphate-buffered saline, and blocked in buffer (0.3% Triton X-100, 2% bovine serum albumin, 10% normal donkey serum in phosphate-buffered saline) for 1 h at room temperature.

The sections were incubated in primary antibodies (anti-Reelin, 1:200; anti-NeuN, 1:200, anti-phospho tau, 1:200; and anti-amyloid-β, 1:500) overnight at 4 °C. After primary antibody incubation, the sections were washed three times with PBS, twice with blocking buffer and incubated in secondary antibody (1:1,000) for 2 h at room temperature. The sections were then washed three times with PBS, incubated with Hoechst (1:1,000) for 10 min and washed once more with PBS.

Confocal tile scans of the EC were acquired on the Zeiss LSM 900 using a 20× objective, with consistent laser setting across all cohorts. Layer II–III EC was identified based on previous criteria¹²⁰. Orthogonal projections of the confocal tile scans were exported to Fiji for signal quantification. In Fiji, layer II–III of the EC was set as a region of interest, and a macro was used to count Reelin-positive cells in the region of interest and quantify the mean fluorescence intensity for each cell. The signal intensity of the Reelin channel was subsequently normalized to the NeuN signal. Researchers were blinded to animal genotype.

External data sources

Processed snRNA-seq data generated by the Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD) consortium (SEAAD_MTG_RNAseq_final-nuclei.2023-05-05.h5ad) were obtained from the Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD) (https://sea-ad-single-cell-profiling.s3.amazonaws.com/index.html#MTG/RNAseq/). The SEA-AD DLPFC data (SEAAD_DLPFC_RNAseq_final-nuclei.2023-07-19.h5ad) were downloaded from https://sea-ad-single-cell-profiling.s3.amazonaws.com/index.html#DLPFC/RNAseq/. Additional processed snRNA-seq datasets (specifically the h5ad files Neurons.h5ad and Nonneurons.h5ad) were obtained from the Linnarsson laboratory (https://console.cloud.google.com/storage/browser/linnarsson-lab-human;tab=objects?authuser=0&prefix=&forceOnObjectsSortingFiltering=false).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link