Multiscale topology classifies cells in subcellular spatial transcriptomics

TopACT

TopACT operates on subcellular-resolution spatial transcriptomics data. These data take the form of a grid of spots such that each spot in the grid has an associated expression vector in ${\mathbb{R}}$^D. The output of our method is a cell type annotation for each spot in the grid. First, we use an annotated sc/snRNA-seq reference dataset to construct an automatic single-cell classifier, called a local classifier. This classifier could in principle use any supervised learning approach; for example, a neural network or random forest but a simple and effective choice is the support vector machine (SVM)^35,36,37. The only restriction is that the classifier must take as input an expression vector and output a probability vector over all cell types. We achieve this by using Platt scaling³⁸. Given a local classifier, TopACT then proceeds as follows. Fix a spot s in the grid. For a given radius r ≥ 0, consider a ball of radius r drawn about s. Let X_r be the sum of all the expression vectors of spots in this ball. Feeding X_r as input to the local classifier produces a probability vector v_r over the cell types. For a list r₁ ≤ r₂ ≤ … ≤ r_k of radii, these vectors can be combined into a multiscale cell type confidence matrix A = [${{\bf{v}}}_{{r}_{1}}$, ${{\bf{v}}}_{{r}_{2}}$, …, ${{\bf{v}}}_{{r}_{k}}$]. Now, pick a confidence threshold θ ∈ (0, 1). For a spot s, let j be minimal such that the most likely cell type i at scale r_j has confidence A_ij ≥ θ. We set the cell type of s to be i if such a j exists. In other words, the cell type assigned to the spot s is the cell type predicted by the local classifier at the smallest possible scale at which a confident prediction can be made. For a full technical description of the TopACT method, see Supplementary Methods 1A.

Synthetic data generation

We use a two-stage process to generate synthetic benchmark data, first generating a synthetic cell type map and then imputing gene expression. A synthetic grid of spots with cell type annotations is produced. We sample 625 points uniformly at random from the unit square [0, 1] × [0, 1], taking these points to be cell centres. We draw a Voronoi diagram (computed using the implementation in SciPy³⁹ based on Qhull⁴⁰) on the basis of these points to simulate cell boundaries. Cell types are then assigned at random to each Voronoi region, in proportion to the cell type abundances in the snRNA-seq data. These cell types are then applied to a 500 × 500 grid of spots overlaid on the unit square. The end result is a grid of spots, each annotated with a cell type. Next, we impute the gene expression at each spot using a Poisson process with parameters inferred from the mouse kidney snRNA-seq data described below. This process is based on a simplified version of the model described by ref. ²⁴. In detail, for a cell type T and gene g let λ_Tg denote the mean expression of gene g over all cells in the snRNA-seq dataset with cell type T. If a spot s is assigned the cell type T, we then model the expression v_sg of gene g at s as being distributed by a Poisson distribution

$${{\bf{v}}}_{sg} \sim {\rm{Poisson}}(\alpha {\lambda }_{Tg})$$

where α = exp(−7.3) is a fixed parameter determining the transcriptional abundance. To model zero-inflation, we then select 20% of spots uniformly at random to be assigned zero reads, regardless of the Poisson-modelled expression.

For molecular diffusion experiments we modelled the diffusion effect separately for each synthetic transcript. Given a transcript a, we sampled a diffusion magnitude D_a ~ exp(λ_diff) from an exponential distribution where λ_diff is the mean diffusion for a single transcript. Independently, a diffusion direction θ_a is sampled from a uniform distribution over [0, 2π). This yields coordinate-wise displacements

$${d}_{a}^{x}={D}_{a}\cos {\theta }_{a}\,{d}_{a}^{y}={D}_{a}\sin {\theta }_{a}$$

The original spot coordinates x_a, y_a for the transcript can then be revised accordingly to displaced coordinates

$${x}_{a}^{{\rm{d}}{\rm{i}}{\rm{f}}{\rm{f}}}={x}_{a}+\lfloor {d}_{a}^{x}/{d}_{{\rm{s}}{\rm{p}}{\rm{o}}{\rm{t}}}\rfloor \,{y}_{a}^{{\rm{d}}{\rm{i}}{\rm{f}}{\rm{f}}}={y}_{a}+\lfloor {d}_{a}^{y}/{d}_{{\rm{s}}{\rm{p}}{\rm{o}}{\rm{t}}}\rfloor $$

The rescaling by d_spot = 0.715 accounts for the inter-spot distance of 0.715 μm in Stereo-seq mouse kidney experiments.

Synthetic data analysis

We ran TopACT directly on synthetic data, with an SVM local classifier trained on the same snRNA-seq reference dataset used for generation. For fixed-window bin 20 analysis, we split the 500 × 500 synthetic grid into square bins, each covering a 20 × 20 region of spots. Bin 20 was chosen so that each bin matches the mean area of a synthetic cell. Moreover, at bin 20 the resulting grid approximates 10 μm resolution, which is considered the ‘sweet spot’ for single-cell analysis¹. We then summed the expression over all spots in each region. RCTD²⁴ was run on couplet mode with default settings, using the same snRNA-seq reference dataset and we assigned each bin the RCTD predicted ‘first type’. For the modal cell type classification, we assigned to each bin its most frequent ground truth cell type. For analysis of rare cell type identification, we took rare cell types to be those making up less than 5% of the total samples in the snRNA-seq reference data. To extract cell loci, we first computed a binary image corresponding to each rare cell type. We performed a binary dilation to clean any cells that were split into several adjacent components and then took the centre of each connected component to be a cell centre.

MPH landscapes

MPH tracks how the topological features (here, loops) of a shape evolve as certain parameters are varied. Given an input point cloud, we record the first persistent homology (H₁) of its associated Rips-codensity bifiltration. This information is summarized in a sequence, called an MPH landscape⁹, of functions λ_k: ${\mathbb{R}}$ × ${\mathbb{R}}$ → ${\mathbb{R}}$ for k = 1, 2,…. Given a radius parameter s and a codensity parameter t, the value λ_k(s, t) ∈ ${\mathbb{R}}$ roughly describes the significance of the kth most significant topological feature in the bifiltration at those parameter values. Here, we focus on λ₁: ${\mathbb{R}}$ × ${\mathbb{R}}$ → ${\mathbb{R}}$ which describes the significance of the most significant such feature. For a full introduction to MPH, see Supplementary Methods 1B. Here, we computed average control and treated MPH landscapes for TopACT-predicted immune cell points clouds (Supplementary Methods 2B). MPH was computed with RIVET (https://github.com/rivetTDA/rivet/) and converted to MPH landscapes using the code from ref. ⁷ (https://github.com/MultiparameterTDAHistology/SpatialPatterningOfImmuneCells).

Statistical tests and reproducibility

Statistical tests were performed using the Python packages Scipy (https://www.scipy.org/) and statannotations (https://github.com/trevismd/statannotations). All tests are one-sided Welch’s t-tests.

The snRNA-seq (one control and two lupus nephritis mice, four healthy human kidney samples) and spatial experiments (one control and two lupus nephritis mice, one IgAN patient kidney sample) were not replicated. Mouse histological images are representative of images from six lupus and six control animals and multiplex immunofluorescence images representative of images from three control and three lupus mice.

Animals

Female BALB/cOlaHsd mice were purchased from Envigo at 5 weeks of age. Animals were housed in specific pathogen-free individually ventilated cages, at 20–24 °C and 45–65% humidity, kept on a 12 h light/dark cycle from 08:00 to 20:00, with food and water freely available. All animal experiments were performed under project licence P84582234, with UK Home Office approval and local approval by the Oxford University Clinical Medicine Animal Welfare and Ethical Review Body and were carried out in compliance with UK Home Office Guidelines and the Animals Scientific Procedure Act 1986 (amended 2013) and reported in line with the ARRIVE guidelines. Mice were treated topically with either 5% Aldara (Imiquimod) cream (Meda Pharmaceuticals) or Vaseline (Unilever) control on both ears, three times weekly for 8 weeks.

Mouse kidney processing

Snap-frozen mouse kidneys were obtained from 12 female mice randomized to treatment TLR7 agonist or control (unblinded). Treated mice develop lupus-like renal disease with glomerular endocapillary proliferation, which includes proliferation of circulating immune cells that have migrated to the capillary tuft. The 10 μm cryosections from three mice (four slices from a control sample and two and four slices, respectively, from two treated samples) were successfully processed for spatial transcriptomics with Stereo-seq⁶. Remaining kidney tissue from the above samples was dissociated to single nuclei, partitioned and sequenced to generate snRNA-seq data⁴¹, yielding a matched single-nucleus and subcellular spatial transcriptomics dataset. We cluster cells in the snRNA-seq dataset using Seurat¹⁹ and annotate cell types according to top marker genes. Initial clustering identified 30 populations, shown in Extended Data Fig. 2, alongside key cell type markers. The spatial data for each sample consist of gene expression readings measured on a grid of 220 nm DNA nanoball spots with a centre-to-centre inter-spot distance of 715 nm. These data are represented by a D-dimensional expression vector (D ≈ 25,000) at each spot. Spatial analysis was restricted to a boundary region defined as the convex hull of high expression spots (Supplementary Methods 2 and Supplementary Fig. 1).

Human tissue

We used the Xenium platform (10x Genomics)¹⁷ to generate spatial data from a human kidney IgA nephropathy biopsy core obtained with the assistance of the Oxford Centre for Histopathology Research as an approved project under the Oxford Radcliffe Biobank research tissue bank ethical approval (South Central—Oxford C Research Ethics Committee: 19/C/0193). For snRNA-seq, healthy control kidney tissue was obtained from pre-implantation biopsies in four living donor kidneys through an approved project in the Oxford Transplant Biobank (South Central—Oxford C Research Ethics Committee: 19/SC/0529). All human samples were obtained with informed patient consent for research. Human tissue experiments were carried out under the University of Oxford Human Tissue Act license 12217.

Mouse brain model

We examined publicly available adult mouse spatial data profiled with Stereo-seq⁶. To ensure an unbiased comparison, for scRNA-seq integration we used the same reference mouse brain atlas⁴² as in the original spatial study and trained the TopACT local classifier to identify the PVM1 subcluster. Similar to mouse kidney, we restricted analysis to a boundary region defined as the convex hull of high expression spots. From TopACT spot classifications, we performed a binary dilation and then called PVM cells as connected components of size at least 60 spots. To validate these called cells, we examined the expression level of PVM marker genes (as defined in the same atlas⁴²) in TopACT-predicted PVM cells compared to uniformly sampled background cells.

Spatial RNA-seq

Stereo-seq was performed as previously described⁶. Capture chips were loaded with DNA nanoballs (DNB) generated by rolling circle amplification of random 25 base pair (bp) oligonucleotides. Single end sequencing (MGI DNBSEQ-Tx) was performed to determine the DNB coordinate identity at each spatial location on the chip, followed by ligation of 22 bp polyT and 10 bp molecular identity oligos to the DNB. The 10 μm kidney tissue sections were cryosectioned from optimal cutting temperature (OCT) embedded frozen blocks and adhered to the chip surface, fixed in methanol, stained with nucleic acid dye (Thermo Fisher Scientific, Q10212) for imaging and incubated at 37 °C with 0.1% pepsin (Sigma, P7000) for 12 min to permeabilize. After permeabilization, we performed reverse transcription and cDNA amplification and the Agilent 2100 was used to check the range of cDNA fragments. The cDNA was interrupted by inhouse Tn5 transposase and amplified and the fragments double-selected. After screening, libraries were subjected to Agilent 2100 quality inspection. Finally, the double-selection libraries were constructed into libraries suitable for the MGI DNBSEQ-Tx sequencing platform through circularization steps and were sequenced to collect data (50 bp for read 1 and 100 bp for read 2).

For Xenium (10X Genomics) on formalin fixed paraffin embedded human renal biopsy blocks, tissue was sectioned and 5 μm slides prepared following the manufacturer-recommended protocol (CG000580) onto Xenium slides using the predesigned 377 gene 10X Human Multi-Tissue and Cancer Xenium Pre-Designed Gene Expression Panel (10x, 1000626). Padlock probes were incubated on the tissue overnight before rolling circle amplification and chemical autofluorescence quenching. Slides were imaged on a Xenium Analyzer machine. Cell segmentation, gene transcript by cell and transcript by tissue location data matrices were generated by the Xenium Onboard Analysis pipeline. Supervised cell clustering was performed in Seurat¹⁹ by finding a set of anchors between the healthy human snRNA-seq dataset and the per cell Xenium data and using these to transfer labels to the Xenium segmentation defined cells.

For TopACT, transcripts with Q-score of 20 or more were selected and binned to 1 μm resolution before processing with the standard TopACT pipeline with a minimum radius of 2 μm, a maximum radius of 5 μm and a confidence level of 0.9. The local classifier was trained from the paired snRNA-seq dataset using the procedure detailed in Supplementary Methods 2A.

snRNA-seq

Tissue from the same frozen organs used for mouse kidney spatial transcriptomics was used to perform complementary snRNA-seq. Single nuclei were isolated as previously described with minor modifications⁴³. In brief, kidney tissues were placed into a 2 ml Dounce homogenizer (Sigma) with 2 ml of prechilled homogenization buffer (10 mM Tris pH 8.0 (Thermo Fisher), 250 mM sucrose (Sigma), 1% BSA (Sangon Biotech), 5 mM MgCl₂ (Thermo Fisher), 25 mM KCl (Thermo Fisher), 0.1 mM dithiothreitol (Thermo Fisher), 1X protease inhibitor cocktail (Roche), 0.4 U μl⁻¹ of RNase inhibitor (MGI), 0.1% NP40 (Roche)). After incubation on ice for 10 min, tissues were homogenized by ten strokes of the loose pestle A and filtered with 70 μm cell strainer (Falcon). The homogenate was further homogenized with ten strokes by tight pestle B, filtered using 30 μm cell strainer (Sysmex) into 15 ml conical tube and centrifuged at 500g for 5 min at 4 °C. The pellet was resuspended in 1 ml of blocking buffer (PBS (Thermo Fisher), 1% BSA, 0.2 U μl⁻¹ of RNase inhibitor) and centrifuged at 500g for 5 min; this step was repeated once. The pellet was resuspended using cell resuspension buffer (MGI) at concentration of 1,000 nuclei per μl for further library preparation. The snRNA-seq libraries were prepared using DNBelab C Series Single-Cell Library Prep Set (MGI, no. 1000021082)⁴⁴ Droplets were generated from a single nuclei suspension, followed by emulsion breakage, bead collection, reverse transcription and cDNA amplification to generate barcoded libraries. Indexed libraries were constructed following the manufacturer’s protocol, quantified using Qubit ssDNA Assay Kit (Thermo Fisher Scientific, Q10212) and sequenced using DNBSEQ-T1 at the China National GeneBank (Shenzhen, China) with read length 41 bp for read 1, 100 bp for read 2 and 10 bp for sample index.

For human snRNA-seq, single nuclei were isolated from fresh or liquid nitrogen flash-frozen renal biopsies using the 10X Genomics Chromium Nuclei Isolation Kit with RNase Inhibitor 16 rxns (PN-1000494), following the kit protocol with the following modifications, 0.2 U μl⁻¹ of supplemental RNase inhibitor was added to the lysis buffer and debris removal buffer, polypropylene 1.5 ml Eppendorf collection tubes were coated with 10% BSA (MACS BSA Stock Solution, Miltenyi Biotec, 130-091-376) overnight before use and the final nuclei suspension was filtered through a 40 μm FLOWMI cell strainer (SP Bel-Art, 136800040). Samples with more than 1 million nuclei were then flow cytometry sorted to clean up the sample using Sytox-7AAD Live dead staining (Invitrogen S10349). Samples with less than 1 million nuclei after washing were filtered a second time with the 40 μm FLOWMI cell strainer. Gene libraries from isolated human renal single nuclei were constructed with droplet-based scRNA-seq using the Chromium Next GEM Single Cell 3′ GEM, Library & Gel Bead Kit v.3.1, 4 rxns (PN-1000128). Single nuclei were loaded at 1,000 nuclei per μl for a targeted yield of 10,000 nuclei per sample. Libraries were sequenced on a NovoSeq6000 (Illumina) at the Oxford Genomics Centre or with Novogene. Runs were demultiplexed and the resulting fastq files processed through the 10X Genomics Cellranger pipeline. Filtered gene matrix data were then analysed in R using the Seurat package.

Single-nucleus clustering

The snRNA-seq data were analysed using Seurat v.4.0.2 (mouse) or 4.4.0 (human)¹⁹. Mouse nuclei were filtered on gene count less than 500 or greater than 3,500 and mitochondrial percentage greater than 5. Human nuclei were filtered and selected for gene features greater than 500 and less than 6,000, gene count less than 25,000 and mitochondrial percentage less than 25. Data were log normalized, variable features identified and linear transformation scaling performed. Principal component analysis dimensionality reduction was run before the human snRNA-seq data were Harmony⁴⁵ integrated to remove batch effects. The first 30 principal components were selected and clusters identified using the ‘FindClusters’ method in Seurat with a resolution of 0.6 (mouse) or 0.5 (human). The ‘FindAllMarkers’ function was used to identify genes that characterized each cluster and differential expression of genes was tested between clusters. Cluster annotation was performed manually on the basis of the top markers, applying knowledge of renal physiology with reference to the literature. Visualizations of the annotated snRNA-seq dataset in the form of UMAP and violin plots are available in Extended Data Figs. 1 (human) and 2 (mouse).

Mouse kidney immune subclustering

To investigate the subpopulations of TopACT-identified immune cells in mouse kidney, we performed a supervised annotation of these cells based on snRNA-seq subclusters using Sonar⁴⁶. Annotation resolved subpopulations of B cells, dendritic cells, macrophages and T cells. Examining the proportions of these subpopulations across samples (Extended Data Fig. 5a) shows expansion of T cell, dendritic cell and macrophage populations in IMQ-treated kidneys. Annotating each immune cell according to its proximity to glomeruli further shows acquisition of T cells, dendritic cells and macrophage cells within the glomeruli themselves (Extended Data Fig. 5b). Extended Data Fig. 5c shows the expression of cell type-specific markers by immune subset and condition (IMQ-treated or control). We visualize the spatial distribution of these subpopulations for representative samples in Extended Data Fig. 6. This visualization shows enrichment of T cells (top row) and macrophages (middle row) but not B cells (bottom row), in glomeruli, consistent with multiplex immunofluorescence results in Fig. 5.

Comparison to ssDNA-based segmentation

We used ssDNA imaging of the mouse kidney Stereo-seq samples to compute cell segmentation based on cell nucleus locations, as previously described⁶. For further validation, we compared TopACT-called cell loci for immune and podocyte cells with these cell bins on a single representative IMQ-treated sample. A TopACT-predicted cell was assigned to an ssDNA bin if its centre was in the bin itself or if the centre-to-centre distance of the bin was sufficiently close (within ten spots, 7.15 μm), the latter case dealing with the scenario of two cell bins with overlapping boundaries but non-coincident cell centres. We found that 110 out of 137 (80%) of TopACT-predicted immune cells and 46 out of 50 (92%) of TopACT-predicted podocyte cells coincided with an ssDNA bin. Extended Data Fig. 3 shows the cell bins annotated according to the assigned TopACT cell type. Only three ssDNA bins were found to coincide with more than one TopACT-predicted cell, providing further evidence that TopACT predictions correspond to ground truth cells. Visual inspection of these three examples is suggestive of the underlying ssDNA bins being doublets.

Multiplex immunofluorescence

Multiplex immunofluorescence staining was performed on 4-μm-thick formalin fixed paraffin embedded sections according to the OPAL protocol (Akoya Biosciences) on the Leica BOND RXm auto stainer (Leica Microsystems). Six staining cycles were carried out using the following primary antibody–Opal fluorophore pairs; thereafter the sections were counterstained with diamidino-2-phenylindole (DAPI; FP1490, Akoya Biosciences): Ly6G/Ly6C (Gr-1) 1:400 (MAB1037-SP; R&D Systems)–Opal 480 1:150; CD4 1:500 (ab183685; Abcam)–Opal 520 1:150; CD8 1:800 (98941; Cell Signaling)–Opal 520 1:150; CD68 1:1,200 (ab125212; Abcam)–Opal 570 1:150; CD19 1:600 (90176; Cell Signaling)–Opal 620 1:150; CD11b 1:80,000 (ab133357; Abcam)–Opal 690 1:150; and E-cadherin 1:500 (3195; Cell Signaling)–Opal 780 1:25.

Samples were deparaffinized and rehydrated according to Leica BOND Rx protocol. Antigen retrieval was carried out at 100 °C for 20 min, tissue sections were incubated in either Epitope Retrieval solution 1 or 2, before the application of each primary antibody. Sections were incubated with primary antibody for 1 h; thereafter, Opal fluorophores were substituted for 3,3′‐diaminobenzidine (DAB) and added using the BOND Polymer Refine Detection System, with a 10 min incubation time. After completion of all cycles of staining, sections were counterstained with DAPI and mounted with VECTASHIELD Vibrance Antifade Mounting Medium (H-1700-10; Vector Laboratories). Slides were scanned and multispectral images of tissue sections obtained using the Akoya Bioscience Vectra PolarisTM. Analysis of the images was carried out using Zeiss Arivis v.4.1.1 (Zeiss). Mean intensity was measured across glomerular regions, manually defined on the basis of E-cadherin and nuclei and non-glomerular regions comprising the whole field of view with the glomerular regions subtracted. Mean intensities for each channel were summed per glomerular and non-glomerular region to calculate the total ‘immune’ intensity per glomerular region. Ratios were calculated per glomerulus using the mean intensity which was calculated by subtracting the mean intensity of a central glomerular region from the mean intensity of the whole glomerular region.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link