Study participants and design

Sixteen healthy adults aged 18–30 years, with no evidence of a previous SARS-CoV-2 infections or vaccinations (seronegative), were included for scRNA-seq sample processing and analysis from the wider cohort (36 participants) enrolled as part of the human SARS-CoV-2 challenge study, pioneered by the government task force, Imperial College London, Royal Free London NHS Foundation Trust, University College London and hVIVO7. These participants were enrolled as part of cohorts 5 and 6, from June to August 2021. Additionally, 20 healthy adults were included as part of the same study (earlier cohorts)7, and blood and nasal (mid-turbinate) samples were processed for bulk RNA-seq as previously described12 (see Supplementary Table 1q for an overview of the bulk RNA-seq validation cohort and samples included). Of these participants, ten individuals received pre-emptive remdesivir as previously described7. Volunteers were tested for the presence of anti-SARS-CoV-2 protein antibodies using a MosaiQ COVID-19 antibody microarray (Quotient) before enrolment and excluded based on a positive test, as well as on risk factors assessed by clinical history, physical examinations and screening assessments. See ref. 7 for the full list of inclusion and exclusion criteria and for further details regarding the challenge set-up and ethics. In brief, written informed consent was obtained from all volunteers before screening and study enrolment. The clinical study was registered with ClinicalTrials.gov (identifier NCT04865237). This study was conducted in accordance with the protocol, the Consensus ethical principles derived from international guidelines, including the Declaration of Helsinki and Council for International Organizations of Medical Sciences International Ethical Guidelines, applicable ICH Good Clinical Practice guidelines, and applicable laws and regulations. The screening protocol and main study were approved by the UK Health Research Authority—Ad Hoc Specialist Ethics Committee (reference: 20/UK/2001 and 20/UK/0002).

Participant 11, who fulfilled enrolment criteria, was later found to have low pre-inoculation levels of neutralizing and spike-binding antibodies (see serum antibody titre methods below). This individual was classified as an abortive infection based on virus kinetics (see virology method below). When tested, the exclusion of this individual was found not to alter any of our conclusions (data not shown).

The participants were followed for 1 year after inoculation, with continued samples and metadata collected for the use in future studies and to benefit the research community. No participants enrolled in the study were observed to present with any long-COVID symptoms at this final time point (1 year), which included an interview by a study clinician to assess for symptoms and a complete physical examination. The UPSIT scores for all participants had returned to baseline and no other symptoms were reported, with physiological observations and physical examination of vital signs were all seen to be normal (including temperature, heart rate, blood pressure, respiratory rate, saturation of peripheral oxygen level [SpO2], spirometry and electrocardiogram). Of note, although most symptoms were seen to spontaneously resolve themselves, one participant (participant 2) out of the six total who reported anosmia or dysosmia as part of the single-cell cohort received additional smell training and a short course of steroids (28 days after inoculation)7. This study, however, focused primarily on the first 28 days after inoculation (with the exception of 46 days for one participant as noted below, see sample collection below).

Of note, after the participants were discharged from quarantine and before their day 28 follow-up (when additional blood samples were collected), two participants reported either to have had their first SARS-CoV-2 vaccine (participant 9) or a community infection (participant 7). In brief, participant 9 had their first vaccine on day 14 after inoculation (2 weeks before the day 28 sample was taken). Participant 7 tested positive before their day 28 visit was due. The follow-up was therefore delayed by 2 weeks, resulting in the day 28 sample for this participant instead being taken at day 46 after inoculation. ELISpot performed on this participant revealed a response in the day 28 and day 90 samples (data not shown). Moreover, participant 8 tested positive on day 29 after inoculation, a day after their day 28 sample was taken. However, for this participant, the ELISpot showed no response at day 28 and a small response at day 90. See Extended Data Fig. 1a for overview of the samples and time points included from each participant. These individuals and time points were found not to alter any of our conclusions.

Challenge virus

Participants were intranasally inoculated with a wild-type pre-Alpha SARS-CoV-2 challenge virus (SARS-CoV-2/human/GBR/484861/2020) at dose 10 TCID50 at day 0. A volume of 100 µl per naris was pipetted between both nostrils and the participant was asked to remain supine (face and torso facing up) for 10 min, followed by 20 min in a sitting position wearing a nose clip after inoculation to ensure maximum contact time with the nasal and pharyngeal mucosa. Mid-turbinate nose and throat samples were collected twice daily using flocked swabs and placed in 3 ml of viral transport medium (BSV-VTM-001, Bio-Serv) that was aliquoted and stored at −80 °C to evaluate viral kinetics (infection status) as described in the section ‘Virology’ below. Participants remained in quarantine for a minimum of 14 days after inoculation until the following discharge criteria were met: two consecutive daily nose and/or throat swabs with no viral detection or a qPCR Ct value > 33.5 and no viable virus by overnight incubation viral culture with detection by immunofluorescence. For details of the protocol and ethics used within the human SARS-CoV-2 challenge study, see the ‘Challenge virus’ section of the methods in ref. 7.

Sample collection for scRNA-seq cohort

Nasopharyngeal swabs

Samples were collected at the Royal Free Hospital by trained healthcare providers at 7 time points: day –1 (pre-inoculation) and days 1, 3, 5, 7, 10 and 14 after inoculation. The participants were asked to clear any mucus from their nasal cavities, and nasopharyngeal samples were collected using FLOQSwabs (Copan flocked swabs, ref. 501CS01) inserted along the nasal septum, above the floor of the nasal passage to the nasopharynx until a slight resistance was felt. The swab was then rotated in this position in both directions for 10 s and slowly removed while still rotating and immediately stored in a pre-cooled cryovial on wet ice containing freeze medium (90% heat-inactivated FBS and 10% dimethyl sulfoxide (DMSO)). On wet ice, the cryovials were transferred to the hospital chutes where they were sent down to the laboratory (<2 min at room temperature), placed in a slow-cooling device (Mr. Frosty Freezing Container, Thermo Fisher Scientific) and stored at −20 °C until all samples were collected, at which point they were moved to −80 °C freezers for at least 48 h for optimum freezing. Samples were moved and stored in liquid nitrogen for later processing.

PBMC isolation from peripheral blood

Peripheral whole blood was collected at the Royal Free Hospital in EDTA tubes at 5 time points: day –1 (pre-inoculation) and days 3, 5, 10, 14 and 28 after inoculation. Each day, the blood was transferred at room temperature to Imperial College London for fresh isolation and collection of PBMCs by means of Histopaque Ficoll separation (Merck, H8889-500ML). The peripheral whole blood was first diluted 1:1 with 1× PBS (Merck, D8662-500ML) before being gently overlaid onto a maximum of 15 ml of Histopaque, at a ratio of 2:1 (blood to Histopaque). The samples were then centrifuged at 400g (with no breaks) for 30 min at room temperature and the PBMC white buffer layer was collected, washed (with PBS about 50 ml) and spun down (400g for 10 min at room temperature), before the supernatant was carefully discarded and the cell pellet was resuspended in 10 ml PBS. The cells were filtered using a 40 or 70 μm cell strainer and then both the cell number and viability were assessed using Trypan Blue. The cells were further centrifuged (400g for 10 min) and resuspended in the required volume of cell freezing medium (90% FBS (Sigma, F9665-500ML) and 10% DMSO (Sigma, D2650-100ML)), before being cryopreserved at −80 °C using a slow-cooling device. The blood and nasopharyngeal samples were collected within 2 h of each other.

Clinical assessments

Participants were carefully monitored and assessed daily using an array of blood tests, spirometry, electrocardiograms and clinical assessments (vital signs, symptom diaries and clinical examination). Full details of all the safety and clinical data collected with the human SARS-CoV-2 challenge study can be obtained in the methods in ref. 7, with an overview of metadata and demographics for the 16 participants enrolled for the scRNA-seq part of this study (up to 28 day after inoculation) in Supplementary Table 1g.

Virology

From 24 h after inoculation, twice daily samples (swabs) were taken at 12-h intervals from both the nose (mid-turbinate) and throat (pharyngeal) to assess and quantify the viral kinetics of each participant before and after inoculation (morning and afternoon) for their quarantine period (minimum 14 days, which was extended with the continued detection of virus). These were measured using two independent assays: (1) RT–qPCR with N gene primers/probes adapted from the Centers for Disease Control and Prevention protocol34 (updated 29 May 2020); and (2) quantitative culture by focus forming assay (FFA). For full details of each assay and statistical analysis, refer to the methods in ref. 7.

The lower limit of quantification (LLOQ) for RT–qPCR was 3 log10 copies per ml, with positive detections less than the LLOQ assigned a value of 1.5 log10 copies per ml and undetectable samples assigned a value of 0 log10 copies per ml. Only samples in which participants presented with consecutive positive RT–qPCR results were further tested using the FFA assay. In the FFA, the LLOQ was 1.27 FFU ml−1. Viral detection less than the LLOQ was assigned 1 log10 FFU ml−1, and undetectable samples were assigned 0 log10 FFU ml−1.

Infection intervals for each participant were calculated based on the time of the first and last RT–qPCR test with detectable virus (across the nose and/or throat), time points in which tests below the LLOQ (1.5) were also counted if they occurred <2 days of a quantifiable (>LLOQ) test result.

An overview of the virology in each of the 16 participants included in the single-cell cohort (<28 days after inoculation) is provided in Extended Data Fig. 1b,c, with CT and FFA (virus titre) values provided in Supplementary Table 1a,b,h,i.

Infection group nomenclature

A sustained laboratory-confirmed infection was defined as quantifiable RT–qPCR detection greater than the LLOQ from mid-turbinate and/or throat (pharyngeal) swabs on 2 or more consecutive 12-h time points, starting from 24 h after inoculation and up to discharge from quarantine. Participants for whom only a single or two non-consecutive RT–qPCR tests returned quantifiable results (>LLOQ) were classified as transient infections. Participants for whom no RT–qPCR tests returned quantifiable results (>LLOQ) were classified as abortive infections (Extended Data Fig. 1b and Supplementary Table 1a,b,h,i). The nomenclature of sustained, transient and abortive infection groups was carefully chosen based on the hypotheses that viral exposure through inoculation leads to sustained, transient and aborted viral replication, respectively, in these participants. Here sustained infection events resemble typical COVID-19 cases, whereby after viral infection, the virus spreads through the upper airway tissues and replicated to highly detectable levels. Transient infections represent a new group of cases whereby we propose that successful but limited replicative infection has taken place, leading to viral loads that were borderline detectable. Finally, we propose that non-replicative viral infection (that is, abortive viral infections) has taken place in the participants who belong to the abortive infection group.

Nasopharyngeal swab dissociation and processing for scRNA-seq

Following freezing, nasopharyngeal swabs were transferred to a category level 3 facility at University College London, stored and processed in batches of 7–8 samples at a time to a single-cell suspension. All work was carried out in a MSC class I hood in compliance with standard category level 3 safety practices. The dissociation and collection of cells from nasopharyngeal swabs was carried out in accordance with the previously described protocol35,36, with minor modifications. This approach involves multiple parallel washes and digestion steps using both the nasopharyngeal swab and collected freezing and wash medium to help ensure maximum cell recovery. First, samples are exposed to DTT for 15 min, followed by an Accutase digestion step for 30 min, before cells from the same sample (collected directly from the swab or the freezing medium and washes from that swab) are quenched, pooled and filtered before checking cell number and viability.

In brief, samples were rapidly thawed (tube A) and the liquid collected in an empty 15 ml Falcon tube (tube B). The cryovial, lid and swab was then carefully rinsed three times with 1 ml warm RPMI 1640 medium, which was added dropwise to the 15 ml tube while gently swirling the tube to slowly dilute the DMSO from the freezing medium to help prevent the cells bursting. After waiting 1 min, the tube (tube B) was then topped up with an extra 2 ml of warm RPMI 1640 medium and centrifuged at 400g for 5 min at 4 °C. The cell pellet was then resuspended in RPMI 1640 and 10 mM DTT (Thermo Fisher, R0861), and incubated for 15 min on a thermomixer (37 °C, 700 r.p.m.), centrifuged as above and the supernatant was aspirated and the cell pellet was resuspended in 1 ml Accutase (Merck, A6964-500ML). This was then incubated for a further 30 min on the thermomixer (37 °C, 700 r.p.m.).

In parallel to the processing of the cell freezing medium and washes above, the swab was moved to a new 1.5 ml Eppendorf tube (tube C) containing 1 ml RPMI 1640 and 10 mM DTT and placed on the thermomixer (37 °C, 700 r.p.m.) for 15 min. In accordance with the steps above, the swab was next transferred to a new 1.5 ml Eppendorf (tube D) containing 1 ml Accutase and incubated with agitation (700 r.p.m.) at 37 °C. The 1 ml RPMI 1640 and 10 mM DTT from the nasopharyngeal swab incubation (in tube C) was centrifuged at 400g for 5 min at 4 °C to pellet cells, the supernatant was discarded, and the cell pellet was resuspended in 1 ml Accutase and incubated for 30 min at 37 °C with agitation (700 r.p.m.).

Following the Accutase digestion step, all cells were combined (tubes B, C and D) and filtered using a 70 μm nylon strainer (pre-wetted with 3 ml quenching medium: RPMI 1640, 10% FBS and 1 mM EDTA (Invitrogen, 1555785-038)) in a 50 ml conical tube (tube E). The filter, tubes and swab were then further thoroughly rinsed with quenching medium to collect all cells, and the washes were combined. The dissociated, filtered cells (tube E) were then centrifuged at 400g for 5 min at 4 °C, and supernatant discarded. The cell pellet was resuspended in residual volume (about 500 µl) and transferred to a new 1.5 ml Eppendorf tube (tube F). Tube E was then washed with a further 500 µl of RPMI 1640 with 10% FBS and combined with tube F, centrifuged as above, the supernatant removed and the cells resuspended in 20 µl RPMI 1640 and 10% FBS. Using Trypan Blue, total cell counts and viability were assessed. The cell concentration was adjusted for 7,000 targeted cell recovery according to the 10x Chromium manual before loading onto a 10x chip (between 700 and 1,000 cells per µl) and processing immediately for 10x 5′ single-cell capture using a Chromium Next GEM Single Cell V(D)J Reagent kit v.1.1 (Rev E Guide). For samples in which fewer than 13,200 total cells were recovered, all cells were loaded.

Note that owing to the sample type, necessary freezing process and no access to a class 3 flow facility to sort viable cells, the majority of the samples processed were seen to have low viability (ranging from 5.4% to 57.85%, with the average viability of samples processed being 26.89%).

PBMC CITE-seq staining for single-cell proteogenomics

Frozen PBMC samples were thawed and processed in batches of 16 to enable a carefully designed pooling strategy. Here each sample was pooled twice into two distinct pools containing up to four PBMC samples per pool from mixed time points. Note that only one sample from each donor was ever pooled together at a time to assist with subsequent demultiplexing. This pooling strategy was used to help remove and correct for any protocol-based batch effects.

In brief, PBMC samples were rapidly thawed at 37 °C in a water bath. Warm RPMI 1640 medium (20–30 ml) containing 10% FBS (RPMI 1640 and FBS) was added slowly to the cells before centrifuging at 300g for 5 min. This was followed by a wash in 5 ml RPMI 1640 and FBS. The PBMC pellet was collected, and the cell number and viability were determined using Trypan Blue.

PBMCs from 4 different donors were then pooled together (1.25 × 105 PBMCs from each donor) to make up 5.0 × 105 cells in total. The remaining cells were used for DNA extraction (Qiagen, 69504). The pooled PBMCs were resuspended in 22.5 µl cell staining buffer (BioLegend, 420201) and blocked by incubation for 10 min on ice with 2.5 µl Human TruStain FcX block (BioLegend, 422301). The PBMC pool was then stained with TotalSeq-C Human Cocktail, V1.0 antibodies (BioLegend, 399905) according to the manufacturer’s instructions (1 vial per pool). For a full list of TotalSeq-C antibodies (130 antibodies and 7 isotype controls) refer to Supplementary Table 1j. Following a 30-min incubation period with the TotalSeq-C Human Cocktail V1.0 antibodies (at 4 °C in the dark), the PBMCs were topped up using cell staining buffer and centrifuged down to a pellet (500g for 5 min at 4 °C), discarding the supernatant. The pellet was then resuspended and washed in the same manner 2 more times using the resuspension buffer (0.05% BSA in HBSS), before finally being resuspended in 20–30 µl resuspension buffer and counted again. The PBMC pools were then processed immediately for 10x 5′ single-cell capture (Chromium Next GEM Single Cell V(D)J Reagent kit v.1.1 with Feature Barcoding technology for cell Surface Protein-Rev D protocol). A total of 25,000 cells were loaded from each pool onto a 10x chip.

PBMC Dextramer staining for SARS-CoV-2 antigen-specific T cell enrichment and single-cell sequencing

To further validate and investigate the SARS-CoV-2 antigen-specific T cell populations in our single-cell dataset, day 10, 14 and 28 post-inoculation PBMCs samples from all 16 participants were further enriched and processed for single-cell sequencing using a multi-allele panel of 44 SARS-CoV-2 antigen-specific dCODE Dextramers (10x compatible) (Immudex, see Supplementary Table 1k for full panel). This panel includes five antigen-specific T-cell populations, spanning four MHC class I and one MHC class II alleles (covering a total of 15 participants; see Supplementary Table 1l) and several negative controls. Samples were then stained with several FACS antibodies (for monocyte and T cells) and sorted using a MACSQuant Tyto cell sorter (Miltenyi Biotec), after which PE-dCODE Dextramer-positive cells were collected and processed for 10x 5′ single-cell capture. This enabled the quantification of paired clonal TCR sequence and TCR specificity by overlaying single-cell V(D)J expression onto dCODE Dextramer-positive cell clusters.

The Dextramer staining protocol was taken from Immudex and optimized and adapted to suit our samples and pooling and staining strategy. In brief, the PBMC samples were thawed in batches of 7–8 samples and the cell number and viability for each sample calculated using Trypan Blue as described above. All cells from each sample were then pooled together in a fresh 1.5 ml Eppendorf tube. Note that the pooling strategy here was such that only one sample per participant or donor was used per pool to enable subsequent demultiplexing by genotype, with each pool containing a mixture of time points to help reduce batch effect. To ensure the collection of as many cells as possible, each of the original sample tubes was then washed with 200 µl staining buffer (1× PBS pH 7.4 containing 5% heat-inactivated FBS (Thermo Fisher Scientific, 10500064) and 0.1 g l–1 herring sperm DNA (Thermo Fisher Scientific, 15634017)) and added to the pool. The tube was then topped up to 1.4 ml with staining buffer and centrifuged down to a pellet (400g for 5 min at 4 °C). The supernatant was carefully removed and the cell pellet gently resuspended in a total of 30–40 µl staining buffer, depending on pellet size, ready for staining.

In parallel, the dCODE Dextramer master mix was prepared (in the dark) as per the manufacturer’s protocol. To help avoid aggregates, each individual Dextramer reagent was first microcentrifuged at full speed for 5 min before adding 2 µl from each dCODE Dextramer specificity to a low-bind nucleus-free 1.5 ml Eppendorf tube (Eppendorf, 30108051) containing 8.8 μl 100 μM d-Biotin (Avidity Science, BIO200) (0.2 µl d-Biotin per number of dCODE Dextramer specificity i.e., 44).The dCODE Dextramer master mix was mixed by gently pipetting before the total volume (96.8 µl) was added to the resuspended cells. The sample was then thoroughly mixed and incubated at room temperature for 30 min in the dark. Following the addition of anti-human CD14-FITC (BioLegend, 325603) and CD3-APC (BioLegend, 300458) (at 1:50) the cells were incubated for a further 20 min (at room temperature in the dark) before being topped up to 1.4 ml with wash buffer (1× PBS pH 7.4 containing 5% heat-inactivated FBS). The cells were centrifuged down to a pellet (400g for 5 min at 4 °C) and the supernatant discarded. The wash step was then repeated 2 times, with the latter using the addition of 1.4 ml wash buffer and 1:5,000 DAPI (Sigma) as live/dead stain. The supernatant was removed and the cell pellet resuspended in 4 ml FACS buffer (1× PBS, 1% FBS, 25 mM HEPES (Thermo Fisher Scientific, 15630-056) and 1 mM EDTA). The samples were then filtered (35 µm nylon mesh cell strainer) and PE dCODE Dextramer-positive cells were sorted using a MACSQuant Tyto cell sorter per the manufacturer’s guidelines (settings: mix speed = 800 r.p.m., chamber temperature = 4 °C, pressure = 150 hPA, noise threshold = 14.40, trigger threshold = off). Note, in order to collect as many cells as possible during sorting, the entire sample was run on the MACSQuant Tyto, with the negative run through collected and re-run a second time to ensure that no true positives were lost. See Extended Data Fig. 8d for the gating strategy for sorting. The PE dCODE Dextramer-positive cells were then collected, centrifuged (400g for 5 min at 4 °C) and resuspended in resuspension medium before counting the cells. The entire sample was then processed for 10x 5′ single cell capture (Chromium Next GEM Single Cell V(D)J Reagent kit v.1.1 with Feature Barcoding technology for cell Surface Protein-Rev D protocol). For cases when more than 25,000 cells were collected, the sample was split equally and loaded over two lanes.

To provide additional controls, participants with non-compatible HLA types, including one volunteer (participant_4) matching none of the HLA types for the multi-allele dCODE Dextramer panel, were also processed and used to determine background noise.

Library generation and sequencing

A Chromium Next GEM Single Cell 5′ V(D)J Reagent kit (v.1.1 chemistry) was used for scRNA-seq library construction for all nasopharyngeal swab samples, and a Chromium Next GEM Single Cell V(D)J Reagent kit v.1.1 with Feature Barcoding technology for cell surface proteins was used for PBMCs, both to process the PBMCs stained with the CITE-seq antibody panel and the dCODE Dextramer (10x compatible) panel. GEX and V(D)J libraries were prepared according to the manufacturer’s protocol (10x Genomics) using individual Chromium i7 sample indices. Additional TCR γ/δ enriched libraries were generated based on an in-house protocol as previously described37. The cell surface protein libraries were created according to the manufacturer’s protocol with slight modifications used for the creation of libraries generated from the CITE-seq antibody panel. These included doubling the SI primer amount per reaction and reducing the number of amplification cycles to 7 during the index PCR to avoid the daisy chain effect. GEX, V(D)J and the CITE-seq-derived cell surface protein indexed libraries were pooled at a ratio of 1:0.1:0.4 and sequenced on a NovaSeq 6000 S4 Flowcell (paired-end, 150 bp reads), aiming for a minimum of 50,000 paired-end reads per cell for GEX libraries and 5,000 paired-end reads per cell for V(D)J and cell surface protein libraries. The Dextramer-derived cell surface protein indexed libraries were submitted at a ratio of 0.1.

Single-cell genomics data alignment

scRNA-seq and CITE-seq data from PBMCs were jointly aligned against the GRCh38 reference that 10x Genomics provided with CellRanger (v.3.0.0), and alignment was performed using CellRanger (v.4.0.0). CITE-seq antibody-derived tag (ADT) barcodes were aligned against a barcode reference provided by the supplier, which we annotated to add informative protein names and made available in our GitHub repository (https://github.com/Teichlab/COVID-19_Challenge_Study). scRNA-seq data from nasopharyngeal swab samples were aligned against the same reference using STARSolo (v.2.7.3a) and post-processed with an implementation of emptydrops extracted from CellRanger (v.3.0.2). To detect viral RNA in infected cells, we added 21 viral genomes including pre-Alpha SARS-CoV-2 (NC_045512.2) to the abovementioned reference genomes for RNA-seq alignment, as previously described6. Single-cell αβ TCR and BCR data were aligned using CellRanger (v.4.0.0) with the accompanying GRCh38 V(D)J reference that 10x Genomics provided. Single-cell γδ TCR data were aligned against the GRCh38 reference that 10x Genomics provided with CellRanger (v.5.0.0), using CellRanger (v.6.1.2).

Single-cell genomics data processing

Both scRNA-seq and ADT-seq data were corrected using SoupX38 to remove free-floating and background RNAs and ADTs. To correct ADT counts, SoupX 1.5.2 parameters soupQuantile and tfidfMin parameters were set to 0.25 and 0.2, respectively, and lowered by decrements of 0.05 until the contamination fraction was calculated using the autoEstCont function. SoupX on RNA data was performed using default settings. To confidently annotate SARS-CoV-2-infected cells, we used SoupX-corrected viral RNA counts to remove false positives due to freely floating SARS-CoV-2 virions. However, when quantifying the amount of reads per cell in Fig. 2h and their distribution over the viral genome in Fig. 2f, we used the raw counts and sequencing data. To profile the distribution of viral reads, we removed PCR duplicates from the aligned BAM files that STARSolo produced with MarkDuplicates in picard (https://broadinstitute.github.io/picard/) and tallied the location within the SARS-CoV-2 genome using the start of each sequencing read. Aligned scRNA-seq data were imported from the filtered_feature_bc_matrix folder into Seurat (v.4.1.0) for processing, keeping only cells with at least 200 RNA features detected. Nasopharyngeal cells and PBMCs with more than 50% and 10% of the counts coming from mitochondrial genes, respectively, were excluded. SoupX-corrected gene expression and ADT counts were normalized by dividing it by the total counts per cell and multiplying by 10 000, followed by adding one and a natural-log transformation (log(1p)).

Demultiplexing and patient identity assignment

Each PBMC sample was pooled twice into two distinct pools containing up to four PBMC samples per pool, followed by CITE-seq and single-cell V(D)J sequencing as described above. Souporcell (v.2.0)39 was used to demultiplex each pool based on the genotype differences between the mixed samples. Souporcell analyses were performed with the skip_remap parameter enabled and using the common SNP database that was provided by the software. We used two complementary approaches to confidently assign participant identity to each Souporcell cluster. First we compared the cluster genotypes with SNP array derived genotyping data, generated for all participants and performed using the Affymetrix UK Biobank Axiom Array kit by Cambridge Genomic Services. Second, the combinations of samples within each pool was unique, which enabled assignment of participant identity based on the presence of unique participant-specific combinations of identical genotypes in two separate pools. This multiplexing and replication strategy furthermore enabled us to distinguish library specific batch effects from participant specific effects in downstream analyses.

Doublet detection

We used the output from Souporcell to identify ground-truth doublets in PBMCs by selecting droplets that contained two genotypes from different participants. We then included these ground-truth doublets into the iterative rounds of subclustering and cell-state annotation to look for doublet specific clusters that emerged, which we then subsequently removed. Doublets in the nasopharyngeal data were removed during iterative rounds of subclustering and cell-state annotation by identifying cell clusters that expressed marker genes from multiple distinct cell types.

Clustering and cell-type annotation

Principal component analysis was run on corrected gene expression counts from selected hypervariable genes, and the first 30 principal components were selected to construct a nearest neighbour graph and UMAP embedding. We used harmony40 to perform batch correction on the PBMC data on the sequencing library identity to remove technical batch effects. Leiden clustering41 performed at resolutions of 0.5, 1, 4 and 32 on nearest neighbour graphs and embeddings created with 500, 1,000, 2,000, 4,000, 6,000 and 8,000 selected hypervariable genes (excluding TCR and BCR genes) were used to perform iterative rounds of cell-type annotation based on marker gene expression and subsetting of clusters to obtain a highly granular cell state annotation. We used previously described cell-type marker genes5,6 to define cell types. Our cell-type annotation was furthermore guided by predicted cell-type labels using models provided in CellTypist42 and custom-trained models based on previously described annotations5,6.

Single-cell TCR and BCR data processing

Aligned single-cell BCR and αβ TCR sequencing data were imported in scirpy43 to obtain a cell by TCR or BCR formatted table, which was then added to Seurat objects containing gene expression data. Aligned single-cell γδ TCR data were reannotated using Dandelion (v.0.2.4)44.

Differential gene expression and gene ontology analysis

We used DESeq2 (ref. 45) to identify significantly changing genes and gene sets. Samples were pseudobulked on cell state and sample, and we used a Wald test to compute adjusted P values. To identify genes associated with infection outcome at day –1, we fitted gene expression from pre-infection samples on cell type, sex and infection outcome. We also included sequencing library identity as a covariate in the differential expression analyses on PBMCs. To quantify interferon stimulation, we used a previously published gene signature6, and we used the ‘AddModuleScore’ function from Seurat to quantify its expression per cell. Cells were classified as interferon stimulated if the module score was higher than 0.5, and significance was determined by a Mann–Whitney U-test on module scores, which was corrected for the multiple testing hypothesis using the Bonferroni approach.

Integration of five COVID-19 studies

Transcriptomic data from refs. 5,6,31,32,33 were processed using the single-cell analysis Python workflow Scanpy46. Each dataset was individually filtered following best practices outlined in ref. 47 (between 200 and 3,500 genes per cell, less than 10% mitochondrial genes expressed per cell, genes expressed in fewer than 3 cells, other parameters at default). The gene sets were reduced to their intersection before combining datasets. Cells came from a total of 602 individuals, with 325 patients with acute COVID-19, 110 patients convalescing from COVID-19, 114 healthy participants and 53 patients in hospital without COVID-19 (controls) (Supplementary Table 1d). This resulted in an integrated embedding containing 946,584 T cells with resolved TCR from 494 samples, made up of 455 donors of which 240 were patients with acute COVID-19, 82 were patients convalescing from COVID-19, 88 healthy participants and 45 patients in hospital without COVID-19 (Supplementary Table 1e). The total number of donors in the integrated object is smaller, as only samples with matching V(D)J sequencing data were kept. A probabilistic scVI model (2 hidden layers, 128 hidden nodes, 20-dimensional latent space, negative binomial gene likelihood, other parameters at default48) was trained on the data to map cells to a shared latent space and visualized using UMAP.

Identification of activated TCR clonotype groups using Cell2TCR

To identify TCR clonotype groups, we used tcrdist3 (ref. 49) with the provided human references to compute a sparse representation of the distance matrices for all identified TRA and TRB CDR3 sequences, with the radius parameter set to 150. We then summed the distances for TRA and TRB to obtain a combined distance matrix. Next, we iterated over possible TCR distance thresholds between 5 and 150 with increments of 5 to compute TCR clonotype groups at each threshold. We then generated a distance adjacency graph of TCRs from different T cells with a distance lower than the threshold, which was clustered to identify TCR clonotype groups using leiden41 clustering through the igraph package50, at a resolution of 1 and using the RBConfigurationVertexPartition partition. To find the optimal distance threshold at which only TCRs that recognize the same antigen are grouped together, we quantified clonotype group contamination at each threshold using two approaches. First, we assumed that T cells that were annotated as naive should not participate in an expanded clonotype group, and quantified the proportion of naive T cells in each clonotype group to determine the largest threshold at which we observed minimal participation of naive T cells. Second, we assumed that CD4+ T cells and CD8+ T cells should never be part of the same TCR clonotype group, so we set out to quantify the proportion of CD4+ and CD8+ mixing in each clonotype group to find the largest threshold at which mixing is minimal. Both approaches revealed the same optimal threshold of 35, at which both naive T cell participation and CD4+ and CD8+ mixing is minimal, which we then used for downstream analyses. To identify activated TCR clonotype groups, we assumed that these groups should include activated T cells and that we should at least detect multiple independent TCR clonotypes that seemed to be raised against the same antigen at the same time. We therefore selected clonotype groups that contained at least one participating activated T cell and that contained at least two unique CDR3 nucleotide sequences.

Identification of activated BCR clonotype groups

To identify BCR clonotype groups that were activated during infection, we used a similar approach as described above for T cells. Instead of using tcrdist to compute distances, we used the Levenshtein distance and iterated over possible thresholds between 1 and 20 to find an optimal threshold by quantifying naive B cell participation. This revealed that a Levenshtein distance of 2 is optimal to identify BCR clonotype groups that only contain B cells that recognize the same antigen. To identify activated BCR clonotype groups, we assumed that these groups should include antibody secreting B cells (plasmablasts and plasma cells) and that we should at least detect multiple independent BCRs clonotypes that seem to be raised against the same antigen at the same time. We therefore selected clonotype groups that contained at least one participating antibody secreting B cell and that contained at least three unique CDR3 nucleotide sequences.

Generation of V(D)J logos

TCR and BCR logos were generated by providing the CDR3 amino acid sequences of each clonotype group to the ggseqlogo R package51 or the logomaker Python package52. When clonotype groups contained CDR3 amino acid sequences of variable lengths, we selected the sequences with the most frequently occurring length within each group for visualization purposes only.

GLMMs of cell-state compositional changes over time

The relative abundance of cells per cell type in each sample was modelled using a GLMM with a Poisson outcome. When technical replicates were available (most of the PBMC samples), these were modelled as separate samples. We modelled participant identifiers, days since inoculation and sequencing library identifiers (of multiplexed libraries), as random effects to overcome collinearity between these factors. The effect of each clinical or technical factor on cell-type composition was estimated by the interaction term with the cell type. The glmer function in the lme4 package implemented on R was used to fit the model. The standard error of the variance parameter for each factor was estimated using the numDeriv package. The conditional distribution of the fold change estimate of a level of each factor was obtained using the ranef function in the lme4 package. The log-transformed fold change is relative to the pre-inoculation time point (day –1). The significance of the fold change estimate was measured by the local true sign rate, which is the probability that the estimated direction of the effect is true, that is, the probability that the true log-transformed fold change is greater than 0 if the estimated mean is positive (or less than 0 if the estimated mean is negative). We calculated P values using a two-sample Z-test using the estimated mean and standard deviation of the distribution of the effect (log-transformed fold change). P values were converted into FDRs using the Benjamini–Hochberg method.

Gaussian processes regression and latent variable models to infer time since viral exposure

To infer time from cell-state abundance, we first generated a logistic regression model using CellTypist42 to predict PBMC or nasopharyngeal cell states based on the highly detailed manually annotates cell states presented in this work. CellTypist models were trained and used under default parameters, with check_expression set to false, balance_cell_type set to true, feature_selection set to true, and max_iter set to 150. We next built a predictive model to infer time since viral exposure using the PBMC data presented in this work as a training dataset. We used the above mentioned publicly available PBMC data from five studies as a test dataset to predict time since viral exposure. Because we were specifically interested in comparing time since viral exposure to reported time since onset of symptoms in varying disease severities, we excluded samples for which these features were unknown. To ensure that the cell-state proportions in the training and test dataset were similar, we used our CellTypist model on both datasets to predict relative cell-state frequencies, which were used as input for our time prediction model. To account for participant-to-participant heterogeneity and continuous variation in the timeline of immune responses, we first constructed a Gaussian process latent variable model53 to smooth the time since viral exposure in the training dataset. We applied the Pyro implementation of this model54 across all predicted cell state abundances, and restricted the model to 2,000 iterations and a single latent variable that was initialized on the square root transformed time since inoculation. This resulted in an accurate recapitulation of the mean time since inoculation while smoothing outliers. We next used each predicted cell state as a task input to generate a multi-task Gaussian process regression model55 to predict the smoothened time since inoculation using GPyTorch56. We used the Adam optimizer and allowed for as many iterations for the loss in marginal log likelihood to reach zero. We next predicted the cell state compositions across the entire tested timeline (day –1 to day 28) and compared these cell state compositions to those in our query dataset as predicted by our CellTypist model. Last, we selected the time point at which predicted cell-state composition had the lowest mean squared error compared with the observed cell-state composition.

Matching clonotype groups to antigen–TCR database

We computated the fold change enrichment of SARS-CoV-2-specific TCRs in activated T cell populations compared with other T cell populations. After 10 random draws of n = 5,000 unique clones of both populations, the median fold change = 4.99, median P = 0.00044.

Bulk TCR sequencing and processing

Total RNA was extracted from whole blood samples collected in Tempus Blood RNA tubes (Thermo Fisher, 4342792) using the manufacturer’s protocol. TCR α and β genes were sequenced using a pipeline that introduces UMIs attached to individual cDNA molecules using single-stranded DNA ligation. The UMI enables correction for sequencing error PCR bias, and provides a quantitative and reproducible method of repertoire analysis. Full details for both the experimental TCR sequencing library preparation57,58 and the subsequent TCR annotation (V, J and CDR3 annotation) using Decombinator (v.4)59 are published. The Decombinator software is freely available at GitHub (https://github.com/innate2adaptive/Decombinator).

Memory formation analysis

T cell phenotypes (naive, activated, effector and memory) were recorded for an antigen-specific TCR clone at different time points throughout infection. TCR clones were filtered by having an activated label at least once, being observed in at least two samples, one of which had to be at day 28. Unique TCR clones are distinguished by colour and numbered with their clone_id identifier. Error bands are drawn when the same clone appeared with several distinct cell-type labels, and the size of the error band informs their relative ratios.

Quantifying TCR diversity restriction in phenotypic clusters using coincidence analysis

To quantify the diversity of TCRs found within different phenotypic clusters, we determined the probability with which two distinct clonotypes within a cluster share an identical CDR3 amino acid sequence60. For visualization, we normalized these probabilities by the same quantity calculated over the complete data regardless of phenotype. This ratio of probability of coincidences provides a stringent measure of convergent functional selection of distinct clonotypes that share the same TCR. The analysis is based on clonotypes defined by distinct nucleotide sequences of the hypervariable regions, and does not make direct use of clonal abundance as these can also reflect TCR-independent lineage differences. We focused our analysis on conventional T cells only, considered only cells with at least one valid functional α-chain and β-chain, and kept only a single chain for each cell in which there were multiple chains. We performed the analysis both on the α-chain and β-chain separately, as well as on paired α and β-chains, in each instance requiring exact matching of the CDR3 amino acid sequences.

Modelling infection outcome on HLA-DQA2 expression

To test whether cell-type-specific expression of HLA-DQA2 at the day before inoculation was predictive of the infection outcome of the challenge experiment, we performed logistic regression modelling using the ‘glm’ R package. For each cell type shown, we fitted whether or not a sustained infection would occur on the mean expression and fraction of cells expressing HLA-DQA2 at day –1. For cross-validation, we used the ‘roc’ R package and performed five 1:1 test-train splits.

Multi-flow re-analysis

Samples used to assess MAIT cell activation were collected as part of the prospective healthcare worker study Covidsortium. Participant screening, study design, sample collection and sample processing have previously been described in detail61. Participants with available PBMC samples who had PCR-confirmed SARS-CoV-2 infection (Roche cobas diagnostic test platform) at any time point were included as cases. A subset of consecutively recruited participants without evidence of SARS-CoV-2 infection on nasopharyngeal swabs and who remained seronegative by both Euroimmun antiS1 spike protein and Roche anti-nucleocapsid protein throughout follow-up (16 weeks of weekly PCR and serology) were included as uninfected controls. The study was approved by a UK Research Ethics Committee (South Central—Oxford A Research Ethics Committee, ref. 20/SC/0149). All participants provided written informed consent.

Multiparametric flow cytometry was performed as described previously and data related to immune subsets other than MAIT cells were previously published14. PBMCs were plated in 96-well round-bottomed plates (0.5–1 × 106 per sample) and washed once in PBS (PBS; Thermo Fisher) then stained with Blue fixable live/dead dye (Thermo Fisher) for 20 min at 4 °C in PBS. Cells were washed again in PBS and incubated with saturating concentrations of monoclonal antibodies against markers to be stained on the cell surface, diluted in 50% Brilliant violet buffer (BD Biosciences) and 50% PBS for 30 min at 4 °C. After surface antibody staining, cells were resuspended in fix/perm buffer (eBiosciences, Foxp3/Transcription Factor staining buffer kit, fix perm concentrate diluted 1:3 in fix/perm diluent) for 45–60 min at 4 °C. Cells were then washed in 1× perm buffer (10× perm buffer Foxp3/Transcription Factor staining buffer kit diluted to 1× in ddH2O) and saturating concentrations of intranuclear targets (Ki67) were stained in 1× perm buffer for 30–45 min, 4 °C. Cells were washed twice in PBS then analysed by flow cytometry using a LSR II flow cytometer (BD Biosciences). Flow cytometry data were analysed using FlowJo (v.10.7.1 for mac, Tree Star). Single stain controls were prepared with cells or anti-mouse IgG beads (BD Biosciences). Fluorescence minus one controls (FMOs) were used for gating (see ref. 14 for FMOs and detailed gating related to these stains). Note that the frequency of MAIT cells did not differ between controls or PCR+ as previously reported14.

Immunofluorescence confocal microscopy

As previously described62, SARS-CoV-2 and mock-infected human nasal epithelial cultures grown at an air–liquid interface were fixed using 4% (v/v) paraformaldehyde for 30 min, permeabilized with 0.2% Triton-X (Sigma) for 15 min and blocked with 5% goat serum (Sigma) for 1 h before overnight staining with primary antibody at 4 °C. Secondary antibody incubations were performed the next day for 1 h at room temperature. Cultures were then incubated with AlexaFluor 555 phalloidin and DAPI (Sigma) for 15 min before mounting with Prolong Gold Antifade reagent (Life Tech). Samples were washed with PBS-T after each incubation step. Images were captured using a LSM710 Zeiss confocal microscope and rendered using Nikon NIS Elements. Human nasal epithelial cell cultures from three individual donors (one child <12 years old, one adult 30–50 years old and one adult >70 years old) were stained and 4 technical repeats used per donor (mock and SARS-CoV-2 infection conditions). Representative images of immunofluorescence staining, taken 72 h after infection, of nasal epithelial cell cultures from the older adult and the child can be seen in Extended Data Fig. 5b and Extended Data Fig. 9a, respectively.

Transmission electron microscopy

Cultured human nasal epithelial cells that were either SARS-CoV-2-infected or mock-infected were fixed with 4% paraformaldehyde 2.5% glutaraldehyde in 0.05 M sodium cacodylate buffer at pH 7.4 and placed at 4 °C for at least 24 h, as previously described62,63. The samples were incubated in 1% aqueous osmium tetroxide for 1 h at room temperature before subsequently en bloc staining in undiluted UA-Zero (Agar Scientific) for 30 min at room temperature. The samples were dehydrated using increasing concentrations of ethanol (50, 70, 90 and 100%), followed by propylene oxide and a mixture of propylene oxide and araldite resin (1:1). The samples were embedded in araldite and left at 60 °C for 48 h. Ultrathin sections were acquired using a Reichert Ultracut E ultramicrotome and stained using Reynold’s lead citrate for 10 min at room temperature. Images were taken on a JEOL 1400Plus transmission electron microscope equipped with an Advanced Microscopy Technologies (AMT) XR16 charge-coupled device camera and using the software AMT Capture Engine. Human nasal epithelial cell cultures from three individual donors (one child <12 years old, one adult 30–50 years old and one adult >70 years old) at 72 h after infection (mock and SARS-CoV-2 infected) were processed and imaged. Representative images 72 h after infection from SAR-CoV-2-infected nasal epithelial cell cultures from the older adult (>70 years) are shown in (Fig. 2), with additional images from the child (<12 years), younger adult (30–50 years) and older adult (>70 years) can be seen in Extended Data Fig. 9b.

Serum antibody assays

As previously described7, serum samples from each participant were taken and the antibody titre measured using two assays. In brief, the SARS-CoV-2 anti-spike IgG concentrations were determined by ELISA (using Nexelis) and reported as ELU ml−1 (Supplementary Table 1p). Neutralizing antibody titres for live SARS-CoV-2 virus (lineage Victoria/01/2020) were determined by microneutralization assay at the UK Health Security Agency and reported as the 50% neutralizing antibody titre (NT50).The LLOQ was 58 and 50.2 ELU ml−1, respectively, for the microneutralization assay and the spike protein IgG ELISA. For the median (IQR) per infection group, see the summary study metadata table in Supplementary Table 1g.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link