Experimental procedures

Statistical methods were not used to predetermine sample size. Experiments were not randomized. Investigators were not blinded to allocation during experiments and outcome assessment.

Mice

Wild-type 129SV mice were purchased from Taconic Biosciences. All mouse work was performed in compliance with all the relevant ethical regulations established by the Institutional Animal Care and Use Committee (IACUC) of Boston Children’s Hospital and under protocols approved by the IACUC of Boston Children’s Hospital. Mice were maintained on a 14-h light/10-h dark schedule in a temperature (22 ± 3 °C) and humidity (35% ~ 70% ± 5%)-controlled environment, with food and water provided ad libitum. Male and female mice were used equally for all experiments.

Generation and characterization of the entire Vκ locus inversion mouse model

The CRISPR–Cas9-mediated entire Vκ locus inversion modifications were made on one Igk allele in the TC1 embryonic stem (ES) cell line. Targeting of the ES cells was performed using sgRNA1 and sgRNA2 as previously described41. Positive clones with 3.1 Mb Vκ locus inversion were identified by PCR and confirmed by Sanger sequencing. After testing negative for mycoplasma, the ES clone with Vκ inversion was injected into RAG2-deficient blastocysts to generate chimeras42. The chimeric mice were bred with wild-type 129SV mice for germline transmission of the targeted inversion, and bred to homozygosity. Sequences of all sgRNAs and oligonucleotides mentioned in this section and sections below are listed in Supplementary Table 1.

Generation of V H
7-3 Igh pre-rearranged; Rag2
−/− mouse model

The heterozygous or homozygous VH7-3 Igh pre-rearranged mice (VH7-3wt/re or VH7-3re/re) were generated through induced pluripotent stem (iPS) cells and maintained in the Alt laboratory. To perform 3C-HTGTS experiments with RAG2-deficient background, VH7-3wt/re or VH7-3re/re mice were crossed with Rag2−/− mice to obtain VH7-3wt/re; Rag2−/− or VH7-3re/re; Rag2−/− mice on the 129SV background.

Purification of bone marrow precursor B cells

For RAG on-target and off-target analysis, single cell suspensions were derived from bone marrows of 4- to 6-week-old male and female wild-type and Igk Vκ locus inversion 129SV mice and incubated in Red Blood Cell Lysing Buffer (Sigma-Aldrich, R7757) to deplete the erythrocytes. B220+CD43lowIgM pre-B cells were isolated by staining with anti-B220–APC (1:1,000 dilution; eBioscience, 17-0452-83), anti-CD43–PE (1:400 dilution; BD Biosciences, 553271) and anti-IgM–FITC (1:500 dilution; eBioscience, 11-5790-81) and purifying via fluorescence-activated cell sorting (FACS), and the purified primary pre-B cells were directly used for HTGTS-V(D)J-seq as described21,43.

For 3C-HTGTS experiments, B220-positive primary pre-B cells were purified via anti-B220 MicroBeads (Miltenyi, 130-049-501) from 4- to 6-week-old male and female VH7-3wt/re; Rag2−/− or VH7-3re/re; Rag2−/− mice. Purified pre-B cells from 3 or 4 mice were pooled together for each 3C-HTGTS experiment. Each mouse was double-checked and confirmed by PCR and Sanger sequencing prior to various assays.

Generation of single Jκ5 v-Abl cell line and its derivatives

The construction of sgRNA–Cas9 plasmids and methods for nucleofection-mediated targeting experiments described for this section and all subsequent paragraphs describing v-Abl line modifications were performed as previously described7. All v-Abl cell lines have not been tested for mycoplasma contamination.

The initial ‘parental’ Rag2−/−;Eμ-Bcl2+ v-Abl cell line in the 129SV background was generated previously6. Random 1–4 bp indels (barcodes) were introduced into a site ~85 bp downstream of the Jκ5-RSS heptamer and ~40 bp upstream of the Jκ5 bait primer on both alleles in this parental line, similarly to the approach previously described to modify JH46. The resulting ‘Jκ5-barcoded’ v-Abl line was further targeted with sgRNA1 and sgRNA2 to invert the whole Vκ locus on one allele and leaving the other allele intact. Thus, the Igk allele-specific barcode permits the separation of sequencing reads derived from the wild-type allele and the Vκ inverted allele assayed with the same bait primer under the same cellular context. This barcoded line was used to generate the data in Fig. 1b,e.

To facilitate further modifications on the Igk locus, the Jκ5-barcoded v-Abl line was targeted with sgRNA1 and sgRNA3 that deleted the entire Igk locus on one allele and left the other allele intact. The barcode was not relevant to further studies based on this single Igk allele line or its derivatives. The single Igk allele line was further targeted by another two pairs of sgRNAs to separately delete Jκ1 to Jκ4 (sgRNA4 and sgRNA5) and downstream Igk-RS (sgRNA6 and sgRNA7) to exclude confounding secondary rearrangements and keep the configuration unchanged between Jκ5 and iEκ. This line is referred to as the ‘single Jκ5 allele line’.

The single Jκ5 allele line was further modified by specifically designed Cas9–sgRNA to generate the single Jκ5-Vκ inv line (sgRNA8 and sgRNA9), single Jκ5-inv line (sgRNA10 and sgRNA11), single Jκ5-single Igh line (sgRNA12 and sgRNA13), single Jκ5-PKO line (sgRNA2 and sgRNA14), single Jκ5-Cer knockout (KO) line (sgRNA15 and sgRNA16), single Jκ5-Sis KO line (sgRNA17 and sgRNA18), and single Jκ5-CerSis KO line (sgRNA15 and sgRNA18).

The single Jκ1 allele v-Abl line was generated from the single Igk allele line by separately deleting Jκ2 to Jκ5 (sgRNA10 and sgRNA19) and deleting downstream Igk-RS (sgRNA6 and sgRNA7).

All candidate clones with desired gene modifications were screened by PCR and confirmed by Sanger sequencing.

Generation and analysis of DJH pre-rearranged WAPL-degron v-Abl cell lines

The DJH pre-rearranged v-Abl lines in C57BL/6 background were derived from the previously described WAPL-degron v-Abl line7. The open reading frame sequences of Rag1 and Rag2 were cloned into pMAX-GFP vector (Addgene, 177825) following the standard protocol to generate pMAX-Rag1 and pMAX-Rag2 plasmids. These two plasmids (each 2.5 μg) were nucleofected into 2.0 × 106 WAPL-degron v-Abl cells to allow endogenous D-to-JH rearrangements mediated by transient RAG expression. Cells harbouring the desired DQ52JH4 rearrangement (DJH-WT line) were subsequently identified by PCR screening and verified by Sanger sequencing. The DJH-inv v-Abl line was generated from the DJH-WT line by using Cas9–sgRNA to target sequences downstream of JH4 and upstream of DQ52 (sgRNA20 and sgRNA21). The DJH-WT and DJH-inv lines were treated with IAA and Dox to deplete WAPL as described7.

Generation of Igh–Igk hybrid v-Abl cell line and its derivatives

The Igh–Igk hybrid v-Abl cell line was derived from the single Jκ5 allele v-Abl line. In brief, the single Jκ5 allele line was targeted by sgRNA12 and sgRNA13 to generate the single Jκ5-single Igh line where the entire Igh locus was deleted from one allele. The single Jκ5-single Igh line was then targeted by sgRNA22 (cut 1, upstream of IGCR1 in Igh) and sgRNA8 (cut 2, upstream of Vκ2-137 in Igk) to generate a balanced chromosomal translocation between chromosomes 12 and 6. In the resulting Igh–Igk hybrid v-Abl line, the entire Igk locus along with the rest of chromosome 6 was appended onto chromosome 12 at a point upstream of IGCR1 in Igh, and the Igh VH locus along with the small telomeric portion of chromosome 12 was reciprocally appended onto chromosome 6. To generate the Igh–Igk hybrid-Vκ line, the Igh–Igk hybrid line was sequentially modified to invert the entire Vκ locus (sgRNA15 and sgRNA23), mutate DQ52 RSSs (sgRNA24 and ssODN1) and delete all upstream D segments (sgRNA25 and sgRNA26). To generate the Igh–Igk hybrid-Vκ-JκRSS-PKO line from the Igh–Igk hybrid-Vκ-JκRSS line, sgRNA2 and sgRNA14 were used to delete the proximal Vκ domain.

To generate the Igh–Igk hybrid-D-JH line, the Igh–Igk hybrid line was targeted by sgRNA27 and sgRNA28 to delete IGCR1 and the entire Vκ locus. The Igh–Igk hybrid-D-JH line was further modified to generate the Igh–Igk hybrid-D line where JH1-4 has been deleted (sgRNA29 and sgRNA30).

All candidate clones with desired gene modifications were screened by PCR and confirmed by Sanger sequencing. See Fig. 4a and Extended Data Figs. 5a and 8a for detailed strategy and procedure.

Whole-chromosome painting

Whole-chromosome painting was performed on single Jκ5-single Igh v-Abl line and Igh–Igk hybrid v-Abl line using fluorescent probes tiling the entire chromosome 6 (Chr6-FITC, Applied Spectral Imaging) and chromosome 12 (Chr12-TxRed, Applied Spectral Imaging) according to standard protocol. In brief, cells were treated with colcemid at 0.05 μg ml−1 final concentration for 3 h before being processed for metaphase drop. The slides were dehydrated in ethanol series, denatured at 70 °C for 1.5 min, and hybridized to denatured probe mixture at 37 °C for 12–16 h. The slides were then washed, stained with DAPI, and imaged with Olympus BX61 microscope. ImageJ (1.53q) was used for image processing.

RSS replacement experiments

All RSS replacement modifications were generated via Cas9–sgRNA using short single-stranded DNA oligonucleotide (ssODN) as donor template. In brief, 2.5 μg Cas9–sgRNA plasmid and 5 μl 10 μM ssODN were co-transfected into 2.0 × 106 v-Abl cells. PCR screening was performed sequentially on pooled clones and then single clones, and subsequently verified by Sanger sequencing. Specifically, sgRNA31 and ssODN2 were used to replace JH1-RSS with Jκ5-RSS in Igh–Igk hybrid-Vκ v-Abl line to generate the Igh–Igk hybrid-Vκ-JκRSS line; sgRNA32 and ssODN3 were used to replace Jκ5-RSS with JH1-RSS in single Jκ5-single Igh line to generate the single Jκ5-single Igh-JHRSS line; sgRNA33 and ssODN4 were used to replace DQ52 upstream RSS with Vκ12-44-RSS in Igh–Igk hybrid-D line to generate the Igh–Igk hybrid-D-VκRSS line; sgRNA34 and ssODN5 were used to replace DQ52 upstream RSS with Vκ11-125-RSS in DJH-inv line to generate the DJH-inv-VκRSS line; sgRNA35 and ssODN6 were used to replace VH5-2-RSS with Jκ1-RSS in DJH-inv-VκRSS line to generate the DJH-inv-VκRSS-JκRSS line.

RAG complementation

RAG was reconstituted in RAG1-deficient v-Abl cells via retroviral infection with the pMSCV-RAG1-IRES-Bsr and pMSCV-Flag-RAG2-GFP vectors followed by 3–4 days of blasticidin (Sigma-Aldrich, 15205) selection to enrich for cells with virus integration7. RAG2 was reconstituted in RAG2-deficient v-Abl cells via retroviral infection with the pMSCV-Flag-RAG2-GFP vector followed by two days of puromycin (ThermoFisher, J67236) selection to enrich for cells with virus integration5.

HTGTS-V(D)J-seq and data analyses

HTGTS-V(D)J-seq libraries were prepared as previously described6,7,21,43 with 0.5–2 μg of genomic DNA (gDNA) from sorted primary pre-B cells or 10 μg of gDNA from G1-arrested RAG-complemented RAG-deficient v-Abl cells. The final libraries were sequenced on Illumina NextSeq550 with control software (2.2.0) or NextSeq2000 with control software (1.5.0.42699) using paired-end 150-bp sequencing kit. HTGTS-V(D)J-seq libraries were processed via the pipeline described previously43. For Igh rearrangement analysis in DJH-WT and DJH-inv WAPL-degron v-Abl lines, the data were aligned to the mm9_DQ52JH4 genome and analysed with all duplicate junctions included in the analyses as previously described43. For analysis in DJH-inv-VκRSS and DJH-inv-VκRSS-JκRSS v-Abl lines, the data were aligned to the mm9_DQ52JH4_VκRSS genome. For all other rearrangement analysis, primary pre-B cells and v-Abl cells used are from 129SV background. Since there is almost no difference in the Igk locus between C57BL/6 and 129SV genomic backgrounds44, the data were aligned to the AJ851868/mm9 hybrid (mm9AJ) genome6 except: data from Igh–Igk hybrid-Vκ-JκRSS and Igh–Igk hybrid-Vκ-JκRSS-PKO v-Abl lines were aligned to the mm9AJ_JH1toJκ5RSS genome, data from single Jκ5-single Igh-JHRSS v-Abl line were aligned to the mm9AJ_Jκ5toJH1RSS genome, and data from Igh–Igk hybrid-D-VκRSS v-Abl line were aligned to the mm9AJ_DQ52uptoVκRSS genome. To show the absolute level of V(D)J recombination, each HTGTS-V(D)J-seq library was down-sampled to 500,000 total reads (junctions + germline reads); to show the relative Vκ usage pattern across the Vκ locus, individual Vκ usage levels were divided by the total Vκ usage level in each HTGTS-V(D)J-seq library to obtain the relative percentage. Such analyses are useful for examining effects of potential regulatory element mutations. For example, differences in absolute rearrangement levels between two samples with the same relative rearrangement patterns would reflect differences in RAG or RC activity without changes in long-range regulatory mechanisms7,26.

RAG off-targets were extracted from corresponding normalized HTGTS-V(D)J-seq libraries by removing on-target junctions on bona fide RSSs. We noticed the remaining junctions in the Igk locus were skewed to a few very strong RSS sites, which represent unannotated bona fide RSSs not associated with functional Vκ segments. We eliminated these strong RSSs from our cryptic RSS analyses by filtering out RSS sites with a CAC and additional at least 9 bp matches to the remaining ideal heptamer AGTG and ideal nonamer ACAAAAACC in the context of a 12-or-23-bp spacer—that is, at most 4-bp mismatches to the ideal RSS site. In addition, because coding end junctions are processed and can spread across several bps beyond the CAC cleavage site4, the new code has the advantage of collapsing these coding end junctional signals within 15 bp into one peak mapped to the CAC cleavage site for better visualization of off-target coding junction peaks. For visualization of the actual distribution of coding end junctions, one can reveal them through analysis with our prior pipeline. Details of both pipelines used are provided in Code availability. Junctions are denoted as deletional if the prey cryptic RSS is in convergent orientation with the bait RSS and as inversional if the prey cryptic RSS is in the same orientation with the bait RSS.

3C-HTGTS and data analyses

3C-HTGTS was performed as previously described3 on G1-arrested RAG2-deficient v-Abl cells3,5,6,7,26. Reference genomes were the same as used in HTGTS-V(D)J-seq data analyses described above. To better normalize 3C-HTGTS libraries and reduce the impact of the level of self-ligation (circularization), the high peaks upstream of the bait site were filtered out, following the same rationale as described for 4C-seq45. For iEκ-baited 3C-HTGTS libraries, we removed bait site peaks in the chr. 6:70,675,300–70,675,450 region; For Cer CBE1-baited 3C-HTGTS libraries, we removed bait site peaks in chr. 6:70,659,550–70,659,700 region; For Sis CBE2-baited 3C-HTGTS libraries, we removed bait site peaks in chr. 6:70,664,600–70,664,800 region; For IGCR1 CBE1-baited 3C-HTGTS libraries, we removed bait site peaks in the chr12:114,740,239–114,740,353 region. Then, only the junctions inside of a genomic region (chr. 6:64,515,000–73,877,000 for the entire Igk locus; chr. 12:111,453,935–120,640,000 for the entire Igh locus; chr. 6:64,515,000–70,658,827 and chr. 12:111,453,935-114,824,843 for the Igh–Igk hybrid-Vκ locus) encompassing the entire Ig locus were retained (see details in Code availability). After processing as described above, the retained junctions of the 3C-HTGTS libraries were further normalized to 50,827 total number of junctions, which is the junction number recovered from the smallest library in the set of libraries being compared. The sequences of primers used for generating 3C-HTGTS libraries are listed in Supplementary Table 1.

Unlike ChIP-seq, the junctions of 3C-HTGTS data are discontinuously distributed on the genome, but mainly on the enzyme cutting sites (CATG by NlaIII). To call peaks for 3C-HTGTS data, we first collapsed the junction signals to nearby enzyme cutting sites, and discarded signals far away (>10 bp) from enzyme cutting sites. Then, we only focused on the cutting sites with signals, calculated the median with a moving window of 101 cutting sites (one centre, 50 left, and 50 right sites). We did a Poisson test for each site, with the median as a conservative over-estimation of the lambda parameter of Poisson distribution. Based on the raw P values from the Poisson test, we calculated Bonferroni-adjusted P values, called peak summits at the sites with adjusted P value < 0.05, and determined the range of peak region by progressively extending the two sides to the sites that have local maximum raw P value and also the raw P values ≥ 0.05. Nearby overlapping peak regions were merged as one peak region, and only the ‘best’ (defined by lowest P value) summit was kept after merging. Finally, for each group of multiple repeats, we merged the overlapping peak regions from all repeats, and counted the number of supporting repeats for each merged peak region. We defined and only kept the ‘robust’ peak regions that were supported by >50% of the repeats (that is, ≥ 2 supporting repeats among 2 or 3 repeats, or ≥ 3 supporting repeats among 4 or 5 repeats), and the ‘best’ (defined by lowest P value) summit information was reported.

We further annotated and quantified the features underlying each of the robust 3C-HTGTS peak region ±1 kb. We focused on CBEs, E2A-binding sites, and transcription. For CBEs, we first scanned the possible CBEs by MEME-FIMO using the CTCF motif record (MA0139.1) in JASPAR 2018 core vertebrate database. We applied MACS2 to call peaks in the three repeats of published CTCF ChIP-seq data in parental v-Abl line6, and only kept ‘reliable’ CBEs with motif score > 13 and overlapping with peaks called in ≥2 repeats. We counted the number of reliable CBEs within each of the robust 3C-HTGTS peak region ±1 kb, and defined them as having an underlying CBE if the number ≥ 1. For E2A-binding sites, we applied MACS2 to get the signal bigwig file from the published E2A ChIP-seq data46, and then annotated the maximum E2A ChIP-seq signal value within each of the robust 3C-HTGTS peak region ±1 kb. We defined peaks having underlying E2A site if the maximum signal ≥ 0.5. For transcription, we annotated the maximum and the average signal of the three repeats of published GRO-seq data in parental v-Abl line6, and defined a peak as having transcription if the maximum signal ≥40 or the average signal ≥10 in ≥2 repeats. See details in Code availability.

Quantification and statistical analysis

Graphs were generated using GraphPad Prism 10, Origin 2023b and R version 3.6.3. After normalization in each sample, 3C-HTGTS, ChIP-seq and GRO-seq signals of multiple repeats were merged as mean ± s.e.m. of the maximum value in each repeat in each bin, after dividing the plotting region into 1,000 bins (Fig. 2m and Extended Data Fig. 2) or 200 bins (Supplementary Data 1). Unpaired, two-sided Welch’s t-test was used to compare total rearrangement levels between indicated samples, with P values presented in relevant figure legends. Pearson correlation coefficient (r) and the corresponding P value were calculated to determine the similarity in Vκ usage pattern between indicated samples after calculating the average usage among repeats, and are presented in relevant figure legends.

Availability of materials

All plasmids, cell lines and mouse lines generated in this study are available from the authors upon request.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link


administrator