Cell-line cultures and genomic editing
HH cutaneous T cell lines (ATCC, CRL-2105), Jurkat E6-1 (ATCC, TIB-152) and Daudi cells (ATCC, CCL-213) were cultured in complete RPMI (cRPMI), RPMI 1640 supplemented with 10% heat-inactivated FBS and 1% non-essential amino acids, sodium pyruvate, HEPES, l-glutamine, penicillin-streptomycin and 0.1% β-mercaptoethanol. HEK293T cells (ATCC, CRL-3216) were cultured in complete DMEM, DMEM supplemented with 10% heat-inactivated FBS and penicillin-streptomycin antibiotics.
To investigate the regulatory regions around HLA-DQB1, HH and Daudi cells were genomically modified as previously described with some modifications21. In brief, 20 µM Cas9 protein (Synthego) was mixed with equal volumes of 40 µM modified sgRNA (Synthego) and incubated at 37 °C for 15 min to form RNP complexes. The sgRNA was designed using SpliceR40. HH and Daudi cells were nucleofected with 2 µl of RNPs in an Amaxa 4D nucleofector and 1 µl of ssODN repair template (SE protocol: CL-120 for HH cells; SF protocol: CA-137 for Daudi cells). Cells were immediately transferred to 24-well plates with pre-warmed media and cultured until sorting.
For Jurkat and Daudi cells targeting FBXO11, 1 µl of mRNA (2 µg µl−1) encoding the base editor ABE8e-NG was mixed with 1 µl of 40 µM modified sgRNA (Synthego). Cells were nucleofected with 2 µl of an mRNA/sgRNA mixture in an Amaxa 4D nucleofector (SE protocol: CL-120 for Jurkat cells; Daudi as above). Cells were incubated as described above in 24-well plates and cultured until sorting.
For multiplexed PAX5 editing in Daudi cells, three sgRNAs were designed for each of the two DNA binding domains (DBD), designated region A and region B. Editing was performed in region A and B separately, as well as combined. Daudi cells were nucleofected with 1 µl of 2 µg µl−1 ABE8e mRNA and 1 µl of each sgRNA (40 µM) using the reagents and protocol described above. Cells were sorted into single-cell plates seven days after nucleofection followed by CRAFTseq.
Healthy individuals, PBMC isolation and CD4+ T-cell magnetic selection
We recruited healthy individuals and processed 40–50 ml of peripheral blood under a M*** General Brigham (MGB) IRB-approved protocol (IRB# 2008P000427). PBMCs were isolated by layering Ficoll Paque (Sigma-Aldrich) underneath 1:1 PBS-diluted blood followed by centrifugation. Buffy coat layers were extracted, washed in PBS and then resuspended in cRPMI. Cells were stored by adding an equal volume of freezing medium (10% DMSO and 50% FBS in cRPMI) and frozen in liquid N2 until use. To isolate naive or total CD4+ T cells, frozen PBMCs were quickly thawed and put immediately in warm cRPMI media. Cells were washed twice and naive or total CD4+ T cells were isolated using a magnetic negative selection kit (Miltenyi, Naive CD4+ T cell isolation kit human, CD4+ T cell Isolation kit) following the manufacturer’s protocols.
CRISPR editing of primary CD4 T cells
After naive CD4+ T cell isolation, cells were stimulated with anti-CD3 and anti-CD28 Dynabeads (ThermoFisher) at a ratio of 1:1 (cell:Dynabead) in the presence of 5 ng ml−1 rhIL-2 and 10 ng ml−1 rhIL-7 (Biolegend) in 96-well U-bottom plates. After one day, cells were extracted and nucleofected with CRISPR reagents. In brief, 250,000–500,000 stimulated CD4+ T cells were nucleofected with 1 µl of mRNA (2 µg µl−1) encoding the base editor and 1 µl of sgRNA (40 µM, Synthego) in an Amaxa 4D nucleofector (P3 protocol: EH-115). After nucleofection, cells were transferred to 96-well U-bottom plates and cultured in cRPMI for 4 h. For experiments targeting PTPRC, cells were cultured for a further 5 days with 5 ng ml−1 rhIL-2 before proceeding to staining and cell sorting. For polarization experiments, cells were split into two wells and cultured with either 10 ng ml−1 rhIL-12, 5 ng ml−1 rhIL-2 and 2 µg ml−1 anti-IL4 antibody for TH1 conditions or 10 ng ml−1 rhIL-2 and 5 ng ml−1 rhTGFβ for Treg conditions.
Cell staining with indexing and oligo-conjugated antibodies
Cells were isolated, washed and resuspended in a cell staining buffer (2% FBS in PBS with 2 mM EDTA). About 500,000 to one million cells were stained with fluorophore-conjugated hashing antibodies for 20 min on ice. After staining, cells were washed and conditions pooled together. Pooled samples were then spun down and resuspended in FcX True Stain (Biolegend) (BioLegend) for 5 min at room temperature, followed by staining with the TotalSeq-A Human Universal Cocktail 1.0 (BioLegend) for 30 min on ice before washing and proceeding to single-cell sorting. A full list of antibodies and barcodes can be found in Supplementary Table 1.
Bulk DNA isolation
Cells were pelleted, resuspended in RLT+ buffer (Qiagen), flash frozen on dry ice and stored at −80 °C until processing. For DNA isolation, samples were thawed, vortexed and incubated for 5 min at room temperature before proceeding to DNA isolation using a Qiagen RNA/DNA extraction kit following the manufacturer’s protocols. After isolation, DNA concentrations were measured by spectrometry (Nanovue) and stored at −20 °C until use. For blockysis of bulk DNA editing, nested PCR reactions were used to amplify 200–400 bp genomic regions of interest with custom Illumina adapters added for sequencing. Bulk samples were sequenced alongside single-cell libraries. Sequences of all primers can be found in Supplementary Table 7.
Bulk flow cytometry
Stimulated and genomically edited total CD4+ T cells were ***ayed for expression of key protein markers by flow cytometry on day 7 after nucleofection with a panel of fluorophore-conjugated antibodies. For all samples, cells were isolated and washed twice in PBS, and FC receptors were blocked with FcX True Stain (Biolegend) for 15 min on ice followed by staining with directly conjugated antibodies for 30 min on ice. Cells were then washed and samples were blockysed on a BD LSR Fortessa. All data were processed using FlowJo before blockysis in R. A full list of antibodies, clones and manufacturers can be found in Supplementary Table 8.
CRAFTseq
Lysis buffer containing 0.2% TritonX (Millipore Sigma), 1.2 U recombinant RNase inhibitor (Takeda), 1.2 mM DTT, 1 M Betaine, 6 mM dNTP and 9 mM dCTP (ThermoFisher) was added into a 384-well PCR plate using a multi-channel pipette. An Agilent Bravo was used to add 2.5 µM of a well-specific OligoDT barcode into each well, then to distribute 1 µl of barcoded lysis buffer to multiple 384-well plates. These lysis plates were sealed and stored at −20 °C until use the next day for sorting. Single cells stained with fluorophore- and oligo-conjugated antibodies were sorted into lysis plates, which were kept sealed on dry ice before being stored at −80 °C until use for a maximum of 4 weeks.
For library generation, plates were thawed and incubated at 72 °C for 3 min, then kept on ice during further processing. RT-PCR mix (4 µl) containing 1.8 µM template switch oligo (Azenta), 0.3 mM dNTP, 0.8 M Betaine, 4.8 mM DTT, 9.7 mM MgCl2, 0.8 U RNase inhibitor, Maxima H Minus reverse transcriptase, 1× KAPA HIFI buffer and 2 U KAPA HiFi HotStart DNA polymerase (Roche) was then added to each well using a MANTIS liquid dispenser (Formulatrix). After first-strand synthesis (50 °C for 1 h and 85 °C for 5 min), a mixture of ADT primers (0.05 µM) and genomic DNA-specific primers (0.2 µM) were added using an I.DOT liquid handler (Dispendix) and amplified using the following program: 98 °C for 5 min then 18–22 cycles of 98 °C for 20 s, 65 °C for 20 s and 72 °C for 6 min, followed by 72 °C for 5 min and then hold at 4 °C. After amplification, the samples were split for processing of cDNA, genomic DNA and ADT.
For genomic DNA, using the Bravo, 1 µl of product per well was taken for further amplification of genomic DNA with nested primers containing a capture oligo with well-specific barcodes. The PCR mix was distributed between plates similarly to the process used for lysis plates and contained 0.3 mM dNTP, 1× KAPA Hifi buffer, 2 U KAPA HIFI HotStart Taq and 0.4 µM region-specific capture primer, 0.4 µM region-specific P7 primer and 0.4 µM capture oligo with well-specific barcode. The DNA was amplified using the following program: 98 °C for 5 min, followed by 20 cycles of 98 °C for 20 s, 65 °C for 20 s and 72 °C for 30 s, followed by 72 °C for 5 min, then hold at 10 °C. We then pooled 2 µl of each well containing nested genomic DNA using the Bravo and purified it with 1.2× solid-phase reverse immobilization (SPRI) beads. In brief, room-temperature SPRI beads were incubated at the given ratio by volume with PCR product for 10 min, then placed on a magnet for 5 min. Supernatant was removed and the beads were washed twice with 80% freshly made ethanol, dried for up to 15 min in air and resuspended in nuclease-free water. The resuspended beads were incubated at room temperature for 5 min, then placed back on the magnet for 5 min, and the supernatant was used for following steps. The cleaned DNA was treated with thermolabile ExoI and ExoVII (NEB) to remove primers (37 °C for 30 min, then 87 °C for 5 min for ExoI denaturation) and re-cleaned with 1.2× SPRI beads.
For multiplexed PAX5 editing, to effectively amplify gDNA amplicons from both region A and region B, two inner PCR reactions were carried out for each region separately, rather than amplifying both regions simultaneously in one reaction.
For cDNA, 2 µl per well of the original amplification product was pooled using the Bravo and cleaned with 0.6× SPRI beads; the supernatant was saved for ADT sequencing. The cleaned cDNA product was treated with ExoI to remove primers (37 °C for 5 min, 85 °C for 5 min for ExoI denaturation) and cleaned again with 0.8× SPRI. cDNA concentrations were measured on a QuBit and 2 ng was used for tagmentation with a NexteraXT kit (Illumina) following the manufacturer’s directions in half-volume reactions (25 µl total).
For ADT, saved supernatant from the 0.6× SPRI clean-up was re-cleaned with 1.4× SPRI (2× total) and ExoI was treated as above. The ADT product was quantified on a QuBit.
For the final amplification of pooled samples, 2 ng of tagmented cDNA and 5 ng of genomic DNA and ADT samples underwent a final amplification step from the 3′ end with custom Illumina-compatible primers. The PCR mix contained 0.2 µM P7 and P5 custom oligos, 1× Q5 buffer, 0.5 µl Q5 Taq, 0.2 µM dNTP and water to a total volume of 25 µl. Amplification used the following programs: cDNA 72 °C for 3 min gap repair, 95 °C for 30 s, followed by 16 cycles of 95 °C for 10 s, 55 °C for 30 s and 72 °C for 30 s, followed by 72 °C for 5 min, then hold at 10 °C. Genomic DNA and ADT: 98 °C for 3 min, followed by 16 cycles of 98 °C for 15 s, 65 °C for 20 s and 72 °C for 45 s, followed by 72 °C for 10 min. All samples were re-cleaned using SPRI beads (0.8× for cDNA, 1.2× for gDNA and 1.6× for ADT) for sequencing. All sequences can be found in Supplementary Tables 1–7. The full protocol can be found in a more readable format deposited on protocols.io (https://www.protocols.io/view/craftseq-d7bk9ikw), in Supplementary Notes and Supplementary Tables 16–23.
Illumina sequencing
Before submission to sequencing at the Genomics Platform at the Broad Institute, sample DNA concentrations were quantified on a QuBit, and distributions were measured using a TapeStation (Agilent) with D5000 high-sensitivity screentapes. All DNA amplicon libraries were sequenced on a MiSeq for 300 cycles (120 R1, 180 R2). ADT amplicon libraries were pooled at 5% with the RNA libraries and sequenced on a NextSeq550 for 75 cycles (26 R1, 50 R2).
Processing and alignment of single-cell data
The I7 and I5 raw FASTQs were merged across different lanes and runs. Transcriptomic reads were then aligned to the human genome (GRCh38 2020-A) using STARsolo41 (v.2.7.6a). The resulting count matrices were imported into R. For all experiments, cells were filtered on at least 500 genes, 1,000 UMIs and fewer than 10% mitochondrial reads. Counts were log(CP10K + 1) normalized and z-score scaled. For visualization, PCA was done on the top 3,000 variable genes, as selected using variance stabilizing transformation. This was followed by batch correction using Harmony42 and dimension reduction using UMAP (uwot package)43. For index flow-cytometry blockysis, raw fluorescence values were exported from index sorting as csv files and imported into R. Values were log10-normalized and blockysed in R.
Kallisto kite (v.0.27.3) was used to build an ADT reference transcriptome corresponding to the ADT panel (Biolegend TotalSeq A) used and to align ADT reads44,45. UMI counts were imported into R, CLR normalized and z-score scaled. For visualization, PCA was done on CLR-normalized and scaled variable ADTs followed by plate correction using Harmony and UMAP dimension reduction on harmonized PCs.
DNA reads were demultiplexed based on cellular barcodes. Then, amplicons were aligned to the amplicon reference using CRISPResso2 (v.2.2.9)46. Allele usage statistics were calculated from CRISPResso2 outputs. For all DNA blockyses, cells were filtered on a minimum number of reads (20–40) and minimum allele frequencies (12–25%) on a per-experiment basis. Assuming dizygosity, only the top two alleles after filtering were included in the blockysis. If after filtering only one allele was recovered, the cell was ***umed to be homozygous. Clustering of DNA editing in the DQB1 experiment (Fig. 2) was performed with supervised k-means clustering with an n of 3. For more information on processing amplicon sequencing data and genotype calling, see Supplementary Notes and Supplementary Figs. 6–23.
Linear modelling of gene and protein expression
Associations of gene and ADT expression to dosage and conditions were made using negative binomial generalized linear modelling. We modelled counts of genes or ADTs with covariates comparing null to full models. For blockyses in cell lines:
$$begin{array}{c}Xsim {rm{N}}{rm{e}}{rm{g}}{rm{a}}{rm{t}}{rm{i}}{rm{v}}{rm{e}},{rm{b}}{rm{i}}{rm{n}}{rm{o}}{rm{m}}{rm{i}}{rm{a}}{rm{l}}({rm{G}}{rm{e}}{rm{n}}{rm{e}}/{{rm{A}}{rm{D}}{rm{T}}}_{i})\ {rm{N}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}\ {rm{F}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}},{rm{g}}{rm{e}}{rm{n}}{rm{o}}{rm{t}}{rm{y}}{rm{p}}{rm{e}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{beta }_{{rm{g}}{rm{e}}{rm{n}}{rm{o}}{rm{t}}{rm{y}}{rm{p}}{rm{e}}}{{rm{G}}{rm{e}}{rm{n}}{rm{o}}{rm{y}}{rm{t}}{rm{p}}{rm{e}}}_{i}\ {rm{F}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}},{rm{c}}{rm{o}}{rm{n}}{rm{d}}{rm{i}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{beta }_{{rm{c}}{rm{o}}{rm{n}}{rm{d}}{rm{i}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}}{{rm{C}}{rm{o}}{rm{n}}{rm{d}}{rm{i}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}}_{i}end{array}$$
where gene/ADT counts is raw counts in each cell i, nCounts is total gene or ADT count in cell i, plate is plate identity, genotype is dosage at the edited nucleotide (0,1,2) and condition is culture condition (CRISPR or control). For the DQB1 editing experiment, we modelled gene and ADT counts:
$$begin{array}{c}Xsim {rm{N}}{rm{e}}{rm{g}}{rm{a}}{rm{t}}{rm{i}}{rm{v}}{rm{e}},{rm{b}}{rm{i}}{rm{n}}{rm{o}}{rm{m}}{rm{i}}{rm{a}}{rm{l}}({{rm{G}}{rm{e}}{rm{n}}{rm{e}}/{rm{A}}{rm{D}}{rm{T}}}_{i})\ {rm{N}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}\ {rm{F}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{log }_{2}({beta }_{{rm{D}}{rm{e}}{rm{l}}{rm{e}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}{rm{s}}{rm{i}}{rm{z}}{rm{e}}}{{rm{D}}{rm{e}}{rm{l}}{rm{e}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}{rm{s}}{rm{i}}{rm{z}}{rm{e}}}_{i}+1)end{array}$$
where deletion size was calculated per cell i. In all cell-line blockysis, genes were filtered for non-zero expression in greater than 30% of cells and a mean of 2 reads per cell. For the PTPRC editing experiments in primary human cells, we modelled:
$$begin{array}{c}Xsim {rm{N}}{rm{e}}{rm{g}}{rm{a}}{rm{t}}{rm{i}}{rm{v}}{rm{e}},{rm{b}}{rm{i}}{rm{n}}{rm{o}}{rm{m}}{rm{i}}{rm{a}}{rm{l}}({{rm{G}}{rm{e}}{rm{n}}{rm{e}}/{rm{A}}{rm{D}}{rm{T}}}_{i})\ {rm{N}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{beta }_{{rm{i}}{rm{n}}{rm{d}}{rm{i}}{rm{v}}{rm{i}}{rm{d}}{rm{u}}{rm{a}}{rm{l}}}{{rm{I}}{rm{n}}{rm{d}}{rm{i}}{rm{v}}{rm{i}}{rm{d}}{rm{u}}{rm{a}}{rm{l}}}_{i}\ {rm{F}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}},{rm{g}}{rm{e}}{rm{n}}{rm{o}}{rm{t}}{rm{y}}{rm{p}}{rm{e}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{beta }_{{rm{i}}{rm{n}}{rm{d}}{rm{i}}{rm{v}}{rm{i}}{rm{d}}{rm{u}}{rm{a}}{rm{l}}}{{rm{I}}{rm{n}}{rm{d}}{rm{i}}{rm{v}}{rm{i}}{rm{d}}{rm{u}}{rm{a}}{rm{l}}}_{i}+{beta }_{{rm{g}}{rm{e}}{rm{n}}{rm{o}}{rm{t}}{rm{y}}{rm{p}}{rm{e}}}{{rm{G}}{rm{e}}{rm{n}}{rm{o}}{rm{t}}{rm{y}}{rm{p}}{rm{e}}}_{i}\ {rm{F}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}},{rm{c}}{rm{o}}{rm{n}}{rm{d}}{rm{i}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{beta }_{{rm{i}}{rm{n}}{rm{d}}{rm{i}}{rm{v}}{rm{i}}{rm{d}}{rm{u}}{rm{a}}{rm{l}}}{{rm{I}}{rm{n}}{rm{d}}{rm{i}}{rm{v}}{rm{i}}{rm{d}}{rm{u}}{rm{a}}{rm{l}}}_{i}+{beta }_{{rm{c}}{rm{o}}{rm{n}}{rm{d}}{rm{i}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}}{{rm{C}}{rm{o}}{rm{n}}{rm{d}}{rm{i}}{rm{t}}{rm{i}}{rm{o}}{rm{n}}}_{i}end{array}$$
where individual was the individual identity in cell i. Genes were filtered for non-zero expression in greater than 15% of cells. For the IL2RA editing experiments, we modelled:
$$begin{array}{c}Xsim {rm{N}}{rm{e}}{rm{g}}{rm{a}}{rm{t}}{rm{i}}{rm{v}}{rm{e}},{rm{b}}{rm{i}}{rm{n}}{rm{o}}{rm{m}}{rm{i}}{rm{a}}{rm{l}}({{rm{G}}{rm{e}}{rm{n}}{rm{e}}/{rm{A}}{rm{D}}{rm{T}}}_{i})\ {rm{N}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+,({1|{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i})+({1|{rm{I}}{rm{n}}{rm{d}}{rm{i}}{rm{v}}{rm{i}}{rm{d}}{rm{u}}{rm{a}}{rm{l}}}_{i})\ {rm{F}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{rm{l}}{rm{n}}(X)=1+{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+({1|{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i})\ ,,,+(1|{{rm{I}}{rm{n}}{rm{d}}{rm{i}}{rm{v}}{rm{i}}{rm{d}}{rm{u}}{rm{a}}{rm{l}}}_{i})+{beta }_{{rm{g}}{rm{e}}{rm{n}}{rm{o}}{rm{t}}{rm{y}}{rm{p}}{rm{e}}}{{rm{G}}{rm{e}}{rm{n}}{rm{o}}{rm{t}}{rm{y}}{rm{p}}{rm{e}}}_{i}end{array}$$
Random effects were used for individual and plate in these models with multiple runs. The R package lme4 was used to model fixed effects using the glm.nb function, and the R package MASS was used to model random effects using the glmer.nb function.
To determine the significance of these models, we used an LRT comparing the null and full models and calculated a P-value for the test statistic against the chi-squared distribution with one degree of freedom. When indicated, we corrected for multiple hypothesis testing using the Benjamini–Hochberg correction (FDR).
To model the effect of PAX5 multiplexed editing on gene expression, we used negative binomial regression to model all eight edited variants jointly, which yielded a P-value for each variant and gene. We used the Bonferroni correction to account for multiple testing considering the number of variants and the number of tested genes, using a P-value cut-off equal to 0.05/(8 × the number of genes), which equals 0.05/(8 × 3,125) = 2 × 10−6. To further validate the effect of individual variants, we used a likelihood-ratio test in which the full model contains all eight edited variants and covariates and the null model is the full model minus the specific variant of interest. We report LRT P-values in the relevant figures.
To evaluate the calibration of effect sizes and P-values learnt from the linear model and LRT, we bootstrapped the CD45 experiment using 200 replications for each gene. We used the estimated β values from this approach to define the 95% confidence intervals and nominal significance (confidence interval not containing 0) for each gene.
PAX5 interaction blockysis
To test for interactions between our two edited regions in PAX5, regions A and B, we first summarized the effect of B.29 on genome-wide gene expression as a B.29 gene score: ({S}_{i}={sum }_{g}{beta }_{g}{E}_{i,g}) where Si is the gene score for cell i, βg is the effect size of B.29 on gene g in negative binomial regression and Ei,g is the expression of gene g in cell i. We then tested the interaction effect between B.29 and A edit outcome (defined as any non-synonymous edit in region A) on the B.29 gene score using a likelihood ratio test to compare the two models below:
$$begin{array}{l}{rm{N}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{S}_{i},=,{beta }_{0}{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}\ ,,,,,,+,{beta }_{A}{{rm{A}}{rm{e}}{rm{d}}{rm{i}}{rm{t}}{rm{e}}{rm{d}}}_{i}+{beta }_{B}B{.29}_{i}\ {rm{F}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:,{S}_{i},=,{beta }_{0}{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})\ ,+,{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{beta }_{A}{{rm{A}}{rm{e}}{rm{d}}{rm{i}}{rm{t}}{rm{e}}{rm{d}}}_{i}+{beta }_{B}B{.29}_{i}+{beta }_{AB}{{rm{A}}{rm{e}}{rm{d}}{rm{i}}{rm{t}}{rm{e}}{rm{d}}}_{i}B{.29}_{i}end{array}$$
To define individual genes with interaction effects, we tested the interaction between B.29 genotype and A editing outcome:
$$begin{array}{l}{rm{N}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{rm{l}}{rm{n}}(X)\ ,=,{beta }_{0}{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{beta }_{A}{{rm{A}}{rm{e}}{rm{d}}{rm{i}}{rm{t}}{rm{e}}{rm{d}}}_{i}+{beta }_{B}B{.29}_{i}\ {rm{F}}{rm{u}}{rm{l}}{rm{l}},{rm{m}}{rm{o}}{rm{d}}{rm{e}}{rm{l}}:{rm{l}}{rm{n}}(X)\ ,=,{beta }_{0}{log }_{10}({{rm{n}}{rm{C}}{rm{o}}{rm{u}}{rm{n}}{rm{t}}{rm{s}}}_{i})+{beta }_{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}{{rm{p}}{rm{l}}{rm{a}}{rm{t}}{rm{e}}}_{i}+{beta }_{A}{{rm{A}}{rm{e}}{rm{d}}{rm{i}}{rm{t}}{rm{e}}{rm{d}}}_{i}+{beta }_{B}B{.29}_{i}\ ,,+,{beta }_{AB}{{rm{A}}{rm{e}}{rm{d}}{rm{i}}{rm{t}}{rm{e}}{rm{d}}}_{i}B{.29}_{i}end{array}$$
In this blockysis, we only tested the genes that are significantly ***ociated with B.29 genotype in our original negative binomial regression models.
Mixscape blockysis
We applied Mixscape12 separately for each cell type to the FBXO11 and CD45 experiments. In each context, we ran Mixscape with three types of input to the model: experimental condition (CRISPR/control, genotyping unaware); dosage of 0 versus 1 or 2 (genotyping aware); and dosage of 0 or 1 versus 2 (genotyping aware). For processing the data for Mixscape, we followed the default RNA processing, including TP10K library size normalization, log transformation, variance stabilizing transformation selection of variable genes, variance scaling and centring of data, and PCA. We provided 40 PCs to Mixscape for calculation of the perturbation signature. To run Mixscape, we set the number of nearest neighbours for the blockysis to 10, defined a log fold change threshold of 0.05 for inclusion of differentially expressed genes, and required differential expression of 2 or more genes to predict perturbation status.
To identify differentially expressed genes between predicted knockout and non-perturbed predictions by Mixscape, we performed blockogous linear modelling of gene expression, as in our model ***ociating genotype dosage with expression. However, in the Mixscape-based approach, expression ***ociation with Mixscape-predicted status was performed as below.
For the PTPRC experiment:
$$begin{array}{c}X sim {rm{Negative}};{rm{binomial}}({{rm{Gene}}}_{i})\ {rm{Null}};{rm{model}}:{rm{ln}}(X)=1+{log }_{10}({{rm{nCounts}}}_{i})+{beta }_{{rm{plate}}}{{rm{plate}}}_{i}+{beta }_{{rm{individual}}}{{rm{Individual}}}_{i}\ {rm{Full}};{rm{model}};{rm{Mixscape}}:{rm{ln}}(X)=1+{log }_{10}({{rm{nCounts}}}_{i})+{beta }_{{rm{plate}}}{{rm{plate}}}_{i}+{beta }_{{rm{individual}}}{{rm{Individual}}}_{i}+{beta }_{{rm{Mixscape}}}{{rm{Mixscape}}}_{i}end{array}$$
For the FBXO11 experiment:
$$begin{array}{c}X sim {rm{Negative}};{rm{binomial}}({{rm{Gene}}}_{i})\ {rm{Null}};{rm{model}}:{rm{ln}}(X)=1+{log }_{10}({{rm{nCounts}}}_{i})+{beta }_{{rm{plate}}}{{rm{plate}}}_{i}\ {rm{Full}};{rm{model}};{rm{Mixscape}}:{rm{ln}}(X)=1+{log }_{10}({{rm{nCounts}}}_{i})+{beta }_{{rm{plate}}}{{rm{plate}}}_{i}+{beta }_{{rm{mixscape}}}{{rm{Mixscape}}}_{i}end{array}.$$
Demultiplexing donor
Samples were genotyped on an Illumina Multi-Ethnic Genotyping Array (MEGA) or Global Screening Array (GSA). Genotyping quality control, haplotype phasing and whole-genome imputation were done for MEGA as previously described47. Analogous steps were performed for GSA, with further genotyping quality control including PHWE > 1 × 10−10, sample relatedness (IBD) < 0.9 and sample heterozygosity rate > 0.217. All imputed variants were filtered on minor allele frequency > 0.01 and imputation accuracy (Rsq) > 0.9 in both datasets. Variants were further pruned to the intersection of filtered variants between the two genotyping arrays. Non-biallelic variants and variants in block chromosomes or non-exonic regions were also removed. Demuxlet (v.1.0) was run with default parameters to predict donors from genotypes. Cell donor identity was ***igned using the output column SNG.1ST, corresponding to the most likely singlet identity. Cells with ambiguous donor ***ignments were excluded.
Gene set enrichment blockysis
We performed gene set enrichment using fgsea on the results from the linear models predicting differentially expressed genes ***ociated with dosage in the CD45 experiment and polarization status in the IL2RA experiment. We included all genes in the enrichment blockysis. Gene ranks were defined according to the sign of the effect size of the ***ociated variable (for directionality) multiplied by −log10P. For the CD45 blockysis, we included the MSigDB immunological signatures (C7) pathways48 that were specific to CD4 T cells. We then filtered to the pathways that included between 15 and 500 genes. For the IL2RA blockysis, we defined a Treg and TH1 pathway by determining differentially expressed genes in bulk RNA-seq from a previously published source of in vitro polarized T cells37. To perform differential expression blockysis between Treg and TH1 samples, we ran DESeq2 with default parameters on the raw counts matrix. We added genes to the polarization pathways if they were nominally significant (P < 0.01) and positively ***ociated (fold change > 1) with the given polarization condition.
Base editor mRNAs
Base editor mRNAs were generated by in vitro transcription using the HiScribe T7 High-Yield RNA synthesis kit (NEB, E2040S) by the method described previously49. NEBnext polymerase was used to PCR amplify template plasmids and install a functional T7 promoter and a 120-nucleotide polyadenine tail. Transcription reactions were set up with complete substitution of uracil by N1-methylpseudouridine (Trilink BioTechnologies, N-1080) and co-transcriptional 5′ capping using the CleanCap AG blockogue (Trilink BioTechnologies, N-7113) to generate a 5′ Cap1 structure. mRNAs were purified using ethanol precipitation according to the kit’s instructions, dissolved in nuclease-free water and normalized to a concentration of 2 µg µl−1 using Nanodrop RNA quantification of diluted test samples.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.