Naert T et al. (2025), Precise, predictable genome integrations by dee...

XB-ART-61465

Nat Biotechnol 2025 Aug 12; doi: 10.1038/s41587-025-02771-0.

Show Gene links Show Anatomy links

Precise, predictable genome integrations by deep-learning-assisted design of microhomology-based templates.

Naert T , Yamamoto T , Han S , Röck R , Horn M , Bethge P , Vladimirov N , Voigt FF , Figueiro-Silva J , Bachmann-Gagescu R , Vleminckx K , Helmchen F , Lienkamp SS .

???displayArticle.abstract???
Precise CRISPR-based DNA integration and editing remain challenging, largely because of insufficient control of the repair process. We find that repair at the genome-cargo interface is predictable by deep learning models and adheres to sequence-context-specific rules. On the basis of in silico predictions, we devised a strategy of base-pair tandem repeat repair arms matching microhomologies at double-strand breaks. These repeat homology arms promote frame-retentive cassette integration and reduce deletions both at the target site and within the transgene. We demonstrate precise integrations at 32 loci in HEK293T cells. Germline-transmissible transgene integration and endogenous protein tagging in Xenopus and adult mouse brains demonstrated precise integration during early embryonic cleavage and in nondividing, differentiated cells. Optimized repair arms also facilitated small edits for scarless single-nucleotide or double-nucleotide changes using oligonucleotide templates in vitro and in vivo. We provide the design tool Pythia to facilitate precise genomic integration and editing for experimental and therapeutic purposes for a wide range of target cell types and applications.

???displayArticle.pubmedLink??? 40796977
???displayArticle.link??? Nat Biotechnol
???displayArticle.grants??? [+]

???attribute.lit??? ???displayArticles.show???

	Fig. 1: Modeling predicted gene-editing outcomes using inDelphi while providing synthetic µHs. a, Predicted editing outcomes are shown using inDelphi (HEK293T) on synthetic DNA. Adding tandem repeats of the bases left of the CRISPR–Cas cut site to the right of the cut affected the predicted editing outcomes. Cumulative µH repair is defined as the percentage of editing outcomes that mobilize (delete) synthetic µHs during repair. Iterative recutting of products is not computationally modeled. b, Modeling of expected editing outcomes across 250,000 distinct gRNAs target sites across human Chr1, when adding the 3 bp flanking the left site of the CRISPR–Cas cut site either as a single repeat (1×) or as tandem repeats (2×–8×). The percentage of repair by µH usage is shown. Box plots show the median, interquartile range (IQR) and whiskers extending to 1.5× the IQR with n = 250,000. c, Heat map highlighting the expected percentage of repair by µH as a function of the length of µH and the number of tandem repeats for 25 gRNAs, demonstrating that there is a sequence-context-specific optimal solution for maximizing the percentage of µH repair outcomes. d, Schematic of the experimental setup: PaqCI digestion releases the linear dsDNA donor, which contains 5× 3-bp µH tandem repeat arms, and is codelivered with RNP targeting AAVS1. e, Sequence of the target locus and 3-bp µH tandem repeat repair arms. f, After 14 days, flow cytometry indicates an increase in stable integration in cells transfected with the linear dsDNA template. g, Integration occurs specifically with PaqCI-linearized templates; circular templates show no detectable on-target integration. h, Quantification of integration efficiency of AAVS1 gRNA compared to a negative control gRNA. Statistical analysis was performed using an unpaired two-tailed t-test; P = 0.021 (n = 3 independent biological replicates). Error bars represent the s.d. i,j, The InDelphi HEK293T model accurately predicts the observed frequency of distinct editing outcomes in the µH tandem repeat arms at both junctions. Data points are the means of three independent biological replicates. A two-sided Pearson correlation was applied (i, r = 0.815, P = 0.00022; j, r = 0.969, P = 1.10 × 10−8). No multiple comparisons were performed. Some schematics were created with BioRender.com.
	Fig. 2: µH tandem repeat repair arms protect the genome and the relationship between integration efficiencies and local sequence context. a, Schematic representation of the experimental setup comparing NHEJ integration (no repair arms) to 4× 3-bp µH tandem repeat repair arms using PaqMan plasmids. b, Comparison of µH tandem repeat-mediated and NHEJ integration efficiencies (n = 2 independent biological repeats). c, Visualization of genome-editing outcomes on both genome–transgene junctions showing the percentage of reads that trimmed the genome (1), the percentage of reads that trimmed the cassette (2) and specific editing outcomes of reads that trimmed neither the genome nor the cassette (3). d, Quantification of genome-editing outcomes on both genome–transgene junctions demonstrating that NHEJ leads to extensive trimming, while 4× 3-bp µH tandem repeat arms protect both the genome and the transgene cassette. e, In the absence of exogenous DNA, in silico modeling predicts that the nucleotide at position −4 will influence the percentage of repair outcomes that is expected to be driven by MMEJ (total n = 10,813,171; plotted random subselection of 500,000 data points). f,g, The 32 gRNAs designed to target coding exons of nonessential genes with four in each of eight classes covering all possible permutations of strong (G or C) and weak (A or T) bases at 3 bp left of the DSB. Each class was composed of four gRNAs binned across the inDelphi-predicted percentage of repair by MMEJ and had similar expected on-target efficiencies (CRISPRScan scores). h, For each gRNA, a distinct dsDNA repair template was generated with 5× 3-bp µH tandem repeat repair arms matching the gRNA-specific context left of the DSB and 5× 3-bp µH tandem repeat repair arms matching the AGG right of the DSB. These were delivered with nontargeting control RNP (top) or gene-targeting RNP (bottom) to HEK293T cells. Each data point represents an independent biological replicate. i, Integration efficiencies at day 14, determined by flow cytometric quantification of GFP⁺ cells. Statistical analysis was performed using a Mann–Whitney test (two-tailed, exact, P = 6.23 × 10⁻⁷, n = 32). j, Quantification of on-target integration efficiencies comparing the presence of a strong or weak base at position −4, just left of the DSB. Statistical analysis was performed using a Mann–Whitney test (two-tailed, exact, P = 0.0211, n = 16. k, On-target integration efficiencies by base identity at position −4, with guanine showing the highest. Each point represents the mean of three biological replicates. Sample sizes: T, n = 7; A, n = 9; C, n = 11; G, n = 5. Statistical analysis was performed using a Kruskal–Wallis test (P = 0.0445) with Dunn’s post hoc test (two-sided, corrected for six comparisons); T versus G, adjusted P = 0.0397. l, inDelphi modeling of the junction product between the sequence left of the DSB and the dsDNA donor. A higher percentage of predicted editing outcomes that have a +1 insertion will result in a lower on-target integration efficiency. Samples were grouped on the basis of the predicted percentage repair with +1 insertion (>25% and <25%). Statistical analysis was performed using a Mann–Whitney test (two-tailed, exact, P = 0.0092, U = 54, n₁ = 12, n₂ = 20). In i–l, error flags represent the s.d.; the center is the mean and each data point represents the mean of three independent biological replicates. m, NGS of left (5′) junction product and the percentage of reads containing genomic deletions or cassette deletions or neither genomic nor cassette deletions (n = 16 genes, each analyzed by sequencing after equimolar pooling of DNA from three independent biological replicates). Box plots show the median, IQR (box) and whiskers extending to 1.5× the IQR. Some schematics were created with BioRender.com.
	Fig. 3: µH tandem repeat-mediated integration at stable landing site h11 in X. tropicalis with germline transmission. a, Schematic of the CRISPR–Cas integration strategy. b, Mosaic but stable GFP expression after 5× 3-bp µH tandem repeat-mediated integration of a pCMV:eGFP in F0 founders at various developmental stages. c, Detection of PCR products demonstrating on-target integration into the h11 locus. d, Schematic of the CRISPR–Cas integration strategy, using only the h11-α RNP. e, Unilateral nonmosaic GFP expression in F0 founders because of pCMV:eGFP integration into the h11 locus at the two-cell stage (half-transgenic embryos). f, Nonintegrative mosaic expression pattern in muscle cells. Junction PCR analysis shows that this represents merely transient expression as correct junction products can only be detected in half-transgenic animals shown in e. g, Sequencing of junction products reveals usage of µH tandem repeats in 60% of reads (n = 5). h, Tissue-restricted expression pattern of pax8-CNS1:eGFP knocked in at the h11-α and h11-β loci in the F0 generation by five µH tandem repeats is observed in 7% of the injected embryos (n = 133). i, Benchtop mesoSPIM whole-organ imaging of a kidney from an adult F0 pax8-CNS1:eGFP founder, confirming stable integration and expression in renal tubules amenable for U-Net-based segmentation. j, Reporter expression in the embryonic kidneys of the F2 generation. k, Tissue-restricted expression pattern of CarAct:dsRed knocked in at the h11-α locus in the F0 generation by eight µH tandem repeats is observed in 8.6% of injected embryos (n = 35). l, Benchtop mesoSPIM imaging of F1 and F2 CarAct:dsRed knock-in animals revealing stable and strong tissue-restricted transgene expression.
	Fig. 4: Endogenous fluorescent protein tagging in X. tropicalis. a, Schematic representation of the repair templates for endogenous gene tagging. Coding sequences linked with GSG linkers. b, Unilateral (Myh9 and Acta2) and bilateral (Ncam1) mBaoJin expression in F0 animals because of endogenous gene tagging. Scale bars, 500 μm. c, Imaging of tagged Myh9 in a living stage 45 tadpole. Top left, kidney tubules with a luminal Myh9 layer (*tubular lumen) and Myh9 signal in intertubular fibroblasts. Top right, epidermal cells showcasing the role of Myh9 in cell–cell adhesions. Bottom right, live imaging of actomyosin dynamics within cell–cell boundaries. Scale bars, 10 μm (top) and 5 μm (bottom). d, Imaging of tagged Acta2 in a living stage 45 tadpole. Left, overview showing fluorescence signal in intestinal smooth muscle cells (SMCs), vascular SMCs, heart muscle and skeletal muscle. Line-scanning artifacts in heart muscle because of heartbeat during acquisition. Gamma correction of 0.2 because of strong signal from intestinal SMCs. Top right, vascular SMCs wrapping around developing blood vessels. Bottom right, actomyosin network of the two perpendicular layers of intestinal SMCs. Scale bars, 250 μm (left) and 25 μm (right). e, Imaging of tagged Ncam1 in a living stage 45 tadpole. Expression of Ncam1 in the central and peripheral nervous system. Bottom right, spinal cord with branching motor and sensory neurons. Scale bars, 200 μm (top and bottom left) and 50 μm (bottom right). f, mBaoJin signal (cyan), immunofluorescence staining (red) and overlay in stage 45 fixed tadpoles. Top, intracellular Myh9 network in the epidermis. Middle top, intestinal SMCs in a unilaterally transgenic tadpole. Bilateral origin of SMCs leading to mosaic expression of labeled Acta2. Middle bottom, Striated skeletal muscle. Bottom, tail motor neuron. Scale bars, 10 μm. g, Repair outcomes of genome–cassette boundaries. h, Western blots detecting tagged endogenous protein.
	Fig. 5: Endogenous fluorescence tagging of Tubb2a in vivo in adult mouse brains by µH tandem repeat-mediated integration. a, Schematic of AAV constructs for targeted eGFP knock-in at the 3′ CDS of Tubb2a. b, Schematic of the experimental setup and subsequent analysis. c, Histology of brain tissue and immunofluorescence detects eGFP-tagged Tubb2a in individual neurons. d, Benchtop mesoSPIM light-sheet imaging of wildDisco-cleared whole mouse brain shows eGFP-tagged Tubb2a in cortical and hippocampal neurons. e, Representative widefield immunofluorescence images showing GFP and Tubb2a expression in neurons. f, Western blot analysis comparing GFP immunoprecipitation from brains infected with either AAV2 alone, codelivered AAV1 and AAV2 or a control virus constitutively expressing GFP under the control of a CMV promoter. g, Sequence of the targeted Tubb2a locus (gRNA underlined, PAM in bold), the repair template and possible NHEJ and µH tandem repeat-mediated editing outcomes. h, Summary of integration outcomes using NGS reads spanning Tubb2a–eGFP amplified from two mouse hemispheres. i, Frequency of in-frame reads of Tubb2a–eGFP detecting either NHEJ or µH tandem repeat-mediated integration outcomes as defined in g. Each data point represents a single, independently injected brain hemisphere zone from the same mouse. Some schematics were created with BioRender.com.
	Fig. 6: Pythia editing, leveraging predictability to create small point mutations in vitro and in vivo in X. tropicalis. a, eGFP-to-eBFP conversion can be achieved by establishing two point mutations. Schematic representation of Pythia, a bioinformatics pipeline, deploying the inDelphi model to calculate expected editing outcomes on both junctions, which yields a combined Pythia score defined as the binomial co-occurrence of the intended edit. Right, the Pythia scores for different repair arm lengths is depicted as a Pythia matrix. b, Strategy for converting eGFP into eBFP using an 18-bp-long ssODN designed by Pythia (homologous sequences underlined). c, Experimental setup for determining eGFP-to-eBFP conversion efficiencies using three different gRNAs, with 30 distinct ssODN repair templates binned across deciles of Pythia scores. d, Scatter plot showing a direct correlation between Pythia scores and fluorescence conversion, across all three tested gRNAs (Spearman’s two-tailed, exact, P = 3.66 × 10−15, ρ = 0.774, n = 90). Comparison of conversion rates between ssODN repair templates with a predicted Pythia score of below and above 30. Samples were grouped on the basis of the predicted percentage repair: <30%, n = 45; >30%, n = 45. Statistical analysis was conducted using a Mann–Whitney test (two-tailed, exact, P = 3.20 × 10−12, U = 148.5, n₁ = 45, n₂ = 45). Box plots show the median, IQR (box) and whiskers extending to 1.5× the IQR. e, The distance between the induced DSB and the site of the intended point mutation influences the median percentage of gene conversion. Statistical analysis was conducted using a one-way two-sided analysis of variance (P < 0.01). Sample sizes: gRNA1, n = 12; gRNA2, n = 14; gRNA3, n = 9. Error bars represent the s.d. In d,e, each data point represents the mean of three independent biological replicates. f, Modeling of potential Pythia editing outcomes for 35 gRNAs targeting the X. tropicalis tyr gene. From top to bottom, the average Pythia score for converting a base to one of the other three bases is shown, plotted first for each destination nucleotide at each position and below for each original nucleotide at each position. Scatter plot of maximum Pythia scores for optimal ssODN design at each position and the length of optimal ssODN (n = 75; each data point represents one in silico simulation). Box plots show the median, IQR (box) and whiskers extending to 1.5× the IQR. g, At-scale modeling of Pythia editing for restoring human RPE65 pathogenic missense variants annotated in ClinVar to restore the wild-type amino acid. For each variant and the closest gRNA, the maximal achievable Pythia score (top) and the length of the optimal repair ssODN repair template (bottom) are shown. h, Strategy for establishing two silent point mutations in the X. tropicalis tyr gene, using an RNP and a 41-bp ssODN repair template as designed using the Pythia pipeline. i, Schematic of experimental design to detect and quantify successful editing events. j, Evidence of gene editing by restriction digest. k, Quantification of NGS amplicon read analysis. Each data point represents one unique embryo that was individually sequenced (n = 4). Error bars represent the s.d. l, Embryonic survival rates after injection with RNP and 1 µM ssODN template. Increased template length significantly correlates with increased lethality (Pearson’s r = 0.9440, one-sided P = 0.0280). m, Survival rates at a fixed nucleotide concentration. No significant correlation between molarity and lethality (Pearson’s r = 0.6047, one-sided P = 0.0752). n, Predicted repair outcomes (blue) versus sequencing results (green). Left, increasing Pythia score leads to higher perfect repair outcomes in all but one site. Some schematics were created with BioRender.com.