Pythia editing
Precise, predictable genome integrations by deep-learning-assisted design of microhomology-based templates
Nat Biotechnol. 2025 Aug 12. doi: 10.1038/s41587-025-02771-0.
Thomas Naert, Taiyo Yamamoto, Shuting Han, Ruth Röck, Melanie Horn, Philipp Bethge, Nikita Vladimirov, Fabian F. Voigt, Joana Figueiro-Silva, Ruxandra Bachmann-Gagescu, Kris Vleminckx, Fritjof Helmchen & Soeren S. Lienkamp
Click here to view article at Nature Biotechnology.
Click here to view article on PubMed.
Click here to view article on Xenbase.
Summary
Precision CRISPR-Cas9-mediated genome engineering remains challenging, particularly for gene integration/editing in non-dividing cells. Naert et al. present Pythia, a deep learning solution forecasting optimal context-dependent repair templates enabling predictable, accurate genome editing in diverse cellular contexts, both in vivo (Xenopus and adult mouse brains) and in vitro.
Click the link to go to the Pythia Engineering Suite.
Abstract
Precise CRISPR-based DNA integration and editing remain challenging, largely because of insufficient control of the repair process. We find that repair at the genome-cargo interface is predictable by deep learning models and adheres to sequence-context-specific rules. On the basis of in silico predictions, we devised a strategy of base-pair tandem repeat repair arms matching microhomologies at double-strand breaks. These repeat homology arms promote frame-retentive cassette integration and reduce deletions both at the target site and within the transgene. We demonstrate precise integrations at 32 loci in HEK293T cells. Germline-transmissible transgene integration and endogenous protein tagging in Xenopus and adult mouse brains demonstrated precise integration during early embryonic cleavage and in nondividing, differentiated cells. Optimized repair arms also facilitated small edits for scarless single-nucleotide or double-nucleotide changes using oligonucleotide templates in vitro and in vivo. We provide the design tool Pythia to facilitate precise genomic integration and editing for experimental and therapeutic purposes for a wide range of target cell types and applications.
Fig. 1: Modeling predicted gene-editing outcomes using inDelphi while providing synthetic µHs. a, Predicted editing outcomes are shown using inDelphi (HEK293T) on synthetic DNA. Adding tandem repeats of the bases left of the CRISPR–Cas cut site to the right of the cut affected the predicted editing outcomes. Cumulative µH repair is defined as the percentage of editing outcomes that mobilize (delete) synthetic µHs during repair. Iterative recutting of products is not computationally modeled. b, Modeling of expected editing outcomes across 250,000 distinct gRNAs target sites across human Chr1, when adding the 3 bp flanking the left site of the CRISPR–Cas cut site either as a single repeat (1×) or as tandem repeats (2×–8×). The percentage of repair by µH usage is shown. Box plots show the median, interquartile range (IQR) and whiskers extending to 1.5× the IQR with n = 250,000. c, Heat map highlighting the expected percentage of repair by µH as a function of the length of µH and the number of tandem repeats for 25 gRNAs, demonstrating that there is a sequence-context-specific optimal solution for maximizing the percentage of µH repair outcomes. d, Schematic of the experimental setup: PaqCI digestion releases the linear dsDNA donor, which contains 5× 3-bp µH tandem repeat arms, and is codelivered with RNP targeting AAVS1. e, Sequence of the target locus and 3-bp µH tandem repeat repair arms. f, After 14 days, flow cytometry indicates an increase in stable integration in cells transfected with the linear dsDNA template. g, Integration occurs specifically with PaqCI-linearized templates; circular templates show no detectable on-target integration. h, Quantification of integration efficiency of AAVS1 gRNA compared to a negative control gRNA. Statistical analysis was performed using an unpaired two-tailed t-test; P = 0.021 (n = 3 independent biological replicates). Error bars represent the s.d. i,j, The InDelphi HEK293T model accurately predicts the observed frequency of distinct editing outcomes in the µH tandem repeat arms at both junctions. Data points are the means of three independent biological replicates. A two-sided Pearson correlation was applied (i, r = 0.815, P = 0.00022; j, r = 0.969, P = 1.10 × 10−8). No multiple comparisons were performed. Some schematics were created with BioRender.com.
Fig. 3: µH tandem repeat-mediated integration at stable landing site h11 in X. tropicalis with germline transmission. a, Schematic of the CRISPR–Cas integration strategy. b, Mosaic but stable GFP expression after 5× 3-bp µH tandem repeat-mediated integration of a pCMV:eGFP in F0 founders at various developmental stages. c, Detection of PCR products demonstrating on-target integration into the h11 locus. d, Schematic of the CRISPR–Cas integration strategy, using only the h11-α RNP. e, Unilateral nonmosaic GFP expression in F0 founders because of pCMV:eGFP integration into the h11 locus at the two-cell stage (half-transgenic embryos). f, Nonintegrative mosaic expression pattern in muscle cells. Junction PCR analysis shows that this represents merely transient expression as correct junction products can only be detected in half-transgenic animals shown in e. g, Sequencing of junction products reveals usage of µH tandem repeats in 60% of reads (n = 5). h, Tissue-restricted expression pattern of pax8-CNS1:eGFP knocked in at the h11-α and h11-β loci in the F0 generation by five µH tandem repeats is observed in 7% of the injected embryos (n = 133). i, Benchtop mesoSPIM whole-organ imaging of a kidney from an adult F0 pax8-CNS1:eGFP founder, confirming stable integration and expression in renal tubules amenable for U-Net-based segmentation. j, Reporter expression in the embryonic kidneys of the F2 generation. k, Tissue-restricted expression pattern of CarAct:dsRed knocked in at the h11-α locus in the F0 generation by eight µH tandem repeats is observed in 8.6% of injected embryos (n = 35). l, Benchtop mesoSPIM imaging of F1 and F2 CarAct:dsRed knock-in animals revealing stable and strong tissue-restricted transgene expression.
Fig. 4: Endogenous fluorescent protein tagging in X. tropicalis. a, Schematic representation of the repair templates for endogenous gene tagging. Coding sequences linked with GSG linkers. b, Unilateral (Myh9 and Acta2) and bilateral (Ncam1) mBaoJin expression in F0 animals because of endogenous gene tagging. Scale bars, 500 μm. c, Imaging of tagged Myh9 in a living stage 45 tadpole. Top left, kidney tubules with a luminal Myh9 layer (*tubular lumen) and Myh9 signal in intertubular fibroblasts. Top right, epidermal cells showcasing the role of Myh9 in cell–cell adhesions. Bottom right, live imaging of actomyosin dynamics within cell–cell boundaries. Scale bars, 10 μm (top) and 5 μm (bottom). d, Imaging of tagged Acta2 in a living stage 45 tadpole. Left, overview showing fluorescence signal in intestinal smooth muscle cells (SMCs), vascular SMCs, heart muscle and skeletal muscle. Line-scanning artifacts in heart muscle because of heartbeat during acquisition. Gamma correction of 0.2 because of strong signal from intestinal SMCs. Top right, vascular SMCs wrapping around developing blood vessels. Bottom right, actomyosin network of the two perpendicular layers of intestinal SMCs. Scale bars, 250 μm (left) and 25 μm (right). e, Imaging of tagged Ncam1 in a living stage 45 tadpole. Expression of Ncam1 in the central and peripheral nervous system. Bottom right, spinal cord with branching motor and sensory neurons. Scale bars, 200 μm (top and bottom left) and 50 μm (bottom right). f, mBaoJin signal (cyan), immunofluorescence staining (red) and overlay in stage 45 fixed tadpoles. Top, intracellular Myh9 network in the epidermis. Middle top, intestinal SMCs in a unilaterally transgenic tadpole. Bilateral origin of SMCs leading to mosaic expression of labeled Acta2. Middle bottom, Striated skeletal muscle. Bottom, tail motor neuron. Scale bars, 10 μm. g, Repair outcomes of genome–cassette boundaries. h, Western blots detecting tagged endogenous protein.
Adapted with permission from Springer Nature on behalf of Nature Biotechnology: Naert al. (2025). Precise, predictable genome integrations by deep-learning-assisted design of microhomology-based templates. Nat Biotechnol 2025 Aug 12; doi: 10.1038/s41587-025-02771-0.
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
Last Updated: 2025-09-10