Design a smart mutation library
Provide a wild-type sequence — get an optimized library of multi-mutant variants ranked by predicted fitness, codon-optimized for your expression host, and ready to order from your synthesis vendor.
Provide a sequence
FASTA · SnapGene · GenBank · raw DNA · raw protein. Auto-detected.
Multiple CDS features detected — pick the gene to evolve
The longest CDS in a plasmid is usually the antibiotic resistance gene. Pick your gene of interest.
Tune the search
Sensible defaults — change if you know what you're after.
Generate
Scoring runs locally on this machine. Nothing is uploaded.
Your variant library will appear here
Provide a sequence above and click Generate smart library. Results render as a sortable variant table with a mutation map, PCR primers, and one-click synthesis ordering.
Library
Mutation landscape
Where mutations land across the protein. Height = how many variants share a mutation at this position.
| Rank ↕ | Mutations ↕ | Fitness ↕ | GC % ↕ | Tm (°C) ↕ | bp ↕ |
|---|
Library
Every library you've generated locally · ~/.dee/output/
Loading your libraries…
History
Activity timeline · most recent first
Loading activity…
Documentation
How DEE works, what to trust, how to cite it
What this tool does
DEE designs a smart mutation library for an existing protein using ESM-2 zero-shot scoring. You provide a wild-type sequence; you get an optimized library of multi-mutant variants ranked by predicted evolutionary fitness, codon-optimized for your expression host, ready to order from any DNA synthesis vendor.
This is not de novo protein design. It's directed-evolution library generation: improving an existing protein along an existing axis (stability, expression, mild activity tuning).
Four-stage pipeline
- Parse & translate. FASTA · SnapGene · GenBank · EMBL · raw DNA · raw protein. Plasmid files surface a CDS picker; raw DNA with multiple stops triggers 6-frame ORF discovery.
- Zero-shot scoring. ESM-2 computes ΔLL = log P(mutant | x_WT) − log P(WT | x_WT) for every position × 19 substitutions (Meier et al. 2021 wild-type marginal scheme).
- Combinatorial search. Simulated annealing over the top-percentile pool of single-site mutations. Multiple restarts; cumulative ΣΔLL as fitness; stop-codon and duplicate-position penalties.
- Codon optimization. Reverse-translate with host codon-usage table. Synonymously scrub BsaI / BsmBI / NotI sites for Golden Gate compatibility.
Honest accuracy expectations
At 2–4 simultaneous mutations, expect 50–70% functional retention when screening. PLM-guided libraries are typically 5–50× more enriched in functional hits than random mutagenesis at equal screening cost.
| Mutations / variant | Approx. functional retention |
|---|---|
| 1–2 | 70–85% |
| 3–4 | 50–70% |
| 5–6 | 30–55% |
| 7–8 | 15–40% |
| 9+ | < 25% |
Stay shallow. Cap Max mutations / variant at 3–4 unless you have a structural reason. Screen, don't trust.
What ESM-2 can & can't see
- Sees: evolutionary plausibility from ~65 M UniRef50 sequences. Conservation patterns. Coevolution. Sequence context.
- Doesn't see: structure explicitly. Active-site residues with rare-but-essential roles. Epistatic incompatibilities between selected mutations. Membrane proteins, IDPs, multi-domain assemblies (weakest here).
What runs locally vs over the network
- Local: ESM-2 inference, simulated annealing, codon optimization, restriction-site scrubbing, primer design, CSV/Excel/GenBank export. Your sequences never leave this machine for these steps.
- Opt-in network: NCBI BLAST identification (sends your sequence to NCBI), AlphaFold-DB structure embed (fetches a public PDB by UniProt accession), synthesis-vendor redirects.
Filter syntax
In the variant filter input above the results table:
C49— every variant mutating residue 49W58L— only variants with exactly that substitutiongc>50,gc<60,tm>58,fitness>2,bp<800— numeric range filters
Citation
Lin et al. (2022). Evolutionary-scale prediction of atomic-level protein structure. Science 379:1123–1130.
Meier et al. (2021). Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst. 34.
Allawi & SantaLucia (1997). Thermodynamics and NMR of internal G·T mismatches in DNA. Biochemistry 36:10581–10594.
Licenses of bundled components
- ESM-2 weights — MIT (Meta/FAIR)
- Transformers, Accelerate — Apache 2.0 (Hugging Face)
- PyTorch — BSD-style (Meta)
- Biopython — BSD-derived
- Mol* viewer — MIT (PDBe / RCSB)
- Inter, JetBrains Mono — SIL Open Font License
- Lucide icons — ISC