Slide Number

Protein Design

Hagen Fritsch
October 31st, 2011

Rostlab Hauptseminar, WS 2011/12

Outline

Motivation: Why do protein design?

The sky is the limit

Problem introduction


Secondary structure prediction:
still unsolved
  • How then possibly compute complete 3D structures?
  • Solution: build upon prior knowledge (more later)

Image: topsan.org

Problem introduction
Complexity: enormous

  • Cryptography: 128-bit cyphers still secure
  • Searchspace: 2128 ≈ 1038 ≈ 2030

    ⇒ i.e. a protein consisting of 30 residues*
    (20 possible amino acids at each position)

We cannot even enumerate all possible 30-peptid proteins

* not considering different side-chain conformations

Problem introduction
Complexity: enormous

Computational feasible
Cryptographically secure
Theoretical searchspace for simple proteins
Theoretical searchspace for typical proteins

* assuming a small rotamer library with a total of 155 residue conformations

Problem introduction
Complexity: enormous

  • Cryptography: 128-bit cyphers still secure
  • Searchspace: 2128 ≈ 1038 ≈ 2030
  • Solutions?
    • Smart algorithms, non-deterministic tricks
      (DEE, Monte Carlo, Genetic Algorithms…)
    • Complexity reduction
      (rotamer libraries, place-dependent amino acids likelyhoods)

Rotamer libraries (since 1987)

  • seamingly infinite conformational possibilities for one side-chain
  • Fix: build libraries with most frequent observed conformations (or some estimate)

Rotamer libraries (2)

Inverse Folding Problem (IFP) (1983)

  • Folding problem still unsolved
  • Inverse Folding Problem is simpler
    Given a 3D-representation of a protein backbone, find a sequence, that folds to the given structure.

Inverse Folding Problem (IFP) (1983)

Given a 3D-representation of a protein backbone, find a sequence, that folds to the given structure.
  • still: same complexity
  • but: a lot of sequences fold to the same structure
  • Goal: find any! (not: the best)
  • fixed-backbone algorithms

Energy functions

  • quantitative need for measurement of protein stability
  • stability ~ melting point
  • stable proteins are most likely to actually fit the assumed fold (cf. IFP)
  • molecular mechanics potential energy functions (MM-PEF, c.f. Boas & Harbury, 2007)

Energy functions: Protein Design Cycle (Bassil et. al., 1996)

  • Aim: improve energy functions to increase the accuracy and efficiency in computational protein design
  • Protein Design Cycle:
    1. → …RLIEYHTQD…: predict protein sequences likely to achieve a desired fold
    2. peptid synthesis
    3. quantitative analysis (predicted vs. actual stability)
    4. $\displaystyle{ E_{new} = E_{old} + \sum_{i,j} E_{exp}(i, j)} refined energy functions

Energy functions: recent progress (Lippow & Tidor, 2007)

  • improve to increase the accuracy and efficiency in computational protein design
  • recently:
    • DNA-protein interactions (Morozov et. al., 2005)
    • metall center interactions (Spiegel et. al., 2006)
    • solvent and solvent-mediated effects (Marshall et. al., 2005) (including a pairwise approximation)
    • placement of individual water molecules (Jiang et. al., 2005) (rotamer approach)

Smart Algorithms: Monte Carlo Method (Metropolis, 1949)

  • based on repeated random sampling

applied to protein design:

  1. initially: pick rotamers for a sequence at random
  2. next: rotamer replacement at a randomly picked residue
    ⇒ new energy: Enew
  3. accept any move with lower energy
  4. accept other moves based on temperature (→ simulated annealing)
  5. continue with step 2

Based on Desjarlais & Clarke (1998) and Voigt et. al. (2000), Image: Wikipedia

Smart Algorithms: Monte Carlo: Simulated Annealing (Metropolis, 1953)

  • idea stolen from metallurgy: heating and controlled cooling increase size of crystals and reduce defects
  • heat: molecules can move freely (escape local minima)

  • slow cooling: more chances to find lower internal energy configurations
$\displaystyle{ ρ = e^{\frac{E_{new} - E_{old}}{kT}} }
with T being the (virtual) temperature

Images on the right: Wikipedia

Smart Algorithms: Dead End Elimination (Desmet et. al., 1992)

  • deterministic algorithm to find the
    Global Minimum Energy Configuration
  • requires a pairwise energy description
Idea:$\displaystyle{ E(i_r) + \sum_{j\neq i}^{N} \min_{s} E(i_r, j_s) > E(i_t) + \sum_{j\neq i}^{N} \max_{s} E(i_t, j_s) }$
  • eliminate one residue after another ($\displaystyle{ O(n^2) } each)
    (n: total number of rotamers)
  • can be extended for pairs ($\displaystyle{ O(n^3) } each)
  • still slow, but dramatical improvent to $\displaystyle{ O(p^n) }

An example
Create a new enzyme for a specific reaction

Steps to cover

  1. Find a working target fold
  2. Remove all residues (backbone-only)
  3. Place side-chains and ligand (TS)
  4. Fill the rest of the protein (according to IFP)
  5. Generate a couple of target designs for experimental classification

An example
1. Find a working target fold

  • Manually define the catalysis mechanism:
    typically: stabilize a transition state
    ⇒ model of functional groups at active site
  • Find supporting backbone conformations:
    Inverse rotamer tree example (Zanghellini et. al., 2006)
    • DEZYMER program (Hellinga & Richards, 1990)
    • RosettaMatch approach
    • Inverse rotamer tree approach (Zanghellini et. al., 2006)
  • Find/Build an according scaffold

Based on Zanghellini et. al. (2006) and Röthlisberger (2008)

Glimpse to the Future


  • Design new protein folds: TOP7 (Kuhlmann et. al., 2003)

  • enhanced energy functions (Boas & Harbury, 2007; cf. also Lippow & Tidor, 2007)

  • conformational change after ligand binding
    Image: Flynn et. al. (2001)

  • multistep reactions
    (Jiang et. al., 2008)
Comparison of a designed model and the actual crystal structure (Kuhlmann et. al., 2008)

Thank you!