Post Job Free

Resume

Sign in

Design Water

Location:
San Diego, CA
Posted:
November 20, 2012

Contact this candidate

Resume:

J Comput Aided Mol Des (****) **:***-***

DOI **.*007/s10822-007-9159-2

An improved relaxed complex scheme for receptor exibility

in computer-aided drug design

Rommie E. Amaro Riccardo Baron

J. Andrew McCammon

Received: 12 July 2007 / Accepted: 21 November 2007 / Published online: 15 January 2008

Springer Science+Business Media B.V. 2008

Abstract The interactions among associating (macro)mol- to virtual screening, more rigorous characterization of local

ecules are dynamic, which adds to the complexity of and global binding effects, and methods to improve its

molecular recognition. While ligand exibility is well computational ef ciency by reducing the receptor ensem-

accounted for in computational drug design, the effective ble to a representative set of con gurations. The choice of

inclusion of receptor exibility remains an important receptor ensemble, its in uence on the predictive power of

challenge. The relaxed complex scheme (RCS) is a RCS, and the current limitations for an accurate treatment

promising computational methodology that combines the of the solvent contributions are also brie y discussed.

advantages of docking algorithms with dynamic structural Finally, we outline potential methodological improvements

information provided by molecular dynamics (MD) simu- that we anticipate will assist future development.

lations, therefore explicitly accounting for the exibility

Keywords Clustering Docking

of both the receptor and the docked ligands. Here, we

Ensemble-based docking

brie y review the RCS and discuss new extensions and

Kinetoplastid RNA editing ligase 1 Molecular dynamics

improvements of this methodology in the context of ligand

Non-redundant ensemble Protein ligand binding

binding to two example targets: kinetoplastid RNA editing

Relaxed complex method Representative ensemble

ligase 1 and the W191G cavity mutant of cytochrome

Virtual screening W191G cytochrome c peroxidase

c peroxidase. The RCS improvements include its extension

Rommie E. Amaro and Riccardo Baron contributed equally to this

Abbreviations

work.

GA Genetic algorithm

R. E. Amaro R. Baron J. A. McCammon KREL1 Kinetoplastid RNA editing ligase 1

Department of Chemistry and Biochemistry, University of MD Molecular dynamics

California at San Diego, La Jolla, CA 92093-0365, USA

RCS Relaxed complex scheme

e-mail: abpjoj@r.postjobfree.com

RMSD Root-mean-square deviation

R. Baron

W191G W191G cavity mutant of cytochrome

e-mail: abpjoj@r.postjobfree.com

c peroxidase

R. E. Amaro R. Baron J. A. McCammon

Center for Theoretical Biological Physics, University of

California at San Diego, La Jolla, CA 92039-0365, USA Introduction

J. A. McCammon

A full understanding of molecular recognition presents a

Department of Pharmacology, University of California at San

problem of intense interest to the eld of computer-aided

Diego, La Jolla, CA 92093-0365, USA

drug design and molecular sciences in general. The inter-

J. A. McCammon

actions between ligand molecules and their corresponding

Howard Hughes Medical Institute, University of California at

receptors are dynamic and complex. Techniques that best

San Diego, La Jolla, CA 92093-0365, USA

123

694 J Comput Aided Mol Des (2008) 22:693 705

Fig. 1 The problem: how to

distill a few good binders and

characterize their binding

propensity out of a vast database

of compounds

address these issues must account for the conformational Hybrid methods, which are faster but more approximate,

exibility of both the ligand and the receptor and do so in have been developed with the aim of reducing a large

initial set of potential binders (*103 or more), to a reduced

an accurate and ef cient manner. While the ability to

set of promising molecules (*101), for which the binding

explore ligand exibility is well established, computer-

aided drug design methodologies have only recently begun properties can then be investigated in a second phase that

to take receptor exibility into account when searching for uses more rigorous methods to predict binding free ener-

and optimizing functional inhibitors. Since it is widely gies (Fig. 1). Examples of these hybrid methods are linear

accepted that ligands may bind to receptor conformations interaction energy (LIE) [37 39], single-step perturbation

that occur infrequently in the receptor s dynamics, and that [40 45], docking to MD structures [44, 46], docking to

the local motions of active site residues can drastically alter relevant normal modes structures [47 49], induced- t

the binding and speci city of ligands to their target, the docking (IFD) [50], the dynamic pharmacophore model

ability to ef ciently sample these rare dynamics and fur- [51, 52] and the relaxed complex scheme (RCS) [53 55].

thermore, to incorporate the resulting conformations into Alternatively, the receptor ensemble can be gathered from

the drug design protocol, remains an important challenge a collection of independent X-ray or NMR experimentally

(reviewed in references [1 5]). derived structures [56, 57]. These hybrid approaches

A closely related challenge is the development of effec- encompass different levels of accuracy, predictive power,

tive methods to predict the binding propensity for series and level of a priori knowledge required. For example, LIE

of compounds or exible peptides to a given receptor [6 11]. and single-step perturbation can be considered non-

Approaches based on scoring functions or compound empirical in that they are derived from free energy type

libraries require large amounts of data to be available a approaches and based on more complex physical models of

priori. These methods include computational virtual the binding process. Yet, in practice they need precise

screening [12 18], docking [18 23], and similarity search- information on the location of the binding site to be

ing [24, 25]. More advanced treatments of receptor ligand available a priori. Conversely, docking-based techniques

binding can be achieved using molecular dynamics (MD) are appealing for screening purposes because they do not

simulations. Free energy changes can then be estimated necessarily require any information on the location of the

based on coupling-parameter approaches, such as thermo- binding site, and can therefore be employed to predict

dynamic integration (TI) and free energy perturbation (FEP) binding site locations. In principle, these docking-based

[11, 26 36], which describe a higher physical complexity of approaches should also be more easily extendable to bio-

the binding process and include an extensive sampling of logically relevant systems with increasing size and

receptor, ligand, and solvent phase spaces. Importantly, the complexity, such as protein nucleotide and protein protein

latter methods provide not only binding free energy esti- association. However, they cannot supply accurate esti-

mates, but also a reliable measure of their accuracy. Yet, mates of free energy changes upon binding.

they are typically too computationally expensive to be The RCS is a promising computational methodology

applied to extensive sets (*105) of compounds, which is the that combines the advantages of docking algorithms

usual scenario for newly discovered biological targets. with dynamic structural information provided by MD

123

J Comput Aided Mol Des (2008) 22:693 705 695

Fig. 2 An overview of the RCS. Improvements to the RCS are shown available ligands can be found in the Zinc Is Not Commercial

in gray background and those speci cally presented in this paper are (ZINC), National Cancer Institute (NCI), and Available Chemicals

outlined in red. In the Receptor Ensemble box (top left), the Database (ACD), among others. AutoDock is then used to dock the

structures can be generated with classical MD, or a variety of simula- ligand database into the receptor ensemble. In the Post-Processing

tion techniques could be considered in order to enhance the sampling stage, the docked complexes can be rescored or reevaluated using

of the receptor con gurational space, including: Generalized-Born more rigorous protocols than the AutoDock version 4.0 scoring

MD (GB-MD), steered MD (SMD), high temperature MD (High T function (AD4), including molecular-mechanics Poisson Boltzmann

MD), targeted MD (TMD), and accelerated MD (Accl. MD). In the surface area (MM-PBSA), single step perturbation, LIE, and FEP or

Ligand Ensemble box (top right), commercially or publicly TI techniques

simulations, explicitly accounting for the exibility of Materials and methods

both the receptor and docked ligands. This procedure is

Relaxed complex scheme: short overview

appealing as a large variety of conformational changes

may characterize ligand binding processes of biochemical

In the typical RCS (Fig. 2), all-atom MD simulations are

and medical interest and, more generally, molecular rec-

carried out for the target biomolecule of interest, with a

ognition. The RCS has been developed in combination

substrate or inhibitor bound in the active site, starting from

with various MD software packages and AutoDock for the

the crystal structure with a bound ligand (i.e., the holo

ligand docking. Although other docking programs can be

complex). Typical simulation lengths range from 2 ns to

considered, all RCS applications to date have employed

tens of ns, and snapshots of the biomolecule are extracted

AutoDock, a widely distributed and tested docking pro-

at a predetermined time interval (e.g., every 10 ps). RCS

gram that has been shown to be successful in a variety of

calculations based on explicit solvent MD simulations of

docking studies [20 22]. The RCS was rst applied to the

two different systems are presented in this work. First, the

FKBP binding protein [53] and tested using improved re-

scoring functions based on MM-PBSA models [54]. kinetoplastid RNA editing ligase 1 (KREL1), which uses

the NAMD2.6 MD software [58] (freely available at http://

Applications of the RCS identi ed a novel-binding trench

www.ks.uiuc.edu/Research/namd/) with the Charmm27 force

in HIV integrase [55]. In this work, we sketch the phi-

losophy underlying the RCS, describe new improvements, eld [59]. Second, the W191G cavity mutant of cytochrome c

peroxidase (W191G), based on simulations performed with

and present recent applications to exemplify the type of

the GROMOS05 software for biomolecular simulation [60]

problems that can be tackled with this computational

scheme. (available at http://www.igc.ethz.ch/gromos/) using the 45A4

123

696 J Comput Aided Mol Des (2008) 22:693 705

the predicted binding energies, they can be prohibitively

parameter set [61] of the GROMOS force eld [62]. Details

computationally expensive.

of the MD for each system are described in References [63]

and [64], respectively. The resulting set of structures, gen-

erated with a physically based MD force eld, represents the

Improved relaxed complex scheme

receptor ensemble and can be conceptually thought of as a set

of structures de ning approximately its thermodynamic

A rst set of improvements involves the docking algorithm

equilibrium state in solution. This receptor ensemble is sub-

itself as implemented in AutoDock version 4.0: (i) a more

sequently used in the docking experiments, in which a

complete thermodynamic cycle, where the unbound (gas

reduced set of small molecules are docked into the active site

phase) ligand enthalpy is computed, (ii) an improved des-

and the corresponding binding af nities are evaluated.

olvation term that accounts for a larger number of atom

AutoDock is used to carry out the docking experiments

types than in the previous versions, and (iii) a charge model

and full ligand exibility is employed. One of the major

that allows fast calculation of the charge distribution [67]

advantages of AutoDock is its use of a hybrid genetic algo-

and compatibility of partial charges between the ligand and

rithm (GA) to perform an ef cient and effective global

the receptor structures [66]. The studies presented here

search for the ligand [65]. Genetic algorithms are optimi-

employed this new and improved version of AutoDock

zation schemes that use the language of natural genetics and

(freely available at http://www.autodock.scripps.edu/).

evolution, and in the case of AutoDock, the optimization

A second set of improvements involves the RCS meth-

problem is molecular docking between a ligand and a

odology itself, as described in the following, based on

receptor. Typically, the receptor is xed and the translation,

recent applications. The rst extension to the RCS we

orientation, and conformation of the ligand are explored.

present here is the application of the method for virtual

Genetically derived terms such as the AutoDock chromo-

screening, which involves an essential enzyme for the

some, which describes the ligand state, de ne its

protozoan parasite Trypanosoma brucei. The discovery of

genotype and the atomic coordinates of the ligand, which

several new inhibitors was the result of a streamlined RCS

describes its phenotype, undergo genetic events such as

method, providing a concrete example of its success when

selection, crossover, and mutation during the optimiza-

trying to discover new inhibitors from a large database of

tion procedure.

compounds [68]. The second methodological advancement

The AutoDock chromosome consists of a string of real-

for RCS involves accounting for both local-induced and

valued genes containing three cartesian coordinates for

global effects of ligand binding. This is shown with the

ligand translation, four variables de ning a quaternion that

well-characterized binding of a set of heterocyclic cation

speci es the ligand orientation, and one real-value for each

ligands to the W191G cavity mutant of cytochrome c

ligand torsion [65]. The global search is carried out on the

peroxidase [69]. The third improvement attempts to de ne

genotype level and performed with the GA, which allows

two general algorithms to reduce the number of MD tra-

selection, crossover, and mutation. The ligand-receptor

jectory snapshots for the docking experiments, which

tness is evaluated based on a semi-empirical scoring

increases computational ef ciency by orders of magnitude

function including an empirical estimate for the ligand

without decreasing its accuracy. First, the KREL1 appli-

con gurational entropy [66]. The global search is followed

cation uses the QR factorization method available in the

by an adaptive-stepping local search that performs energy

MultiSeq plugin in VMD [70]. Second, the W191G cavity

minimizations on the atomic coordinates. Afterwards, the

mutant of cytochrome c peroxidase (W191G) uses an

optimized phenotype is fed back to the genotype, in

atom-positional root-mean-square deviation (RMSD)

accordance with Lamarckian genetics, from which the

clustering algorithm [71] as implemented in the rmsdmat2

algorithm derives its name. Ultimately, solutions better

and cluster2 programs of the GROMOS++ analysis soft-

suited to speci c interactions have a better score, therefore

ware [60]. Last, we discuss the importance and the

reproduce and persist, whereas poorer suited ones die off.

dif culties of including explicit water molecules within the

The re-docking of the ligands across the ensemble of

binding sites in the RCS docking experiments.

receptor structures results in a range of predicted binding

af nities for each ligand, based on the AutoDock scoring

function. The resulting binding spectrum for each ligand New applications

is then used to reorder the ligands and better predict rela-

RCS as a tool for enhanced virtual screening

tive af nity. Various post-processing options can be

considered beyond the initial af nity estimate provided by

AutoDock, including the application of MM-PBSA, single- Given a novel protein target, the goal of identifying a new

set of potential inhibitors with drug-like properties can be

step perturbation, LIE, FEP, or TI (Fig. 2). Although more

rigorous free energy estimates increase the con dence in achieved using virtual screening type approaches. Typically

123

J Comput Aided Mol Des (2008) 22:693 705 697

(i.e., the reorganization of residues upon induced- t bind-

these large-scale virtual screens are carried out by evalu-

ing) or global (i.e., larger scale conformational changes

ating the predicted af nities of thousands of molecules

occurring also in remote structural elements of the receptor

against a single static crystal structure [15, 72]. Here we

upon binding) effects. Our current structural knowledge of

report on the success of porting the RCS into a virtual

biomolecular association phenomena is predominantly

screen type application in the search for inhibitors against

based on X-ray crystallography ensemble-averaged struc-

an essential kinetoplastid RNA editing ligase 1 (KREL1)

tures. Although these experiments provide critical binding

in T. brucei, the parasite responsible for the devastating

information, they typically capture only one state involved

tropical disease African sleeping sickness [68]. KREL1 is

in the binding process, which may be a dominant con gu-

required for survival of both the insect and bloodstream

ration, but not necessarily exclusive. Dynamic information

forms of the parasite [73], and it is a particularly attractive

at the atomistic level, as provided by MD simulations, is of

drug target as there are no known human homologues. The

fundamental importance and may reveal binding modes and

high-resolution crystal structure [74] provides an excellent

relevant biophysical information otherwise inaccessible to

platform for computer-aided drug design as well as for MD

standard experimental techniques.

simulations and the RCS application.

A relevant example of the importance of predicting

The KREL1 crystal structure revealed a deep active site

receptor- exibility effects resulted from the application of

pocket with several water molecules coordinated to the

RCS to HIV integrase. MD simulations of the integrase

ATP substrate and the protein. Two 20 ns simulations in

protein bound to a known inhibitor revealed a new cavity

explicit solvent were carried out with KREL1, both with

adjacent to the active site [55]. RCS docking of ligands into

and without the bound ATP in order to generate the

this newly discovered pocket indicated favorable binding

receptor ensembles [63]. A screen of the crystal structure

of ligands to this area. This new structural insight was

against the NCI diversity set (containing 1,900 compounds)

exploited in the development of raltegravir (MK-0518), the

using AutoDock version 4.0 was performed, and the top

rst of a new class of antiretroviral agents active against

twenty- ve compounds that obeyed most of Lipinski s

the enzyme integrase that has recently been approved by

rules of 5 [75] were selected for application of the RCS

the FDA [76].

method. The top 25 ligands were then re-docked into the

As the binding propensity de ning a given molecular

full receptor ensemble as well as a reduced representative

association re ects the relative stabilities of the possible

set (discussed in further detail below) and these compounds

conformations of the receptor, effective drug-design pro-

were then re-ranked based on their average binding energy

tocols should be based on a distribution of receptor

of the most populated cluster.

conformations. In this respect, RCS has the advantage of

The results of the RCS virtual screen with KREL1 are

requiring the generation of only one MD ensemble per

particularly promising. Several new inhibitors have been

receptor macro-state (e.g., the open or closed state of a loop

identi ed with the RCS and an in vitro inhibition assay of

gating the binding pocket). This has recently been sys-

the rst step in the binding reaction, the adenylation step,

tematically investigated in the case of the W191G cavity

was used to verify the computational predictions. These

mutant of cytochrome c peroxidase by analyzing the

experiments con rmed two of the eight tested compounds

docking of small ligands into alternative ensembles of

found in the initial screen were inhibitory [68]. Impor-

receptor conformations [69].

tantly, the RCS method resulted in a reordering of the

The binding of heterocyclic cation ligands into the

twenty- ve compounds that identi ed inhibitors that would

W191G engineered cavity has been characterized experi-

not have otherwise been tested, based on their rank from

mentally [77 81]. Mutation of this key tryptophan in the

the static crystal structure screen. Speci cally, the best hit

as experimentally veri ed was initially ranked fteenth, active site creates a ligand-binding cavity and also appears

and after RCS reordering became rst. In the case of to increase local exibility, which opens a loop-gated

limited resources and low-throughput experimental proce- pathway for ligands to reach the buried cavity. Recently,

dures, where only a handful of the best compounds MD simulations suggested the importance of induced- t

identi ed in the screen could be experimentally tested, the effects in the W191G cavity for binding of 2-amino-5-

application of the RCS method provided a measurable and methylthiazole (2a5mt) [64]. X-ray crystallography

important enrichment of the initial ranked set. experiments have elucidated the structures of several

ligand-protein complexes, including those for which the

Accounting for induced t and the global effects loop rearrangement is more pronounced and causes a shift

of ligand binding between the closed- and open-gate structural ensembles.

Benzimidazole (bzi) was suggested to produce a full

In nature, a great number of protein ligand recognition opening of the cavity [78]. MD simulations starting

processes are only possible when accompanied by local from different initial con gurations characterized the

123

698 J Comput Aided Mol Des (2008) 22:693 705

conformational sampling and dominant con gurations of ensemble. Although both the holo and apo receptor

the closed and open alternate states (Fig. 3a). ensembles generate ligand-binding poses (i.e., the geome-

RCS calculations were performed on different gating- try of a docked ligand into the binding site) that are similar

loop and binding states: the closed-gate apo, the closed- to those determined experimentally, the 2a5mt binding

gate holo (i.e., the complex with the best binder), and the af nities are closer to the experimental results for the holo

open-gate apo structure, allowing the investigation of the ensemble (Fig. 3c). This illustrates that the holo ensemble

correlation between each compound s binding af nity and is the best choice to perform RCS calculations for the

the closed/open state of the gating loop (Fig. 3b) [69]. 2a5mt ligand, and suggests the same is likely the case for

Additional in silico experiments evaluated the bene ts of other ligands with similar chemical and electrostatic

using non-standard MD trajectories to enhance the con- properties [69].

formational sampling (e.g., simulations at high temperature A different picture emerges when bzi binds to the same

using atom-positional restraining potentials) or simulate an cavity. In this case, the best agreement between RCS

unphysical generalized-ligand interaction encompassing af nities and the experimental free energies is found when

the characteristics of all potential binders [69]. In the case using the apo-open receptor ensemble. This agrees with the

of 2a5mt, the optimal binding spectrum occurs when experimental observation that bzi shifts the propensity of

docked into the receptor conformations from the holo the gating-loop con gurations towards the open-gate state.

Fig. 3 (a) The W191G cavity mutant of cytochrome c peroxidase conformational states of the gating loop the probability distributions

and its two dominant con gurations extracted using an RMSD of the binding af nities from RCS calculations are shown as based on

conformational clustering analysis for the gating-loop and MD the apo (black), holo (red), and apo open-gate (green) receptor

simulations of the separate states. The closed (blue) and open ensemble simulations. The dashed-vertical lines correspond to the

(yellow) gate states are highlighted, together with Asp 235, the experimental free energies of binding. Docking poses for 2a5mt (c)

residue determining the orientation of the binders in the cavity. The and bzi (d) are displayed from corresponding crystal structures

heme cofactor is shown in red. (b) Binding propensities of the best (yellow) and the RCS calculations based on MD simulations of the

binder (2a5mt) and of the binder suggested to induce the full opening apo (black), holo (red), and apo open-gate (green) receptors

of the gating loop (bzi) are shown [69]. For each of the two

123

J Comput Aided Mol Des (2008) 22:693 705 699

Again, the ligand-binding poses (i.e., the relative orienta- multi-dimensional QR factorization algorithm is applied to

tion of the docked ligand into the W191G arti cial cavity) the encoded alignment of receptor structures, which results

are very similar between the RCS method and the crys- in a reordering of the structures based on increasing linear

tallographic complexes (Fig. 3c). Although a false negative dependence. This reordering subsequently allows the

is found when docking bzi to the ensemble of apo receptor construction of non-redundant sets of structures at some

structures, bzi binds favorably and with a binding mode user-de ned cutoff, representing a certain dynamical con-

similar to experiment when using the closed-gate holo guration space.

ensemble (Fig 3d). These promising results suggest that it The application of this method to the receptor ensemble

may be possible to capture different binding propensities of RNA editing ligase 1 resulted in the initial set of 400

depending on both local and global receptor rearrange- structures (extracted every 50 ps from a 20 ns simulation)

ments upon binding. being reduced to 33, with essentially no loss of binding

spectrum information (Fig. 4a). When docking a large set

of ligands, as is required in a virtual-screen type applica-

Effective reduction of the receptor ensemble tion, the reduction of receptor structures for the

computational dockings can make a signi cant difference

In the original RCS, the computational docking experi- in computational cost. For example, for RNA editing

ments were carried out using snapshots extracted at equal ligase, the number of dockings was reduced from 11,200 to

time intervals from the MD trajectories. As the simulations 924, resulting in a 90% reduction of computational cost

are carried out for several nanoseconds, this typically sums [68].

up to *104 to 105 receptor structures, many of which may An alternative method is clustering based on a matrix of

be conformationally redundant. Two recent studies have all pair-wise RMSD of the aligned structures in the

investigated alternative methods to distill the structural receptor ensemble. If the binding region of the receptor is

information to a reduced, yet meaningful set. known a priori the clustering algorithm can focus on this

A novel method to distill the ensemble of structures to a particular subset of residues that constitute the binding site.

non-redundant set is the so-called QR factorization The employed algorithm was originally developed to cap-

method. This technique was originally developed to ture the dominant con gurations of an ensemble of

remove inherent bias in structure databases and distill, from structures for exible peptides [71] and its application has

a vast quantity of redundant information, a minimal basis been further extended to exible molecules [86, 87] and

set of protein structures that accurately spans the evolu- protein surface loops [64]. After removing overall rotation

tionary phase space of a particular protein [82]. It has also and translations, the atoms of the binding region are

superimposed using their Ca atoms coordinates. A matrix

been applied to an ensemble of NMR structures in order to

determine a small, representative subset of structures from that contains all pair-wise RMSD values among all the

a larger experimental dataset [83] and to create non- structures in the trajectory is created. Next, the matrix is

redundant sequence alignments [84]. This technique has divided into batches corresponding to similar structures

most recently been incorporated into the RCS, where a using the RMSD values, with a clustering algorithm [71]

multiple structural alignment of the receptor ensemble is and a user-de ned cutoff. This clustering allows docking

performed with STAMP [85]. This alignment algorithm trials to be performed on a reduced number of signi cant

operates progressively: all possible pair-wise alignments conformations, while retaining the dominant characteristics

are computed, followed by a hierarchical clustering anal- of the entire spectrum of binding modes.

ysis based on a structural similarity measure to build the In the case of the W191G cavity mutant of cytochrome c

multiple structural alignment. The measure of structural peroxidase, re-docking into the entire ensemble of struc-

tures would require docking to 104 snapshots. However,

similarity applied here is QH, which essentially measures

the distance between all pairs of Ca atoms among all when these trajectory snapshots were clustered into groups

aligned structures. Although the development of QH was of similar con gurations with a RMSD similarity criterion

motivated by the need to include gaps in order to build a of 0.1 nm for the cavity residues, the resulting two most

similarity measure for more distantly related proteins, in dominant clusters of trajectory structures represented 36

the case of aligning the receptor ensemble, the gap term is and 16%, 81 and 10%, 48 and 23% of the structures for the

unnecessary as structures of the same protein are aligned. apo, holo, and apo-open ensembles, respectively (Fig. 4b).

The structural alignment is stored in a multidimensional This RMSD clustering resulted in a 99% reduction of

matrix of dimension maln 9 nreceptor structures 9 d, where d computational cost for the RCS docking stage [69].

encodes the rotated Ca atoms coordinates. In this matrix, In addition to improving the computational ef ciency of

each receptor structure is represented in a column and the RCS, clustering analyses can also supply useful infor-

the rows represent the multiple alignment. Finally, a mation about the exibility of the receptor, by analyzing

123

700 J Comput Aided Mol Des (2008) 22:693 705

Fig. 4 Reducing redundancy in the receptor ensemble. (a) Left (not all KREL1 structures are shown). Right panel: the initial (top) set

panel: Multidimensional QR factorization of KREL1 determines the of structures with the corresponding binding spectrum and the

distance relationship among all pairs of proteins (according to RMSD) reduced set (bottom) is shown. The similarity between the full and

and then reorders them based on increasing linear dependence, reduced binding spectrums indicates that there is virtually no loss of

allowing the distillation of a reduced, representative set of structures information. (b) Dominant con gurations of the W191G cavity region

for docking. At any particular QH threshold (indicated by red dotted as extracted from RMSD conformational clustering. For each separate

line at QH 0.86), at each point of intersection of a branch, the most MD ensemble the corresponding reference crystal con guration is

linearly independent structure is chosen from the group to the right of displayed (red thin lines) superimposed on the central member

the dotted line (each red open circle drawn at the branch intersection structures of the rst (yellow licorice) and second (green licorice)

indicates the choice of one structure to represent all structures to the most populated clusters.

right of the node). For clarity, the structure tree shown here is reduced

123

J Comput Aided Mol Des (2008) 22:693 705 701

possible choice of MD ensemble may depend on the

the number of structures representing a certain QH or

amount of knowledge available case by case. Additional

RMSD threshold cutoff (in the QR factorization method) or

scenarios can be considered as well, for example including

the cluster population versus the number of clusters pop-

homology-modeling type of approaches to generate the

ulated (in the RMSD clustering method). This type of

initial receptor structural con guration.

information gives quantitative insight about the local and

global exibility of the receptor. The computational gain

due to these types of clustering schemes seems particularly

Accurate description of solvent contributions

useful when screening large compound databases.

An effective representation of the solvent during the

docking trials is a major factor limiting the accuracy of

Choice of MD receptor ensemble and RCS predictive

docking calculations. Until recently, RCS calculations have

power

been performed using only the receptor structure and

ligand, even when the MD trajectory structures were gen-

One of the major challenges for hybrid docking techniques

erated in combination with explicit water models.

is the possibility to screen large compound databases and

Accounting for the speci c role of conserved water mole-

extract potential binders based on very limited a priori

cules is highly relevant as they may perturb the exibility

knowledge of the binding process itself. In this context, the

of a bound ligand, signi cantly alter the electrostatic

W191G cytochrome c peroxidase system is used as a

environment experienced by a small molecule, and even

platform to investigate how the choice of MD receptor

occlude potential areas of binding. The explicit inclusion of

ensemble for RCS calculations affects the predictive power

these contributions will certainly improve the accuracy of

of this methodology [69]. To re ect the different amounts

the ligand-binding description, similarly to what recently

of knowledge that may be available on the binding process,

reported for protein protein docking [88]. Here, we present

different typical scenarios in drug discovery were consid-

two new applications of the RCS that tested docking of the

ered, including cases in which: (i) no information is

ligands into MD generated receptor structures with and

available on the location of the binding site, (ii) X-ray

without cavity water molecules. What emerges from both

structures of the protein ligand complexes and knowledge

examples is the important (thermo)dynamic role of speci c

on potential binders is not available, and (iii) the X-ray

waters for the binding process.

structures known for the protein ligand complex do not

In the case of the RNA editing ligase, the KREL1 crystal

de ne unique ligand-binding poses. Corresponding to the

structure suggested three buried water molecules in the

above scenarios: (i) the RCS technique using the holo-

deep end of the ATP binding pocket. Explicitly solvated

receptor ensemble nds true positives using a docking grid

MD simulations of the holo complex (i.e., ATP bound)

that encompasses the entire W191G cytochrome c peroxi-

allowed us to predict the dynamics of the crystal water

dase structure; (ii) the number of true positives and true

molecules. The simulations indicated that these water

negatives can be signi cantly increased versus the number

molecules have different exchange rates, and that one of

of false positives and false negatives by employing an

the water molecules in particular, the one directly inter-

MD receptor ensemble containing a generalized type of

acting with both the protein and ATP, persists in its

unphysical ligand that re ects the main structural proper-

original location for the duration of the 20 ns simulation

ties of the compounds in the database, when compared to

[63]. In terms of the receptor structure, this single coordi-

equivalent docking calculations performed on the apo MD

nated water molecule acts as a structural scaffold that

receptor ensemble; and (iii) the multiple binding orienta-

prevents the localized collapse of surrounding residues

tions characterizing the true positives do not necessarily

while interacting with the bound ATP. By extracting the

correspond to non-accurate docking results when compared

water molecules from the structure before ligand docking,

to the raw electron density data from X-ray crystallogra-

an additional small cavity was open to the ligand. The RCS

phy. The quality of the binding poses can be judged by a

dockings were performed both with and without the three

combined evaluation of (i) the distribution of the docked-

conserved water molecules. Interestingly, the best inhibi-

ligand cluster populations versus the cluster number, (ii)

tors were identi ed when the water molecules were not

RMSD from the corresponding experimental complex after

included in the RCS dockings. The predicted docking poses

superimposing the structures as described above, and (iii)

and binding af nities differ signi cantly depending on

the comparison of the different poses for a same ligand.

whether the three explicit conserved water molecules are

These results open new possibilities for enhancing the

included in the dockings or not (data not shown). Impor-

predictive power of RCS calculations in so-called blind

tantly, at least two of the experimentally veri ed inhibitors

test cases (when information is missing concerning the

were predicted to substitute a functional group into the

binding process), and they also suggest that the best

123

702 J Comput Aided Mol Des (2008) 22:693 705

location where one of the crystal water molecules was static dockings may be an arti cial consequence of the

located (Fig. 5a) [68]. introduction of bound (static) water molecules in the cav-

In the W191G cytochrome c peroxidase application, ity, whereas locally disordered (dynamically swapping

several docking calculations were performed to investigate among the favorable sites) water molecules should be

the in uence of the crystallographic water sites on Auto- considered instead. We note that an accurate sampling of

Dock binding af nities and ligand poses. The tests were receptor, ligand, and solvent phase spaces is, in principle,

performed with rigid-protein docking calculations on dif- reached by more expensive free energy calculations [11,

ferent receptor models, alternatively including or excluding 26 36].

water site 308, which was suggested to be a highly con-

served location for a bound water based on X-ray structures

for a large group of compounds [81]. The results from the Future methodological improvements

docking calculations are in agreement with what was pre-

viously suggested by similar tests based on AutoDock The development of computational tools for computer-

version 3.0, which showed that the introduction of even a aided drug design depends on the critical compromise

single water molecule can signi cantly perturb the binding between accuracy and computational costs. Ideally, the

propensity of most of the ligands [89]. most reliable prediction of molecular af nity can be

While the effect on the predicted binding af nities and obtained through rigorous free energy calculations of the

poses is at variance with the speci c ligand, a systematic ligand-binding process [11, 26 36]. In practice, however,

consequence of the introduction of one water molecule into the CPU time typically required to perform such free

the W191G cavity is the signi cant reduction of the con- energy calculations on a few candidates (bottom of the

gurational space available to the ligand during the funnel diagram in Fig. 1) is comparable to that involved in

docking trials. MD simulations of the W191G cavity pre- rough geometrical recognition over a pool of molecules

dict the location of the highly favorable water sites (within more than ve orders of magnitude larger (top of the funnel

X-ray crystallography resolution and re nement assump- diagram in Fig. 1). Although the theory and methods are

tions) compared with the available experimental data well established for calculating free energy in practice,

(Fig. 5b). Additionally, MD simulations reveal a larger they are still prohibitively expensive to be employed in

number of favorable water sites, and allow the description high-throughput screening of drug-like databases. The

of the dynamic behavior of the solvent, including the future development of hybrid techniques, and especially

swapping of water molecules between highly favorable the RCS, is therefore twofold.

regions [64]. Based on these observations we suggest that First, it will certainly involve the re nement of the

the signi cant effects of including explicit water in the underlying physical models describing ligand-binding

Fig. 5 Solvent contributions in protein ligand binding. (a) The inhibitor replaces the location of a crystal water molecule. (b) For the

KREL1 active site with one of the newly discovered inhibitors in a W191G cavity (gray surface), the crystallographic water sites (solid

predicted docked conformation. KREL1 is shown in orange cartoon, red spheres; diameter corresponding to X-ray resolution) are

with the novel inhibitor shown docked in the active site (licorice, compared to the highly favorable average density regions of water

atom type colors). The three crystallographic water sites (not included molecules in the MD simulations (blue wireframe isosurfaces), for the

in the docking calculation) are shown in licorice with their van der best binder 2a5mt (yellow licorice) and from the 1AEN crystal

Waals surface in transparent. Note that the sulfonic acid group of the structure (red licorice)

123

J Comput Aided Mol Des (2008) 22:693 705 703

that reliably and ef ciently accommodates receptor exi-

(thermo)dynamics in increased detail, especially during the

bility is still lacking. The extensions and methodological

docking stage. Although the results presented here include

improvements to the RCS presented here take important

the unbound (gas phase) ligand enthalpy term at the

steps toward offering such a streamlined procedure. Our

docking stage [66], a complete description of the thermo-

examples indicate that alternative choices of receptor

dynamic cycle of binding is still far from being explicitly

ensembles can signi cantly alter the predictive power of

treated in RCS. Previous studies investigated the bene ts

RCS calculations, and that it is possible to reduce the

of rescoring the docked complexes using a more accurate

receptor ensemble to a non-redundant set of con gurations

(implicit solvent) description of the solvent contributions

by various techniques without losing relevant binding

[54]. More recently, the role of ligand entropy in the

information. Furthermore, both example systems indicate

re nement of protein ligand docking predictions has been

that the role of explicit water molecules in molecular

evaluated [90, 91]. Additionally, accurate con gurational

association remains one of the key components of com-

entropy calculations from MD simulations and a complete

puter-aided drug design methods to be further investigated.

quasi-harmonic analysis have demonstrated that the ther-



Contact this candidate