J Comput Aided Mol Des (****) **:***-***
An improved relaxed complex scheme for receptor exibility
in computer-aided drug design
Rommie E. Amaro Riccardo Baron
J. Andrew McCammon
Received: 12 July 2007 / Accepted: 21 November 2007 / Published online: 15 January 2008
Springer Science+Business Media B.V. 2008
Abstract The interactions among associating (macro)mol- to virtual screening, more rigorous characterization of local
ecules are dynamic, which adds to the complexity of and global binding effects, and methods to improve its
molecular recognition. While ligand exibility is well computational ef ciency by reducing the receptor ensem-
accounted for in computational drug design, the effective ble to a representative set of con gurations. The choice of
inclusion of receptor exibility remains an important receptor ensemble, its in uence on the predictive power of
challenge. The relaxed complex scheme (RCS) is a RCS, and the current limitations for an accurate treatment
promising computational methodology that combines the of the solvent contributions are also brie y discussed.
advantages of docking algorithms with dynamic structural Finally, we outline potential methodological improvements
information provided by molecular dynamics (MD) simu- that we anticipate will assist future development.
lations, therefore explicitly accounting for the exibility
Keywords Clustering Docking
of both the receptor and the docked ligands. Here, we
Ensemble-based docking
brie y review the RCS and discuss new extensions and
Kinetoplastid RNA editing ligase 1 Molecular dynamics
improvements of this methodology in the context of ligand
Non-redundant ensemble Protein ligand binding
binding to two example targets: kinetoplastid RNA editing
Relaxed complex method Representative ensemble
ligase 1 and the W191G cavity mutant of cytochrome
Virtual screening W191G cytochrome c peroxidase
c peroxidase. The RCS improvements include its extension
Rommie E. Amaro and Riccardo Baron contributed equally to this
Abbreviations
work.
GA Genetic algorithm
R. E. Amaro R. Baron J. A. McCammon KREL1 Kinetoplastid RNA editing ligase 1
Department of Chemistry and Biochemistry, University of MD Molecular dynamics
California at San Diego, La Jolla, CA 92093-0365, USA
RCS Relaxed complex scheme
e-mail: abpjoj@r.postjobfree.com
RMSD Root-mean-square deviation
R. Baron
W191G W191G cavity mutant of cytochrome
e-mail: abpjoj@r.postjobfree.com
c peroxidase
R. E. Amaro R. Baron J. A. McCammon
Center for Theoretical Biological Physics, University of
California at San Diego, La Jolla, CA 92039-0365, USA Introduction
J. A. McCammon
A full understanding of molecular recognition presents a
Department of Pharmacology, University of California at San
problem of intense interest to the eld of computer-aided
Diego, La Jolla, CA 92093-0365, USA
drug design and molecular sciences in general. The inter-
J. A. McCammon
actions between ligand molecules and their corresponding
Howard Hughes Medical Institute, University of California at
receptors are dynamic and complex. Techniques that best
San Diego, La Jolla, CA 92093-0365, USA
123
694 J Comput Aided Mol Des (2008) 22:693 705
Fig. 1 The problem: how to
distill a few good binders and
characterize their binding
propensity out of a vast database
of compounds
address these issues must account for the conformational Hybrid methods, which are faster but more approximate,
exibility of both the ligand and the receptor and do so in have been developed with the aim of reducing a large
initial set of potential binders (*103 or more), to a reduced
an accurate and ef cient manner. While the ability to
set of promising molecules (*101), for which the binding
explore ligand exibility is well established, computer-
aided drug design methodologies have only recently begun properties can then be investigated in a second phase that
to take receptor exibility into account when searching for uses more rigorous methods to predict binding free ener-
and optimizing functional inhibitors. Since it is widely gies (Fig. 1). Examples of these hybrid methods are linear
accepted that ligands may bind to receptor conformations interaction energy (LIE) [37 39], single-step perturbation
that occur infrequently in the receptor s dynamics, and that [40 45], docking to MD structures [44, 46], docking to
the local motions of active site residues can drastically alter relevant normal modes structures [47 49], induced- t
the binding and speci city of ligands to their target, the docking (IFD) [50], the dynamic pharmacophore model
ability to ef ciently sample these rare dynamics and fur- [51, 52] and the relaxed complex scheme (RCS) [53 55].
thermore, to incorporate the resulting conformations into Alternatively, the receptor ensemble can be gathered from
the drug design protocol, remains an important challenge a collection of independent X-ray or NMR experimentally
(reviewed in references [1 5]). derived structures [56, 57]. These hybrid approaches
A closely related challenge is the development of effec- encompass different levels of accuracy, predictive power,
tive methods to predict the binding propensity for series and level of a priori knowledge required. For example, LIE
of compounds or exible peptides to a given receptor [6 11]. and single-step perturbation can be considered non-
Approaches based on scoring functions or compound empirical in that they are derived from free energy type
libraries require large amounts of data to be available a approaches and based on more complex physical models of
priori. These methods include computational virtual the binding process. Yet, in practice they need precise
screening [12 18], docking [18 23], and similarity search- information on the location of the binding site to be
ing [24, 25]. More advanced treatments of receptor ligand available a priori. Conversely, docking-based techniques
binding can be achieved using molecular dynamics (MD) are appealing for screening purposes because they do not
simulations. Free energy changes can then be estimated necessarily require any information on the location of the
based on coupling-parameter approaches, such as thermo- binding site, and can therefore be employed to predict
dynamic integration (TI) and free energy perturbation (FEP) binding site locations. In principle, these docking-based
[11, 26 36], which describe a higher physical complexity of approaches should also be more easily extendable to bio-
the binding process and include an extensive sampling of logically relevant systems with increasing size and
receptor, ligand, and solvent phase spaces. Importantly, the complexity, such as protein nucleotide and protein protein
latter methods provide not only binding free energy esti- association. However, they cannot supply accurate esti-
mates, but also a reliable measure of their accuracy. Yet, mates of free energy changes upon binding.
they are typically too computationally expensive to be The RCS is a promising computational methodology
applied to extensive sets (*105) of compounds, which is the that combines the advantages of docking algorithms
usual scenario for newly discovered biological targets. with dynamic structural information provided by MD
123
J Comput Aided Mol Des (2008) 22:693 705 695
Fig. 2 An overview of the RCS. Improvements to the RCS are shown available ligands can be found in the Zinc Is Not Commercial
in gray background and those speci cally presented in this paper are (ZINC), National Cancer Institute (NCI), and Available Chemicals
outlined in red. In the Receptor Ensemble box (top left), the Database (ACD), among others. AutoDock is then used to dock the
structures can be generated with classical MD, or a variety of simula- ligand database into the receptor ensemble. In the Post-Processing
tion techniques could be considered in order to enhance the sampling stage, the docked complexes can be rescored or reevaluated using
of the receptor con gurational space, including: Generalized-Born more rigorous protocols than the AutoDock version 4.0 scoring
MD (GB-MD), steered MD (SMD), high temperature MD (High T function (AD4), including molecular-mechanics Poisson Boltzmann
MD), targeted MD (TMD), and accelerated MD (Accl. MD). In the surface area (MM-PBSA), single step perturbation, LIE, and FEP or
Ligand Ensemble box (top right), commercially or publicly TI techniques
simulations, explicitly accounting for the exibility of Materials and methods
both the receptor and docked ligands. This procedure is
Relaxed complex scheme: short overview
appealing as a large variety of conformational changes
may characterize ligand binding processes of biochemical
In the typical RCS (Fig. 2), all-atom MD simulations are
and medical interest and, more generally, molecular rec-
carried out for the target biomolecule of interest, with a
ognition. The RCS has been developed in combination
substrate or inhibitor bound in the active site, starting from
with various MD software packages and AutoDock for the
the crystal structure with a bound ligand (i.e., the holo
ligand docking. Although other docking programs can be
complex). Typical simulation lengths range from 2 ns to
considered, all RCS applications to date have employed
tens of ns, and snapshots of the biomolecule are extracted
AutoDock, a widely distributed and tested docking pro-
at a predetermined time interval (e.g., every 10 ps). RCS
gram that has been shown to be successful in a variety of
calculations based on explicit solvent MD simulations of
docking studies [20 22]. The RCS was rst applied to the
two different systems are presented in this work. First, the
FKBP binding protein [53] and tested using improved re-
scoring functions based on MM-PBSA models [54]. kinetoplastid RNA editing ligase 1 (KREL1), which uses
the NAMD2.6 MD software [58] (freely available at http://
Applications of the RCS identi ed a novel-binding trench
www.ks.uiuc.edu/Research/namd/) with the Charmm27 force
in HIV integrase [55]. In this work, we sketch the phi-
losophy underlying the RCS, describe new improvements, eld [59]. Second, the W191G cavity mutant of cytochrome c
peroxidase (W191G), based on simulations performed with
and present recent applications to exemplify the type of
the GROMOS05 software for biomolecular simulation [60]
problems that can be tackled with this computational
scheme. (available at http://www.igc.ethz.ch/gromos/) using the 45A4
123
696 J Comput Aided Mol Des (2008) 22:693 705
the predicted binding energies, they can be prohibitively
parameter set [61] of the GROMOS force eld [62]. Details
computationally expensive.
of the MD for each system are described in References [63]
and [64], respectively. The resulting set of structures, gen-
erated with a physically based MD force eld, represents the
Improved relaxed complex scheme
receptor ensemble and can be conceptually thought of as a set
of structures de ning approximately its thermodynamic
A rst set of improvements involves the docking algorithm
equilibrium state in solution. This receptor ensemble is sub-
itself as implemented in AutoDock version 4.0: (i) a more
sequently used in the docking experiments, in which a
complete thermodynamic cycle, where the unbound (gas
reduced set of small molecules are docked into the active site
phase) ligand enthalpy is computed, (ii) an improved des-
and the corresponding binding af nities are evaluated.
olvation term that accounts for a larger number of atom
AutoDock is used to carry out the docking experiments
types than in the previous versions, and (iii) a charge model
and full ligand exibility is employed. One of the major
that allows fast calculation of the charge distribution [67]
advantages of AutoDock is its use of a hybrid genetic algo-
and compatibility of partial charges between the ligand and
rithm (GA) to perform an ef cient and effective global
the receptor structures [66]. The studies presented here
search for the ligand [65]. Genetic algorithms are optimi-
employed this new and improved version of AutoDock
zation schemes that use the language of natural genetics and
(freely available at http://www.autodock.scripps.edu/).
evolution, and in the case of AutoDock, the optimization
A second set of improvements involves the RCS meth-
problem is molecular docking between a ligand and a
odology itself, as described in the following, based on
receptor. Typically, the receptor is xed and the translation,
recent applications. The rst extension to the RCS we
orientation, and conformation of the ligand are explored.
present here is the application of the method for virtual
Genetically derived terms such as the AutoDock chromo-
screening, which involves an essential enzyme for the
some, which describes the ligand state, de ne its
protozoan parasite Trypanosoma brucei. The discovery of
genotype and the atomic coordinates of the ligand, which
several new inhibitors was the result of a streamlined RCS
describes its phenotype, undergo genetic events such as
method, providing a concrete example of its success when
selection, crossover, and mutation during the optimiza-
trying to discover new inhibitors from a large database of
tion procedure.
compounds [68]. The second methodological advancement
The AutoDock chromosome consists of a string of real-
for RCS involves accounting for both local-induced and
valued genes containing three cartesian coordinates for
global effects of ligand binding. This is shown with the
ligand translation, four variables de ning a quaternion that
well-characterized binding of a set of heterocyclic cation
speci es the ligand orientation, and one real-value for each
ligands to the W191G cavity mutant of cytochrome c
ligand torsion [65]. The global search is carried out on the
peroxidase [69]. The third improvement attempts to de ne
genotype level and performed with the GA, which allows
two general algorithms to reduce the number of MD tra-
selection, crossover, and mutation. The ligand-receptor
jectory snapshots for the docking experiments, which
tness is evaluated based on a semi-empirical scoring
increases computational ef ciency by orders of magnitude
function including an empirical estimate for the ligand
without decreasing its accuracy. First, the KREL1 appli-
con gurational entropy [66]. The global search is followed
cation uses the QR factorization method available in the
by an adaptive-stepping local search that performs energy
MultiSeq plugin in VMD [70]. Second, the W191G cavity
minimizations on the atomic coordinates. Afterwards, the
mutant of cytochrome c peroxidase (W191G) uses an
optimized phenotype is fed back to the genotype, in
atom-positional root-mean-square deviation (RMSD)
accordance with Lamarckian genetics, from which the
clustering algorithm [71] as implemented in the rmsdmat2
algorithm derives its name. Ultimately, solutions better
and cluster2 programs of the GROMOS++ analysis soft-
suited to speci c interactions have a better score, therefore
ware [60]. Last, we discuss the importance and the
reproduce and persist, whereas poorer suited ones die off.
dif culties of including explicit water molecules within the
The re-docking of the ligands across the ensemble of
binding sites in the RCS docking experiments.
receptor structures results in a range of predicted binding
af nities for each ligand, based on the AutoDock scoring
function. The resulting binding spectrum for each ligand New applications
is then used to reorder the ligands and better predict rela-
RCS as a tool for enhanced virtual screening
tive af nity. Various post-processing options can be
considered beyond the initial af nity estimate provided by
AutoDock, including the application of MM-PBSA, single- Given a novel protein target, the goal of identifying a new
set of potential inhibitors with drug-like properties can be
step perturbation, LIE, FEP, or TI (Fig. 2). Although more
rigorous free energy estimates increase the con dence in achieved using virtual screening type approaches. Typically
123
J Comput Aided Mol Des (2008) 22:693 705 697
(i.e., the reorganization of residues upon induced- t bind-
these large-scale virtual screens are carried out by evalu-
ing) or global (i.e., larger scale conformational changes
ating the predicted af nities of thousands of molecules
occurring also in remote structural elements of the receptor
against a single static crystal structure [15, 72]. Here we
upon binding) effects. Our current structural knowledge of
report on the success of porting the RCS into a virtual
biomolecular association phenomena is predominantly
screen type application in the search for inhibitors against
based on X-ray crystallography ensemble-averaged struc-
an essential kinetoplastid RNA editing ligase 1 (KREL1)
tures. Although these experiments provide critical binding
in T. brucei, the parasite responsible for the devastating
information, they typically capture only one state involved
tropical disease African sleeping sickness [68]. KREL1 is
in the binding process, which may be a dominant con gu-
required for survival of both the insect and bloodstream
ration, but not necessarily exclusive. Dynamic information
forms of the parasite [73], and it is a particularly attractive
at the atomistic level, as provided by MD simulations, is of
drug target as there are no known human homologues. The
fundamental importance and may reveal binding modes and
high-resolution crystal structure [74] provides an excellent
relevant biophysical information otherwise inaccessible to
platform for computer-aided drug design as well as for MD
standard experimental techniques.
simulations and the RCS application.
A relevant example of the importance of predicting
The KREL1 crystal structure revealed a deep active site
receptor- exibility effects resulted from the application of
pocket with several water molecules coordinated to the
RCS to HIV integrase. MD simulations of the integrase
ATP substrate and the protein. Two 20 ns simulations in
protein bound to a known inhibitor revealed a new cavity
explicit solvent were carried out with KREL1, both with
adjacent to the active site [55]. RCS docking of ligands into
and without the bound ATP in order to generate the
this newly discovered pocket indicated favorable binding
receptor ensembles [63]. A screen of the crystal structure
of ligands to this area. This new structural insight was
against the NCI diversity set (containing 1,900 compounds)
exploited in the development of raltegravir (MK-0518), the
using AutoDock version 4.0 was performed, and the top
rst of a new class of antiretroviral agents active against
twenty- ve compounds that obeyed most of Lipinski s
the enzyme integrase that has recently been approved by
rules of 5 [75] were selected for application of the RCS
the FDA [76].
method. The top 25 ligands were then re-docked into the
As the binding propensity de ning a given molecular
full receptor ensemble as well as a reduced representative
association re ects the relative stabilities of the possible
set (discussed in further detail below) and these compounds
conformations of the receptor, effective drug-design pro-
were then re-ranked based on their average binding energy
tocols should be based on a distribution of receptor
of the most populated cluster.
conformations. In this respect, RCS has the advantage of
The results of the RCS virtual screen with KREL1 are
requiring the generation of only one MD ensemble per
particularly promising. Several new inhibitors have been
receptor macro-state (e.g., the open or closed state of a loop
identi ed with the RCS and an in vitro inhibition assay of
gating the binding pocket). This has recently been sys-
the rst step in the binding reaction, the adenylation step,
tematically investigated in the case of the W191G cavity
was used to verify the computational predictions. These
mutant of cytochrome c peroxidase by analyzing the
experiments con rmed two of the eight tested compounds
docking of small ligands into alternative ensembles of
found in the initial screen were inhibitory [68]. Impor-
receptor conformations [69].
tantly, the RCS method resulted in a reordering of the
The binding of heterocyclic cation ligands into the
twenty- ve compounds that identi ed inhibitors that would
W191G engineered cavity has been characterized experi-
not have otherwise been tested, based on their rank from
mentally [77 81]. Mutation of this key tryptophan in the
the static crystal structure screen. Speci cally, the best hit
as experimentally veri ed was initially ranked fteenth, active site creates a ligand-binding cavity and also appears
and after RCS reordering became rst. In the case of to increase local exibility, which opens a loop-gated
limited resources and low-throughput experimental proce- pathway for ligands to reach the buried cavity. Recently,
dures, where only a handful of the best compounds MD simulations suggested the importance of induced- t
identi ed in the screen could be experimentally tested, the effects in the W191G cavity for binding of 2-amino-5-
application of the RCS method provided a measurable and methylthiazole (2a5mt) [64]. X-ray crystallography
important enrichment of the initial ranked set. experiments have elucidated the structures of several
ligand-protein complexes, including those for which the
Accounting for induced t and the global effects loop rearrangement is more pronounced and causes a shift
of ligand binding between the closed- and open-gate structural ensembles.
Benzimidazole (bzi) was suggested to produce a full
In nature, a great number of protein ligand recognition opening of the cavity [78]. MD simulations starting
processes are only possible when accompanied by local from different initial con gurations characterized the
123
698 J Comput Aided Mol Des (2008) 22:693 705
conformational sampling and dominant con gurations of ensemble. Although both the holo and apo receptor
the closed and open alternate states (Fig. 3a). ensembles generate ligand-binding poses (i.e., the geome-
RCS calculations were performed on different gating- try of a docked ligand into the binding site) that are similar
loop and binding states: the closed-gate apo, the closed- to those determined experimentally, the 2a5mt binding
gate holo (i.e., the complex with the best binder), and the af nities are closer to the experimental results for the holo
open-gate apo structure, allowing the investigation of the ensemble (Fig. 3c). This illustrates that the holo ensemble
correlation between each compound s binding af nity and is the best choice to perform RCS calculations for the
the closed/open state of the gating loop (Fig. 3b) [69]. 2a5mt ligand, and suggests the same is likely the case for
Additional in silico experiments evaluated the bene ts of other ligands with similar chemical and electrostatic
using non-standard MD trajectories to enhance the con- properties [69].
formational sampling (e.g., simulations at high temperature A different picture emerges when bzi binds to the same
using atom-positional restraining potentials) or simulate an cavity. In this case, the best agreement between RCS
unphysical generalized-ligand interaction encompassing af nities and the experimental free energies is found when
the characteristics of all potential binders [69]. In the case using the apo-open receptor ensemble. This agrees with the
of 2a5mt, the optimal binding spectrum occurs when experimental observation that bzi shifts the propensity of
docked into the receptor conformations from the holo the gating-loop con gurations towards the open-gate state.
Fig. 3 (a) The W191G cavity mutant of cytochrome c peroxidase conformational states of the gating loop the probability distributions
and its two dominant con gurations extracted using an RMSD of the binding af nities from RCS calculations are shown as based on
conformational clustering analysis for the gating-loop and MD the apo (black), holo (red), and apo open-gate (green) receptor
simulations of the separate states. The closed (blue) and open ensemble simulations. The dashed-vertical lines correspond to the
(yellow) gate states are highlighted, together with Asp 235, the experimental free energies of binding. Docking poses for 2a5mt (c)
residue determining the orientation of the binders in the cavity. The and bzi (d) are displayed from corresponding crystal structures
heme cofactor is shown in red. (b) Binding propensities of the best (yellow) and the RCS calculations based on MD simulations of the
binder (2a5mt) and of the binder suggested to induce the full opening apo (black), holo (red), and apo open-gate (green) receptors
of the gating loop (bzi) are shown [69]. For each of the two
123
J Comput Aided Mol Des (2008) 22:693 705 699
Again, the ligand-binding poses (i.e., the relative orienta- multi-dimensional QR factorization algorithm is applied to
tion of the docked ligand into the W191G arti cial cavity) the encoded alignment of receptor structures, which results
are very similar between the RCS method and the crys- in a reordering of the structures based on increasing linear
tallographic complexes (Fig. 3c). Although a false negative dependence. This reordering subsequently allows the
is found when docking bzi to the ensemble of apo receptor construction of non-redundant sets of structures at some
structures, bzi binds favorably and with a binding mode user-de ned cutoff, representing a certain dynamical con-
similar to experiment when using the closed-gate holo guration space.
ensemble (Fig 3d). These promising results suggest that it The application of this method to the receptor ensemble
may be possible to capture different binding propensities of RNA editing ligase 1 resulted in the initial set of 400
depending on both local and global receptor rearrange- structures (extracted every 50 ps from a 20 ns simulation)
ments upon binding. being reduced to 33, with essentially no loss of binding
spectrum information (Fig. 4a). When docking a large set
of ligands, as is required in a virtual-screen type applica-
Effective reduction of the receptor ensemble tion, the reduction of receptor structures for the
computational dockings can make a signi cant difference
In the original RCS, the computational docking experi- in computational cost. For example, for RNA editing
ments were carried out using snapshots extracted at equal ligase, the number of dockings was reduced from 11,200 to
time intervals from the MD trajectories. As the simulations 924, resulting in a 90% reduction of computational cost
are carried out for several nanoseconds, this typically sums [68].
up to *104 to 105 receptor structures, many of which may An alternative method is clustering based on a matrix of
be conformationally redundant. Two recent studies have all pair-wise RMSD of the aligned structures in the
investigated alternative methods to distill the structural receptor ensemble. If the binding region of the receptor is
information to a reduced, yet meaningful set. known a priori the clustering algorithm can focus on this
A novel method to distill the ensemble of structures to a particular subset of residues that constitute the binding site.
non-redundant set is the so-called QR factorization The employed algorithm was originally developed to cap-
method. This technique was originally developed to ture the dominant con gurations of an ensemble of
remove inherent bias in structure databases and distill, from structures for exible peptides [71] and its application has
a vast quantity of redundant information, a minimal basis been further extended to exible molecules [86, 87] and
set of protein structures that accurately spans the evolu- protein surface loops [64]. After removing overall rotation
tionary phase space of a particular protein [82]. It has also and translations, the atoms of the binding region are
superimposed using their Ca atoms coordinates. A matrix
been applied to an ensemble of NMR structures in order to
determine a small, representative subset of structures from that contains all pair-wise RMSD values among all the
a larger experimental dataset [83] and to create non- structures in the trajectory is created. Next, the matrix is
redundant sequence alignments [84]. This technique has divided into batches corresponding to similar structures
most recently been incorporated into the RCS, where a using the RMSD values, with a clustering algorithm [71]
multiple structural alignment of the receptor ensemble is and a user-de ned cutoff. This clustering allows docking
performed with STAMP [85]. This alignment algorithm trials to be performed on a reduced number of signi cant
operates progressively: all possible pair-wise alignments conformations, while retaining the dominant characteristics
are computed, followed by a hierarchical clustering anal- of the entire spectrum of binding modes.
ysis based on a structural similarity measure to build the In the case of the W191G cavity mutant of cytochrome c
multiple structural alignment. The measure of structural peroxidase, re-docking into the entire ensemble of struc-
tures would require docking to 104 snapshots. However,
similarity applied here is QH, which essentially measures
the distance between all pairs of Ca atoms among all when these trajectory snapshots were clustered into groups
aligned structures. Although the development of QH was of similar con gurations with a RMSD similarity criterion
motivated by the need to include gaps in order to build a of 0.1 nm for the cavity residues, the resulting two most
similarity measure for more distantly related proteins, in dominant clusters of trajectory structures represented 36
the case of aligning the receptor ensemble, the gap term is and 16%, 81 and 10%, 48 and 23% of the structures for the
unnecessary as structures of the same protein are aligned. apo, holo, and apo-open ensembles, respectively (Fig. 4b).
The structural alignment is stored in a multidimensional This RMSD clustering resulted in a 99% reduction of
matrix of dimension maln 9 nreceptor structures 9 d, where d computational cost for the RCS docking stage [69].
encodes the rotated Ca atoms coordinates. In this matrix, In addition to improving the computational ef ciency of
each receptor structure is represented in a column and the RCS, clustering analyses can also supply useful infor-
the rows represent the multiple alignment. Finally, a mation about the exibility of the receptor, by analyzing
123
700 J Comput Aided Mol Des (2008) 22:693 705
Fig. 4 Reducing redundancy in the receptor ensemble. (a) Left (not all KREL1 structures are shown). Right panel: the initial (top) set
panel: Multidimensional QR factorization of KREL1 determines the of structures with the corresponding binding spectrum and the
distance relationship among all pairs of proteins (according to RMSD) reduced set (bottom) is shown. The similarity between the full and
and then reorders them based on increasing linear dependence, reduced binding spectrums indicates that there is virtually no loss of
allowing the distillation of a reduced, representative set of structures information. (b) Dominant con gurations of the W191G cavity region
for docking. At any particular QH threshold (indicated by red dotted as extracted from RMSD conformational clustering. For each separate
line at QH 0.86), at each point of intersection of a branch, the most MD ensemble the corresponding reference crystal con guration is
linearly independent structure is chosen from the group to the right of displayed (red thin lines) superimposed on the central member
the dotted line (each red open circle drawn at the branch intersection structures of the rst (yellow licorice) and second (green licorice)
indicates the choice of one structure to represent all structures to the most populated clusters.
right of the node). For clarity, the structure tree shown here is reduced
123
J Comput Aided Mol Des (2008) 22:693 705 701
possible choice of MD ensemble may depend on the
the number of structures representing a certain QH or
amount of knowledge available case by case. Additional
RMSD threshold cutoff (in the QR factorization method) or
scenarios can be considered as well, for example including
the cluster population versus the number of clusters pop-
homology-modeling type of approaches to generate the
ulated (in the RMSD clustering method). This type of
initial receptor structural con guration.
information gives quantitative insight about the local and
global exibility of the receptor. The computational gain
due to these types of clustering schemes seems particularly
Accurate description of solvent contributions
useful when screening large compound databases.
An effective representation of the solvent during the
docking trials is a major factor limiting the accuracy of
Choice of MD receptor ensemble and RCS predictive
docking calculations. Until recently, RCS calculations have
power
been performed using only the receptor structure and
ligand, even when the MD trajectory structures were gen-
One of the major challenges for hybrid docking techniques
erated in combination with explicit water models.
is the possibility to screen large compound databases and
Accounting for the speci c role of conserved water mole-
extract potential binders based on very limited a priori
cules is highly relevant as they may perturb the exibility
knowledge of the binding process itself. In this context, the
of a bound ligand, signi cantly alter the electrostatic
W191G cytochrome c peroxidase system is used as a
environment experienced by a small molecule, and even
platform to investigate how the choice of MD receptor
occlude potential areas of binding. The explicit inclusion of
ensemble for RCS calculations affects the predictive power
these contributions will certainly improve the accuracy of
of this methodology [69]. To re ect the different amounts
the ligand-binding description, similarly to what recently
of knowledge that may be available on the binding process,
reported for protein protein docking [88]. Here, we present
different typical scenarios in drug discovery were consid-
two new applications of the RCS that tested docking of the
ered, including cases in which: (i) no information is
ligands into MD generated receptor structures with and
available on the location of the binding site, (ii) X-ray
without cavity water molecules. What emerges from both
structures of the protein ligand complexes and knowledge
examples is the important (thermo)dynamic role of speci c
on potential binders is not available, and (iii) the X-ray
waters for the binding process.
structures known for the protein ligand complex do not
In the case of the RNA editing ligase, the KREL1 crystal
de ne unique ligand-binding poses. Corresponding to the
structure suggested three buried water molecules in the
above scenarios: (i) the RCS technique using the holo-
deep end of the ATP binding pocket. Explicitly solvated
receptor ensemble nds true positives using a docking grid
MD simulations of the holo complex (i.e., ATP bound)
that encompasses the entire W191G cytochrome c peroxi-
allowed us to predict the dynamics of the crystal water
dase structure; (ii) the number of true positives and true
molecules. The simulations indicated that these water
negatives can be signi cantly increased versus the number
molecules have different exchange rates, and that one of
of false positives and false negatives by employing an
the water molecules in particular, the one directly inter-
MD receptor ensemble containing a generalized type of
acting with both the protein and ATP, persists in its
unphysical ligand that re ects the main structural proper-
original location for the duration of the 20 ns simulation
ties of the compounds in the database, when compared to
[63]. In terms of the receptor structure, this single coordi-
equivalent docking calculations performed on the apo MD
nated water molecule acts as a structural scaffold that
receptor ensemble; and (iii) the multiple binding orienta-
prevents the localized collapse of surrounding residues
tions characterizing the true positives do not necessarily
while interacting with the bound ATP. By extracting the
correspond to non-accurate docking results when compared
water molecules from the structure before ligand docking,
to the raw electron density data from X-ray crystallogra-
an additional small cavity was open to the ligand. The RCS
phy. The quality of the binding poses can be judged by a
dockings were performed both with and without the three
combined evaluation of (i) the distribution of the docked-
conserved water molecules. Interestingly, the best inhibi-
ligand cluster populations versus the cluster number, (ii)
tors were identi ed when the water molecules were not
RMSD from the corresponding experimental complex after
included in the RCS dockings. The predicted docking poses
superimposing the structures as described above, and (iii)
and binding af nities differ signi cantly depending on
the comparison of the different poses for a same ligand.
whether the three explicit conserved water molecules are
These results open new possibilities for enhancing the
included in the dockings or not (data not shown). Impor-
predictive power of RCS calculations in so-called blind
tantly, at least two of the experimentally veri ed inhibitors
test cases (when information is missing concerning the
were predicted to substitute a functional group into the
binding process), and they also suggest that the best
123
702 J Comput Aided Mol Des (2008) 22:693 705
location where one of the crystal water molecules was static dockings may be an arti cial consequence of the
located (Fig. 5a) [68]. introduction of bound (static) water molecules in the cav-
In the W191G cytochrome c peroxidase application, ity, whereas locally disordered (dynamically swapping
several docking calculations were performed to investigate among the favorable sites) water molecules should be
the in uence of the crystallographic water sites on Auto- considered instead. We note that an accurate sampling of
Dock binding af nities and ligand poses. The tests were receptor, ligand, and solvent phase spaces is, in principle,
performed with rigid-protein docking calculations on dif- reached by more expensive free energy calculations [11,
ferent receptor models, alternatively including or excluding 26 36].
water site 308, which was suggested to be a highly con-
served location for a bound water based on X-ray structures
for a large group of compounds [81]. The results from the Future methodological improvements
docking calculations are in agreement with what was pre-
viously suggested by similar tests based on AutoDock The development of computational tools for computer-
version 3.0, which showed that the introduction of even a aided drug design depends on the critical compromise
single water molecule can signi cantly perturb the binding between accuracy and computational costs. Ideally, the
propensity of most of the ligands [89]. most reliable prediction of molecular af nity can be
While the effect on the predicted binding af nities and obtained through rigorous free energy calculations of the
poses is at variance with the speci c ligand, a systematic ligand-binding process [11, 26 36]. In practice, however,
consequence of the introduction of one water molecule into the CPU time typically required to perform such free
the W191G cavity is the signi cant reduction of the con- energy calculations on a few candidates (bottom of the
gurational space available to the ligand during the funnel diagram in Fig. 1) is comparable to that involved in
docking trials. MD simulations of the W191G cavity pre- rough geometrical recognition over a pool of molecules
dict the location of the highly favorable water sites (within more than ve orders of magnitude larger (top of the funnel
X-ray crystallography resolution and re nement assump- diagram in Fig. 1). Although the theory and methods are
tions) compared with the available experimental data well established for calculating free energy in practice,
(Fig. 5b). Additionally, MD simulations reveal a larger they are still prohibitively expensive to be employed in
number of favorable water sites, and allow the description high-throughput screening of drug-like databases. The
of the dynamic behavior of the solvent, including the future development of hybrid techniques, and especially
swapping of water molecules between highly favorable the RCS, is therefore twofold.
regions [64]. Based on these observations we suggest that First, it will certainly involve the re nement of the
the signi cant effects of including explicit water in the underlying physical models describing ligand-binding
Fig. 5 Solvent contributions in protein ligand binding. (a) The inhibitor replaces the location of a crystal water molecule. (b) For the
KREL1 active site with one of the newly discovered inhibitors in a W191G cavity (gray surface), the crystallographic water sites (solid
predicted docked conformation. KREL1 is shown in orange cartoon, red spheres; diameter corresponding to X-ray resolution) are
with the novel inhibitor shown docked in the active site (licorice, compared to the highly favorable average density regions of water
atom type colors). The three crystallographic water sites (not included molecules in the MD simulations (blue wireframe isosurfaces), for the
in the docking calculation) are shown in licorice with their van der best binder 2a5mt (yellow licorice) and from the 1AEN crystal
Waals surface in transparent. Note that the sulfonic acid group of the structure (red licorice)
123
J Comput Aided Mol Des (2008) 22:693 705 703
that reliably and ef ciently accommodates receptor exi-
(thermo)dynamics in increased detail, especially during the
bility is still lacking. The extensions and methodological
docking stage. Although the results presented here include
improvements to the RCS presented here take important
the unbound (gas phase) ligand enthalpy term at the
steps toward offering such a streamlined procedure. Our
docking stage [66], a complete description of the thermo-
examples indicate that alternative choices of receptor
dynamic cycle of binding is still far from being explicitly
ensembles can signi cantly alter the predictive power of
treated in RCS. Previous studies investigated the bene ts
RCS calculations, and that it is possible to reduce the
of rescoring the docked complexes using a more accurate
receptor ensemble to a non-redundant set of con gurations
(implicit solvent) description of the solvent contributions
by various techniques without losing relevant binding
[54]. More recently, the role of ligand entropy in the
information. Furthermore, both example systems indicate
re nement of protein ligand docking predictions has been
that the role of explicit water molecules in molecular
evaluated [90, 91]. Additionally, accurate con gurational
association remains one of the key components of com-
entropy calculations from MD simulations and a complete
puter-aided drug design methods to be further investigated.
quasi-harmonic analysis have demonstrated that the ther-