Vol. ** no. ** ****, pages **** ****
BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btr168
Structural bioinformatics Advance Access publication April 5, 2011
ProDy : Protein Dynamics Inferred from Theory and Experiments
Ahmet Bakan, Lidio M. Meireles and Ivet Bahar
Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of
Medicine, University of Pittsburgh, 3064 BST3, 3501 Fifth Ave, Pittsburgh, PA 15213, USA
Associate Editor: Anna Tramontano
analysis of structural variability in these ensembles could open
ABSTRACT
the way to gain insights into rearrangements selected/stabilized in
Summary: We developed a Python package, ProDy, for structure-
different functional states (Bahar et al., 2007, 2010), or into the
based analysis of protein dynamics. ProDy allows for quantitative
structure-encoded dynamic features shared by protein family or
characterization of structural variations in heterogeneous datasets
subfamily members (Marcos et al., 2010; Raimondi et al., 2010;
of structures experimentally resolved for a given biomolecular
Velazquez-Muriel et al., 2009). The lack of software for performing
system, and for comparison of these variations with the
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on January 25, 2013
such operations is primarily due to the non-uniform content of
theoretically predicted equilibrium dynamics. Datasets include
structural datasets such as sequence variations at particular regions,
structural ensembles for a given family or subfamily of proteins,
including missing or substituted residues, short segments or loops.
their mutants and sequence homologues, in the presence/absence
We developed ProDy to analyze and retrieve biologically signi cant
of their substrates, ligands or inhibitors. Numerous helper functions
information from such heterogeneous structural datasets. ProDy
enable comparative analysis of experimental and theoretical data,
delivers information on the structural variability of target systems
and visualization of the principal changes in conformations that
and allows for systematic comparison with the intrinsic dynamics
are accessible in different functional states. ProDy application
predicted by theoretical models and methods, thus helping gain
programming interface (API) has been designed so that users can
insight into the relation between structure, dynamics and function.
easily extend the software and implement new methods.
Availability: ProDy is open source and freely available under GNU
General Public License from http://www.csb.pitt.edu/ProDy/.
2 DESCRIPTION AND FUNCTIONALITY
Contact: *****@****.***; *****@****.***
2.1 Input for ProDy
Received on December 26, 2010; revised on March 9, 2011;
accepted on March 27, 2011 The input for ProDy is the set of atomic coordinates in PDB format
for the protein of interest, or simply the PDB id or sequence
of the protein. Given a query protein, fast and exible ProDy
1 INTRODUCTION
parsers are used to Blast search the PDB, retrieve the corresponding
Protein dynamics plays a key role in a wide range of molecular les (e.g. mutants, complexes or sequence homologs with user-
events in the cell, including substrate/ligand recognition, binding, de ned minimal sequence identity) from the PDB FTP server
allosteric signaling and transport. For a number of well-studied and extract their coordinates and other relevant data. Additionally,
proteins, the Protein Data Bank (PDB) hosts multiple high- the program can be used to analyze a series of conformers from
resolution structures. Typical examples are drug targets resolved in molecular dynamics (MD) trajectories inputted in PDB le format or
the presence of different inhibitors. These ensembles of structures programmatically through Python NumPy arrays. More information
convey information on the structural changes that are physically on the input format is given at the ProDy web site tutorial and
accessible to the protein, and the delineation of these structural examples.
variations provides insights into structural mechanisms of biological
activity (Bakan and Bahar, 2009; Yang et al., 2008).
2.2 Protein dynamics from experiments
Existing computational tools and servers for characterizing
protein dynamics are suitable for single structures [e.g. Anisotropic The experimental data refer to ensembles of structures, X-ray
Network Model (ANM) server (Eyal et al., 2006), elN mo (Suhre crystallographic or NMR. These are usually heterogeneous datasets,
and Sanejouand, 2004), FlexServ (Camps et al., 2009)], pairs of in the sense that they have disparate coordinate data arising
structures [e.g. open and closed forms of enzymes; MolMovDB from sequence dissimilarities, insertions/deletions or missing data
(Gerstein and Krebs, 1998)], or nucleic magnetic resonance (NMR) due to unresolved disordered regions. In ProDy, we implemented
models [e.g. PCA_NEST (Yang et al., 2009)]. Tools for systematic algorithms for optimal alignment of such heterogeneous datasets and
retrieval and analyses of ensembles of structures are not publicly building corresponding covariance matrices. Covariance matrices
accessible. Ensembles include X-ray structures for a given protein describe the mean-square deviations in atomic coordinates from their
and its complexes; families and subfamilies of proteins that belong mean position (diagonal elements) or the correlations between their
to particular structural folds; or a protein and its mutants resolved pairwise uctuations (off-diagonal elements). The principal modes
in the presence of different inhibitors, ligands or substrates. The of structural variation are determined upon principal component
analysis (PCA) of the covariance matrix, as described previously
To whom correspondence should be addressed. (Bakan and Bahar, 2009).
The Author(s) 2011. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
[16:12 12/5/2011 Bioinformatics-btr168.tex] Page: 157*-****-****
A.Bakan et al.
mode of structural variation (PC1; violet arrows) based exclusively
on experimental structural dataset for p38.
As to generating computational data, two approaches are taken
in ProDy: NMA of a representative structure using its ANM
representation (Figure 1B; color-coded such that red/blue regions
refer to largest/smallest conformational mobilities); and EDA of MD
trajectories provided that an ensemble of snapshots is provided by
the user. The green arrows in Figure 1C describe the rst (lowest
frequency, most collective) mode predicted by the ANM, shortly
designated as ANM1. The heatmap in Figure 1D shows the overlap
(Marques and Sanejouand, 1995) between top-ranking PCA and
ANM modes. The cumulative overlap between the top three pairs of
modes is 0.73.
An important aspect of ProDy is the sampling of a representative
set of conformers consistent with experiments a feature expected
to nd wide utility in exible docking and structure re nement.
Figure 1E displays the conformational space sampled by
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on January 25, 2013
experimental structures (blue dots), projected onto the subspace
spanned by the top three PCA directions, which accounts for 59% of
the experimentally observed structural variance. The conformations
generated using the softest modes ANM1-ANM3 predicted to be
intrinsically accessible to p38 in the apo form, are shown by the
Fig. 1. Comparative analysis of p38 dynamics from experiments (PCA) red dots. The sizes of the motions along these modes obey a
and theory (ANM). (A) Overlay of 150 p38 X-ray structures using ProDy.
Gaussian distribution with variance scaling with the inverse square
An inhibitor is shown in space- lling representation. (B) Network model
root of the corresponding eigenvalues. ANM conformers cover a
(ANM) representation of p38 (generated using NMWiz and VMD). (C)
subspace (green ellipsoidal envelope) that encloses all experimental
Comparison of the principal mode PC1 (from experiments; violet arrows)
structures. Detailed information on how to generate such plots and
and the softest mode ANM1 from theory (green arrows) and (D) overlap of
gures using ProDy is given in the online documentation, along with
the top ve modes. (E) Distribution of X-ray structures (blue) and ANM-
several examples of downloadable scripts.
generated conformers (red) in the subspace spanned by PC1-3. The green
ellipsoid is an analytical solution predicted by the ANM.
2.5 Graphical interface
2.3 Protein dynamics from theory and simulations We have designed a graphical interface, NMWiz, to enable users
We have implemented classes for Gaussian network model (GNM) to perform ANM and PCA calculations from within a molecular
analysis and for normal mode analysis (NMA) of a given structure visualization program. NMWiz is designed as a VMD (Humphrey
using the ANM (Eyal et al., 2006). Both models have been widely et al., 1996) plugin, and is distributed within the ProDy installation
used in recent years for analyzing and visualizing biomolecular package. It is used to do calculations for molecules loaded into
systems dynamics (Bahar et al., 2010). The implementation is VMD; and results are visualized on the y. The plug-in allows for
generic and exible. The user can (i) build the models for any set depicting color-coded network models and normal mode directions
of atoms, e.g. the substrate or inhibitor can be explicitly included to (Fig. 1B and C), displaying animations of various PCA and ANM
study the perturbing effect of binding on dynamics, and (ii) utilize modes, generating trajectories, and plotting square uctuations.
user-de ned or built-in distance-dependent or residue-speci c force
constants (Hinsen et al., 2000; Kovacs et al., 2004). ProDy also 2.6 Supporting features
offers the option to perform essential dynamics analysis (EDA;
ProDy comes with a growing library of functions to facilitate
Amadei et al., 1993) of MD snapshots, which is equivalent to
comparative analysis. Examples are functions to calculate, print
the singular value decomposition of trajectories to extract principal
and plot the overlaps between experiment, theory and computations
variations (Velazquez-Muriel et al., 2009).
(Fig. 1D) or to view the spatial dispersion of conformers (Fig. 1E).
For rapid and exible analysis of large numbers of PDB structures,
2.4 Dynamics analysis example we designed a fast PDB parser. The parser can handle alternate
Figure 1 illustrates the outputs generated by ProDy in a comparative locations and multiple models, and read speci ed chains or atom
analysis of experimental and computational data for p38 kinase subsets selected by the user. We evaluated the performance of ProDy
(Bakan and Bahar, 2011). Figure 1A displays the dataset of 150 X- relative to Biopython PDB module (Hamelryck and Manderick,
ray crystallographically resolved p38 structures retrieved from the 2003) using 4701 PDB structures listed in the PDB SELECT dataset
PDB and optimally overlaid by ProDy. The ensemble contains the (Hobohm and Sander, 1994): we timed parsers for reading the PDB
les and returning C -coordinates to the user (see documentation).
apo and inhibitor-bound forms of p38, thus providing information
on the conformational space sampled by p38 upon inhibitor binding. The Python standard Biopython PDB parser evaluated the dataset in
Parsing structures, building and diagonalizing the covariance matrix 52 min; and ProDy in 11 min. In addition, we implemented an atom
to determine the principal modes of structural variations takes only selector using Pyparsing module for rapid access to subsets of atoms
38 s on Intel CPU at 3.20 GHz. Figure 1C illustrate the rst principal in PDB les. This feature reduces the user programming effort to
1576
[16:12 12/5/2011 Bioinformatics-btr168.tex] Page: 157*-****-****
ProDy
access any set of atoms down to a single line of code from several and implement new methods and ideas, thus lowering the technical
lines composed of nested loops and comparisons required with the barriers to apply such methods in more complex computational
current Python packages for handling PDB data. The implementation analyses.
of atom selections follows that in VMD. Full list of selection
Funding: National Institutes of Health (1R01GM086238-01 to I.B.
keywords and usage examples are provided in the documentation.
and UL1 RR024153 to A.B.).
ProDy also offers functions for structural alignment and comparison
of multiple chains. Con ict of Interest : none declared.
3 DISCUSSION REFERENCES
Several web servers have been developed for characterizing protein Amadei,A. et al. (1993) Essential dynamics of proteins. Proteins, 17, 412 425.
dynamics, including elN mo (Suhre and Sanejouand, 2004), ANM Bahar,I. et al. (2007) Intrinsic dynamics of enzymes in the unbound state and relation
to allosteric regulation. Curr. Opin. Struct. Biol., 17, 633 640.
(Eyal et al., 2006) and FlexServ (Camps et al., 2009). These servers
Bahar,I. et al. (2010) Normal mode analysis of biomolecular structures: functional
perform coarse-grained ENM based NMA calculations, and as such mechanisms of membrane proteins. Chem. Rev., 110, 1463 1497.
are useful for elucidating structure-encoded dynamics of proteins. Bakan,A. and Bahar,I. (2011) Computational generation of inhibitor-bound conformers
FlexServ also offers the option to use distance-dependent force of p38 MAP kinase and comparison with experiments. Pac. Symp. Biocomput., 16,
181 192.
constants (Kovacs et al., 2004), in addition to protocols for coarse-
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on January 25, 2013
Bakan,A. and Bahar,I. (2009) The intrinsic dynamics of enzymes plays a dominant role
grained generation and PCA of trajectories. ProDy differs from
in determining the structural changes induced upon inhibitor binding. Proc. Natl
these as it allows for systematic retrieval and comparative analysis Acad. Sci. USA, 106, 143**-*****.
of ensembles of heterogeneous structural datasets. Such datasets Camps,J. et al. (2009) FlexServ: an integrated tool for the analysis of protein exibility.
includes structural data collected for members of a protein family in Bioinformatics, 25, 1709 1710.
Eyal,E. et al. (2006) Anisotropic network model: systematic evaluation and a new web
complex with different substrates/inhibitors. ProDy further allows
interface. Bioinformatics, 22, 2619 2627.
for the quantitative comparison of the results from experimental Gerstein,M. and Krebs,W. (1998) A database of macromolecular motions. Nucleic Acids
datasets with theoretically predicted conformational dynamics. In Res., 26, 4280 4290.
addition, ProDy offers the following advantages: (i) it is extensible, Hamelryck,T. and Manderick,B. (2003) PDB le parser and structure class implemented
in Python. Bioinformatics, 19, 2308 2310.
interoperable and suitable for use as a toolkit for developing new
Hinsen,K. et al. (2000) Harmonicity in slow protein dynamics. Chem. Phys., 261, 25 37.
software; (ii) it provides scripts for automated tasks and batch
Hobohm,U. and Sander,C. (1994) Enlarged representative set of protein structures.
analyses of large datasets; (iii) it has a exible API suitable for testing Protein Sci., 3, 522 524.
new methods and hypotheses, and benchmarking them against Humphrey,W. et al. (1996) VMD: visual molecular dynamics. J. Mol. Graph., 14, 33 38.
existing methods with minimal effort and without the need to modify Kovacs,J.A. et al. (2004) Predictions of protein exibility: rst-order measures.
Proteins, 56, 661 668.
the source code; (iv) it allows for producing publication quality
Lezon,T.R. and Bahar,I. (2010) Using entropy maximization to understand the
gures when used with Python plotting library Matplotlib; and (v) it determinants of structural dynamics beyond native contact topology. PLoS. Comput.
provides the option to input user-de ned distance-dependent force Biol., 6, e1000816.
function or utilize elaborate classes that return force constants based Marcos,E. et al. (2010) On the conservation of the slow conformational dynamics
within the amino acid kinase family: NAGK the paradigm. PLoS Comput. Biol., 6,
on the type and properties of interacting residues [e.g. based on
e1000738.
their secondary structure or sequence separation (Lezon and Bahar,
Marques,O. and Sanejouand,Y.H. (1995) Hinge-bending motion in citrate synthase
2010)]. arising from normal mode calculations. Proteins, 23, 557 560.
Raimondi,F. et al. (2010) Deciphering the deformation modes associated with function
retention and specialization in members of the Ras superfamily. Structure., 18,
4 CONCLUSION 402 414.
Suhre,K. and Sanejouand,Y.H. (2004) ElN mo: a normal mode web server for protein
ProDy is a free, versatile, easy-to-use and powerful tool for inferring movement analysis and the generation of templates for molecular replacement.
protein dynamics from both experiments (i.e. PCA of ensembles of Nucleic Acids Res., 32, W610 W614.
structures) and theory (i.e. GNM, ANM and EDA of MD snapshots). Velazquez-Muriel,J.A. et al. (2009) Comparison of molecular dynamics and superfamily
spaces of protein domain deformation. BMC Struct. Biol., 9, 6.
ProDy complements existing tools by allowing the systematic
Yang,L. et al. (2008) Close correspondence between the motions from principal
retrieval and analysis of heterogeneous experimental datasets,
component analysis of multiple HIV-1 protease structures and elastic network
leveraging on the wealth of structural data deposited in the PDB to modes. Structure, 16, 321 330.
gain insights into structure-encoded dynamics. In addition, ProDy Yang,L.W. et al. (2009) Principal component analysis of native ensembles
allows for comparison of the results from experimental datasets with of biomolecular structures (PCA_NEST): insights into functional dynamics.
Bioinformatics, 25, 606 614.
theoretically predicted conformational dynamics. Finally, through
a exible Python-based API, ProDy can be used to quickly test
1577
[16:12 12/5/2011 Bioinformatics-btr168.tex] Page: 157*-****-****