Post Job Free
Sign in

Data Structural

Location:
Pittsburgh, PA
Posted:
January 25, 2013

Contact this candidate

Resume:

Vol. ** no. ** ****, pages **** ****

BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btr168

Structural bioinformatics Advance Access publication April 5, 2011

ProDy : Protein Dynamics Inferred from Theory and Experiments

Ahmet Bakan, Lidio M. Meireles and Ivet Bahar

Department of Computational and Systems Biology, and Clinical & Translational Science Institute, School of

Medicine, University of Pittsburgh, 3064 BST3, 3501 Fifth Ave, Pittsburgh, PA 15213, USA

Associate Editor: Anna Tramontano

analysis of structural variability in these ensembles could open

ABSTRACT

the way to gain insights into rearrangements selected/stabilized in

Summary: We developed a Python package, ProDy, for structure-

different functional states (Bahar et al., 2007, 2010), or into the

based analysis of protein dynamics. ProDy allows for quantitative

structure-encoded dynamic features shared by protein family or

characterization of structural variations in heterogeneous datasets

subfamily members (Marcos et al., 2010; Raimondi et al., 2010;

of structures experimentally resolved for a given biomolecular

Velazquez-Muriel et al., 2009). The lack of software for performing

system, and for comparison of these variations with the

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on January 25, 2013

such operations is primarily due to the non-uniform content of

theoretically predicted equilibrium dynamics. Datasets include

structural datasets such as sequence variations at particular regions,

structural ensembles for a given family or subfamily of proteins,

including missing or substituted residues, short segments or loops.

their mutants and sequence homologues, in the presence/absence

We developed ProDy to analyze and retrieve biologically signi cant

of their substrates, ligands or inhibitors. Numerous helper functions

information from such heterogeneous structural datasets. ProDy

enable comparative analysis of experimental and theoretical data,

delivers information on the structural variability of target systems

and visualization of the principal changes in conformations that

and allows for systematic comparison with the intrinsic dynamics

are accessible in different functional states. ProDy application

predicted by theoretical models and methods, thus helping gain

programming interface (API) has been designed so that users can

insight into the relation between structure, dynamics and function.

easily extend the software and implement new methods.

Availability: ProDy is open source and freely available under GNU

General Public License from http://www.csb.pitt.edu/ProDy/.

2 DESCRIPTION AND FUNCTIONALITY

Contact: *****@****.***; *****@****.***

2.1 Input for ProDy

Received on December 26, 2010; revised on March 9, 2011;

accepted on March 27, 2011 The input for ProDy is the set of atomic coordinates in PDB format

for the protein of interest, or simply the PDB id or sequence

of the protein. Given a query protein, fast and exible ProDy

1 INTRODUCTION

parsers are used to Blast search the PDB, retrieve the corresponding

Protein dynamics plays a key role in a wide range of molecular les (e.g. mutants, complexes or sequence homologs with user-

events in the cell, including substrate/ligand recognition, binding, de ned minimal sequence identity) from the PDB FTP server

allosteric signaling and transport. For a number of well-studied and extract their coordinates and other relevant data. Additionally,

proteins, the Protein Data Bank (PDB) hosts multiple high- the program can be used to analyze a series of conformers from

resolution structures. Typical examples are drug targets resolved in molecular dynamics (MD) trajectories inputted in PDB le format or

the presence of different inhibitors. These ensembles of structures programmatically through Python NumPy arrays. More information

convey information on the structural changes that are physically on the input format is given at the ProDy web site tutorial and

accessible to the protein, and the delineation of these structural examples.

variations provides insights into structural mechanisms of biological

activity (Bakan and Bahar, 2009; Yang et al., 2008).

2.2 Protein dynamics from experiments

Existing computational tools and servers for characterizing

protein dynamics are suitable for single structures [e.g. Anisotropic The experimental data refer to ensembles of structures, X-ray

Network Model (ANM) server (Eyal et al., 2006), elN mo (Suhre crystallographic or NMR. These are usually heterogeneous datasets,

and Sanejouand, 2004), FlexServ (Camps et al., 2009)], pairs of in the sense that they have disparate coordinate data arising

structures [e.g. open and closed forms of enzymes; MolMovDB from sequence dissimilarities, insertions/deletions or missing data

(Gerstein and Krebs, 1998)], or nucleic magnetic resonance (NMR) due to unresolved disordered regions. In ProDy, we implemented

models [e.g. PCA_NEST (Yang et al., 2009)]. Tools for systematic algorithms for optimal alignment of such heterogeneous datasets and

retrieval and analyses of ensembles of structures are not publicly building corresponding covariance matrices. Covariance matrices

accessible. Ensembles include X-ray structures for a given protein describe the mean-square deviations in atomic coordinates from their

and its complexes; families and subfamilies of proteins that belong mean position (diagonal elements) or the correlations between their

to particular structural folds; or a protein and its mutants resolved pairwise uctuations (off-diagonal elements). The principal modes

in the presence of different inhibitors, ligands or substrates. The of structural variation are determined upon principal component

analysis (PCA) of the covariance matrix, as described previously

To whom correspondence should be addressed. (Bakan and Bahar, 2009).

The Author(s) 2011. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/

by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

[16:12 12/5/2011 Bioinformatics-btr168.tex] Page: 157*-****-****

A.Bakan et al.

mode of structural variation (PC1; violet arrows) based exclusively

on experimental structural dataset for p38.

As to generating computational data, two approaches are taken

in ProDy: NMA of a representative structure using its ANM

representation (Figure 1B; color-coded such that red/blue regions

refer to largest/smallest conformational mobilities); and EDA of MD

trajectories provided that an ensemble of snapshots is provided by

the user. The green arrows in Figure 1C describe the rst (lowest

frequency, most collective) mode predicted by the ANM, shortly

designated as ANM1. The heatmap in Figure 1D shows the overlap

(Marques and Sanejouand, 1995) between top-ranking PCA and

ANM modes. The cumulative overlap between the top three pairs of

modes is 0.73.

An important aspect of ProDy is the sampling of a representative

set of conformers consistent with experiments a feature expected

to nd wide utility in exible docking and structure re nement.

Figure 1E displays the conformational space sampled by

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on January 25, 2013

experimental structures (blue dots), projected onto the subspace

spanned by the top three PCA directions, which accounts for 59% of

the experimentally observed structural variance. The conformations

generated using the softest modes ANM1-ANM3 predicted to be

intrinsically accessible to p38 in the apo form, are shown by the

Fig. 1. Comparative analysis of p38 dynamics from experiments (PCA) red dots. The sizes of the motions along these modes obey a

and theory (ANM). (A) Overlay of 150 p38 X-ray structures using ProDy.

Gaussian distribution with variance scaling with the inverse square

An inhibitor is shown in space- lling representation. (B) Network model

root of the corresponding eigenvalues. ANM conformers cover a

(ANM) representation of p38 (generated using NMWiz and VMD). (C)

subspace (green ellipsoidal envelope) that encloses all experimental

Comparison of the principal mode PC1 (from experiments; violet arrows)

structures. Detailed information on how to generate such plots and

and the softest mode ANM1 from theory (green arrows) and (D) overlap of

gures using ProDy is given in the online documentation, along with

the top ve modes. (E) Distribution of X-ray structures (blue) and ANM-

several examples of downloadable scripts.

generated conformers (red) in the subspace spanned by PC1-3. The green

ellipsoid is an analytical solution predicted by the ANM.

2.5 Graphical interface

2.3 Protein dynamics from theory and simulations We have designed a graphical interface, NMWiz, to enable users

We have implemented classes for Gaussian network model (GNM) to perform ANM and PCA calculations from within a molecular

analysis and for normal mode analysis (NMA) of a given structure visualization program. NMWiz is designed as a VMD (Humphrey

using the ANM (Eyal et al., 2006). Both models have been widely et al., 1996) plugin, and is distributed within the ProDy installation

used in recent years for analyzing and visualizing biomolecular package. It is used to do calculations for molecules loaded into

systems dynamics (Bahar et al., 2010). The implementation is VMD; and results are visualized on the y. The plug-in allows for

generic and exible. The user can (i) build the models for any set depicting color-coded network models and normal mode directions

of atoms, e.g. the substrate or inhibitor can be explicitly included to (Fig. 1B and C), displaying animations of various PCA and ANM

study the perturbing effect of binding on dynamics, and (ii) utilize modes, generating trajectories, and plotting square uctuations.

user-de ned or built-in distance-dependent or residue-speci c force

constants (Hinsen et al., 2000; Kovacs et al., 2004). ProDy also 2.6 Supporting features

offers the option to perform essential dynamics analysis (EDA;

ProDy comes with a growing library of functions to facilitate

Amadei et al., 1993) of MD snapshots, which is equivalent to

comparative analysis. Examples are functions to calculate, print

the singular value decomposition of trajectories to extract principal

and plot the overlaps between experiment, theory and computations

variations (Velazquez-Muriel et al., 2009).

(Fig. 1D) or to view the spatial dispersion of conformers (Fig. 1E).

For rapid and exible analysis of large numbers of PDB structures,

2.4 Dynamics analysis example we designed a fast PDB parser. The parser can handle alternate

Figure 1 illustrates the outputs generated by ProDy in a comparative locations and multiple models, and read speci ed chains or atom

analysis of experimental and computational data for p38 kinase subsets selected by the user. We evaluated the performance of ProDy

(Bakan and Bahar, 2011). Figure 1A displays the dataset of 150 X- relative to Biopython PDB module (Hamelryck and Manderick,

ray crystallographically resolved p38 structures retrieved from the 2003) using 4701 PDB structures listed in the PDB SELECT dataset

PDB and optimally overlaid by ProDy. The ensemble contains the (Hobohm and Sander, 1994): we timed parsers for reading the PDB

les and returning C -coordinates to the user (see documentation).

apo and inhibitor-bound forms of p38, thus providing information

on the conformational space sampled by p38 upon inhibitor binding. The Python standard Biopython PDB parser evaluated the dataset in

Parsing structures, building and diagonalizing the covariance matrix 52 min; and ProDy in 11 min. In addition, we implemented an atom

to determine the principal modes of structural variations takes only selector using Pyparsing module for rapid access to subsets of atoms

38 s on Intel CPU at 3.20 GHz. Figure 1C illustrate the rst principal in PDB les. This feature reduces the user programming effort to

1576

[16:12 12/5/2011 Bioinformatics-btr168.tex] Page: 157*-****-****

ProDy

access any set of atoms down to a single line of code from several and implement new methods and ideas, thus lowering the technical

lines composed of nested loops and comparisons required with the barriers to apply such methods in more complex computational

current Python packages for handling PDB data. The implementation analyses.

of atom selections follows that in VMD. Full list of selection

Funding: National Institutes of Health (1R01GM086238-01 to I.B.

keywords and usage examples are provided in the documentation.

and UL1 RR024153 to A.B.).

ProDy also offers functions for structural alignment and comparison

of multiple chains. Con ict of Interest : none declared.

3 DISCUSSION REFERENCES

Several web servers have been developed for characterizing protein Amadei,A. et al. (1993) Essential dynamics of proteins. Proteins, 17, 412 425.

dynamics, including elN mo (Suhre and Sanejouand, 2004), ANM Bahar,I. et al. (2007) Intrinsic dynamics of enzymes in the unbound state and relation

to allosteric regulation. Curr. Opin. Struct. Biol., 17, 633 640.

(Eyal et al., 2006) and FlexServ (Camps et al., 2009). These servers

Bahar,I. et al. (2010) Normal mode analysis of biomolecular structures: functional

perform coarse-grained ENM based NMA calculations, and as such mechanisms of membrane proteins. Chem. Rev., 110, 1463 1497.

are useful for elucidating structure-encoded dynamics of proteins. Bakan,A. and Bahar,I. (2011) Computational generation of inhibitor-bound conformers

FlexServ also offers the option to use distance-dependent force of p38 MAP kinase and comparison with experiments. Pac. Symp. Biocomput., 16,

181 192.

constants (Kovacs et al., 2004), in addition to protocols for coarse-

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on January 25, 2013

Bakan,A. and Bahar,I. (2009) The intrinsic dynamics of enzymes plays a dominant role

grained generation and PCA of trajectories. ProDy differs from

in determining the structural changes induced upon inhibitor binding. Proc. Natl

these as it allows for systematic retrieval and comparative analysis Acad. Sci. USA, 106, 143**-*****.

of ensembles of heterogeneous structural datasets. Such datasets Camps,J. et al. (2009) FlexServ: an integrated tool for the analysis of protein exibility.

includes structural data collected for members of a protein family in Bioinformatics, 25, 1709 1710.

Eyal,E. et al. (2006) Anisotropic network model: systematic evaluation and a new web

complex with different substrates/inhibitors. ProDy further allows

interface. Bioinformatics, 22, 2619 2627.

for the quantitative comparison of the results from experimental Gerstein,M. and Krebs,W. (1998) A database of macromolecular motions. Nucleic Acids

datasets with theoretically predicted conformational dynamics. In Res., 26, 4280 4290.

addition, ProDy offers the following advantages: (i) it is extensible, Hamelryck,T. and Manderick,B. (2003) PDB le parser and structure class implemented

in Python. Bioinformatics, 19, 2308 2310.

interoperable and suitable for use as a toolkit for developing new

Hinsen,K. et al. (2000) Harmonicity in slow protein dynamics. Chem. Phys., 261, 25 37.

software; (ii) it provides scripts for automated tasks and batch

Hobohm,U. and Sander,C. (1994) Enlarged representative set of protein structures.

analyses of large datasets; (iii) it has a exible API suitable for testing Protein Sci., 3, 522 524.

new methods and hypotheses, and benchmarking them against Humphrey,W. et al. (1996) VMD: visual molecular dynamics. J. Mol. Graph., 14, 33 38.

existing methods with minimal effort and without the need to modify Kovacs,J.A. et al. (2004) Predictions of protein exibility: rst-order measures.

Proteins, 56, 661 668.

the source code; (iv) it allows for producing publication quality

Lezon,T.R. and Bahar,I. (2010) Using entropy maximization to understand the

gures when used with Python plotting library Matplotlib; and (v) it determinants of structural dynamics beyond native contact topology. PLoS. Comput.

provides the option to input user-de ned distance-dependent force Biol., 6, e1000816.

function or utilize elaborate classes that return force constants based Marcos,E. et al. (2010) On the conservation of the slow conformational dynamics

within the amino acid kinase family: NAGK the paradigm. PLoS Comput. Biol., 6,

on the type and properties of interacting residues [e.g. based on

e1000738.

their secondary structure or sequence separation (Lezon and Bahar,

Marques,O. and Sanejouand,Y.H. (1995) Hinge-bending motion in citrate synthase

2010)]. arising from normal mode calculations. Proteins, 23, 557 560.

Raimondi,F. et al. (2010) Deciphering the deformation modes associated with function

retention and specialization in members of the Ras superfamily. Structure., 18,

4 CONCLUSION 402 414.

Suhre,K. and Sanejouand,Y.H. (2004) ElN mo: a normal mode web server for protein

ProDy is a free, versatile, easy-to-use and powerful tool for inferring movement analysis and the generation of templates for molecular replacement.

protein dynamics from both experiments (i.e. PCA of ensembles of Nucleic Acids Res., 32, W610 W614.

structures) and theory (i.e. GNM, ANM and EDA of MD snapshots). Velazquez-Muriel,J.A. et al. (2009) Comparison of molecular dynamics and superfamily

spaces of protein domain deformation. BMC Struct. Biol., 9, 6.

ProDy complements existing tools by allowing the systematic

Yang,L. et al. (2008) Close correspondence between the motions from principal

retrieval and analysis of heterogeneous experimental datasets,

component analysis of multiple HIV-1 protease structures and elastic network

leveraging on the wealth of structural data deposited in the PDB to modes. Structure, 16, 321 330.

gain insights into structure-encoded dynamics. In addition, ProDy Yang,L.W. et al. (2009) Principal component analysis of native ensembles

allows for comparison of the results from experimental datasets with of biomolecular structures (PCA_NEST): insights into functional dynamics.

Bioinformatics, 25, 606 614.

theoretically predicted conformational dynamics. Finally, through

a exible Python-based API, ProDy can be used to quickly test

1577

[16:12 12/5/2011 Bioinformatics-btr168.tex] Page: 157*-****-****



Contact this candidate