Training It

Location:

China

Posted:

November 21, 2012

Contact this candidate

Resume:

Amino Acids (****) **:*** ***

DOI **.****/s*****-009-0245-8

ORIGINAL ARTICLE

Identifying protein protein interaction sites in transient

complexes with temperature factor, sequence pro le

and accessible surface area

Rong Liu Wenchao Jiang Yanhong Zhou

Received: 5 October 2008 / Accepted: 21 January 2009 / Published online: 12 February 2009

Springer-Verlag 2009

Abstract Transient protein protein interactions play a experimental techniques in studying transient protein

vital role in many biological processes, such as cell regu- protein interactions.

lation and signal transduction. A nonredundant dataset of

Keywords Protein protein interactions

130 protein chains extracted from transient complexes was

Transient interface Sequence pro le

used to analyze the features of transient interfaces. It was

Temperature factor Accessible surface area

found that besides the two well-known features, sequence

pro le and accessible surface area (ASA), the temperature Support vector machine

factor (B-factor) can also re ect the differences between

interface and the rest of protein surface. These features

were utilized to construct support vector machine (SVM) Introduction

classi ers to identify interaction sites. The results of

threefold cross-validation on the nonredundant dataset Protein protein interactions are critical to many biological

show that when B-factor was used as an additional feature, processes. The so-called interaction sites or functional sites

the prediction performance can be improved signi cantly. play a crucial role in protein protein interactions. Identify-

The sensitivity, speci city and correlation coef cient were ing these pivotal sites is useful to get a better understanding

raised from 54 to 62%, 41 to 45% and 0.20 to 0.29, of molecular recognition process at the residual and atomic

respectively. To further illustrate the effectiveness of our level, to uncover the mechanism of metabolic and signal

method, the classi ers were tested with an independent set transduction networks, and to gain important clues for

of 53 nonhomologous protein chains derived from bench- rational drug design (Chelliah et al. 2004).

mark 2.0. The sensitivity, speci city and correlation According to their lifetime, protein protein interactions

coef cient of the classi er based on the three features were can be divided into permanent interactions and tran-

63%, 45% and 0.33, respectively. It is indicated that our sient interactions (Jones and Thornton 1996). Due to the

classi ers are robust and can be applied to complement structural stability of permanent complexes, permanent

interactions are much easier to study by experimental

methods, such as X-ray crystallography and NMR spec-

Electronic supplementary material The online version of this troscopy. On the other hand, since transient interactions

article (doi:10.1007/s00726-009-0245-8) contains supplementary

often neither form stable crystals nor give good NMR

material, which is available to authorized users.

structures, transient complexes are notoriously hard to

R. Liu W. Jiang Y. Zhou study experimentally (Szilagyi et al. 2005). Nevertheless,

Hubei Bioinformatics and Molecular Imaging Key Laboratory, transient interactions are the focus of signi cant interest

College of Life Science and Technology,

owing to their biological importance, particularly with

Huazhong University of Science and Technology,

respect to cell regulation and signal transduction (Hoskins

430074 Wuhan, China

et al. 2006). Thus, computational methods are needed to

e-mail: ******@****.***.**

assist in nding the features of transient protein protein

R. Liu

interfaces and identifying residues in these interfaces.

e-mail: **********@***.***

123

264 R. Liu et al.

The purpose of our research is to focus on the identi -

By statistically analyzing different types of protein

cation of protein protein interaction sites in transient

protein interfaces, some common features have been

complexes. By analyzing the features of transient inter-

attained and used to identify interaction sites. It was found

faces, we found that besides the two well-known features,

that there are distinct differences in amino acid composi-

sequence pro le and ASA, B-factor can also re ect

tion between interface and noninterface, as well as between

the differences between interface and the rest of protein

different types of interfaces. Compared with noninterfaces,

surface. Then, B-factor, sequence pro le, ASA or the

permanent interfaces always contain more hydrophobic

combinations of them were used to construct SVM clas-

residues (Glaser et al. 2001). Although some transient

si ers to recognize interface residues. The results show that

interfaces are also hydrophobic, they are rich in aromatic

B-factor plays a key role in identifying the interaction sites

residues and depleted in charged residues (Lo Conte et al.

in transient complexes, and that utilizing the complemen-

1999). Evolutionary conservation of residues is another

tarity of the three features is favorable for improving the

important feature for the identi cation of interaction sites.

Generally, interface residues are more conservative than prediction performance.

noninterface residues during evolution. Transient interfaces

tend to evolve at a relatively higher rate than permanent

interfaces (Mintseris and Weng 2005). Previous studies Materials and methods

have demonstrated that interface residues are more solvent

accessible than noninterface residues. Solvent accessibility Dataset

is one of the most effective features used to predict

homodimer interfaces (Jones and Thornton 1997a, b). It has The experimental data in this study were derived from the

been suggested that interface residues have lower temper- dataset used by Ansari and Helms (2005). This dataset

ature factors (B-factors) than the exterior of protein, which contains 170 transient protein protein interaction pairs, not

contributes to less exibility of the interfacial regions including antigen antibody interactions. The correspond-

(Jones and Thornton 1995). In addition, secondary struc- ing transient complexes were extracted from the protein

ture (Neuvirth et al. 2004; Ansari and Helms 2005) and data bank (PDB) (Berman et al. 2000). To further advance

side-chain conformational entropy (Cole and Warwicker the quality of experimental data, the dataset was ltered

2002; Liang et al. 2006) can also be used to distinguish strictly. The complexes having multiple models solved by

interface residues from noninterface residues. Thus, these NMR spectroscopy were discarded. The pairs containing

features are valuable for identifying interaction sites. chains less than 50 residues were eliminated to lter out

The features mentioned above have been combined to small molecules. For the chains that interact with multiple

predict interaction sites in different types of complexes, partners, the one including the most interface residues was

which is based on a wide range of machine learning selected as a representative. After ltering the dataset, there

methods (Zhou and Shan 2001; Koike and Takagi 2004; were 117 transient protein protein interaction pairs,

Landau et al. 2005; Li et al. 2006; Bradford et al. 2006; namely 234 protein chains. Finally, 234 chains were clus-

Friedrich et al. 2006; Li et al. 2007). However, only a few tered to remove redundant chains using the BLASTCLUST

studies have chosen the interaction sites in transient program (Altschul et al. 1990) with identity threshold of

complexes as prediction objects. Ofran and Rost (2003) 30% and length coverage threshold of 90%. As a result, a

developed a neural network that identi es transient pro- nonredundant dataset composed of 130 protein chains was

tein protein interfaces from local sequence information. used in this research.

Neuvirth et al. (2004) utilized a naive Bayesian method

with 13 features to identify the interfaces of unbound De nition of surface residues and interface residues

structures of transient heterodimers at a patch level. Liang

et al. (2006) presented a linear combination of energy In this study, the method of Fariselli et al. (2002) was

score, interface propensity and residue conservation score adopted to de ne surface residues and interface residues.

to predict interface residues of the unbound structures A residue was considered as a surface residue if its ASA is

used by Neuvirth et al. (2004). Dong et al. (2007) input at least 16% of its nominal maximum area (Rost and

binary pro le interface propensity, sequence pro le and Sander 1994). The DSSP program (Kabsch and Sander

accessible surface area (ASA) to support vector machine 1983) was used to calculate the ASA of each residue in

(SVM) for recognizing interaction sites in transient unbound chain. The atom coordinates of a single chain

complexes. Although the existing prediction methods were derived from the corresponding PDB le. A surface

have achieved success at different levels, the prediction of residue was de ned as an interface residue if the distance

between its Ca atom and any residue s Ca atom from

interface residues in transient complexes is still at its

primary stage. its partner chain is less than 1.2 nm. According to this

123

Identifying protein protein interaction sites in transient complexes 265

de nition, our dataset contained 16,056 surface residues, sequence pro le was used. In this study, SVM classi ers

about 29% of which were interface residues. were implemented using the LIBSVM package (Chang and

Lin 2001) with the radial basis function as kernel. For each

Feature extraction classi er, we used a grid search to determine the optimal

values of C and c so as to maximize the correlation coef-

B-factor cient (CC) of cross-validation.

B-factor is a measure of atomic thermal motion and dis- Cross-validation

order. The B-factor of Ca atom was used to represent the

exibility of each residue and normalized by the following Threefold cross-validation was used to train and test the

equation (Yuan et al. 2003): classi ers. The whole dataset was randomly divided into

Br B three subsets with an approximately equal number of

NBr 1

r B chains. In each validation, one subset was used for testing

while the rest were used for training. In our dataset, only

where Br is the B-factor of residue r, (B) and r(B) are the

29% of surface residues were interface residues. If all

mean value and the standard deviation of the B-factors for

noninterface residues were used for training, the classi ers

the chosen chain, respectively.

would prefer to classify a target residue as a noninterface

residue. Therefore, for each run, the classi ers were trained

Sequence pro le

using all interface residues and an equal number of non-

interface residues extracted randomly from the training set

Sequence pro le was generated by three iterations of PSI-

and this procedure was repeated ve times. A residue was

BLAST searches (Altschul et al. 1997) against NCBI

classi ed as an interface residue if it was predicted to be

nonredundant database with the BLOSUM62 substitution

positive at least three times, otherwise a noninterface

matrix and E-value threshold of 0.001. The pro le value

residue.

was scaled between 0 and 1 by the following equation

(Kim and Park 2003):

Evaluation measures of classi er performance

if x 5

> 0: 0

In this study, four widely used measures, sensitivity,

f x 0:5 0:1x if 5\x\5 2

: speci city, accuracy and CC, were adopted to evaluate the

if x ! 5

1: 0

performances of different classi ers. These evaluation

where x is the original pro le value. measures are de ned as follows:

Sensitivity 4

Accessible surface area (ASA)

TP + FN

ASA was calculated in the process of de ning surface Specificity 5

TP + FP

residues with the DSSP program and scaled between 0 and

TP TN

1 by the following equation (Wang et al. 2008): Accuracy 6

TP + FN + TN + FP

ASAr

NASAr 3 Correlation coefficient CC

max(ASAr

TP TN FP FN

p 7

where ASAr is the ASA of residue r, max(ASAr) is the

TP FN TP FP TN FP TN FN

nominal maximum area of residue r.

where TP, FP, TN and FN represent the numbers of true

positives, false positives, true negatives and false nega-

Classi er construction

tives, respectively.

In our experiment, SVM classi ers (Vapnik 1995) were

used to identify whether a surface residue was located at the

Results and discussion

interface or not. SVM classi ers were constructed using B-

factor, sequence pro le, ASA or the combinations of them.

Features of transient protein protein interface

Each classi er input a window containing a target residue

and its ten spatially nearest surface residues. As a result,

The residue distributions in the interface and noninterface

each residue was represented by an 11-component vector if

are shown in Fig. 1a. It is clear that 11 residue types were

B-factor or ASA was used and by a 220-component vector if

123

266 R. Liu et al.

enriched in the interface, six types (Phe, Ile, Met, Leu, Val, The results con rm that residues in the transient interface

Trp) of which were hydrophobic residues. In addition, Tyr are more conservative.

and Arg that are potential hot spots were also overrepre- Previous ndings have suggested that interface residues

sented in the interface. The similar phenomena have been are more solvent accessible than noninterface residues

observed by Ansari and Helms (2005). The overrepre- (Jones and Thornton 1997a, b). From Fig. 1d, the average

sented 11 residue types in our study included the seven ASAs of the interface residues and noninterface residues

types in their research. Moreover, there were more reveal that except for Asp, Gly and Ala, the solvent

hydrophobic residues in our results, which was probably accessibilities of the other 17 residue types in the transient

owing to the different de nitions of interface residues and interface were stronger.

the different sizes of datasets.

Residues exhibiting relatively low B-factors are gener- Performance of SVM classi ers

ally those participating in forming secondary structures,

neighboring disul de bridges, or are involved in ligands The results of threefold cross-validation on 130 chains are

binding (Tseng and Liang 2007). As shown in Fig. 1b, by given in Table 1. As can be seen from Table 1, all the

calculating and comparing the mean values of the B-factors classi ers can predict signi cantly better than the random

of residues in the interface and noninterface, we found that predictions (shown in parentheses). When single feature

the mean values of the interface residues were all signi - was used, SVMB can identify residues in the transient

cantly lower than those of the noninterface residues. interfaces most effectively, SVMP was second to SVMB,

It has been long demonstrated that interface residues are and SVMA was relatively inferior. For the classi ers using

more conservative than noninterface residues during evo- the combination of two features, SVMB?P obtained the best

lution (Mintseris and Weng 2005). We followed the CC of 0.262. Although SVMP?A and SVMB?A did not

method of Zhou and Shan (2001) to average the diagonal perform as good as SVMB?P, they were still superior to the

elements of sequence pro le for all residue types in the classi ers with single feature. However, SVMB?P?A

interface and compared them against the corresponding achieved a much better performance than the above clas-

averages in the noninterface. Figure 1c shows that except si ers based on two features. Especially, compared with

for Thr and Trp, the averages over the interface residues SVMP?A, the CC was raised from 0.198 to 0.290. These

were all higher than those over the noninterface residues. results indicate that B-factor plays a vital role in identifying

Fig. 1 Comparison between (b) 0.6

(a) 10 interface noninterface

interface noninterface

interface residues and

Percentage of Residue 9 0.5

noninterface residues. a residue

8 0.4

distributions, b B-factors, c

7 0.3

conservation scores, d

B-factor

6 0.2

accessible surface areas

5 0.1

4 0.0

3 -0.1

2 -0.2

1 -0.3

0 -0.4

Y F I MCR H L G VWT D S N Q PA E K MF V H GC YWS A P T D I Q N R K E L

Residue Type Residue Type

9 interface noninterface

Accessible Surface Area

8 0.55

Conservation Score

7 0.50

MYWL I K RH P V E F NC S QT DGA

CHPMN L R F S K A DY I QE VGTW

Residue Type Residue Type

123

Identifying protein protein interaction sites in transient complexes 267

Evaluation of the predictions using three-dimensional

Table 1 The results of threefold cross-validation on 130 chains

structure

Classi er Sensitivity Speci city Accuracy CC To further illustrate the effectiveness of our method, the

43.4 (55.3)a 33.7 (28.7) 58.5 (47.0) 0.077 (-0.011)

SVMA prediction results of the protein complex 1ABR (PDB ID)

SVMP 52.8 (51.4) 39.5 (29.6) 62.5 (50.2) 0.181 (0.010) chosen from our dataset were visualized using the PyMOL

SVMB 59.7 (47.7) 40.7 (28.2) 63.0 (49.8) 0.220 (-0.015) package (DeLano 2002). The complex 1ABR that is a type

SVMP?A 53.9 (51.4) 40.6 (29.5) 63.3 (49.9) 0.198 (0.007) II ribosome-inactivating protein is composed of an A-chain

SVMB?A 59.3 (47.4) 41.9 (29.3) 64.0 (51.7) 0.234 (0.008) (1ABR:A) linked by a disul de bond to a B-chain

SVMB?P 60.3 (50.1) 43.6 (28.9) 65.7 (50.4) 0.262 (0.005) (1ABR:B) (Tahirov et al. 1995). As can be seen from

SVMB?P?A 61.8 (49.2) 45.4 (28.8) 67.1 (49.7) 0.290 (-0.008) Fig. 3, the classi ers with single feature can identify part of

interface residues in 1ABR:B, but incorrectly predicted

The subscripts are de ned as follows: B B-factor, P sequence pro le,

many false positives and false negatives. However, when

A ASA

the three features were combined, the classi er not only

Random predictions were obtained by randomly shuf ing the labels

of samples in training sets and retraining the classi ers to predict test identi ed more interface residues, but also reduced the

sets

number of false predictions. It is indicated again that

combining the three features can improve the prediction

performance.

140

sensitivity Independent testing

specificity

120 accuracy

correlation coefficient

Benchmark 2.0 is a nonredundant dataset for testing pro-

100

Number of Proteins

tein protein docking algorithms (Mintseris et al. 2005).

This dataset (excluding antigen-antibody) contains 62

transient protein complexes. The chains sharing more than

30% sequence identity with anyone of the 130 chains in our

dataset were eliminated. After this process, we got a non-

homologous set consisting of 37, 11 and 5 chains with minor

(rigid body), medium (medium dif cult) and large (dif -

cult) conformational changes. Then, we used our dataset as

a training set to train the classi ers with combined features,

and predicted the interaction sites contained by all chains

>=0.0 >=0.1 >=0.2 >=0.3 >=0.4 >=0.5 >=0.6 >=0.7 >=0.8 >=0.9

Cutoff and nonhomologous chains in benchmark 2.0, respectively.

In order to balance positive and negative samples, all

Fig. 2 The distributions of evaluation measure values of SVMB?P?A

interface residues and a same number of randomly sampled

for 130 chains

noninterface residues from our dataset were extracted for

training and this procedure was repeated ve times.

interaction sites in transient complexes, and that utilizing The results of different classi ers tested on the whole set

the complementarity of the three features is favorable for (shown in parentheses) and the nonhomologous set are

improving the prediction performance. displayed in Table 2. It can be seen that the prediction

For each evaluation measure, setting different cutoffs abilities of the four classi ers were consistent with the

from 0 to 1 with a 0.1 increment each time, the corre- results attained by threefold cross-validation on 130 chains.

sponding numbers of chains were obtained. The SVMB?P?A got the best performance not only for all chains,

distributions of evaluation measure values of SVMB?P?A but also for nonhomologous chains. In the classi ers based

are exhibited in Fig. 2. It can be observed that the sensi- on two features, SVMB?P was the best, SVMB?A was

tivity values of 99 (76%) chains exceeded 50%, and 69 second to SVMB?P, and SVMP?A was relatively inferior.

(53%) chains had the speci city values exceeding the same Especially, as the magnitude of conformational changes

cutoff. The distributions of accuracy values show that the increases, only combining sequence pro le and ASA can

accuracy values were greater than 20% for all chains, 117 not favorably identify the interface residues. However,

(90%) chains of which achieved the values over 50%. In when B-factor was input as an additional feature, the pre-

addition, it can be found that the CC values were not less diction performance was obviously improved. It is

than 0 for 113 (87%) chains, which suggests that our interesting that the classi ers utilizing B-factor as a feature

method is effective. got better performance on the dif cult set than on the other

123

268 R. Liu et al.

Fig. 3 Visualization of

prediction results for complex

1ABR (PDB ID). a SVMB, b

SVMP, c SVMA, d SVMB?P?A.

The colors of different residues

are de ned as follows: green

denotes true positives (TP), red

denotes false positives (FP),

yellow denotes false negatives

(FN)

Table 2 The results of

Subset No. of Classi er Sensitivity Speci city Accuracy CC

independent testing on

chains benchmark 2.0

37 (86)a

Rigid body SVMP?A 55.4 (59.6) 38.6 (38.9) 63.4 (66.0) 0.200 (0.248)

SVMB?A 65.8 (61.8) 40.2 (40.7) 63.6 (67.4) 0.257 (0.278)

SVMB?P 65.9 (66.6) 44.3 (44.9) 67.8 (70.8) 0.312 (0.348)

SVMB?P?A 67.6 (67.8) 44.9 (46.7) 68.3 (72.2) 0.328 (0.374)

Medium dif cult 11 (24) SVMP?A 34.5 (44.7) 35.7 (35.4) 69.3 (67.6) 0.149 (0.180)

SVMB?A 58.8 (58.0) 42.8 (38.5) 71.2 (68.1) 0.308 (0.259)

SVMB?P 50.0 (56.0) 45.1 (41.7) 73.3 (71.1) 0.297 (0.290)

SVMB?P?A 49.0 (56.3) 47.9 (42.9) 74.9 (71.9) 0.318 (0.303)

Dif cult 5 (14) SVMP?A 27.6 (43.5) 31.1 (29.9) 73.3 (62.6) 0.129 (0.107)

SVMB?A 85.2 (73.3) 39.4 (40.5) 70.8 (68.2) 0.423 (0.343)

SVMB?P 73.9 (69.3) 40.2 (40.5) 72.8 (68.6) 0.385 (0.326)

SVMB?P?A 70.0 (68.4) 39.3 (40.7) 72.4 (68.9) 0.359 (0.325)

The subscripts are de ned as

All 53 (124) SVMP?A 46.9 (54.1) 37.4 (37.0) 66.5 (65.8) 0.187 (0.214)

follows: B B-factor, P sequence

pro le, A ASA SVMB?A 66.6 (62.6) 40.6 (40.2) 66.7 (67.7) 0.294 (0.284)

The numbers of all chains in SVMB?P 63.1 (64.7) 43.8 (43.6) 70.0 (70.5) 0.320 (0.332)

different categories are in SVMB?P?A 63.4 (65.4) 44.6 (45.0) 70.6 (71.6) 0.331 (0.351)

parentheses

two sets. The results further illuminate that B-factor is whole set. Even so, SVMB?P?A achieved a CC of 0.331 on

crucial to identify the interaction sites of the chains with the nonhomologous set, which was better than the value of

large conformational changes. In addition, except for threefold cross-validation on 130 chains. The prediction

SVMB?A, the performances of the other three classi ers results con rm that our classi ers are robust, and that using

tested on the nonhomologous set were not so good as on the more training samples can acquire better performance.

123

Identifying protein protein interaction sites in transient complexes 269

Acknowledgments This work was supported by the National Nat-

Comparison with cons-PPISP

ural Science Foundation of China (Grant Nos. 90608020, 30370354,

and 90203011), NCET-060651, the National Platform Project of

A direct comparison with other methods is dif cult due to China (Grant No. 2005DKA64001), and the Ministry of Education of

the differences in the de nitions of surface residues and China (Grant Nos. 200******** and 505010).

interface residues and the preparations of datasets. We

made an attempt to compare our method with cons-PPISP,

because they were both tested on the protein protein

References

docking benchmark set. Cons-PPISP that used sequence

pro le and solvent accessibility as input to neural networks

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic

was developed by Chen and Zhou (2005). Their method

local alignment search tool. J Mol Biol 215:403 410

was tested on 68 unique chains of 40 complexes in

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,

benchmark 1.0. The sensitivity and speci city were 50% Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new

generation of protein database search programs. Nucleic Acids

and 50% for the enzyme-inhibitor category, 28 and 31% for

Res 25:3389 3402. doi:10.1093/nar/25.17.3389

other category, and 38 and 42% for the whole 68 chains,

Ansari S, Helms V (2005) Statistical analysis of predominantly

respectively. Our method was tested on 95 unique chains of transient protein protein interfaces. Proteins 61:344 355. doi:

62 complexes in benchmark 2.0, which achieved a sensi- 10.1002/prot.20593

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,

tivity and speci city of 69 and 51% for the enzyme-

Shindyalov IN, Bourne PE (2000) The protein data bank.

inhibitor category, 64 and 42% for other category, and 67

Nucleic Acids Res 28:235 242. doi:10.1093/nar/28.1.235

and 44% for the whole 95 chains, respectively. Our method Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR (2006) Insights

achieving better performance probably depends on three into protein protein interfaces using a Bayesian network

prediction method. J Mol Biol 362:365 386. doi:10.1016/j.jmb.

factors. First, B-factor was used as an additional feature in

2006.07.028

our method, which led to the obvious improvement of

Chang CC, Lin CJ (2001) LIBSVM: a library for support vector

prediction performance. When sequence pro le and ASA machines. Software available at: (http://www.csie.ntu.edu.tw/

were combined, the sensitivity and speci city of our *cjlin/libsvm)

Chelliah V, Chen L, Blundell TL, Lovell SC (2004) Distinguishing

method for 95 chains were only 55 and 36%. Second, we

structural and functional restraints in evolution in order to

used a balanced training set to train the classi ers, which

identify interaction sites. J Mol Biol 342:1487 1504. doi:

may result in a relatively high sensitivity. Third, owing to 10.1016/j.jmb.2004.08.022

the training set of Chen and Zhou including some other Chen H, Zhou HX (2005) Prediction of interface residues in protein

protein complexes by a consensus neural network method:

types of complexes, the features extracted from these

test against NMR data. Proteins 61:21 35. doi:10.1002/prot.

complexes may not be suitable for predicting interface

20514

residues in transient complexes. Cole C, Warwicker J (2002) Side-chain conformational entropy at

protein protein interfaces. Protein Sci 11:2860 2870. doi:

10.1110/ps.0222702

DeLano WL (2002) The PyMOL molecular graphics system.

Conclusion

Software available at: (http://www.pymol.org)

Dong Q, Wang X, Lin L, Guan Y (2007) Exploiting residue-level and

Transient protein protein interactions play a vital role in pro le-level interface propensities for usage in binding sites

prediction of proteins. BMC Bioinformatics 8:147. doi:

many biological processes. Due to the limitation of

10.1186/147*-****-*-***

experimental methods, the knowledge of these interactions

Fariselli P, Pazos F, Valencia A, Casadio R (2002) Prediction of

is inadequate. In this research, transient interfaces were protein protein interaction sites in heterocomplexes with neural

chosen as study objects and the features of these interfaces networks. Eur J Biochem 269:1356 1361. doi:10.1046/j.1432-

1033.2002.02767.x

were analyzed. It was found that besides sequence pro le

Friedrich T, Pils B, Dandekar T, Schultz J, Muller T (2006)

and ASA, B-factor can also distinctly re ect the differences

Modelling interaction sites in protein domains with interaction

between interface and noninterface. We converted these pro le hidden Markov models. Bioinformatics 22:2851 2857.

features into input vector and used SVM classi ers to doi:10.1093/bioinformatics/btl486

Glaser F, Steinberg DM, Vakser IA, Ben-Tal N (2001) Residue

predict residues in the interface. It is indicated that the

frequencies and pairing preferences at protein protein interfaces.

incorporation of B-factor is important to identify interac-

Proteins 43:89 102. doi:10.1002/1097-0134(20010501)43:2

tion sites in transient complexes, and that the information \89::AID-PROT1021[3.0.CO;2-H

contained within these features are complementary. Hoskins J, Lovell S, Blundell TL (2006) An algorithm for predicting

protein protein interaction sites: abnormally exposed amino acid

Therefore, our method can complement experimental

residues and secondary structure elements. Protein Sci 15:1017

techniques in studying transient protein protein interac-

1029. doi:10.1110/ps.051589106

tions. Incorporation of our method with more Jones S, Thornton JM (1995) Protein protein interactions: a review of

physicochemical properties and structural attributes will protein dimer structures. Prog Biophys Mol Biol 63:31 65. doi:

10.1016/0079-6107(94)00008-W

prompt the study of protein protein interactions.

123

270 R. Liu et al.

Jones S, Thornton JM (1996) Principles of protein protein interactions. Mintseris J, Wiehe K, Pierce B, Anderson R, Chen R, Janin J, Weng Z

Proc Natl Acad Sci USA 93:13 20. doi:10.1073/pnas.93.1.13 (2005) Protein protein docking benchmark 2.0: an update.

Jones S, Thornton JM (1997a) Analysis of protein protein interaction Proteins 60:214 216. doi:10.1002/prot.20560

sites using surface patches. J Mol Biol 272:121 132. doi: Neuvirth H, Raz R, Schreiber G (2004) ProMate: a structure based

10.1006/jmbi.1997.1234 prediction program to identify the location of protein protein

Jones S, Thornton JM (1997b) Prediction of protein protein interac- binding sites. J Mol Biol 338:181 199. doi:10.1016/j.jmb.2004.

tion sites using patch analysis. J Mol Biol 272:133 143. doi: 02.040

10.1006/jmbi.1997.1233 Ofran Y, Rost B (2003) Predicted protein protein interaction sites

Kabsch W, Sander C (1983) Dictionary of protein secondary from local sequence information. FEBS Lett 544:236 239. doi:

structure: pattern of hydrogen-bonded and geometrical features. 10.1016/S0014-5793(03)00456-3

Biopolymers 22:2577 2637. doi:10.1002/bip.360221211 Rost B, Sander C (1994) Conservation and prediction of solvent

Kim H, Park H (2003) Protein secondary structure prediction based accessibility in protein families. Proteins 20:216 226. doi:

on an improved support vector machines approach. Protein Eng 10.1002/prot.340200303

16:553 560. doi:10.1093/protein/gzg072 Szilagyi A, Grimm V, Arakaki AK, Skolnick J (2005) Prediction of

Koike A, Takagi T (2004) Prediction of protein protein interaction physical protein protein interactions. Phys Biol 2:S1 S16. doi:

sites using support vector machines. Protein Eng Des Sel 10.1088/1478-3975/2/2/S01

17:165 173. doi:10.1093/protein/gzh020 Tahirov TH, Lu TH, Liaw YC, Chen YL, Lin JY (1995) Crystal

Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, structure of abrin-a at 2.14 A. J Mol Biol 250:354 367. doi:

Ben-Tal N (2005) ConSurf 2005: the projection of evolutionary 10.1006/jmbi.1995.0382

conservation scores of residues on protein structures. Nucleic Tseng YY, Liang J (2007) Predicting enzyme functional surfaces and

Acids Res 33:W299 W302. doi:10.1093/nar/gki370 locating key residues automatically from structures. Ann Biomed

Li JJ, Huang DS, Wang B, Chen P (2006) Identifying protein protein Eng 35:1037 1042. doi:10.1007/s10439-006-9241-2

interfacial residues in heterocomplexes using residue conserva- Vapnik VN (1995) The nature of statistical learning theory. Springer,

tion scores. Int J Biol Macromol 38:241 247. doi:10.1016/ New York

j.ijbiomac.2006.02.024 Wang Y, Xue Z, Shen G, Xu J (2008) PRINTR: prediction of RNA

Li MH, Lin L, Wang XL, Liu T (2007) Protein protein interaction binding sites in proteins using SVM and pro les. Amino Acids

site prediction based on conditional random elds. Bioinformat- 35:295 302. doi:10.1007/s00726-007-0634-9

ics 23:597 604. doi:10.1093/bioinformatics/btl660 Yuan Z, Zhao J, Wang ZX (2003) Flexibility analysis of enzyme

Liang S, Zhang C, Liu S, Zhou Y (2006) Protein binding site active sites by crystallographic temperature factors. Protein Eng

prediction using an empirical scoring function. Nucleic Acids 16:109 114. doi:10.1093/proeng/gzg014

Res 34:3698 3707. doi:10.1093/nar/gkl454 Zhou HX, Shan Y (2001) Prediction of protein interaction sites from

Lo Conte L, Chothia C, Janin J (1999) The atomic structure of sequence pro le and residue neighbor list. Proteins 44:336 343.

protein protein recognition sites. J Mol Biol 285:2177 2198. doi:10.1002/prot.1099

doi:10.1006/jmbi.1998.2439

Mintseris J, Weng Z (2005) Structure, function, and evolution of

transient and obligate protein protein interactions. Proc Natl

Acad Sci USA 102:109**-*****. doi:10.1073/pnas.050*******

123

Contact this candidate