ORIGINAL ARTICLE
Identifying protein protein interaction sites in transient
complexes with temperature factor, sequence pro le
and accessible surface area
Rong Liu Wenchao Jiang Yanhong Zhou
Received: 5 October 2008 / Accepted: 21 January 2009 / Published online: 12 February 2009
Springer-Verlag 2009
Abstract Transient protein protein interactions play a experimental techniques in studying transient protein
vital role in many biological processes, such as cell regu- protein interactions.
lation and signal transduction. A nonredundant dataset of
Keywords Protein protein interactions
130 protein chains extracted from transient complexes was
Transient interface Sequence pro le
used to analyze the features of transient interfaces. It was
Temperature factor Accessible surface area
found that besides the two well-known features, sequence
pro le and accessible surface area (ASA), the temperature Support vector machine
factor (B-factor) can also re ect the differences between
interface and the rest of protein surface. These features
were utilized to construct support vector machine (SVM) Introduction
classi ers to identify interaction sites. The results of
threefold cross-validation on the nonredundant dataset Protein protein interactions are critical to many biological
show that when B-factor was used as an additional feature, processes. The so-called interaction sites or functional sites
the prediction performance can be improved signi cantly. play a crucial role in protein protein interactions. Identify-
The sensitivity, speci city and correlation coef cient were ing these pivotal sites is useful to get a better understanding
raised from 54 to 62%, 41 to 45% and 0.20 to 0.29, of molecular recognition process at the residual and atomic
respectively. To further illustrate the effectiveness of our level, to uncover the mechanism of metabolic and signal
method, the classi ers were tested with an independent set transduction networks, and to gain important clues for
of 53 nonhomologous protein chains derived from bench- rational drug design (Chelliah et al. 2004).
mark 2.0. The sensitivity, speci city and correlation According to their lifetime, protein protein interactions
coef cient of the classi er based on the three features were can be divided into permanent interactions and tran-
63%, 45% and 0.33, respectively. It is indicated that our sient interactions (Jones and Thornton 1996). Due to the
classi ers are robust and can be applied to complement structural stability of permanent complexes, permanent
interactions are much easier to study by experimental
methods, such as X-ray crystallography and NMR spec-
Electronic supplementary material The online version of this troscopy. On the other hand, since transient interactions
article (doi:10.1007/s00726-009-0245-8) contains supplementary
often neither form stable crystals nor give good NMR
material, which is available to authorized users.
structures, transient complexes are notoriously hard to
R. Liu W. Jiang Y. Zhou study experimentally (Szilagyi et al. 2005). Nevertheless,
Hubei Bioinformatics and Molecular Imaging Key Laboratory, transient interactions are the focus of signi cant interest
College of Life Science and Technology,
owing to their biological importance, particularly with
Huazhong University of Science and Technology,
respect to cell regulation and signal transduction (Hoskins
430074 Wuhan, China
et al. 2006). Thus, computational methods are needed to
e-mail: ******@****.***.**
assist in nding the features of transient protein protein
R. Liu
interfaces and identifying residues in these interfaces.
e-mail: **********@***.***
123
264 R. Liu et al.
The purpose of our research is to focus on the identi -
By statistically analyzing different types of protein
cation of protein protein interaction sites in transient
protein interfaces, some common features have been
complexes. By analyzing the features of transient inter-
attained and used to identify interaction sites. It was found
faces, we found that besides the two well-known features,
that there are distinct differences in amino acid composi-
sequence pro le and ASA, B-factor can also re ect
tion between interface and noninterface, as well as between
the differences between interface and the rest of protein
different types of interfaces. Compared with noninterfaces,
surface. Then, B-factor, sequence pro le, ASA or the
permanent interfaces always contain more hydrophobic
combinations of them were used to construct SVM clas-
residues (Glaser et al. 2001). Although some transient
si ers to recognize interface residues. The results show that
interfaces are also hydrophobic, they are rich in aromatic
B-factor plays a key role in identifying the interaction sites
residues and depleted in charged residues (Lo Conte et al.
in transient complexes, and that utilizing the complemen-
1999). Evolutionary conservation of residues is another
tarity of the three features is favorable for improving the
important feature for the identi cation of interaction sites.
Generally, interface residues are more conservative than prediction performance.
noninterface residues during evolution. Transient interfaces
tend to evolve at a relatively higher rate than permanent
interfaces (Mintseris and Weng 2005). Previous studies Materials and methods
have demonstrated that interface residues are more solvent
accessible than noninterface residues. Solvent accessibility Dataset
is one of the most effective features used to predict
homodimer interfaces (Jones and Thornton 1997a, b). It has The experimental data in this study were derived from the
been suggested that interface residues have lower temper- dataset used by Ansari and Helms (2005). This dataset
ature factors (B-factors) than the exterior of protein, which contains 170 transient protein protein interaction pairs, not
contributes to less exibility of the interfacial regions including antigen antibody interactions. The correspond-
(Jones and Thornton 1995). In addition, secondary struc- ing transient complexes were extracted from the protein
ture (Neuvirth et al. 2004; Ansari and Helms 2005) and data bank (PDB) (Berman et al. 2000). To further advance
side-chain conformational entropy (Cole and Warwicker the quality of experimental data, the dataset was ltered
2002; Liang et al. 2006) can also be used to distinguish strictly. The complexes having multiple models solved by
interface residues from noninterface residues. Thus, these NMR spectroscopy were discarded. The pairs containing
features are valuable for identifying interaction sites. chains less than 50 residues were eliminated to lter out
The features mentioned above have been combined to small molecules. For the chains that interact with multiple
predict interaction sites in different types of complexes, partners, the one including the most interface residues was
which is based on a wide range of machine learning selected as a representative. After ltering the dataset, there
methods (Zhou and Shan 2001; Koike and Takagi 2004; were 117 transient protein protein interaction pairs,
Landau et al. 2005; Li et al. 2006; Bradford et al. 2006; namely 234 protein chains. Finally, 234 chains were clus-
Friedrich et al. 2006; Li et al. 2007). However, only a few tered to remove redundant chains using the BLASTCLUST
studies have chosen the interaction sites in transient program (Altschul et al. 1990) with identity threshold of
complexes as prediction objects. Ofran and Rost (2003) 30% and length coverage threshold of 90%. As a result, a
developed a neural network that identi es transient pro- nonredundant dataset composed of 130 protein chains was
tein protein interfaces from local sequence information. used in this research.
Neuvirth et al. (2004) utilized a naive Bayesian method
with 13 features to identify the interfaces of unbound De nition of surface residues and interface residues
structures of transient heterodimers at a patch level. Liang
et al. (2006) presented a linear combination of energy In this study, the method of Fariselli et al. (2002) was
score, interface propensity and residue conservation score adopted to de ne surface residues and interface residues.
to predict interface residues of the unbound structures A residue was considered as a surface residue if its ASA is
used by Neuvirth et al. (2004). Dong et al. (2007) input at least 16% of its nominal maximum area (Rost and
binary pro le interface propensity, sequence pro le and Sander 1994). The DSSP program (Kabsch and Sander
accessible surface area (ASA) to support vector machine 1983) was used to calculate the ASA of each residue in
(SVM) for recognizing interaction sites in transient unbound chain. The atom coordinates of a single chain
complexes. Although the existing prediction methods were derived from the corresponding PDB le. A surface
have achieved success at different levels, the prediction of residue was de ned as an interface residue if the distance
between its Ca atom and any residue s Ca atom from
interface residues in transient complexes is still at its
primary stage. its partner chain is less than 1.2 nm. According to this
123
Identifying protein protein interaction sites in transient complexes 265
de nition, our dataset contained 16,056 surface residues, sequence pro le was used. In this study, SVM classi ers
about 29% of which were interface residues. were implemented using the LIBSVM package (Chang and
Lin 2001) with the radial basis function as kernel. For each
Feature extraction classi er, we used a grid search to determine the optimal
values of C and c so as to maximize the correlation coef-
B-factor cient (CC) of cross-validation.
B-factor is a measure of atomic thermal motion and dis- Cross-validation
order. The B-factor of Ca atom was used to represent the
exibility of each residue and normalized by the following Threefold cross-validation was used to train and test the
equation (Yuan et al. 2003): classi ers. The whole dataset was randomly divided into
Br B three subsets with an approximately equal number of
NBr 1
r B chains. In each validation, one subset was used for testing
while the rest were used for training. In our dataset, only
where Br is the B-factor of residue r, (B) and r(B) are the
29% of surface residues were interface residues. If all
mean value and the standard deviation of the B-factors for
noninterface residues were used for training, the classi ers
the chosen chain, respectively.
would prefer to classify a target residue as a noninterface
residue. Therefore, for each run, the classi ers were trained
Sequence pro le
using all interface residues and an equal number of non-
interface residues extracted randomly from the training set
Sequence pro le was generated by three iterations of PSI-
and this procedure was repeated ve times. A residue was
BLAST searches (Altschul et al. 1997) against NCBI
classi ed as an interface residue if it was predicted to be
nonredundant database with the BLOSUM62 substitution
positive at least three times, otherwise a noninterface
matrix and E-value threshold of 0.001. The pro le value
residue.
was scaled between 0 and 1 by the following equation
(Kim and Park 2003):
Evaluation measures of classi er performance
8
if x 5
> 0: 0
In this study, four widely used measures, sensitivity,
f x 0:5 0:1x if 5\x\5 2
>
: speci city, accuracy and CC, were adopted to evaluate the
if x ! 5
1: 0
performances of different classi ers. These evaluation
where x is the original pro le value. measures are de ned as follows:
TP
Sensitivity 4
Accessible surface area (ASA)
TP + FN
TP
ASA was calculated in the process of de ning surface Specificity 5
TP + FP
residues with the DSSP program and scaled between 0 and
TP TN
1 by the following equation (Wang et al. 2008): Accuracy 6
TP + FN + TN + FP
ASAr
NASAr 3 Correlation coefficient CC
max(ASAr
TP TN FP FN
p 7
where ASAr is the ASA of residue r, max(ASAr) is the
TP FN TP FP TN FP TN FN
nominal maximum area of residue r.
where TP, FP, TN and FN represent the numbers of true
positives, false positives, true negatives and false nega-
Classi er construction
tives, respectively.
In our experiment, SVM classi ers (Vapnik 1995) were
used to identify whether a surface residue was located at the
Results and discussion
interface or not. SVM classi ers were constructed using B-
factor, sequence pro le, ASA or the combinations of them.
Features of transient protein protein interface
Each classi er input a window containing a target residue
and its ten spatially nearest surface residues. As a result,
The residue distributions in the interface and noninterface
each residue was represented by an 11-component vector if
are shown in Fig. 1a. It is clear that 11 residue types were
B-factor or ASA was used and by a 220-component vector if
123
266 R. Liu et al.
enriched in the interface, six types (Phe, Ile, Met, Leu, Val, The results con rm that residues in the transient interface
Trp) of which were hydrophobic residues. In addition, Tyr are more conservative.
and Arg that are potential hot spots were also overrepre- Previous ndings have suggested that interface residues
sented in the interface. The similar phenomena have been are more solvent accessible than noninterface residues
observed by Ansari and Helms (2005). The overrepre- (Jones and Thornton 1997a, b). From Fig. 1d, the average
sented 11 residue types in our study included the seven ASAs of the interface residues and noninterface residues
types in their research. Moreover, there were more reveal that except for Asp, Gly and Ala, the solvent
hydrophobic residues in our results, which was probably accessibilities of the other 17 residue types in the transient
owing to the different de nitions of interface residues and interface were stronger.
the different sizes of datasets.
Residues exhibiting relatively low B-factors are gener- Performance of SVM classi ers
ally those participating in forming secondary structures,
neighboring disul de bridges, or are involved in ligands The results of threefold cross-validation on 130 chains are
binding (Tseng and Liang 2007). As shown in Fig. 1b, by given in Table 1. As can be seen from Table 1, all the
calculating and comparing the mean values of the B-factors classi ers can predict signi cantly better than the random
of residues in the interface and noninterface, we found that predictions (shown in parentheses). When single feature
the mean values of the interface residues were all signi - was used, SVMB can identify residues in the transient
cantly lower than those of the noninterface residues. interfaces most effectively, SVMP was second to SVMB,
It has been long demonstrated that interface residues are and SVMA was relatively inferior. For the classi ers using
more conservative than noninterface residues during evo- the combination of two features, SVMB?P obtained the best
lution (Mintseris and Weng 2005). We followed the CC of 0.262. Although SVMP?A and SVMB?A did not
method of Zhou and Shan (2001) to average the diagonal perform as good as SVMB?P, they were still superior to the
elements of sequence pro le for all residue types in the classi ers with single feature. However, SVMB?P?A
interface and compared them against the corresponding achieved a much better performance than the above clas-
averages in the noninterface. Figure 1c shows that except si ers based on two features. Especially, compared with
for Thr and Trp, the averages over the interface residues SVMP?A, the CC was raised from 0.198 to 0.290. These
were all higher than those over the noninterface residues. results indicate that B-factor plays a vital role in identifying
Fig. 1 Comparison between (b) 0.6
(a) 10 interface noninterface
interface noninterface
interface residues and
Percentage of Residue 9 0.5
noninterface residues. a residue
8 0.4
distributions, b B-factors, c
7 0.3
conservation scores, d
B-factor
6 0.2
accessible surface areas
5 0.1
4 0.0
3 -0.1
2 -0.2
1 -0.3
0 -0.4
Y F I MCR H L G VWT D S N Q PA E K MF V H GC YWS A P T D I Q N R K E L
Residue Type Residue Type
(c) (d) 0.60 interface noninterface
9 interface noninterface
Accessible Surface Area
8 0.55
Conservation Score
7 0.50
MYWL I K RH P V E F NC S QT DGA
CHPMN L R F S K A DY I QE VGTW
Residue Type Residue Type
123
Identifying protein protein interaction sites in transient complexes 267
Evaluation of the predictions using three-dimensional
Table 1 The results of threefold cross-validation on 130 chains
structure
Classi er Sensitivity Speci city Accuracy CC To further illustrate the effectiveness of our method, the
43.4 (55.3)a 33.7 (28.7) 58.5 (47.0) 0.077 (-0.011)
SVMA prediction results of the protein complex 1ABR (PDB ID)
SVMP 52.8 (51.4) 39.5 (29.6) 62.5 (50.2) 0.181 (0.010) chosen from our dataset were visualized using the PyMOL
SVMB 59.7 (47.7) 40.7 (28.2) 63.0 (49.8) 0.220 (-0.015) package (DeLano 2002). The complex 1ABR that is a type
SVMP?A 53.9 (51.4) 40.6 (29.5) 63.3 (49.9) 0.198 (0.007) II ribosome-inactivating protein is composed of an A-chain
SVMB?A 59.3 (47.4) 41.9 (29.3) 64.0 (51.7) 0.234 (0.008) (1ABR:A) linked by a disul de bond to a B-chain
SVMB?P 60.3 (50.1) 43.6 (28.9) 65.7 (50.4) 0.262 (0.005) (1ABR:B) (Tahirov et al. 1995). As can be seen from
SVMB?P?A 61.8 (49.2) 45.4 (28.8) 67.1 (49.7) 0.290 (-0.008) Fig. 3, the classi ers with single feature can identify part of
interface residues in 1ABR:B, but incorrectly predicted
The subscripts are de ned as follows: B B-factor, P sequence pro le,
many false positives and false negatives. However, when
A ASA
a
the three features were combined, the classi er not only
Random predictions were obtained by randomly shuf ing the labels
of samples in training sets and retraining the classi ers to predict test identi ed more interface residues, but also reduced the
sets
number of false predictions. It is indicated again that
combining the three features can improve the prediction
performance.
140
sensitivity Independent testing
specificity
120 accuracy
correlation coefficient
Benchmark 2.0 is a nonredundant dataset for testing pro-
100
Number of Proteins
tein protein docking algorithms (Mintseris et al. 2005).
This dataset (excluding antigen-antibody) contains 62
80
transient protein complexes. The chains sharing more than
30% sequence identity with anyone of the 130 chains in our
60
dataset were eliminated. After this process, we got a non-
homologous set consisting of 37, 11 and 5 chains with minor
40
(rigid body), medium (medium dif cult) and large (dif -
20
cult) conformational changes. Then, we used our dataset as
a training set to train the classi ers with combined features,
0
and predicted the interaction sites contained by all chains
>=0.0 >=0.1 >=0.2 >=0.3 >=0.4 >=0.5 >=0.6 >=0.7 >=0.8 >=0.9
Cutoff and nonhomologous chains in benchmark 2.0, respectively.
In order to balance positive and negative samples, all
Fig. 2 The distributions of evaluation measure values of SVMB?P?A
interface residues and a same number of randomly sampled
for 130 chains
noninterface residues from our dataset were extracted for
training and this procedure was repeated ve times.
interaction sites in transient complexes, and that utilizing The results of different classi ers tested on the whole set
the complementarity of the three features is favorable for (shown in parentheses) and the nonhomologous set are
improving the prediction performance. displayed in Table 2. It can be seen that the prediction
For each evaluation measure, setting different cutoffs abilities of the four classi ers were consistent with the
from 0 to 1 with a 0.1 increment each time, the corre- results attained by threefold cross-validation on 130 chains.
sponding numbers of chains were obtained. The SVMB?P?A got the best performance not only for all chains,
distributions of evaluation measure values of SVMB?P?A but also for nonhomologous chains. In the classi ers based
are exhibited in Fig. 2. It can be observed that the sensi- on two features, SVMB?P was the best, SVMB?A was
tivity values of 99 (76%) chains exceeded 50%, and 69 second to SVMB?P, and SVMP?A was relatively inferior.
(53%) chains had the speci city values exceeding the same Especially, as the magnitude of conformational changes
cutoff. The distributions of accuracy values show that the increases, only combining sequence pro le and ASA can
accuracy values were greater than 20% for all chains, 117 not favorably identify the interface residues. However,
(90%) chains of which achieved the values over 50%. In when B-factor was input as an additional feature, the pre-
addition, it can be found that the CC values were not less diction performance was obviously improved. It is
than 0 for 113 (87%) chains, which suggests that our interesting that the classi ers utilizing B-factor as a feature
method is effective. got better performance on the dif cult set than on the other
123
268 R. Liu et al.
Fig. 3 Visualization of
prediction results for complex
1ABR (PDB ID). a SVMB, b
SVMP, c SVMA, d SVMB?P?A.
The colors of different residues
are de ned as follows: green
denotes true positives (TP), red
denotes false positives (FP),
yellow denotes false negatives
(FN)
Table 2 The results of
Subset No. of Classi er Sensitivity Speci city Accuracy CC
independent testing on
chains benchmark 2.0
37 (86)a
Rigid body SVMP?A 55.4 (59.6) 38.6 (38.9) 63.4 (66.0) 0.200 (0.248)
SVMB?A 65.8 (61.8) 40.2 (40.7) 63.6 (67.4) 0.257 (0.278)
SVMB?P 65.9 (66.6) 44.3 (44.9) 67.8 (70.8) 0.312 (0.348)
SVMB?P?A 67.6 (67.8) 44.9 (46.7) 68.3 (72.2) 0.328 (0.374)
Medium dif cult 11 (24) SVMP?A 34.5 (44.7) 35.7 (35.4) 69.3 (67.6) 0.149 (0.180)
SVMB?A 58.8 (58.0) 42.8 (38.5) 71.2 (68.1) 0.308 (0.259)
SVMB?P 50.0 (56.0) 45.1 (41.7) 73.3 (71.1) 0.297 (0.290)
SVMB?P?A 49.0 (56.3) 47.9 (42.9) 74.9 (71.9) 0.318 (0.303)
Dif cult 5 (14) SVMP?A 27.6 (43.5) 31.1 (29.9) 73.3 (62.6) 0.129 (0.107)
SVMB?A 85.2 (73.3) 39.4 (40.5) 70.8 (68.2) 0.423 (0.343)
SVMB?P 73.9 (69.3) 40.2 (40.5) 72.8 (68.6) 0.385 (0.326)
SVMB?P?A 70.0 (68.4) 39.3 (40.7) 72.4 (68.9) 0.359 (0.325)
The subscripts are de ned as
All 53 (124) SVMP?A 46.9 (54.1) 37.4 (37.0) 66.5 (65.8) 0.187 (0.214)
follows: B B-factor, P sequence
pro le, A ASA SVMB?A 66.6 (62.6) 40.6 (40.2) 66.7 (67.7) 0.294 (0.284)
a
The numbers of all chains in SVMB?P 63.1 (64.7) 43.8 (43.6) 70.0 (70.5) 0.320 (0.332)
different categories are in SVMB?P?A 63.4 (65.4) 44.6 (45.0) 70.6 (71.6) 0.331 (0.351)
parentheses
two sets. The results further illuminate that B-factor is whole set. Even so, SVMB?P?A achieved a CC of 0.331 on
crucial to identify the interaction sites of the chains with the nonhomologous set, which was better than the value of
large conformational changes. In addition, except for threefold cross-validation on 130 chains. The prediction
SVMB?A, the performances of the other three classi ers results con rm that our classi ers are robust, and that using
tested on the nonhomologous set were not so good as on the more training samples can acquire better performance.
123
Identifying protein protein interaction sites in transient complexes 269
Acknowledgments This work was supported by the National Nat-
Comparison with cons-PPISP
ural Science Foundation of China (Grant Nos. 90608020, 30370354,
and 90203011), NCET-060651, the National Platform Project of
A direct comparison with other methods is dif cult due to China (Grant No. 2005DKA64001), and the Ministry of Education of
the differences in the de nitions of surface residues and China (Grant Nos. 200******** and 505010).
interface residues and the preparations of datasets. We
made an attempt to compare our method with cons-PPISP,
because they were both tested on the protein protein
References
docking benchmark set. Cons-PPISP that used sequence
pro le and solvent accessibility as input to neural networks
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic
was developed by Chen and Zhou (2005). Their method
local alignment search tool. J Mol Biol 215:403 410
was tested on 68 unique chains of 40 complexes in
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
benchmark 1.0. The sensitivity and speci city were 50% Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs. Nucleic Acids
and 50% for the enzyme-inhibitor category, 28 and 31% for
Res 25:3389 3402. doi:10.1093/nar/25.17.3389
other category, and 38 and 42% for the whole 68 chains,
Ansari S, Helms V (2005) Statistical analysis of predominantly
respectively. Our method was tested on 95 unique chains of transient protein protein interfaces. Proteins 61:344 355. doi:
62 complexes in benchmark 2.0, which achieved a sensi- 10.1002/prot.20593
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,
tivity and speci city of 69 and 51% for the enzyme-
Shindyalov IN, Bourne PE (2000) The protein data bank.
inhibitor category, 64 and 42% for other category, and 67
Nucleic Acids Res 28:235 242. doi:10.1093/nar/28.1.235
and 44% for the whole 95 chains, respectively. Our method Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR (2006) Insights
achieving better performance probably depends on three into protein protein interfaces using a Bayesian network
prediction method. J Mol Biol 362:365 386. doi:10.1016/j.jmb.
factors. First, B-factor was used as an additional feature in
2006.07.028
our method, which led to the obvious improvement of
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector
prediction performance. When sequence pro le and ASA machines. Software available at: (http://www.csie.ntu.edu.tw/
were combined, the sensitivity and speci city of our *cjlin/libsvm)
Chelliah V, Chen L, Blundell TL, Lovell SC (2004) Distinguishing
method for 95 chains were only 55 and 36%. Second, we
structural and functional restraints in evolution in order to
used a balanced training set to train the classi ers, which
identify interaction sites. J Mol Biol 342:1487 1504. doi:
may result in a relatively high sensitivity. Third, owing to 10.1016/j.jmb.2004.08.022
the training set of Chen and Zhou including some other Chen H, Zhou HX (2005) Prediction of interface residues in protein
protein complexes by a consensus neural network method:
types of complexes, the features extracted from these
test against NMR data. Proteins 61:21 35. doi:10.1002/prot.
complexes may not be suitable for predicting interface
20514
residues in transient complexes. Cole C, Warwicker J (2002) Side-chain conformational entropy at
protein protein interfaces. Protein Sci 11:2860 2870. doi:
10.1110/ps.0222702
DeLano WL (2002) The PyMOL molecular graphics system.
Conclusion
Software available at: (http://www.pymol.org)
Dong Q, Wang X, Lin L, Guan Y (2007) Exploiting residue-level and
Transient protein protein interactions play a vital role in pro le-level interface propensities for usage in binding sites
prediction of proteins. BMC Bioinformatics 8:147. doi:
many biological processes. Due to the limitation of
10.1186/147*-****-*-***
experimental methods, the knowledge of these interactions
Fariselli P, Pazos F, Valencia A, Casadio R (2002) Prediction of
is inadequate. In this research, transient interfaces were protein protein interaction sites in heterocomplexes with neural
chosen as study objects and the features of these interfaces networks. Eur J Biochem 269:1356 1361. doi:10.1046/j.1432-
1033.2002.02767.x
were analyzed. It was found that besides sequence pro le
Friedrich T, Pils B, Dandekar T, Schultz J, Muller T (2006)
and ASA, B-factor can also distinctly re ect the differences
Modelling interaction sites in protein domains with interaction
between interface and noninterface. We converted these pro le hidden Markov models. Bioinformatics 22:2851 2857.
features into input vector and used SVM classi ers to doi:10.1093/bioinformatics/btl486
Glaser F, Steinberg DM, Vakser IA, Ben-Tal N (2001) Residue
predict residues in the interface. It is indicated that the
frequencies and pairing preferences at protein protein interfaces.
incorporation of B-factor is important to identify interac-
Proteins 43:89 102. doi:10.1002/1097-0134(20010501)43:2
tion sites in transient complexes, and that the information \89::AID-PROT1021[3.0.CO;2-H
contained within these features are complementary. Hoskins J, Lovell S, Blundell TL (2006) An algorithm for predicting
protein protein interaction sites: abnormally exposed amino acid
Therefore, our method can complement experimental
residues and secondary structure elements. Protein Sci 15:1017
techniques in studying transient protein protein interac-
1029. doi:10.1110/ps.051589106
tions. Incorporation of our method with more Jones S, Thornton JM (1995) Protein protein interactions: a review of
physicochemical properties and structural attributes will protein dimer structures. Prog Biophys Mol Biol 63:31 65. doi:
10.1016/0079-6107(94)00008-W
prompt the study of protein protein interactions.
123
270 R. Liu et al.
Jones S, Thornton JM (1996) Principles of protein protein interactions. Mintseris J, Wiehe K, Pierce B, Anderson R, Chen R, Janin J, Weng Z
Proc Natl Acad Sci USA 93:13 20. doi:10.1073/pnas.93.1.13 (2005) Protein protein docking benchmark 2.0: an update.
Jones S, Thornton JM (1997a) Analysis of protein protein interaction Proteins 60:214 216. doi:10.1002/prot.20560
sites using surface patches. J Mol Biol 272:121 132. doi: Neuvirth H, Raz R, Schreiber G (2004) ProMate: a structure based
10.1006/jmbi.1997.1234 prediction program to identify the location of protein protein
Jones S, Thornton JM (1997b) Prediction of protein protein interac- binding sites. J Mol Biol 338:181 199. doi:10.1016/j.jmb.2004.
tion sites using patch analysis. J Mol Biol 272:133 143. doi: 02.040
10.1006/jmbi.1997.1233 Ofran Y, Rost B (2003) Predicted protein protein interaction sites
Kabsch W, Sander C (1983) Dictionary of protein secondary from local sequence information. FEBS Lett 544:236 239. doi:
structure: pattern of hydrogen-bonded and geometrical features. 10.1016/S0014-5793(03)00456-3
Biopolymers 22:2577 2637. doi:10.1002/bip.360221211 Rost B, Sander C (1994) Conservation and prediction of solvent
Kim H, Park H (2003) Protein secondary structure prediction based accessibility in protein families. Proteins 20:216 226. doi:
on an improved support vector machines approach. Protein Eng 10.1002/prot.340200303
16:553 560. doi:10.1093/protein/gzg072 Szilagyi A, Grimm V, Arakaki AK, Skolnick J (2005) Prediction of
Koike A, Takagi T (2004) Prediction of protein protein interaction physical protein protein interactions. Phys Biol 2:S1 S16. doi:
sites using support vector machines. Protein Eng Des Sel 10.1088/1478-3975/2/2/S01
17:165 173. doi:10.1093/protein/gzh020 Tahirov TH, Lu TH, Liaw YC, Chen YL, Lin JY (1995) Crystal
Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, structure of abrin-a at 2.14 A. J Mol Biol 250:354 367. doi:
Ben-Tal N (2005) ConSurf 2005: the projection of evolutionary 10.1006/jmbi.1995.0382
conservation scores of residues on protein structures. Nucleic Tseng YY, Liang J (2007) Predicting enzyme functional surfaces and
Acids Res 33:W299 W302. doi:10.1093/nar/gki370 locating key residues automatically from structures. Ann Biomed
Li JJ, Huang DS, Wang B, Chen P (2006) Identifying protein protein Eng 35:1037 1042. doi:10.1007/s10439-006-9241-2
interfacial residues in heterocomplexes using residue conserva- Vapnik VN (1995) The nature of statistical learning theory. Springer,
tion scores. Int J Biol Macromol 38:241 247. doi:10.1016/ New York
j.ijbiomac.2006.02.024 Wang Y, Xue Z, Shen G, Xu J (2008) PRINTR: prediction of RNA
Li MH, Lin L, Wang XL, Liu T (2007) Protein protein interaction binding sites in proteins using SVM and pro les. Amino Acids
site prediction based on conditional random elds. Bioinformat- 35:295 302. doi:10.1007/s00726-007-0634-9
ics 23:597 604. doi:10.1093/bioinformatics/btl660 Yuan Z, Zhao J, Wang ZX (2003) Flexibility analysis of enzyme
Liang S, Zhang C, Liu S, Zhou Y (2006) Protein binding site active sites by crystallographic temperature factors. Protein Eng
prediction using an empirical scoring function. Nucleic Acids 16:109 114. doi:10.1093/proeng/gzg014
Res 34:3698 3707. doi:10.1093/nar/gkl454 Zhou HX, Shan Y (2001) Prediction of protein interaction sites from
Lo Conte L, Chothia C, Janin J (1999) The atomic structure of sequence pro le and residue neighbor list. Proteins 44:336 343.
protein protein recognition sites. J Mol Biol 285:2177 2198. doi:10.1002/prot.1099
doi:10.1006/jmbi.1998.2439
Mintseris J, Weng Z (2005) Structure, function, and evolution of
transient and obligate protein protein interactions. Proc Natl
Acad Sci USA 102:109**-*****. doi:10.1073/pnas.050*******
123