Resume

Training Plant

Location:

Lincoln, NE

Posted:

November 21, 2012

Contact this candidate

Resume:

Open Access

Volume

et al.

Moriyama

**** *, ***** **, ******* R96

Method

Mining the Arabidopsis thaliana genome for highly-divergent seven

comment

transmembrane receptors

Etsuko N Moriyama*, Pooja K Strope*, Stephen O Opiyo, Zhongying Chen

and Alan M Jones

Addresses: *School of Biological Sciences and Plant Science Initiative, University of Nebraska-Lincoln, Lincoln, NE 68588-0660, USA.

Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583-0915, USA. Departments of Biology and

Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

reviews

Correspondence: Etsuko N Moriyama. Email: abpy2j@r.postjobfree.com

Published: 25 October 2006 Received: 28 June 2006

Revised: 24 August 2006

Genome Biology 2006, 7:R96 (doi:10.1186/gb-2006-7-10-r96)

Accepted: 25 October 2006

The electronic version of this article is the complete one and can be

found online at http://genomebiology.com/2006/7/10/R96

reports

2006 Moriyama et al.; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which

permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A combination of seven transmembrane proteinsmethods

Arabidopsis putative in Arabidopsis thaliana. is described and used to identify a minimum set of 54 candidate seven trans-

membrane receptors multiple protein classification

deposited research

Abstract

To identify divergent seven-transmembrane receptor (7TMR) candidates from the Arabidopsis

thaliana genome, multiple protein classification methods were combined, including both alignment-

based and alignment-free classifiers. This resolved problems in optimally training individual

classifiers using limited and divergent samples, and increased stringency for candidate proteins. We

identified 394 proteins as 7TMR candidates and highlighted 54 with corresponding expression

patterns for further investigation.

refereed research

The human genome encodes approximately 800 or more

Background

Seven-transmembrane (7TM)-region containing proteins 7TMRs, both with and without known cognate ligands (the

constitute the largest receptor superfamily in vertebrates and latter are so-called orphan GPCRs); they thus constitute >1%

other metazoans. These cell-surface receptors are activated of the gene complement [6,7]. More than 1,000 genes or 5%

by a diverse array of ligands, and are involved in various sig- of the Caenorhabditis elegans genome are predicted to

naling processes, such as cell proliferation, neurotransmis- encode 7TMRs; the majority of them appear to be chemore-

interactions

sion, metabolism, smell, taste, and vision. They are the ceptors [8]. Approximately 300 7TMR-encoding genes

central players in eukaryotic signal transduction. They are (about 1% to 2% of the genome) have been recognized in the

commonly referred to as G protein-coupled receptors Drosophila melanogaster genome [6,7]. Compared to such

(GPCRs) because most transduce extracellular signals into large numbers of 7TMRs found in animal genomes, very few

cellular physiological responses through the activation of het- 7TMpRs have been reported in plants and fungi. Only 22 Ara-

erotrimeric guanine nucleotide binding proteins (G proteins) bidopsis 7TMpRs have been described so far. Fifteen of them

[1]. However, an increasing number of alternative 'G protein- constitute the 'mildew resistance locus O' (MLO) family,

whose direct interaction with the G-protein subunit (G )

independent' signaling mechanisms have been associated

information

with groups of these 7TM proteins [2-5]. Thus, for precision has not been shown [9,10]. While another 7TMpR, GCR1 [11],

directly interacts with the plant G subunit GPA1 [12], it has

and clarity, we refer to these proteins simply as 7TM receptors

(7TMRs), and candidate proteins in organisms greatly diver- been shown that GCR1 can act independently of the heterot-

gent to humans are designated here as 7TM putative recep- rimeric G-protein complex as well [2]. Hsieh and Goodman

tors (7TMpRs). [13] recently reported five expressed proteins predicted to

Genome Biology 2006, 7:R96

R96.2 Genome Biology 2006, Volume 7, Issue 10, Article R96 Moriyama et al. http://genomebiology.com/2006/7/10/R96

have 7TM regions (heptahelical transmembrane proteins based classifiers and more sensitive alignment-free classifi-

(HHPs) 1 to 5) but these, like the other 16, do not have candi- ers, to predict candidate 7TMpRs in divergent genomes more

date ligands. Finally, an unusual Regulator of G Signaling effectively.

(RGS) protein (designated AtRGS1) has been predicted to

have 7TM regions [14]. RGS proteins function as a GTPase

activating protein (GAP) to de-sensitize signaling by de-acti- Results and discussion

vating the G subunits of the heterotrimeric complex. Identifying 7TMpR candidates using various protein

Because Arabidopsis seedlings lacking AtRGS1 have reduced classification methods

sensitivity to D-glucose [2,14,15], the possibility exists that Among many protein classification methods commonly used,

AtRGS1 is a novel D-glucose receptor having an agonist-regu- the current state-of-the-art and most used is the profile hid-

lated GAP function. Although we designate them 7TMpRs den Markov models (profile HMMs) [27]. It is used to con-

here, it should be noted that neither a ligand nor a full signal- struct protein family databases such as Pfam [28,29], SMART

ing cascade has been demonstrated yet for any of these plant [30,31], and Superfamily [32]. However, profile HMMs and

proteins, and only for a barley MLO protein has the 7TM other currently used classification methods such as PROSITE

topology been experimentally confirmed [9]. [33,34] and PRINTS [35,36] share an important weakness.

These methods rely on multiple alignments for generating

None of the reported Arabidopsis 7TMpR proteins share sub- their models (patterns, profile HMMs, and so on). Generating

stantial sequence similarity with known metazoan GPCRs robust multiple alignments is difficult or impossible when

constituting six different subfamilies. It appears that plant extremely diverged sequences are included in the analysis;

7TMpRs dramatically diverged from known metazoan GPCRs 7TMRs are one such protein family whose sequence similari-

over the 1.6 billion years since the plant and metazoan line- ties between subgroups can be lower than 25%. Furthermore,

ages bifurcated. It should be noted that Arabidopsis GCR1 alignments are generated only from known related proteins

shares weak but significant similarity with the cyclic AMP (positive samples), and, therefore, no information from neg-

receptor, CAR1, found in the slime mold [2,11,16]. There is ative samples (unrelated protein sequences) is directly incor-

also very weak similarity to the Class B Secretin family porated in the model building process. Identifiable 'hits' are,

GPCRs. However, other than GCR1, currently used search therefore, constrained by initial sampling bias, which

methods have not robustly identified plant 7TMpR proteins becomes reinforced when models are iteratively rebuilt from

as candidate GPCRs. This great sequence divergence high- accumulated sequences. Consequently, the predictive power,

lights the need for new approaches to identify divergent especially the sensitivity, of these classifiers decreases when

7TMR candidates in non-metazoan genomes. they are applied against extremely diverged protein families.

The human genome contains 16 G, 5 G, and 12 G genes. In To overcome this disadvantage and to increase sensitivities

stark contrast, both fungi and plants have much simpler G- against such non-alignable similarities, several 'alignment-

protein coupled signaling systems. For example, the Arabi- free' methods have been proposed recently. These methods

dopsis genome contains one canonical G, one G, and two quantify various properties of amino acid sequences and con-

G genes [17]. Similarly, a small number of G-proteins are vert them into a descriptor array. Once multiple sequences

found in fungi; there are two G, one G, and one G in Sac- with different lengths are transformed into a uniform matrix,

charomyces cerevisiae [18-20] while Neurospora crassa and various multivariate analysis methods can be applied. Kim et

some fungi have more genes encoding each subunit [21-23]. al. [37] and Moriyama and Kim [38] used parametric and

Therefore, it may be reasonable to assume that plants and non-parametric discriminant function analysis methods.

fungi have fewer GPCRs than human, and while approxi- Karchin et al. [39] incorporated profile HMMs with support

mately 200 Arabidopsis proteins were predicted to have 7TM vector machines (SVMs) using the Fisher kernel (SVM-

regions, sequence divergence precludes unequivocal assign- Fisher) so that negative sample information can be taken into

ment of any as an orphan GPCR [24,25]. However, at least 61 account when training the classifier. SVMs can be applied

7TMpRs have been recently predicted from the plant patho- with completely 'alignment-free' sequence descriptors, for

genic fungus Magnaporthe grisea genome [26], raising the example, amino acid and dipeptide compositions. Such align-

possibility that more divergent groups of 7TMpR proteins ment-free classifiers are shown to outperform profile HMMs

likely remain undiscovered in non-metazoan taxa. as well as Karchin et al.'s SVM-Fisher [40,41] (PK Strope and

EN Moriyama, submitted). Another multivariate method,

In this report, we describe our comprehensive computational partial least squares (PLS) regression, was used by Lapinsh et

strategy for identifying 7TMpR candidates from the entire al. [42] with physico-chemical properties of amino acids. We

protein sequence set predicted from the A. thaliana genome, recently re-evaluated the descriptors used with PLS and opti-

and compile their tissue-specific expression and co-expres- mized them to discriminate 7TMRs from other proteins [43].

sion patterns with G-proteins. To take advantage of different

approaches, we combined multiple protein classification We applied these methods against the entire predicted pro-

methods, including more specific (conservative) alignment- tein sequence set derived from the A. thaliana genome. As

Genome Biology 2006, 7:R96

http://genomebiology.com/2006/7/10/R96 Genome Biology 2006, Volume 7, Issue 10, Article R96 Moriyama et al. R96.3

shown in Table 1, among the 28,952 protein sequences, the Table 1

Sequence Alignment and Modeling system (SAM), a profile

Numbers of 7TMpR candidates identified by various methods

HMM method, predicted only 16 (excluding one alternatively from the A. thaliana genome

comment

spliced gene sequence) as 7TMpR candidates. Fifteen of them

are identified as MLO or similar to MLO and one as GCR1 in Methods Number of 7TMpR candidates*

The Arabidopsis Information Resource (TAIR) [44,45]. It

HMMTOP

clearly shows that SAM is highly specific (discriminating)

with no false positive, assuming that current annotations are 7TMs 236 (201)

correct. SAM failed to identify only one known MLO (MLO4: 6-8 TM 633 (545)

At1g11000). This protein, as well as AtRGS1 and five recently 5-9 TMs 1,091 (957)

predicted 7TM proteins (HHP1-5), were among the 16 previ- 5-10 TMs 1,343 (1,179)

reviews

ously predicted Arabidopsis 7TMpRs not included in the ran- SAM 16 (15)

domly sampled 500 7TMR training sequences (see Materials LDA 3,211 (2,935)

and methods). Thus, we concluded that the predictive power QDA 2,006 (1,820)

of SAM alone is insufficient to identify highly diverged and LOG 2,626 (2,394)

potentially novel 7TMpR sequences. KNN (K = 5) 3,125 (2,839)

KNN (K = 10) 3,202 (2,906)

The results obtained by SAM were compared with those

KNN (K = 15) 3,298 (3,004)

obtained by alignment-free methods. As shown in Table 1,

KNN (K = 20) 3,347 (3,043)

reports

alignment-free methods (LDA, QDA, LOG, KNN, SVM with

SVM-AA 2,263 (2,043)

amino acid composition (SVM-AA), SVM with dipeptide

SVM-di 2,004 (1,807)

composition (SVM-di), and PLS with amino acid properties

PLS-ACC 2,671 (2,466)

(PLS-ACC)) predicted 2,000 to 3,400 proteins as 7TMpR

candidates, which is about 10% of the entire predicted Arabi- *The numbers in parentheses show 7TMpR candidates after removing

dopsis proteome and about 30% to 50% of all possible trans- proteins derived from alternative splicing. The numbers of TM regions

predicted by HMMTOP.

membrane proteins (6,475 proteins) [24,25]. These

deposited research

alignment-free methods clearly call many false positives, and

need further optimization to improve their discrimination regions, the number of Arabidopsis 7TMpR candidates

power. becomes 1,179 proteins.

One advantage of alignment-free methods to be noted is their Choosing 7TMpR candidates by combining prediction

sensitivity against short or partial sequences [37,38]. Many of results

the 28,952 protein sequences used in this study are based Among the ten alignment-free classifiers, LOG misclassified

only on ab initio gene prediction results, and hence are likely seven previously predicted Arabidopsis 7TMpRs. KNN with

refereed research

to contain various types of errors. If only a part of a 7TMR K set at 5, 10, and 15 missed one, while KNN with K set at 20

protein is predicted correctly, alignment-free methods could classified them all correctly (see Materials and methods on

have a better chance to identify it. KNN). To reduce the number of false positives (non-7TMRs

predicted as 7TMRs) as well as false negatives (7TMRs pre-

Table 1 lists Arabidopsis proteins that were predicted to have dicted as non-7TMRs) and to obtain a set of 7TMpR candi-

five to ten transmembrane regions and bins them by the dates with higher confidence, we examined combinations of

number of transmembrane regions. HMMTOP 2.0 [46,47] the prediction results by the remaining six alignment-free

predicted 201 proteins as having 7TM regions. This number is methods (LDA, QDA, KNN with K = 20, SVM-AA, SVM-di,

close to a previous prediction (184 proteins) [24,25]. We and PLS-ACC). There were 652 proteins predicted as 7TMpR

interactions

should note, however, that no single method predicts 7TM candidates by all six methods (by choosing the strict intersec-

regions from all known 7TMRs exactly (see Materials and tion). Using the number of predicted TM regions to be 5 to 10,

methods). As mentioned above, it is also possible that some 394 (342 after removing duplicated entries due to alternative

deduced Arabidopsis proteins we analyzed do not contain the splicing) proteins were identified as 7TMR candidates. These

entire correct coding region. There were 952 Arabidopsis Arabidopsis proteins are listed in Additional data file 1. Of the

proteins predicted to have five to nine TM regions. Based on 22 previously predicted 7TMpRs, 20 were found in this list.

the distribution of predicted TM numbers obtained from the Although HHP4 and HHP5 were not included in this list, both

entire GPCRDB entries, this range (5 to 9 TM regions) could were identified by two of the alignment-free methods: KNN

information

cover almost all of the 7TMR candidates (99.1%; see Figure 1 and SVM-AA. Note that RGS1 and five HHP (as well as nine

and Materials and methods). The 22 previously predicted MLO and GCR1) sequences were excluded from the training

Arabidopsis 7TMpRs were predicted to have seven to ten TM set, and these six were not identified as candidate 7TMpRs by

regions (Figure 1). If we extend the range to 5 to 10 TM SAM.

Genome Biology 2006, 7:R96

R96.4 Genome Biology 2006, Volume 7, Issue 10, Article R96 Moriyama et al. http://genomebiology.com/2006/7/10/R96

list. The identification of multiple members from these gene

families using our alignment-free methods supported the

99.8 (99.1)

consistency of this approach. However, for most of these fam-

ilies, not all members were found. Additionally, eight single

97.6 (97.1)

HMMTOP

representatives of small protein families consisting of two to

TMHMM

five members and four single representatives of large protein

families were found in the list. Some of these proteins, espe-

400

cially those from large protein families, may represent false

positives as 7TMpR candidates. This 7TMR mining method

can be refined, for example, by re-training models as well as

300

Counts

using more flexible hierarchical classification.

200

The five predicted heptahelical proteins (HHP1-5) reported

by Hsieh and Goodman [13] were identified by sequence sim-

ilarity to human adiponectin receptors (AdipoRs) and mem-

100

brane progestin receptors (mPRs) that share little sequence

4 2 similarity to known GPCRs. HHP1-3 were identified in our

initial list of 394 but were culled from the final list of 54 Ara-

0 1 2 3 4 5 6 7 8 9 10

bidopsis 7TMpR candidates. This is because HMMTOP pre-

Number of TMs

dicted HHP1, HHP2, HHP4, and HHP5 to have seven TM

regions and intracellular amino termini, in contrast to known

Figure 1

bars) and TMHMM (gray bars) numbers predicted sample sequences

Distribution of transmembranefrom the 500 7TMR by HMMTOP (black

Distribution of transmembrane numbers predicted by HMMTOP (black GPCRs. This unusual structural topology was also found in

bars) and TMHMM (gray bars) from the 500 7TMR sample sequences.

AdipoRs [13,48]. HHP3 had eight predicted TM regions. Of

Proportions of the proteins predicted to have six to eight and five to

the 15 MLO proteins, 8 were also predicted to have 8 to 10 TM

nine TM regions by HMMTOP are shown at the top. The percentages

regions by HMMTOP (Figure 1). Recently, Benton et al. [49]

shown in parentheses were obtained from the entire 7,674 7TMR dataset

in GPCRDB. The numbers shown on the top of black bars are the number experimentally showed that Drosophila odorant receptors,

of previously predicted 22 Arabidopsis 7TMpR proteins.

another extremely diverged 7TMR family, have intracellular

amino termini. Among our 394 candidate list, 23 proteins

were predicted to have seven TM regions and intracellular

A further restriction to protein topology of exactly 7TM amino termini (Additional data file 1). Therefore, we consider

regions and an amino-terminus located extracellularly these 54 as a minimum working set of 7TMpR candidates,

reduced the candidate number to 64 (54 excluding duplica- and many of the other proteins included in the list of 394

tions due to alternative splicing). This set included nine of the should be examined in the second stage.

22 previously predicted 7TMpRs. These 54 7TMpR candi-

dates are the first targets for our further analysis and are sum- Expression patterns of genes encoding the 7TMpR

marized in Table 2 (also listed in Additional data file 2). candidates and G-protein subunits

Eighteen are described as simply 'expressed proteins' in the We utilized the Meta-Analyzer server of the Genevestigator

TAIR database (except for AT3G26090, which encodes web site to study spatial expression patterns of Arabidopsis

RGS1). Interestingly, one of them (AT5G27210) is known to genes encoding the 7TMpR candidates and G-protein subu-

have weak similarity to a mouse orphan 7TMR. While others nits. Note that the expression of MLO genes were not

are known to belong to certain protein families (for example, included in this analysis since we reported them recently

MtN3 family), in many cases, their molecular functions have [50]. As is shown in Figure 2, expression patterns of analyzed

not been identified, and further investigation on these 7TMpR candidates can be divided into two major groups;

7TMpR candidates is warranted. about half of them show distinct tissue specificity, whereas

the other half either exhibit less distinct expression patterns

The 54 proteins were grouped into families based on similar- or display ubiquitous expression. All genes encoding G-pro-

ities to known protein sequences. Eight of the 54 7TMpR can- tein subunits fall into the latter major group. Ubiquitous

didates, including GCR1 and RGS1, are encoded by single expression of genes encoding G-protein subunits allows over-

copy genes. In addition to the seven MLO proteins identified, lap with genes in both groups, and makes, in principle, co-

there are eight MtN3 family members, two proteins of an functioning of G-proteins with these 7TMpR candidates spa-

unnamed family consisting of six expressed proteins, as well tially and temporally possible. All eight genes encoding the

as multiple (two to three) members from smaller gene fami- MtN3 family proteins appear to have distinct tissue specific

lies (five or less). All members of the TOM3 family and the expression. Among them, At3g48740 and At4g25010 have

Perl1-like family, as well as the majority of the GNS/SUR4 the highest sequence similarities to At5g23660 and

family and an unnamed family consisting of five expressed At5g50800, respectively. Both pairs of genes share similar or

proteins (expressed protein family 2) were included in the overlapping expression patterns, suggesting relatedness/

Genome Biology 2006, 7:R96

http://genomebiology.com/2006/7/10/R96 Genome Biology 2006, Volume 7, Issue 10, Article R96 Moriyama et al. R96.5

Table 2

Summary of the 54 7TMpR candidates identified in this study1

comment

Groups* TAIR locus IDs

Multiple members from gene families

Nodulin MtN3 family proteins (8/17) At1g21460, At3g16690, At3g28007, At3g48740, At4g25010, At5g13170, At5g23660, At5g50800

MLO proteins (7/15) At1g11000 (MLO4), At1g26700 (MLO14), At1g42560 (MLO9), At2g33670 (MLO5), At2g44110

(MLO15), At4g24250 (MLO13), At5g53760 (MLO11)

Expressed protein family 1 (2/6) At1g77220, At4g21570

GNS1/SUR4 membrane family proteins (3/4) At1g75000, At3g06470, At4g36830

reviews

Perl1-like family protein (2/2) At1g16560, At5g62130

TOM3 family proteins (3/3) At1g14530, At2g02180, At4g21790

Expressed protein family 2 (3/5) At1g10660, At2g47115, At5g62960

Expressed protein family 3 (2/4) At3g09570, At5g42090

Expressed protein family 4 (2/5) At1g49470, At5g19870

Expressed protein family 5 (2/5) At3g63310, At4g02690

Single copy genes (8) At1g48270 (GCR1), At1g57680, At2g41610, At2g31440, At3g04970, At3g26090 (RGS1),

At3g59090, At4g20310

reports

Single member from small gene families (8) At2g01070, At3g19260, At2g35710, At2g16970, At1g15620, At1g63110, At4g36850, At5g27210

Single member from big gene families (4) At1g71960, At3g01550, At5g23990, At5g37310

*The number of candidates identified in this study belonging to each group is shown in parentheses (the number of all proteins in each group is given

after More detailed information is given in Additional data file 2.

similarity of their functions. Confirming the actual functions Materials and methods

of the 7TMpR candidates as GPCRs requires further extensive

deposited research

Arabidopsis protein data

testing. A possible involvement of these candidate proteins in We downloaded 28,952 protein sequences from TIGR (Ara-

'G protein-independent' signaling mechanisms also needs to bidopsis thaliana database release 5, dated 10 June 2004)

be explored. [51]. Among the 28,952 proteins, 2,760 are derived from

alternative splicing.

Conclusion Training data preparation for protein classification

We show that the profile HMM protein classification method, Positive training samples (known 7TMR sequences) were

currently one of the most used, is overly specific (conserva- obtained from GPCRDB (Information System for G Protein-

refereed research

tive) when applied to extremely diverged 7TMpR proteins. Coupled Receptors, Release 9.0, last updated on 28 June 28

Our premise is that there are more 7TMpRs yet to be identi- 2005) [6,7]. In the GPCRDB, 2,030 7TMRs (originally col-

fied in the A. thaliana and other genomes divergent to lected from the Swiss-Prot protein database) were grouped

humans. The limitations were that the lack of available sam- into six major classes (classes A to E plus the Frizzled/

ples limits the effectiveness of profile HMM methods, and Smoothened family) and six putative families (ocular albi-

while alignment-free methods are more sensitive, they have nism proteins, insect odorant receptors, plant MLO recep-

high rates for false positives. The candidate 7TMpR proteins tors, nematode chemoreceptors, vomeronasal receptors, and

provided in this study, for example, can be included to expand taste receptors). Five hundred 7TMR sequences were ran-

the training set and re-iteration using refined training sets domly sampled and used as the positive samples. Note that

interactions

can be done to reduce false positive rates. However, this is 'putative/unclassified' (orphan) 7TMRs and bacteriorho-

possible only after these new candidates are confirmed as true dopsins were not included in this dataset. These 500 7TMRs

positives experimentally. included six of the15 known Arabidopsis MLO proteins.

Among the 22 currently known Arabidopsis 7TMpRs, in

The strategy we described here overcomes the 'chicken-or- addition to the nine MLO proteins, GCR1 as well as six

egg' problem; predictions by multiple protein classification recently identified Arabidopsis 7TMpRs (AtRGS1 and HHP1-

methods and the number of predicted transmembrane 5; GPCRDB does not list these proteins) were not included in

regions were used to identify a more likely reduced set of the random 500 7TMR samples. Note that the 15 Arabidopsis

information

7TMR candidates. By setting up various methods as hierar- 7TMpRs not included in the training set can be used to assess

chical multiple filters, one can prioritize target protein sets for the classifier performance as test cases.

further experimental confirmation of their functions.

For negative samples, 500 non-7TMR sequences longer than

100 amino acids were randomly sampled from the Swiss-Prot

Genome Biology 2006, 7:R96

R96.6 Genome Biology 2006, Volume 7, Issue 10, Article R96 Moriyama et al. http://genomebiology.com/2006/7/10/R96

alu

sio

res

ion

La ots nt le

Pe ult le leaf

Ca wer enc

Ju sette leaf

ati t

Rouline pex

Ra co s

ng roo

llu pe

po on

Inf dicle tyl

Se iole f

tyl g

Ad enile

Call sus

Caoot a

Po men

Hy ed

Coedlin

Ro sc

Floores

Sil icel

Pe ma

Elo al

Se que

Ov l

Se s

r pe

Pe len

ar y

Sta al

Sh de

Ste d

No m

ter

Se al

Color scale

Sti

6 At1g14530 >= 95%

5 At1g10660 >= 90%= 85% = 80% = 75% = 70% = 65% = 60% = 55% = 50% = 45% = 40% = 35% = 30% = 25% = 20% = 15% = 10% = 5% = 0%

Contact this candidate