Guidelines to
HYDROPHOBIC CLUSTER ANALYSIS
(HCA)
The HCA method is
based on the use of a bidimensional plot, called the HCA plot, the
principles of which are illustrated below (Figure
1).
The bidimensional plot originates
from the drawing of the 1D sequence on an an alpha helix (3.6
residue/turn, connectivity distance of 4 (residues separating two
different clusters) which has been shown to offer the best
correspondence between clusters and regular secondary structures.
Examination of the HCA plot of a protein sequence allow to easily
identify globular regions from non globular ones and, in globular
regions, to identify secondary structures. This 2D signature, which is
much more conserved than 1D sequence and which can be enriched from the
comparison of families of highly divergent sequences, allows to
succesfully detect at low levels of sequence identity relevant
similarities.
For more details about the methodology and applications, see our publications.
Figure 1 (adapted from the figure 1 of Ref.1)
Illustration of the
principles of the HCA diagram
The protein
linear sequence (1D) (here the human alpha1 antitrypsin) is shown on
the top of the figure with hydrophobic amino acids coloured. This
sequence is written on an alpha helix displayed along a cylinder.
The cylinder is then cut parallel to its axis and unrolled in a
bidimensional diagram (2D). This diagram is compacted and duplicated in
order to restore the full environment of each amino acids. Hydrophobic
amino acids are not distributed random but form clusters. The positions
of these clusters have been shown to correspond to the positions of
regular secondary structures (alpha helices and beta strands). This is
illustrated by the correponding experimental structure (3D). The form
of the clusters is generally indicative of the type of secondary
structures (vertical clusters are often associated to beta strands
whereas horizontal ones often correspond to alpha helices).
Special symbols are used for some amino acids: star for proline, square
and dotted square for threonine and serine and diamond for glycine.
A detailled list of the percentages
of alpha, beta and coil structures associated to each cluster (as
deduced from experimental structures) is in preparation. Conversely,
sequences stretches between clusters mainly correspond to loops. The 2D
structure of a protein sequence can be therefore easily deduced from
the examination of the HCA plot.