Project IX: Protein classification via motif counting
and analysis.
Proteins are
typically classified into families according to functional and sequence
similarities. The similarity of the proteins across the sequences in each
family is far from uniform. While some regions
are clearly conserved, others display little sequence similarity. Often the
conserved regions are crucial to the protein’s function, for example enzymatic
catalytic sites. Such conserved regions can be used to probe an uncharacterized
sequence to indicate its function
The description of a
protein family by its conserved regions focuses on the family’s characteristic
and distinctive sequence features. Databases
of conserved features of protein families can be utilized to classify sequences
from proteins, cDNAs and genomic DNA.
This project will
concentrate on classification methods derived from marginal and sequential motif
counting. The statistical testing
methods used will be based on approximate normal theory and simulation-based methods,
such as permutation tests.