Protein Motif and Protein Domain Prediction

Written by Super User. Posted in Protein

Information about protein motif and protein domain prediction.

Also See Our Link Database of Protein Motif Bioinformatic Tools

Protein motifs and domains are important as they provide clues as to posible functions of the protein, and possible interactions of the protein. If the protein for example has a domain for DNA binding, you would predict that that protein may act by binding to DNA sequences.

A note on protein motif and protein domain database searches: Most proteins are modular with several domains. If your protein or unknown protein is most similar to a protein kinase, this does not necessarily mean that it is a kinase - it is possible the two proteins share several other domains (ie. several SH3 domains).

See Our Link Database of Protein Motif Bioinformatic Tools

Protein Motif Databases:

CDD The Conserved Domain Database. Proteins often contain several modules or domains, each with a distinct evolutionary origin and function. NCBI's Conserved Domain Database is a collection of multiple sequence alignments for ancient domains and full-length proteins. The CD-Search service may be used to identify the conserved domains present in a protein query sequence.

CDD Keyword Search Search the Conserved Domain Database by Keyword at Entrez Pubmed.

CDART: Conserved Domain Architecture Retrieval Tool

InterPro InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.

PROSite PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs.

Pfam Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. Pfam version 7.7b (October 2002) contains alignments and models for 4832 protein families, based on the Swissprot 40 and SP-TrEMBL

ProDom Protein Domain Database

PairsDB The Automatic Domain Decomposition Algorithm (ADDA) is used to generate a database of protein domain families with complete coverage of all protein sequences. Sequences are split into domains and domains are grouped into protein domain families in a completely automated process. The current database contains domains for more than 1.5 million sequences in more than 40 000 domain families. In particular, there are 3828 novel domain families that do not overlap with the curated domain databases Pfam, SCOP and InterPro. Data isfreely available for downloading and querying via a web interface. See also:
PairsDB Domain Families Search
PairsDB Domain Families Browse

PRINTS PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of a SWISS-PROT/TrEMBL composite. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbours.

HITS Motif Scan Hits is a free database devoted to protein domains. It is also a collection of tools for the investigation of the relationships between protein sequences and motifs described on them. These motifs are defined by an heterogeneous collection of predictors, which currently includes regular expressions, generalized profiles and hidden Markov models

SMART Simple Modular Architecture Research Tool.  Goal: to allow automatic identification and annotation of domains in user-supplied protein sequences.

PROClass The ProClass database is a non-redundant protein database organized according to family relationships as defined collectively by ProSite patterns and PIR superfamilies. The ProClass database can facilitate protein family information retrieval, unveil domain and family relationships, and classify multi-domained proteins, by combining global and motif similarities into a single family organization scheme.

ExPASY SWISS PROT Searchable index Molecular Biology Server of the University of Geneva: contains searchable index of SWISS-PROT.

ExPASY PROSITE PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs.

SYSTERS Protein Family Database by Motifs

TIGRFAM Database are protein families based on Hidden Markov Models or HMMs.

eMotif Search The eMOTIFS are derived from the multiple sequence alignments in the BLOCKS+ database.

MOTIF: Can search multiple databases including: PROSITE Pattern, PROSITE Profile, BLOCKS, ProDom, PRINTS, Pfam, Pfam_fs for fragments, and a user-defined profile library.

HMM Search - Motif Search of a Protein Sequence using the Pfam-A Database of Protein Domains - Sanger Centre (UK).


HMM Search - Motif Search of a Protein Sequence using the Pfam-A Database of Protein Domains - WUSTL (US).

MAST - Motif Alignment and Search Tool - Pasteur Inst (France).

MEME - Multiple EM for Motif Elicitation - Pasteur Institute (France).

MOTIF - Pattern Search Service - Iowa State U (US) .

PatternFind - Sequence Pattern Search- Pattern Search of several databases including: Swiss-Prot, TrEMBL, Swiss-Prot splice variants, trEST, trGEN,
trome, Current ENSEMBL peptides for all species, Microbial complete proteomes, RefSeq Release, RefSeq weekly updates, PATTINPROT - Query Sequence Pattern Search - Pole Bio-Informatique Lyonnaise (France), Pratt2.1 - Pattern Discovery Tool, Pratt - U Helsinki (Finland), Pratt - Pasteur Inst (France), Pratt - Inst Informatik-U Bergensis (Norway), Pratt - EBI (UK).


PROSITE - Profile Motifs Searches of ProSite Database.

PROSITE Search - Genestream - IGH Montpellier (France).


ScanProsite - Scan a Protein Sequence against the PROSITE DB - ExPASy (Switzerland).

Plant Protein Motif Databases:

PlantP 

PHYTOPROT The general procedure behind PHYTOPROT consists in performing an "all-by-all" comparison of (putative) protein sequences with the BIOFACET software, building clusters of sequences (Mohseni-Zadeh et al., Recomb 2003) based on their similarity and finally displaying à la Prodom the domains shared by the sequences in each cluster

Integr8 Arabidopsis Thalania Provides information about protein using keyword search. Contains Domain information within the search.

Pfam Arabidopsis Protein Families

Parasite Protein Motif Databases:

Parasite Motif Search Parasite Motif Database. Databases of: All Leishmania, Leishmania major Friedlin, Trypanosoma brucei, Trypanosoma cruzi, All Crithidia, All kinetoplastids, Plasmodium falciparum, All Plasmodium, Toxoplasma gondii, All Apicomplexa, All Schistosoma, All Filarioidea.

Other Species Protein Motif Databases:

Other Species at Integr8

Pfam other species display_block('protein_motifs'); ?>