In the past, gene annotation was carried out laboriously by individual scientists interested in particular genes, but the process has now been largely automated. The usual approach is to use software to scan the stored sequences for transcriptional and translational start and stop signals, for RNA-splicing sites, and for other telltale signs of protein-coding genes. The software also looks for certain short sequences that  specify known mRNAs. Thousands of such sequences, called expressed sequence tags, or ESTs, have been collected from cDNA sequences and are cataloged in computer databases. This type of analysis identifies sequences that may be previously unknown protein-coding genes.
