Data structures of genome and protein sequences indexing




Data structure is a tool for storage and retrieval of information which is named logic and mathematic way of specific data organization. various sequences of genes and proteins in various creatures increases the amount of data in genome databases, and finding appropriate data structure and indexing are subject for many studies. String data structures are general data structure for genome indexing, and this article would review the many used three types of string data structure, suffix tree, suffix array, and Directed Acyclic Word Graphs. This paper is a review of the literature related to three types of data, including genome databases indexing field, tree, postfix, postfix and graphs spiral array directly introduces the word. Findings of this research show that suffix tree and Directed Acyclic Word Graph (DAWG) structures need much space however suffix array need less space. Against the Directed Acyclic Word Graph, suffix array can be stored on Memory Stick. Suffix tree and Directed Acyclic Word Graph are a dynamic structures but as suffix array is a Sorted out structure, it could hardly be changed.