Bioinformatics involves the application of the principles of Information Technology in Biotechnology. It is a recently developed science. It uses techniques of Information Technology to understand and manipulate biological data. Tools and methods usually applied in computational technology are adapted to manage, store and analyze biological data.
Biological sciences involve a large amount of data due to the explosion of tools and techniques in the genomics and cell and molecular biology fields. Managing all this information can be difficult. Only if it is properly managed can the data be analyzed and manipulated. It is hence, an interdisciplinary science which requires knowledge of mathematics, computer science and statistics in order to be able to manage the information successfully. It mostly utilizes computers to store, manage and analyze all this data. The goal of Bioinformatics is to successfully analyze all the biological data and discover the hidden secrets about organisms and about life. This knowledge will help make improvements in the fields of agriculture, health, environment, energy and biotechnology.
The term Bioinformatics was coined in the 1990s. At that time, it merely dealt with DNA, mRNA and protein sequence data. But, the speed of with which biological data is being generated requires some efficient analytical techniques that can only be provided by bioinformatics. The other kinds of biological data that are analyzed by bioinformatics techniques are protein structure, microarrays, drug designing, gene expression etc.
Some of the areas of Biology that can be covered by Bioinformatics are:
DNA Microarrays- It is used to measure the levels of gene expression in cells, in various diseases, to detect SNPs. It is a series of thousands of DNA probes. Hybridization of probe to target is detected and analyzed. Since there are thousands of probes, several genetics can be performed at the same time.
Comparative Genomics- The complete genomes of different organisms can be compared and studied easily. The genomes of different strains of the same organisms can also be compared for structure and function. Considering the vast amount of information there is no choice but to automate the information.
Structural Genomics- This involves the prediction of the three dimensional structure and functions of proteins.
Functional Genomics- Identification of genes and their specialized functions.
Medical Informatics- This involves the management of biomedical and medical data with respect to biomolecules and assays.
Biological Databases
The biological data can be stored based on the kind of information into various databases. Each database may be available with its own set of tools to analyze the data. They are usually accessible by the public.
Protein sequence databases are of two types- primary and secondary. The primary databases include raw data like protein sequences while secondary databases combine data from primary databases and give a more complete set of information. Some examples are SWISS-PROT which has nucleotide sequence translations, protein structure, domain structure, post translational modifications; PIR-International which is a well classified and cross referenced protein sequence database, provided by National Biomedical Research Foundation (NBRF), USA. Then, there is ALIGN which has a compilation of sequence alignments, ProDom compiles homologous domains. There are a lot of such protein databases and these can be accessed to analyze protein structure, function and sequence and also prediction of protein structure.
PROSITE and Pfam are protein sequence motif databases that provide information on protein families and protein domain.
Nucleotide Sequence Databases data is submitted by genome sequencing labs and is stored in GenBank, EMBL (European Molecular Biology Laboratory) and DDBJ (DNA Data Bank of Japan). GenBank has nucleotide sequence databases that are added by National Center for Biotechnology Information (NCBI), USA. EMBL is maintained by European Bioinformatics Institute (EBI), UK. These databases are synchronized and coordinated almost every day. UniGene has clusters of GenBank nucleotide sequences. NCBI has completed genome sequences available, while EBI Genomes has information and statistics on completed genomes and on other current genome projects.
Molecular Structure Databases have a compilation of the three dimensional structures of molecules that are determined by Nuclear Magnetic Resonance (NMR) or X-Ray crystallography. Some examples are Protein Database (PDB) and Structural Classification of Proteins (SCOP).
Other databases include KEGG which has regularly updated information on metabolic pathways and participating macromolecules and genes.
This organizing and cross referencing of biological data into easily accessible databases has drastically cut short the time required to sift through data. Handling of so much biological data much less their analysis would have been impossible otherwise.
About Author / Additional Info: