Consequences of Amino Acid Substitutions on Protein Function
Authors: Deepak V Pawar1, Mahesh Mahajan1, Rakesh Kumar Prajapat1, Kishor U Tribhuvan1
1ICAR-NRCPB, I.A.R.I, New Delhi-12


From the time when the first protein sequences and structures have been determined, it has been clear that the position and properties of amino acids are key in determining the biological function of proteins. For example, the first protein structure to be discovered, haemoglobin provided a molecular basis for understanding the basis of genetic disease sickle cell anaemia. A single nucleotide mutation leads to a substitution of glutamate with valine is the basis of disease. The substitution is results in lower solubility of haemoglobin and also causes the molecules to form long fibres within blood cells which leads to the unusual sickle-shaped cells. Haemoglobin is just one of many examples now known where single mutations can have radical consequences on protein structure, function and thereby protein associated phenotype. With the availability of thousands or even millions of DNA and protein sequences now we have knowledge of many mutations, either naturally occurring or artificially induced.

Protein Features Relevant to Amino Acid Behavior

The function of protein is determined by a number of general properties

  1. Protein Environments
The most important feature is cellular location of the protein. Different subcellular location has different chemical environments with the consequence that many amino acids behave differently with the change in their chemical environment. The most significant difference is between soluble and membrane proteins. While soluble proteins prefer to be surrounded by water molecules, membrane proteins are surrounded by lipids. Roughly speaking this means that these two classes of proteins behave in an ‘inside-out’ fashion relative to each other. Soluble proteins have polar or hydrophilic residues on their surfaces, whereas membrane proteins have hydrophobic residues on the surface that interact with the membrane. The soluble proteins further get categorized into two types, extracellular and intracellular/ cytosolic. The cytosolic environment is quite different from the more aqueous environment outside the cell; the density of proteins and other molecules changes the behavior of some amino acids quite drastically, the foremost among these being cysteine. Outside the cell, cysteines in proximity to one another get oxidized to form disulphide bonds, sulphur-sulphur covalent linkages tha are important for protein folding and stability. However, due to the reducing environment inside the cell formation of these bonds is very difficult in cytosolic proteins.

Cells also contain numerous compartments, the organelles, which can also have slightly

different environments from each other. Proteins in the nucleus often interact with DNA, meaning they contain different preferences for amino acids on their surfaces (e.g. positive amino acids or those containing amides most suitable for interacting with the negatively charged sugar-phosphate backbone). Some organelles such as mitochondria or chloroplasts are quite similar to the cytosol, while others, such as lysosomes and Golgi bodies are more alike the extracellular environment. Therefore, it is important to consider the likely cellular location of any protein before considering the consequences of amino acid substitutions.

  1. Protein Structure
Proteins themselves also contain different microenvironments. For soluble proteins, the surface lies at the interface with water and thus tends to contain more polar or charged amino acids than one finds in the core of the protein, which is more likely to comprise hydrophobic amino acids. Proteins also contain regions that are directly involved in protein function, such as active sites or binding sites, in addition to regions that are less critical to the protein function and where mutations are likely to have fewer consequences.

  1. Protein Evolution
Proteins are nearly always members of homologous families. Knowledge about the family

a protein belongs in will generally give insights into the possible function. The processes that give rise to homologous protein families are speciation or duplication. Proteins related by speciation only are referred to as orthologues, these proteins have the same function in different species. Whereas proteins related by duplications are referred to as paralogues. Sequential rounds of speciation and intra-genomic duplication can lead to confusing situations where it becomes difficult to say whether proteins are paralogous or orthologous in nature. To be maintained in a genome over time, paralogous proteins are likely to evolve different functions (or have a dominant negative phenotype and so resist decay by point mutation. Differences in function can range from subtle differences in substrate (e.g. malate versus lactate dehydrogenases), to only weak similarities in molecular function (e.g. hydrolases) to complete differences in cellular location and function (e.g. an intracellular signaling domain homologous to a secreted growth factor (Schoorlemmer and Goldfarb, 2001)). At the other extreme, the molecular function may be identical, but the cellular function may be altered, as in the case of enzymes with differing tissue specificities.

  1. Protein Function
Protein function is key to any understanding of the consequences of amino acid substitution. Enzymes, such as trypsin, tend to have highly conserved active sites involving a handful of polar residues. In contrast, proteins that function primarily only to interact with other proteins, such as fibroblast growth factors, interact over a large surface, with virtually any amino acid being important in mediating the interaction. In other cases, multiple functions make the situation even more confusing, for example a protein kinase (Hanks et al., 1988) can both catalyse a phosphorylation event and bind specifically to another protein, such as cyclin (Jeffrey et al., 1995).

  1. Post-translational Modification
Although there are only 20 possible types of amino acid that can be incorporated into a protein sequence upon translation of DNA, there are many more variations that can occur through subsequent modification. In addition, the gene-specified protein sequence can be shortened by proteolysis, or lengthened by addition of amino acids at either terminus. Two common modifications, phosphorylation and glycosylation, are discussed in the context of the amino acids where they most often occur (tyrosine, serine, threonine and asparagine). The main conclusion is that modifications are highly specific, with specificity provided by primary, secondary and tertiary protein structure, although with detailed mechanisms being obscure. The biological function of the modified proteins is also summarized, from the reversible phosphorylation of serine, threonine and tyrosine residues that occurs in signaling through to the formation of disulphide bridges and other cross-links that stabilize tertiary structure, and on to the covalent attachment of lipids that allows anchorage to cell membranes. More detail on biological effects is given by Parekh and Rohlff (1997), especially where it concerns possible therapeutic applications. Many diseases arise by abnormalities in post-translational modification, and these are not necessarily apparent from genetic information alone.

How mutations affect the protein function

Several studies have been carried out previously in an attempt to decipher general principles about the association between mutations and protein structure & function. SNPs are the point mutations which are present at a measurable frequency a population. They can occur either in coding or non-coding DNA. They may influence regulatory mechanisms such as promoter activity (gene expression), messenger RNA (mRNA) conformation (stability), and subcellular localization of mRNAs and/or proteins. Coding SNPs can be further being divided into two main categories, synonymous (where there is no change in the amino acid they code for), and non-synonymous. synonymous SNPs tend to occur much more frequently than Non-synonymous SNPs. The main reason for this is the natural selection force which keeps deleterious effect of Non-synonymous mutations in check. Site-directed mutagenesis is a powerful tool for discovering the importance of an amino acid in the function of the protein. Gross changes in amino acid type can reveal sites that are important in maintaining the structure of the protein. Peracchi (2001) has reviewed the use of site-directed mutagenesis to investigate mechanisms of enzyme catalysis.

Sr No. Amino acid Substituted by Amino acid charge Function in protein
1 Alanine (Ala, A) Other small amino acids hydrophobic and nonpolar Play a role in substrate recognition or specificity, particularly in interactions with other non-reactive atoms such as carbon
2 Isoleucine (Ile, I) Other hydrophobic, particularly aliphatic amino acids Hydrophobic The isoleucine side chain is very non-reactive and is thus rarely directly involved in protein functions like catalysis, although it can play a role in substrate recognition. In particular, hydrophobic amino acids can be involved in binding/recognition of hydrophobic ligands such as lipids.
3 Leucine (Leu, L) Other hydrophobic, particularly aliphatic amino acids Hydrophobic Same as Isoleucine
4 Valine (Val, V) Other hydrophobic, particularly aliphatic amino acids Hydrophobic Same as Isoleucine
5 Methionine (Met, M) Other hydrophobic, particularly aliphatic amino acids Hydrophobic Binding/recognition of hydrophobic ligands such as lipids. Sulphur atom of Methionine can involve in binding to metal atoms.
6 Phenylalanine (Phe, F) other aromatic or hydrophobic amino acids, prefers to exchange with tyrosine Hydrophobic Aromatic residues can also be involved in interactions with non-protein ligands that themselves contain aromatic groups via stacking interactions
7 Tryptophan (Trp, W) other aromatic residues Hydrophobic Same as Phenylalanine
8 Tyrosine (Tyr, Y) other aromatic amino acids partially hydrophobic Tyrosine contains a reactive hydroxyl group, which helps in interactions with non-carbon atoms. A common role for tyrosines (and serines and threonines) within intracellular proteins is in phosphorylation reactions
9 Histidine (His, H) Being polar amino acid it does not substitute particularly well with any other amino acid pKa near to that of physiological pH Most common amino acids in protein active or binding sites. Also very common in metal binding sites (e.g. zinc), often acting together with cysteines
10 Arginine (Arg, R) polar amino acids amphipathic nature Arginines are quite frequent in protein active or binding sites. The positive charge helps in interaction with negatively-charged non-protein atoms (e.g. anions or carboxylate groups). Arginine contains a complex guanidinium group on its side chain that has a geometry and charge distribution that is ideal for binding negatively-charged groups on phosphates
11 Lysine (Lys, K) arginine or other polar amino acids Amphipathic Lysines are quite frequent in protein active or binding sites. Lysine contains a positively charged amino group on its side chain which helps in forming hydrogen bonds with negatively-charged non-protein atoms (e.g. anions or carboxylate groups)
12 Aspartate (Asp, D) glutamate or other polar amino acids Polar Most commonly present in protein active or binding sites. The negative charge means that they can interact with positively-charged non-protein atoms
13 Glutamate (Glu, E) aspartate or other polar amino acids Polar Frequently involved in protein active or binding sites of proteases or lipases
14 Asparagine (Asn, N) other polar amino acids, especially aspartate Polar Frequently involved in protein active or binding sites. The polar side chain is good for interactions with other polar or charged atoms. Asparagine can play a similar role to aspartate in some proteins. The best example is found in certain cysteine proteases
15 Glutamine (Gln, Q) other polar amino acids, especially glutamate Polar Frequently involved in protein active or binding sites. The polar side chain is good for interactions with other polar or charged atoms.
16 Serine (Ser, S) other polar or small amino acids in particular threonine Polar Serines are quite common in protein functional centres. The hydroxyl group is fairly reactive, being able to form hydrogen bonds with a variety of polar substrates.
17 Threonine (Thr, T) other polar amino acids, particularly serine Polar Threonines are quite common in protein functional centres. The hydroxyl group is fairly reactive, being able to form hydrogen bonds with a variety of polar substrates. Intracellular threonines can also be phosphorylated (see Tyrosine) and in the extracellular environment they can be O-glycosylated (see Serine).
18 Cysteine (Cys, C) substitution with any other amino acid Polar Cysteines are also very common in protein active and binding sites. Binding to metals can also be important in enzymatic functions.
19 Glycine (Gly, G) other small amino acids Hydrophobic Glycines can play a distinct functional role, such as using its backbone (without a side chain) to bind to phosphates
20 Proline (Pro, P) other small amino acids, although its unique properties does not often substitute well Hydrophobic The proline side chain is very non-reactive. This, together with its difficulty in adopting many protein main-chain conformations means that it is very rarely involved in protein active or binding sites


References:

Hanks SK, Quinn AM, Hunter T. (1988). The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science 241: 42â€"52.

Jeffrey PD, Russo AA, Polyak K, Gibbs E, Hurwitz J, Massague J, et al. (1995). Mechanism of CDK activation revealed by the structure of a cyclinAâ€"CDK2 complex. Nature 376: 313â€"320.

Parekh RB, Rohlff C. (1997). Post-translational modification of proteins and the discovery of new medicine. Curr Opin Biotechnol 8: 718â€"723.

Peracchi A. (2001). Enzyme catalysis: removing chemically ‘essential’ residues by sitedirected mutagenesis. Trends Biochem Sci 26: 497â€"503.

Schoorlemmer J, Goldfarb M. (2001). Fibroblast growth factor homologous factors are intracellular signaling proteins. Curr Biol 11 : 793â€"797.



About Author / Additional Info:
I am PhD research scholar, pursuing PhD at IARI, New Delhi in the discipline of Molecular Biology and Biotechnology. I am working on blast disease resistance in O. sativa