Diabetes mellitus is causing major health problems in the world and
is affecting more and more people. Today nearly 200 million people
have been diagnosed, and the number of new incidences is rapidly
increasing, especially in the western world. The reason for this
increase is not simple. Change in lifestyle and environmental
factors are important factors as indicated by many clinical studies.
In general, diabetes mellitus is a severe life threatening disease.
The secondary complications are many and well described. Examples
are highly increased risk of obesity, development of retinopathy,
development of kidney dysfunction, and accelerated development of
atherosclerosis and mucormycosis.
Diabetes mellitus types 1 and 2 are characterized by primary
hyperglycemia due to dysregulation of the plasma glucose titer.
Glucose uptake by somatic cells is regulated by insulin. Insulin is
produced by the ß-cells in pancreas and dysfunction or apoptosis of
these cells are affecting insulin expression and subsequently the
plasma concentration of glucose.
Studying diabetes mellitus
The understanding of the genetic basis of diabetes mellitus and the
influence of genetic variation on phenotypic traits are of great
importance for medical research and development of drugs. The use of
sequence analysis tools like the CLC bio workbenches are of great
importance today to gain a better understanding of the complexity
and of the diabetes pathogenesis.
In pre-clinical studies many different animal models have been used.
The pig is a valuable model organism and is relevant to human
biomedical research in areas as e.g. obesity, cardiovascular
disease, and nutritional studies.
This case study describes how the CLC Combined Workbench is used to
identify sequence polymorphisms in the pig genome at positions that
may be related to diabetes and subsequently demonstrate how to
analyze those using bioinformatics tools.
Genetics of Diabetes Mellitus
Diabetes mellitus is a complex, multifactorial and polygenic
disease, likely to be caused by one or more gene alterations acting
in combination with non-genetic factors [
Morwessel, 1998]. Since
obese phenotypic traits often are seen in relation to a diabetic
diagnose, genetic analysis of genes (including sequence analysis)
known to be related to obesity could be interesting for the
clarification of some phenotypic relations to the disease.
Identifying uncoupling proteins as diabetes type 2 candidate genes
The candidate gene approach for type 2 diabetes mellitus tests for
association between particular gene variants and diabetes. So
candidate genes encode proteins involved in either insulin
synthesis, pathways of insulin secretion, or insulin action, where
defects cause abnormal patterns [
So et al., 2000] and polymorphisms in
these genes may be important risk factors for type 2 diabetes
mellitus patients.
Uncoupling proteins
Uncoupling proteins (UCP) are located in the inner mitochondrial
membrane, and one of the suggested functions of UCPs is that of
uncoupling by acting as a channel for proton entry into the
mitochondrial matrix.
Figure 1: Location of the uncoupling protein in the intramembrane space [Gura, 1998].
Figure 2: The yellow arrow represents the uncoupling protein's function as a channel for proton entry into the matrix [Rousset et al., 2004].
Respiration and ADP phosphorylation in mitochondria are coupled, and
the uncoupling proteins appear to be controlling the level of these
functions. Uncoupling is when a protein acts as a proton carrier,
and by the transportation of protons from the intermembrane space to
the matrix a shunt between ATP synthase and the respiratory chain is
created [Rousset et al., 2004]. By this mechanism UCPs might have some
basic roles to play in human physiology. In addition UCP2 and UCP3
decrease membrane potential and increase thermogenesis
[Dalgaard and Pedersen, 2001] and the genes encoding these proteins are thus
regarded as candidate genes for studies of the diabetes type 2 often
accompanying obese phenotypic traits.
As the uncoupling proteins are highly conserved among species and
significant similarity is seen between e.g. the human and the pig
uncoupling proteins, showing more than 90 percent identity at the
level of amino acid sequence, these genes might be interesting
related to the animal model research.
Work flow
CLC bio provides bioinformatics software of great importance for the
detection, identification, and characterization of polymorphisms in
diabetes type 2 candidate genes as e.g. the uncoupling proteins. The
use of CLC Combined Workbench integrating all analyses in one
program is exemplified below.
Human and porcine DNA encoding UCP2 and UCP3 is retrieved from local
and on-line databases using BLAST. The sequences are aligned to
check for proper identity between species. Next, primers are
designed for porcine DNA. The primers are used for PCR, and the
products are subsequently sequenced. The sequencing data is
assembled to the reference sequences used for designing primers, and
putative polymorphisms are identified. Using the integrated SNP
annotation functionality of the Combined Workbench, the possible
polymorphisms are characterized and compared to known SNPs in the
SNP database. Next, the coding regions of the DNA sequences are
translated into protein and subjected to a number of predictions to
determine the impact of the polymorphisms on the UCP proteins.
In the next sections, three of the steps in the work flow are
described in further detail.
Zooming in on Annotations
During the entire
work flow, the
same sequences are used for both alignments, as basis for primer
design, and as reference sequences in assembly and SNP
identification. This means that annotation of genes, coding regions
etc. are preserved during all the analyses. They can then be used to
guide the inspection of alignments and BLAST hits, the location of
primers, the interpretation of sequencing data etc. Small snapshots
of the role of annotations in the different parts of the work flow
are shown below:
Figure 3: Inspecting the result of a BLAST search. The yellow annotation represents the coding region of the UCP3 gene. The annotations make it easy to get an overview of where the hits align to the query sequence.
Figure 4: An alignment with a yellow annotation representing the coding region of the UCP3 gene for humans(top) and pigs(bottom). The translation is shown to visualize differences in the amino acid sequence between the two species.
Figure 5: Designing primers to bind just before the coding region of the UCP3 coding region. The annotations eliminate the need for remembering positions.
Figure 6: Assembling to the reference sequence where both the primer binding site (red annotation) and the coding sequence (yellow annotation) are shown.
Zooming in on SNP annotation using BLAST
Sequencing data of genes encoding the uncoupling proteins are
searched for polymorphisms. In positions where a polymorphism is
identified, a BLAST database search is performed for SNPs in human
genes similar to the genes of interest. This helps control and
verify the results.
Results of the SNP annotation using BLAST can be shown as graphics
(see figure 8), in a tabular view, and as annotations on
the input sequence(s).
Figure 7: The graphical view of the SNP BLAST.
At the top you see the sequence used as reference in the assembly
with possible SNP's annotated with red arrows. Below are the hits from the SNP database. One SNP is high-lighted (M).
When annotating SNPs using BLAST, you can select which database you
want to search against, e.g. human, mouse, or rat. You can also set
the BLAST parameters such as filtering and gap costs.
In the graphic view you can see sequence matches between your query
sequence and hit sequences in the database chosen. From here you can
easily zoom in on specific regions of interest, or you can open a
hit region in a new view.
Annotations on the query sequence will indicate any match from the
database where a polymorphism has previously been identified, and
the tabular view provides an overview of e.g. identity and positions
of matching regions between query and hit sequences. From the
tabular view you can easily open any of the hit sequences at the
NCBI web page.
Zooming in on transmembrane helix prediction and secondary structure prediction
After translation of the sequenced genes and adding of annotations
where polymorphisms were identified, transmembrane regions in the
porcine uncoupling proteins 2 and 3 are predicted by CLC Combined
Workbench to localize the identified polymorphisms; transmembrane
location or in extracellular or intermembrane regions.
In a similar way, secondary structure of the proteins were predicted
in CLC Combined Workbench. From suggested locations of the
polymorphisms the complications with specific domains affecting
protein structure and function are identified.
As related to uncoupling protein 3 putative topology in the inner
mitochondrial membrane, a suggested location of identified
polymorphisms may be helpful to predict possible impacts of the
identified genetic variance.
Figure 8: Prediction of secondary structure for the analyzed uncoupling protein 3.