Emory University scientists have identified and created a map of more than 400,000 insertions and deletions (INDELs) in the human genome that signal a little-explored type of genetic difference among individuals. INDELS are an alternative form of natural genetic variation that differs from the much-studied single nucleotide polymorphisms (SNPs). Both types of variation are likely to have a major impact on humans, including their health and susceptibility to disease.
The INDEL research, led by Scott Devine, PhD, assistant professor of biochemistry at Emory University School of Medicine, has been posted online and will be published in the September issue of the journal Genome Research.
The human genome sequence in our DNA contains three billion base pairs of four chemical building blocks--adenine, thymine, cytosine, and guanine (A, T, C, G), strung together in different combinations in long chains within 23 pairs of chromosomes. When the first human genome was being sequenced, it became apparent that additional human genomes would have to be sequenced to identify the places in the genetic code that account for human variation. Scientists now know that humans share about 97-99 percent of the genetic code, and the remaining 1-3 percent dictates individual differences. These naturally occurring differences, called polymorphisms, help explain differences in appearance, susceptibility to diseases, and responses to the environment.
SNPs are differences in single chemical bases in the genome sequence, and INDELs result from the insertion and deletion of small pieces of DNA of varying sizes and types. If the human genome is viewed as a genetic instruction book, then SNPs are analogous to single letter changes in the book, whereas INDELs are equivalent to inserting and deleting words or paragraphs.
Most polymorphism discovery projects have focused on SNPs, resulting in the International HapMap Project--a catalog and map of more than 10 million SNPs derived from diverse individuals throughout the globe. Dr. Devine and postdoctoral researcher Ryan Mills, PhD, focused instead on INDELs, using a computational approach to examine DNA re-sequences that originally were generated for SNP discovery projects. Thus far they have identified and mapped 415,436 unique INDELs, but they expect to expand the map to between 1 and 2 million by continuing their efforts with additional human sequences.
Dr. Devine says INDELs can be grouped into five major categories, depending on their effect on the genome: (1) insertions or deletions of single base pairs; (2) expansions by only one base pair (monomeric base pair expansions); (3) multi-base pair expansions of 2 to 15 repeats; (4) transposon insertions (insertions of mobile elements); (5) and random DNA sequence insertions or deletions. INDELs already are known to cause human diseases. For example, cystic fibrosis is frequently caused by a three-base-pair deletion in the CFTR gene, and DNA insertions called triplet repeat expansions are implicated in fragile X syndrome and Huntington's disease. Transposon insertions have been identified in hemophilia, muscular dystrophy and cancer.
"We're entering an exciting new era of predictive health where an individual's personal genetic code will provide guidance on healthcare decisions" says Dr. Devine. "Our maps of insertions and deletions will be used together with SNP maps to create one big unified map of variation that can identify specific patterns of genetic variation to help us predict the future health of an individual. The next phase of this work is to figure out which changes correspond to changes in human health and develop personalized health treatments. This could include specific drugs tailored to each individual, given their specific genetic code."
Ultimately, each person's genome could be re-sequenced in a doctor's office and his or her genetic code analyzed to make predictions about their future health. Dr. Devine believes the technology holds the promise of predicting whether a person will develop diabetes, mental disorders, cancer, heart disease and a range of other conditions.
Dr. Devine and his colleagues used data from the University of California, Santa Cruz, the SNP Consortium, the HapMap Consortium, and dbSNP. All the INDELs identified in the study have been deposited into dbSNP--a publicly available SNP database hosted by the National Center for Biotechnology Information. The National Human Genome Research Institute of the National Institutes of Health funded the research.