First ‘truly complete human genome’ sequenced; India software plays key role | India News

BENGALURU: In what could be the largest improvement to the human reference genome since its initial release 20 years ago, researchers from Telomere-to-Telomere (T2T) consortium, an international collaboration of around 30 institutions have sequenced the “first truly complete human reference genome”.
This could mark a new era of genomics where no region of the genome — the entire human genetic code — is beyond reach. This unlocks newer regions in human DNA and holds potential to enhance understanding of a wide variety of disorders affecting people. It could also lead to better genetic screening that enables quick and specific diagnostic tests to treat various maladies.
TOI accessed the preprint paper titled ‘The complete sequence of a human genome’ which dubs the new sequence “T2T-CHM13”.
Final validation of this was also aided by software from Chirag Jain, assistant professor, Department of Computational and Data Sciences, the Indian Institute of Science (IISc).
Gap Filling
In 2001, Celera Genomics and International Human Genome Sequencing published the first drafts of the human genome and revolutionised genomics. But there were gaps: As per Nature, sequencing was not truly complete and about 15% was missing owing to technological limitations. Subsequently, scientists solved some puzzles, but the most recent human genome, which geneticists have used as a reference since 2013, still lacked 8% of the full sequence.
Now, researchers at the Telomere-to-Telomere (T2T) consortium, an international collaboration of around 30 institutions across the world, have sequenced the “first truly complete human reference genome”. TOI accessed the preprint paper titled ‘The complete sequence of a human genome’ which dubs the new sequence “T2T-CHM13”.
Human Genome is the complete set of the DNA. DNA strands are like a four-letter language — four chemical units or bases that are the alphabet. The letters combine specifically with letters in the opposite strand to form words (base pairs or bp), encoding information. All these words are stored in chromosomes in human cells.
If a human genome were a history book, it would have around 3-billion words (bp) across 22 chapters (chromosomes) giving information on human journey through time with a detailed blueprint for building every human cell that would give health care providers new powers to treat, prevent and cure diseases.
So, if 8% of the genome was not sequenced earlier, it meant some pages of this book were missing: That means, not all of the 3 billion+ base pairs that each human genome contains, was sequenced.
“…Addressing this 8% gap, T2T has completed the first truly complete sequence of a human genome,” the paper reads.
The sequence reference includes gapless assemblies for all 22 autosomes plus Chromosome X (which look same in females and males), corrects errors, introduces 200 million bp of novel sequence containing 2,226 gene copies; 115 are predicted to be protein coding — important to understand diseases.
Newly completed regions include all centromeric satellite arrays and the short arms of all five acrocentric chromosomes.
Satellite arrays — known to vary extensively in the human population — will aid medical genomics and thereby give better understanding of inherited variation that underlies human physiology, evolution, and diseases.
Similarly, better understanding of acrocentric chromosomes, which are linked to disorders like Down syndrome, also has its usefulness.
Final validation
“The Genome construction involved many newly designed computer algorithms, software for processing sequencing data and turning it into complete human genome. One software (Winnowmap2) was developed and contributed by me with collaborators. Winnowmap2 was critical in final validation of the genome,” Jain told TOI.
Pointing out that the software takes genome sequencing data as input and maps it to genome assembly, he added that mapping method had to take into account a large number of repetitive segments.
“Presence of repeats in a genome makes it challenging because there are many possible alignment candidates for a sequence, and the correct one is rarely obvious. Once data was correctly aligned, differences found between the genome and sequencing data exposed a few mistakes which were corrected by T2T before the final genome release,” he said.
Not the last word
T2T-CHM13 represents one person’s genome and T2T has now teamed up with the Human Pangenome Reference consortium to sequence over 300 genomes from people across the world.
The new sequence is not the last word on human genome according to Nature (scientific journal) “as T2T had trouble resolving a few regions on chromosomes, and estimates about 0.3% of the genome might contain errors.”
In their paper, T2T researchers note: One limitation of CHM13 is lack of a Y chromosome. “In order to finish a T2T reference sequence for all human chromosomes, we are in the process of sequencing and assembling the Y chromosome.”

Source link

Show More

Related Articles

Back to top button

Adblock Detected

Please disable your Ad-Blocker in order to view our site