A team of Stanford scientists set the first Guinness World Record for the fastest DNA sequencing technology, which took only 5 hours and 2 minutes to sequence a human genome. The research team led by Stanford University collaborated with NVIDIA, Oxford Nanopore Technologies, Google, Baylor College of Medicine, and the University of California at Santa Cruz to use AI to speed up the end-to-end process, from collecting a blood sample to sequencing the entire genome and identifying disease-linked variants. The record was certified by the Genome in a Bottle group of the National Institute of Science and Technology, and it is documented by Guinness World Records.
Sequencing genomes entails extracting short sequences of DNA from the 6 billion pairs of nucleobases inherited from our parents, namely adenine (A), thymine (T), guanine (G), and cytosine (C). Using a typical human genome as a reference, the sequences are then replicated and reattached together. This method, however, does not always capture the full genome of a patient and the data it gives can sometimes leave out variations in genes that point to a diagnosis. This means that locating mutations that occur throughout a wide portion of DNA might be difficult, if not impossible. Hence researchers use lengthy-read sequencing which preserves significantly longer segments of the patient’s genome, increasing the chances of finding mutations, minimizing errors, and correctly diagnosing the patient.
Genome sequencing is a vital tool for clinicians diagnosing uncommon genetic illnesses. It aids them in determining whether their patients’ genes are mutated and, if so, what genetic disorders such mutations could cause. However, it is not an easy task due to oddities such as variances in sequencing techniques and technologies, as well as data storage formats and data exchange protocols. Machine learning and deep learning are two AI technologies that are already well-known for their remarkable data processing and pattern recognition prowess. As a result, AI frameworks are used in healthcare research to allow for the efficient interpretation of massive complicated datasets, such as genomes.
The researchers were able to reach the record-breaking speed by refining each step of the sequencing process. Stanford researchers used a DNA sequencing platform from Oxford Nanopore Technologies, called PromethION Flow Cells. This device reads genomes by pulling large strands of DNA through pores that are similar in size and composition to the openings in biological cell membranes. It detects the DNA sequence by reading small electrical changes specific to each DNA letter as a DNA strand travels through the pore. Thousands of these pores are dispersed over a flow cell device. The researchers sequenced a single patient’s genome simultaneously over 48 flow cells, allowing them to read the full genome in a record duration of 5 hours and 2 minutes (7 hours and 18 minutes in total, including diagnosing it). The device also supports “long-read sequencing.”
They generated more than 100 gigabases (one billion nucleotides) of data every hour employing high nanopore sequencing on Oxford Nanopore’s PromethION Flow Cells, then expedited base calling and variant calling using NVIDIA GPUs on Google Cloud. At this stage, the device’s raw data are converted into a string of A, T, G, and C nucleotides, which are then aligned in near real-time. The scientists quickly realized that sending the data directly to a cloud-based storage system allowed them to boost computational power enough to handle all of the data generated by the nanopore device. Because it dispersed the data among cloud GPUs, it immediately reduced latency.
The next step was to look for little variations in the DNA sequence that could lead to a hereditary disease. The researchers used the NVIDIA Clara Parabricks computational genomics application framework for both base calling and variant calling. Clara Parabricks used a GPU-accelerated version of PEPPER-Margin-DeepVariant, a pipeline developed by UC Santa Cruz’s Computational Genomics Laboratory in partnership with Google, to speed up this stage. DeepVariant employs convolutional neural networks for very accurate variant calling. Clara Parabricks’ GPU-accelerated DeepVariant Germline Pipeline software produces findings at ten times the speed of native DeepVariant instances, reducing its time to find disease-causing variants.
Using this rapid genome sequencing approach, scientists scanned the 3-month-old patient’s entire genome in just eight and a half hours. They discovered that the baby’s CSNK2B gene was altered. CSNK2B is a gene linked to Poirier-Bienvenu syndrome, a rare neurodevelopmental condition characterized by early-onset epilepsy. Doctors diagnosed the patient with Poirier-Bienvenu within a few days, administered the appropriate antiseizure prescription, and provided disease-specific counseling and a prognosis to the patient’s family. On contrary, an epilepsy gene panel (which did not include CSNK2B) that had been ordered at the time of the presentation, and the findings, which arrived two weeks later, revealed only several nondiagnostic variations of questionable significance.
This highlights a huge milestone in genome sequencing and health diagnostics. With the ability to sequence a person’s entire DNA in just hours, super rapid genome testing could become a life-saving technology for detecting inheritable disorders in humans. This will also enhance patient prognosis by discovering certain disorders early. Rapid genome sequencing could also be the key to identifying and classifying undiagnosed adult patients with unknown genetic disorders.
For more information visit here.