UAB IT’s Research Computing team is bringing the university's supercomputer to the next level with many new capabilities — including a tool that speeds analysis of genomic data, which has been helpful for the Worthey lab in UAB’s Heersink School of Medicine.
NVIDIA Clara Parabricks is software that helps with secondary analysis of DNA and RNA sequences. It enables researchers to analyze whole human genome data sets in only thirty minutes. It is similar to the open-source Genome Analysis Toolkit (GATK) software that runs on traditional CPU (Central Processing Unit) architectures. Parabricks, however, is optimized to run on the NVIDIA A100s GPU nodes installed by Research Computing. GPUs, or Graphics Processing Units, are designed for parallel processing. GPUs are increasingly used to run data intensive workflows needed for analytics and AI applications as software for them becomes readily available.
Brandon Wilk and James Scherer work in Dr. Liz Worthey's Center for Computational Genomics and Data Science in UAB’s Heersink School of Medicine. Their lab specializes in the study of rare genetic conditions, working on myriad of diseases requiring whole genome sequencing. The duo has worked with Research Computing for several years and knew that needed a faster process, they could work collaboratively with the team to develop a solution.
“Our lab uses whole genome sequencing analysis,” Wilk said. “This is big data requiring a lot of compute and time to analyze even a single person's genome. We needed something to reduce us from the 30 hours of processing time per sample to something more manageable.”
Parabricks still utilizes the GATK tool and implements the parallelizable algorithm with GPU-enabled CUDA code. When the A100s and Parabricks were brought into Cheaha, Wilk and Scherer were able to run tests and see how the new software would improve their research.
“We saw massive gains,” Scherer said. “With access to 4 GPUs on Cloud.RC we are seeing about 1 hour and 40 minutes processing time for a 30X whole genome.”
Bringing Parabricks into the new RC ecosystem was not as easy as one would expect. Scherer says the project took six months to complete, with the original project running from February to October. The group had two people working on the project, and were in constant contact with Prema Soundararajan, a scientist within Research Computing, and her team.
“One of the biggest roadblocks we had to overcome was the licensing of Parabricks. Originally, you had to purchase the license and from there we would have to figure out who could use it, and when it was applicable,” Wilk said. “When they released version four it became free for all of us to use.”
If you are interested in learning more about the power of Parabricks, or curious about what Research Computing can help with, visit our website.
Even though the initial project has wrapped up, there are more evolutions to Parabricks. New software updates and hardware configurations keep the Research Computing team, and the investigators on their toes.
“We found that looking through NVIDIA’s documentation on Parabricks and the Germline Pipeline was extremely helpful when it came to troubleshooting issues. Prema and her group were right there with us, eager to help and answer any questions we had,” Wilk said.