UAB IT is redesigning its research computing infrastructure to make it more robust and scalable, as well as more reliable, in response to an increase in demand for services. This comes after a $2 million investment that saw a 225% increase in the computing power of Cheaha, the state's fastest supercomputer.
Even during the COVID pandemic and a period of remote work at UAB, UAB IT Research Computing saw marked growth in the use of its services. More than 1,700 users are registered for research computing services, a nearly five-fold increase in the past five years.
The next step in redesigning the research computing infrastructure will be migrated equipment from aged data centers to new state-of-the-art facilities on Oct. 18 and Nov. 8. Among the destinations are the new Technology Innovation Center and locally-based data center DC BLOX.
"One driver for the Technology Innovation Center was to house UAB's high-performance computer, Cheaha," said Ralph Zottola, Ph.D., assistant vice president for Research Computing. "But we quickly realized that we needed a bigger house. The new facilities will be more reliable, thus increasing system availability for the research community."
An Intel Innovation Fund award helped drive a $2 million investment in Dell equipment that doubled research computing power. Ultimately, RC computing will go from 4352 compute processing cores and 72 GPUs to 9840 compute processing cores and 136 GPUs, or a final estimated 1,240,000,000,000,000 floating point operations per second (1.24 petaFLOPS) another measure of computing speed.
"These new resources accelerated COVID-19 computational research on campus," Zottola said.
Research Computing also received funding from the UAB Education Foundation that enabled the design and implementation of two new computing systems: the UAB Research Cloud and the Kubernetes Research Automation cluster. The UAB Research Cloud will allow researcher to be creative: it provides a development sandbox to create new analysis workflows, present data and figures to the world, and also test out ways to scale their analysis to massive scales. Kubernetes allows researchers to automate their complex analyses like an orchestra, fittingly called 'orchestration.' Orchestration can make scientific analysis more reproducible and scalable. "I call Kubernetes orchestration 'rotisserie science," said Blake Joyce, Ph.D., new data science manager at UAB, "you tell the computer how many samples you have, exactly what you want it to do, and then you leave it to do the easy stuff. Researchers are then free to write manuscripts, grants, or experiment while the computers analyze data. When the analysis is ready, researchers can spend precious time assessing the data and results, actual science, instead of fighting with computers."
That expansion meant UAB IT outgrew the Technology Innovation Center almost as soon as it was built.
"UAB Research computing resources grew so quickly that we required more space, power and cooling than the TIC data center was originally designed to provide two years ago," said Joyce. "It's a great problem to have. And fortunately, a new data center facility opened a mile from the UAB campus, DC BLOX."
Ultimately, the TIC data center will host the Cheaha HPC compute nodes and new tiered research data storage services in the TIC data center. DC BLOX will host the UAB Research Cloud, the Kubernetes Research Automation machine, and all the graphical processing units, including the newest NVIDIA DGX A100 nodes. A 200 gigabit per second network between the TIC and DC BLOX data centers will ensure seamless, instantaneous interaction between the different research computing and storage resources.
"We are grateful to President Ray Watts, Dr. Curt Carver, Dr. Chris Brown, and the UAB research computing community for their investment in research computing and the trust placed in us to guide this expansion," Zottola said. "Our No. 1 priority during the data center migration project is to minimize negative impact to ongoing research by avoiding significant interruption to Cheaha HPC and RC services."
UAB IT has more than 1,700 researchers registered for RC services and 624 active Cheaha users this past year alone.
During the week of Oct. 18, all equipment in the old 936 Building data center will be moved to the TIC and DC BLOX data centers. On Nov. 8, the last of the data storage servers, the ScienceDMZ data transfer services, will be moved to TIC and DC BLOX, and the campus and DC BLOX network connections will be moved to the TIC data center.
"We expect this last phase of migration to be unnoticed by users," said John-Paul Robinson, HPC Architect, "but migration details and updates will be announced via the regular HPC-Announce listserv process. All Cheaha users are automatically subscribed to this list."
Please contact us at