25 Getting Started with HiPerGator
Matthew A. Gitzendanner, Ph.D.
Getting Started with HiPerGator
Each user on HiPerGator needs a sponsor. Faculty who are investors in HiPerGator, can sponsor themselves. Students, post-docs, staff, or collaborators within or outside of UF can request accounts under the sponsorship of a sponsor. Visit the UF Research Computing website to fill out the HiPerGator account application form. All users must have a GatorLink account. Non-UF users under the sponsorship of a UF faculty member must submit a request for a GatorLink account.
Training
All new users need to complete the HiPerGator Training. This course helps to acquaint users with HiPerGator and the policies for use. Additional training opportunities are available. The page also lists upcoming training and events as well as links to prerecorded training sessions and short, how-to videos covering commonly asked questions. The Help and Documentation site has a wide range of resources, including a Getting Started section and Quick Start Guide.
What Is so Special about HiPerGator AI?
Given that most users will not use the entire HiPerGator AI cluster for a single analysis, what makes it so special? The first thing is the individual GPUs that make up HiPerGator AI, the NVIDIA A100 GPU. As of 2021, this GPU is the latest and most advanced GPU available. In addition to the sheer number of tensor cores and its advanced architecture, the 80GB of GPU memory makes the A100 stand out. A standard gaming GPU may only have 8 to 12GB of RAM. That extra RAM makes training larger models much more efficient.
The second thing is the networking bandwidth on the DGX servers. This is especially important as applications scale past a single GPU. Within each DGX server, the 8 A100 GPUs have exceptionally fast connections allowing multiple GPUs within a server to efficiently work together to tackle larger problems. Scaling even further, each A100 GPU has its own dedicated connection to the switch connecting the DGX servers within a rack and to other DGX servers in the “Scalable Unit” (SU). Again these connections facilitate scalability that is unparalleled and allow large calculations to operate across multiple A100 GPUs.
Lastly, providing data to these GPUs at the rates that they are capable of analyzing it is not trivial. HiPerGator AI includes a 2.5PB all-flash parallel filesystem, called /red, that is capable of feeding data to the GPUs at extraordinary rates.
Feedback/Errata