NVIDIA
HPC Infrastructure Administrator
This job is now closed
Job Description
- Req#: JR1973918
Administer an HPC cluster composed of Linux systems ranging from the world’s most powerful servers to embedded systems
Maintain the configuration of our resource management system (SLURM) to keep resource allocation efficient and aligned with organizational priorities
Automate configuration management, software updates, and maintenance of system availability using modern DevOps tools (Ansible, Gitlab, etc.)
Plan and maintain new systems that support the NVIDIA Software stack
Work directly with developers and hardware architects to debug issues, identify new requirements, and improve workflows
Actively communicate with users and management regarding resource planning and allocation
5+ years of previous experience deploying and administering HPC clusters
BA, BS, or MS in CS, EE, CE or equivalent experience
Deep knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.)
Demonstrated ability to script in bash, and at least one high-level language (Python preferred)
Experience with container technologies (Docker, Singularity, etc.)
Deep understanding of operating systems, computer networks, and high-performance hardware
Ability to work well with developers, hardware architects, & test engineers
Passionate dedication to providing quality support for users
Prior work experience managing high performance fabrics and parallel file systems
Familiarity with CUDA and managing GPU-accelerated computing systems
Basic knowledge of deep learning frameworks and algorithms
We are now seeking a HPC Infrastructure Engineer!
NVIDIA’s Compute Architecture Group is growing our team of HPC Infrastructure Engineers who run our internal cluster for accelerated AI and HPC software development. As part of this team, you will help to manage a diverse cluster of GPU-accelerated systems. Your contributions will enable engineers to work efficiently with a wide variety of forward-looking hardware configurations as they vigilantly seek out opportunities for performance optimization and continuously deliver high quality software.
Our ideal candidate is versatile enough to apply expertise from many domains: system administration, performance analysis, automation, and architecture. Your work will enable the ground breaking experimentation that allows us to design the world’s most powerful systems for the most demanding computing applications. You will have a meaningful impact at a fast-moving company that is spearheading the next wave in computing technology. Join our technically diverse team of GPU architects, software engineers and infrastructure experts to unlock unprecedented performance in every domain!
What you'll be doing:
What we need to see:
Ways to stand out from the crowd:
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
About the company
9637389 Nvidia Corporation is an American multinational technology company incorporated in Delaware and based in Santa Clara, California.
Notice
Talentify is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.
Talentify provides reasonable accommodations to qualified applicants with disabilities, including disabled veterans. Request assistance at accessibility@talentify.io or 407-000-0000.
Federal law requires every new hire to complete Form I-9 and present proof of identity and U.S. work eligibility.
An Automated Employment Decision Tool (AEDT) will score your job-related skills and responses. Bias-audit & data-use details: www.talentify.io/bias-audit-report. NYC applicants may request an alternative process or accommodation at aedt@talentify.io or 407-000-0000.