NVIDIA

Site Reliability Engineer


PayCompetitive
LocationSanta Clara/California
Employment typeFull-Time

This job is now closed

  • Job Description

      Req#: JR1988504

      We are seeking a highly motivated Senior Site Reliability Engineer (SRE) to join our Embedded organization. This team is responsible for automating, deploying, and maintaining infrastructure for various NVIDIA AI workflows and applications such as Metropolis, ACE, and Riva hosted in the cloud. The SRE role focuses on ensuring production health to prevent outages by defining and developing robust software engineering solutions and practices. These efforts simplify the operating environment, enhance the reliability of NVIDIA cloud services, and expedite feature rollouts.

      What You'll Be Doing:

      • Develop and integrate new software, tools, and analytics to improve the availability, scalability, latency, and efficiency of our cloud services.

      • Manage upgrades and automated rollbacks across all clusters.

      • Maintain Service Level Agreements (SLAs) by collaborating with developers to define Service Level Indicators (SLIs) and design stable, secure services.

      • Guide the Change Advisory Board and Root Cause Corrective Action (RCCA) processes.

      • Collaborate with engineering, DevOps, and product leads across the GPU cloud services stack to build fast, reliable, and durable production systems.

      • Drive process changes to enhance the reliability and performance of cloud services.

      • Debug production issues across services and levels of the stack.

      • Improve operational processes.

      What We Need to See:

      • Bachelor's degree in Computer Science or a related field, or equivalent experience.

      • 5+ years of experience in system design, complexity analysis, software design in Unix/Linux systems, performance tuning, and application issue resolution.

      • 5+ years of experience in authoring and debugging software written in C++ and Python.

      • Hands-on experience with Kubernetes-based cloud environments.

      • Multi-cloud experience.

      • Experience working with partners across multiple teams.

      • Background with operating production systems.

      Ways to Stand Out from the Crowd:

      • Background with Software as a Service (SaaS) offerings.

      • Experience in application issues, algorithms, and data structures.

      With competitive salaries and a generous benefits package (www.nvidiabenefits.com ), we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you!

      The base salary range is 140,000 USD - 258,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

      You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

      NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

  • About the company

      9637389 Nvidia Corporation is an American multinational technology company incorporated in Delaware and based in Santa Clara, California.

Notice

Talentify is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.

Talentify provides reasonable accommodations to qualified applicants with disabilities, including disabled veterans. Request assistance at accessibility@talentify.io or 407-000-0000.

Federal law requires every new hire to complete Form I-9 and present proof of identity and U.S. work eligibility.

An Automated Employment Decision Tool (AEDT) will score your job-related skills and responses. Bias-audit & data-use details: www.talentify.io/bias-audit-report. NYC applicants may request an alternative process or accommodation at aedt@talentify.io or 407-000-0000.