Cadence is a pivotal leader in electronic design, building upon more than 30 years of computational software expertise. The company applies its underlying Intelligent System Design strategy to deliver software, hardware and IP that turn design concepts into reality.
Cadence customers are the world’s most innovative companies, delivering extraordinary electronic products from chips to boards to systems for the most dynamic market applications including consumer, hyperscale computing, 5G communications, automotive, aerospace industrial and health.
At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.
Job Title: Sr Systems Engineer (Data Center Operations)
Location: Munich, Germany
Reports to: IT Group Director
Job Overview:
The Data Center Operations Engineer plays a critical role in maintaining and expanding Cadence’s global data center infrastructure, with a strong focus on Linux-based systems and GPU server environments. In this hands-on role, you will ensure the reliability, performance, and scalability of compute, network, and storage platforms that underpin some of the world’s most advanced electronic design workloads. Working closely with global infrastructure, development, and operations teams, you will drive everything from daily health monitoring and incident resolution to full GPU cluster bring-up and large-scale hardware deployments.
Job Responsibilities:
Deploy and maintain Linux-based compute, GPU, and storage infrastructure across data center environments, ensuring high availability and consistent performance.
Configure and bring up InfiniBand fabric and GPU clusters, including switch configuration, subnet management, and end-to-end validation testing.
Install, rack, label, and cable server hardware — including CPUs, memory, NICs, HDDs, and RAID components — in line with approved design specifications and quality standards.
Troubleshoot and resolve complex operational issues across Linux systems, GPU platforms, networking equipment, and storage infrastructure.
Conduct daily health checks of systems and infrastructure components, proactively identifying and mitigating risks before they affect service delivery.
Monitor the data center environment using established alerting frameworks, escalate issues appropriately, and drive timely service restoration in line with SLAs.
Coordinate with vendors and onsite staff for hardware delivery, diagnostics, replacement, and warranty fulfilment.
Maintain accurate operational documentation, system configurations, and runbooks to support consistency and knowledge sharing across the team.
Participate in an on-call rotation and provide on-site or remote support during maintenance windows and operational incidents.
Collaborate with global infrastructure and operations teams to support data center builds, migrations, refresh programmes, and process improvement initiatives.
Job Qualifications:
Bachelor’s degree in Computer Science, Engineering, Information Technology, or equivalent practical experience.
3–6 years of hands-on experience in Linux system administration, troubleshooting, and performance validation.
Proficiency with Linux command-line tools and shell scripting (Bash or equivalent).
Experience with cluster bring-up, GPU server deployment, driver installation, and system-level configuration.
Hands-on experience setting up and validating GPU servers in clustered environments, including end-to-end GPU testing in InfiniBand-based clusters.
Working knowledge of InfiniBand networking, including switch configuration and subnet management.
Solid understanding of networking fundamentals including the OSI model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP).
Experience installing, configuring, and troubleshooting routers, switches, and terminal servers for out-of-band management.
Familiarity with fibre and copper cabling in IP and SAN environments.
Strong organisational skills with meticulous attention to detail in data center environments.
Clear verbal and written communication skills, with the ability to work effectively across cross-functional and global teams.
Additional Skills/Preferences:
Experience supporting HPC, AI, or large-scale GPU environments.
Exposure to data center monitoring and alerting platforms.
Experience documenting operational processes and maintaining technical runbooks.
Familiarity with large-scale data center buildouts or refresh programmes.
Cadence is committed to equal employment opportunity and employment equity throughout all levels of the organization. We strive to attract a qualified and diverse candidate pool and encourage diversity and inclusion in the workplace.
Sign in to browse authentic reviews, anonymous ratings and salary data before you apply.