Posted on 
Jun 6, 2024

Systems Engineer, Kernel/Linux

Roseland
Mid-Senior ICs
Engineering, IT
CoreWeave
CoreWeave
CoreWeave
Private
101-250
Software, Security & Developer Tools

CoreWeave is a specialized cloud provider focused on GPU accelerated use cases including VFX, AI/ML, Batch Processing and Real Time Experiences. We support countless AI/ML services in the text to image, NLP and broader AI/ML space, reducing client’s infrastructure management requirements with our Kubernetes based serverless GPU cloud offerings.

Job Description

CoreWeave is seeking a highly skilled and motivated Senior Systems Engineer to join our Kernel HAVOCK Team. In this role, you will play a crucial part in the design, development, and optimization of our bare-metal systems from POST through joining a Kubernetes cluster. The team’s primary responsibilities include maintaining a custom Linux kernel, various OS images (Ubuntu-based), the virtualization stack (kubevirt/qemu/vfio), and the container/pod runtime stack (containerd/nydus/kubelet). You will collaborate closely with cross-functional teams, up stack engineering teams, and stakeholders to ensure the successful delivery of highly performant and reliable software solutions.  

  

Kernel Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet

Our Team’s Stack:

  • Linux Kernel (custom build, currently tracking Ubuntu HWE)
  • Intel/AMD CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
  • KubeVirt, QEMU, SR-IOV, vfio-pci
  • Ubuntu 22.04
  • Containerd, Kubelet

Responsibilities:

  • Develop and maintain tooling to build custom Linux kernels and stateless OS images
  • Automate packaging of critical components (drivers, microcode, components with out-of-tree patches, etc)
  • Serve as a senior point of contact for hardware issue escalation and troubleshooting
  • Collaborate with cross-functional teams to define Linux and OS requirements, specifications, and system architecture
  • Analyze and optimize the performance of bare-metal and virtualized systems, identify bottlenecks, and propose improvements for enhanced efficiency

Requirements:

  • Must have at least 5 years of professional experience maintaining large fleets of Linux servers
  • Deep professional experience with troubleshooting and debugging hardware, OS, and kernel issues
  • History of improving system efficiency within different subsystems (network, storage, security)
  • Strong familiarity with sysctls, cgroups, iommu, init systems, seccomp/apparmor
  • Ability to effectively prioritize and communicate proposed features and fixes
  • Strong passion for automation, with a commitment to automating processes comprehensively
  • Excellent documentation skills and attention to detail
  • Strong analytical and problem-solving abilities

Nice-to-haves:

  • Experience with kexec, kpatch, kdump
  • Experience building CI/CD pipelines (GitHub or GitLab)
  • Opinions about software version control and team collaboration
  • Experience writing software tests

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $130,000/year in our lowest geographic market up to $210,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.  

CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.

CoreWeave is seeking a highly skilled and motivated Senior Systems Engineer to join our Kernel HAVOCK Team. In this role, you will play a crucial part in the design, development, and optimization of our bare-metal systems from POST through joining a Kubernetes cluster. The team’s primary responsibilities include maintaining a custom Linux kernel, various OS images (Ubuntu-based), the virtualization stack (kubevirt/qemu/vfio), and the container/pod runtime stack (containerd/nydus/kubelet). You will collaborate closely with cross-functional teams, up stack engineering teams, and stakeholders to ensure the successful delivery of highly performant and reliable software solutions.

Kernel Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet

Our Team’s Stack:

  • Linux Kernel (custom build, currently tracking Ubuntu HWE)
  • Intel/AMD CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
  • KubeVirt, QEMU, SR-IOV, vfio-pci
  • Ubuntu 22.04
  • Containerd, Kubelet

Responsibilities:

  • Develop and maintain tooling to build custom Linux kernels and stateless OS images
  • Automate packaging of critical components (drivers, microcode, components with out-of-tree patches, etc)
  • Serve as a senior point of contact for hardware issue escalation and troubleshooting
  • Collaborate with cross-functional teams to define Linux and OS requirements, specifications, and system architecture
  • Analyze and optimize the performance of bare-metal and virtualized systems, identify bottlenecks, and propose improvements for enhanced efficiency

Requirements:

  • Must have at least 5 years of professional experience maintaining large fleets of Linux servers
  • Deep professional experience with troubleshooting and debugging hardware, OS, and kernel issues
  • History of improving system efficiency within different subsystems (network, storage, security)
  • Strong familiarity with sysctls, cgroups, iommu, init systems, seccomp/apparmor
  • Ability to effectively prioritize and communicate proposed features and fixes
  • Strong passion for automation, with a commitment to automating processes comprehensively
  • Excellent documentation skills and attention to detail
  • Strong analytical and problem-solving abilities

Nice-to-haves:

  • Experience with kexec, kpatch, kdump
  • Experience building CI/CD pipelines (GitHub or GitLab)
  • Opinions about software version control and team collaboration 
  • Experience writing software tests

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $130,000/year in our lowest geographic market up to $210,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.  

Receive Tech Ladies'
newest jobs in your inbox,
every week.

Join Tech Ladies for full-access to the job board, member-only events, and more!

If you're already a member, we haven't forgotten you. We promise. It's a new system. If you fill out the form once, it'll remember you going forward. Apologies for the inconvenience.

Roseland
Roseland
No items found.
Engineering
Engineering
IT
IT
Remote
Remote