Senior Software Engineer, HPC Network
CoreWeave is a specialized cloud provider focused on GPU accelerated use cases including VFX, AI/ML, Batch Processing and Real Time Experiences. We support countless AI/ML services in the text to image, NLP and broader AI/ML space, reducing client’s infrastructure management requirements with our Kubernetes based serverless GPU cloud offerings.
Job Description
About the Role:
Our HPC Network teams have a maniacal focus on delivering world-class network infrastructure by way of top notch automation on top of modern architectural and design concepts. Our goal is to build the most resilient, high performance network fabrics possible to accelerate our unique, and bleeding edge AI, ML, and VFX workloads for our customers, but also have fun doing it! CoreWeave maintains and runs numerous cloudscale datacenter fabrics, which are central to the direct success of our customers' workloads.
As our next amazing Engineer, you will be responsible for helping design, develop, and implement tooling to integrate our InfiniBand Fabrics with the rest of our stack. Your day to day will consist of writing code in close cooperation with our HPC Network Engineering team to build, operate, and monitor CoreWeave’s Infiniband fabrics. Your goal is to make our network so highly automated and intelligent, that you forget it’s even there.
Wondering if you’re a good fit?We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match.Here are some qualities we’ve found compatible with our team. If a portion of this resonates with you, we’d love to talk.
4+ years experience in the following areas:
- A basic understanding of computer networks and networking.
- Experience supporting large infrastructure projects.
- Familiarity with cloud native tooling and protocols such as:
+ Helm, ArgoCD, Prometheus, Grafana, Alert Manager, REST/gRPC APIs.
- Good understanding and working knowledge of Linux.
- Python & Shell scripting are required, Go is a big plus.
- Using Kubernetes to automate all things is second nature to you:
+ You create controllers and operators at any time
+ Can recite "Kubernetes API Conventions" by heart
- A great attitude, and a willingness to help those more junior, and learn from those more senior.
Nice to Have:
Previous experience with the following would be greatly appreciated:
- CI/CD
- Software Development
- Monitoring
- HPC
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $160,000-$210,000. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.
CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.
About the Role:
Our HPC Network teams have a maniacal focus on delivering world-class network infrastructure by way of top notch automation on top of modern architectural and design concepts. Our goal is to build the most resilient, high performance network fabrics possible to accelerate our unique, and bleeding edge AI, ML, and VFX workloads for our customers, but also have fun doing it! CoreWeave maintains and runs numerous cloudscale datacenter fabrics, which are central to the direct success of our customers' workloads.
As our next amazing Engineer, you will be responsible for helping design, develop, and implement tooling to integrate our InfiniBand Fabrics with the rest of our stack. Your day to day will consist of writing code in close cooperation with our HPC Network Engineering team to build, operate, and monitor CoreWeave’s Infiniband fabrics. Your goal is to make our network so highly automated and intelligent, that you forget it’s even there.
Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match. Here are some qualities we’ve found compatible with our team. If a portion of this resonates with you, we’d love to talk.
4+ years experience in the following areas:
- A basic understanding of computer networks and networking.
- Experience supporting large infrastructure projects.
- Familiarity with cloud native tooling and protocols such as:
- Helm, ArgoCD, Prometheus, Grafana, Alert Manager, REST/gRPC APIs.
- Good understanding and working knowledge of Linux.
- Python & Shell scripting are required, Go is a big plus.
- Using Kubernetes to automate all things is second nature to you:
- You create controllers and operators at any time
- Can recite "Kubernetes API Conventions" by heart
- A great attitude, and a willingness to help those more junior, and learn from those more senior.
Nice to Have:
Previous experience with the following would be greatly appreciated:
- CI/CD
- Software Development
- Monitoring
- HPC
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $160,000-$210,000. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.