As a Site Reliability Engineer, you will enhance system reliability, scalability, and performance, ensuring quality releases and serving as a trusted leader across teams.
As a Site Reliability Engineer, you'll be driving strategies and solutions for complex and unique challenges that impact the reliability, scalability, and performance of our systems. You will communicate best practices, set standards, and gain alignment across teams on approaches to improve system health and operational excellence. By serving as an expert and owner in multiple areas of our infrastructure and production environment, you will successfully deliver improvements to architecture, automation, and resilience-from conception through deployment.
You will play a key role in overseeing our weekly release process. This involves coordinating across multiple teams and organizations spanning several time zones to ensure releases are executed smoothly and reliably. You will be responsible for driving alignment, communicating status and blockers, and ensuring that all stages of the release run efficiently from planning through rollout, while maintaining a strong focus on quality, reliability, and customer impact.
You understand the tradeoffs between reliability, velocity, and product requirements, and you work closely with engineering and product stakeholders to deliver solutions that balance these needs. You are passionate about growing others, providing technical guidance, and mentoring teammates. In turn, you are looked to as a trusted technical leader within the team and across engineering when it comes to reliability, scalability, and operational best practices.
About the Team
Our Product and Engineering team works with our award-winning products to help us create a single experience to help customers assess risk, detect threats and automate their security programs at over 11,000 organizations. These teams use best-in-class technology, leading-edge research, and broad, strategic expertise to develop new products and features, and enhance existing features, in order to create value for customers across the world.
The stakes for creating a safer digital world are greater than ever. At Rapid7 we believe it's our responsibility to show up every day and give our best for our customers and the entire security community. Our global engineering teams are at the centre of this mission and are dedicated to building a complete suite of industry-leading products which provide a cohesive platform for our customers. Our range of solutions spans vulnerability management, detection, automation, cloud security, and penetration testing; in order to build these products our teams work with an array of technologies including Java, Python, AWS and Go, just to name a few.
Join our engineering team to help us build and innovate great products on our Insight platform using the latest technologies to make the world a safer digital space.
About the Role
We are seeking a Site Reliability Engineer (SRE) to join our growing engineering team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our on-premise products and supporting services. You'll combine software engineering principles with systems engineering to build and maintain highly available, resilient, and efficient systems.
This is a hands-on role where you will collaborate closely with product engineering, infrastructure, and operations teams to design and support mission-critical systems that serve our customers worldwide. Specifically, your focus will be to:
The skills and qualities you'll bring include:
We know that the best ideas and solutions come from multi-dimensional teams. That's because these teams reflect a variety of backgrounds and professional experiences. If you are excited about this role and feel your experience can make an impact, please don't be shy - apply today.
About Rapid7
At Rapid7, we are on a mission to create a secure digital world for our customers, our industry, and our communities. We do this by embracing tenacity, passion, and collaboration to challenge what's possible and drive extraordinary impact.
Here, we're building a dynamic workplace where everyone can have the career experience of a lifetime. We challenge ourselves to grow to our full potential. We learn from our missteps and celebrate our victories. We come to work every day to push boundaries in cybersecurity and keep our 11,000 global customers ahead of whatever's next.
Join us and bring your unique experiences and perspectives to tackle some of the world's biggest security challenges.
#LI-CG
You will play a key role in overseeing our weekly release process. This involves coordinating across multiple teams and organizations spanning several time zones to ensure releases are executed smoothly and reliably. You will be responsible for driving alignment, communicating status and blockers, and ensuring that all stages of the release run efficiently from planning through rollout, while maintaining a strong focus on quality, reliability, and customer impact.
You understand the tradeoffs between reliability, velocity, and product requirements, and you work closely with engineering and product stakeholders to deliver solutions that balance these needs. You are passionate about growing others, providing technical guidance, and mentoring teammates. In turn, you are looked to as a trusted technical leader within the team and across engineering when it comes to reliability, scalability, and operational best practices.
About the Team
Our Product and Engineering team works with our award-winning products to help us create a single experience to help customers assess risk, detect threats and automate their security programs at over 11,000 organizations. These teams use best-in-class technology, leading-edge research, and broad, strategic expertise to develop new products and features, and enhance existing features, in order to create value for customers across the world.
The stakes for creating a safer digital world are greater than ever. At Rapid7 we believe it's our responsibility to show up every day and give our best for our customers and the entire security community. Our global engineering teams are at the centre of this mission and are dedicated to building a complete suite of industry-leading products which provide a cohesive platform for our customers. Our range of solutions spans vulnerability management, detection, automation, cloud security, and penetration testing; in order to build these products our teams work with an array of technologies including Java, Python, AWS and Go, just to name a few.
Join our engineering team to help us build and innovate great products on our Insight platform using the latest technologies to make the world a safer digital space.
About the Role
We are seeking a Site Reliability Engineer (SRE) to join our growing engineering team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our on-premise products and supporting services. You'll combine software engineering principles with systems engineering to build and maintain highly available, resilient, and efficient systems.
This is a hands-on role where you will collaborate closely with product engineering, infrastructure, and operations teams to design and support mission-critical systems that serve our customers worldwide. Specifically, your focus will be to:
- Ensure the availability, performance, scalability, and security of production systems.
- Design, build, and maintain monitoring, alerting, and automation tools to reduce manual work and improve reliability.
- Collaborate with development teams to design systems that are fault-tolerant, resilient, and observable.
- Manage incident response: triage, troubleshoot, and resolve production issues quickly, driving postmortems and follow-up improvements.
- Optimize release deployment pipelines and CI/CD processes to deliver faster, more efficient, and stable releases with improved safety.
- Contribute to capacity planning, cost optimization, and performance tuning.
- Document runbooks, best practices, and knowledge to improve team efficiency.
- Participate in on-call rotation to ensure 24/7 service reliability.
- Lead and coordinate the weekly release process across multiple teams and time zones, ensuring smooth execution, clear communication, and minimal customer impact.
- Manage observability for product releases. Implement release specific monitoring to catch issues early.
The skills and qualities you'll bring include:
- Proven experience as an SRE, DevOps Engineer, Systems Engineer, or similar role.
- Strong skills in Linux/Unix systems administration.
- Proficiency in at least one programming language (e.g., Python, Go, Java).
- Experience with cloud platforms (AWS, GCP, or Azure).
- Hands-on experience with infrastructure as code (Terraform, Ansible, etc.).
- Knowledge of containers and orchestration tools (Docker, Kubernetes).
- Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK, Datadog, etc.).
- Strong understanding of networking, security, and system design fundamentals.
- Excellent problem-solving, communication, and collaboration skills.
- Core Value Embodiment: Embody our core values to foster a culture of excellence that drives meaningful impact and collective success
We know that the best ideas and solutions come from multi-dimensional teams. That's because these teams reflect a variety of backgrounds and professional experiences. If you are excited about this role and feel your experience can make an impact, please don't be shy - apply today.
About Rapid7
At Rapid7, we are on a mission to create a secure digital world for our customers, our industry, and our communities. We do this by embracing tenacity, passion, and collaboration to challenge what's possible and drive extraordinary impact.
Here, we're building a dynamic workplace where everyone can have the career experience of a lifetime. We challenge ourselves to grow to our full potential. We learn from our missteps and celebrate our victories. We come to work every day to push boundaries in cybersecurity and keep our 11,000 global customers ahead of whatever's next.
Join us and bring your unique experiences and perspectives to tackle some of the world's biggest security challenges.
#LI-CG
Top Skills
Ansible
AWS
Azure
Datadog
Docker
Elk
GCP
Go
Grafana
Java
Kubernetes
Prometheus
Python
Terraform
Similar Jobs at Rapid7
Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
As a Software Engineer Intern, you will collaborate with engineers, contribute ideas to product improvement, and develop innovative solutions using cutting-edge technology.
Top Skills:
GitNon-Relational DatabasesObject-Oriented LanguagesRelational Databases
Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
As an AI Engineer Intern, you will learn to deploy AI/ML solutions, gain AWS cloud experience, and design APIs, while collaborating across teams.
Top Skills:
AIAWSGitMachine LearningPythonSagemaker
Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
As a Senior Software Engineer, you will oversee the development of complex features, mentor junior engineers, and collaborate across teams to ensure high-quality, secure products.
Top Skills:
AWSGoJavaPythonSpringSpring Boot
What you need to know about the Edinburgh Tech Scene
From traditional pubs and centuries-old universities to sleek shopping malls and glass-paneled office buildings, Edinburgh's architecture reflects its unique blend of history and modernity. But the fusion of past and future isn't just visible in its buildings; it's also shaping the city's economy. Named the United Kingdom's leading technology ecosystem outside of London, Edinburgh plays host to major global companies like Apple and Adobe, as well as a growing number of innovative startups in fields like cybersecurity, finance and healthcare.