Datadog Logo

Datadog

Software Engineer - Incident Management

Reposted 5 Days Ago
Be an Early Applicant
Easy Apply
Hybrid
2 Locations
Mid level
Easy Apply
Hybrid
2 Locations
Mid level
The role involves steering the on-call experience, defining incident responses, contributing to post-mortems, training on-callers, and engaging in cross-functional collaborations to enhance incident management.
The summary above was generated by AI

The Incident Management SRE team at Datadog fosters a resilient culture by using incidents as learning opportunities and catalysts for growth. We collaborate closely with teams across departments to enhance on-call experience, incident response, and post-incident analysis, reducing friction and optimizing tooling and processes. Our efforts empower Datadog to navigate unexpected failures confidently, efficiently, and with a commitment to continuous learning and systems improvement.

At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

 

What You’ll Do: 

  • Steer the on-call experience for the company by establishing best practices and building platforms to support on-call rotations and compensation.
  • Define how we respond to incidents and write software to streamline the process, collaborating with product teams as needed. Our aim is to fully support our incident responders in dealing with complexity.
  • Contribute to the post-mortem process for the company, collaborating with teams on writing them, and identifying opportunities to reduce friction and enhance learning value for the organization. Our team also runs a weekly postmortem reading group.
  • Support various teams in facilitating incident reviews that emphasize learning and blamelessness. Help them share their learnings across the organization to improve the resilience of our people.
  • Train our on-callers in incident and post-mortem processes, involving both introducing newcomers to on-call responsibilities and refreshing the knowledge of existing engineers.
  • Engage in cross-functional collaborations with different teams across the organization, embedding in their group for a few weeks to either learn about how work is performed or help them improve on-call practices

Who You Are: 

  • At least 3 years of experience building software that solves real user problems, designing new features with RFCs as well as reviewing others’ code and documents collaboratively. We develop in Go and Python and a bit of TypeScript.
  • Familiarity with Kubernetes and distributed systems, along with an understanding of their potential failure scenarios.
  • Interest in analyzing incidents, identifying broader risk patterns, and effectively sharing findings for others to understand and learn from.
  • Experience being on-call and responding to incidents, iteratively improving incident response processes.
  • Empathy, collaboration, and communication skills in English to cultivate strong relationships across various teams in the organization
  • Willingness to teach and train other engineers on best practices. Experience driving cross-functional change and leading through influence, or a strong interest in doing so.

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply.


Benefits and Growth:

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks, our internal panel discussions
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.


#LI-MF2

About Datadog: 

Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve complexity in the cloud age by enabling digital transformation, cloud migration, and infrastructure monitoring of our customers’ entire technology stacks. Built by engineers, for engineers, Datadog is used by organizations of all sizes across a wide range of industries. Together, we champion professional development, diversity of thought, innovation, and work excellence to empower continuous growth. Join the pack and become part of a collaborative, pragmatic, and thoughtful people-first community where we solve tough problems, take smart risks, and celebrate one another. Learn more about #DatadogLife on Instagram, LinkedIn, and Datadog Learning Center.

Equal Opportunity at Datadog:

Datadog is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and other characteristics protected by law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. Here are our Candidate Legal Notices for your reference. 

Datadog endeavors to make our Careers Page accessible to all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please complete this form. This form is for accommodation requests only and cannot be used to inquire about the status of applications. 

Your Privacy:

Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice.

Top Skills

Go
Kubernetes
Python
Typescript

Similar Jobs at Datadog

2 Hours Ago
Easy Apply
Hybrid
2 Locations
Easy Apply
Senior level
Senior level
Artificial Intelligence • Cloud • Software • Cybersecurity
Lead the Org Management & Governance team, guiding engineers, driving technical roadmaps, and engaging in complex technical problems while fostering an engineering culture of excellence.
Top Skills: Distributed SystemsSoftware Engineering
Yesterday
Easy Apply
Hybrid
2 Locations
Easy Apply
Expert/Leader
Expert/Leader
Artificial Intelligence • Cloud • Software • Cybersecurity
As a Staff Software Engineer, you'll lead the design and delivery of critical systems, ensure efficient performance at scale, and oversee key architectural decisions.
Yesterday
Easy Apply
Hybrid
2 Locations
Easy Apply
Senior level
Senior level
Artificial Intelligence • Cloud • Software • Cybersecurity
Design and build a scalable AI platform for training and deploying models, collaborating with teams for experiments and tooling development.
Top Skills: Ci/CdData EngineeringDistributed SystemsMl FrameworksPython

What you need to know about the Edinburgh Tech Scene

From traditional pubs and centuries-old universities to sleek shopping malls and glass-paneled office buildings, Edinburgh's architecture reflects its unique blend of history and modernity. But the fusion of past and future isn't just visible in its buildings; it's also shaping the city's economy. Named the United Kingdom's leading technology ecosystem outside of London, Edinburgh plays host to major global companies like Apple and Adobe, as well as a growing number of innovative startups in fields like cybersecurity, finance and healthcare.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account