DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
How an AI Terminal Assistant Became My Team's Most Productive Engineer - Opencode + Claude + MCP

How an AI Terminal Assistant Became My Team's Most Productive Engineer - Opencode + Claude + MCP

2
Comments 2
13 min read
The SRE's Guide to Surviving Tool Sprawl

The SRE's Guide to Surviving Tool Sprawl

Comments
2 min read
I built an AI incident copilot that does not store your production logs

I built an AI incident copilot that does not store your production logs

Comments
6 min read
Protective Computing: Software Should Fail Safely Under Stress

Protective Computing: Software Should Fail Safely Under Stress

Comments
5 min read
99.9% uptime is 43 minutes a month. Do you know your number?

99.9% uptime is 43 minutes a month. Do you know your number?

Comments
4 min read
I Reduced Our Alert Volume by 90% — Here's the Playbook

I Reduced Our Alert Volume by 90% — Here's the Playbook

Comments
2 min read
When Everything Is Broken, Where Do You Start?

When Everything Is Broken, Where Do You Start?

1
Comments
5 min read
A single probe saying "down" shouldn't wake you at 3am

A single probe saying "down" shouldn't wake you at 3am

Comments
2 min read
Auto-verifying your AI-SRE's fixes (Part II): HolmesGPT end-to-end on a real cluster

Auto-verifying your AI-SRE's fixes (Part II): HolmesGPT end-to-end on a real cluster

17
Comments 1
5 min read
Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams

Comments
3 min read
Postmortem: When AI Meets Resilience — AWS Resilience Hub and SRE

Postmortem: When AI Meets Resilience — AWS Resilience Hub and SRE

Comments
10 min read
Capacity Planning Without ML: The 80/20 Approach

Capacity Planning Without ML: The 80/20 Approach

Comments
2 min read
EBS gp2 burst credits ran dry and our builds slowed to a crawl

EBS gp2 burst credits ran dry and our builds slowed to a crawl

1
Comments
4 min read
Amazon Linux 2 is EOL on June 30, 2026 — here's everything that breaks

Amazon Linux 2 is EOL on June 30, 2026 — here's everything that breaks

Comments
2 min read
Semantic caching our flaky-test summariser: 58% fewer LLM calls

Semantic caching our flaky-test summariser: 58% fewer LLM calls

Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.