SLAs, SLOs, and SLIs Explained for Developers
SLA, SLO, SLI — these acronyms get thrown around in meetings, but most developers can't clearly explain the difference. That's a problem, because these concepts directly affect how you build, monitor, and operate your services. Let's fix that.
SLI: Service Level Indicator
An SLI is a measurement. It's a specific metric that quantifies how well your service is performing. Think of it as the raw data.
Common SLIs include:
- Availability — Percentage of successful requests (status code 2xx or 3xx)
- Latency — Percentage of requests completed within a threshold (e.g., under 200ms)
- Error rate — Percentage of requests that resulted in errors
- Throughput — Requests processed per second
The key: SLIs should be measured from the user's perspective, not the server's. A load balancer reporting 100% uptime doesn't matter if users are seeing errors due to a misconfigured backend.
How to measure SLIs
// Calculate availability SLI
const totalRequests = 1_000_000;
const successfulRequests = 997_500;
const availability = (successfulRequests / totalRequests) * 100;
// availability = 99.75%
// Calculate latency SLI (p95)
const requests = getAllRequestDurations();
const sorted = requests.sort((a, b) => a - b);
const p95Index = Math.ceil(sorted.length * 0.95) - 1;
const p95Latency = sorted[p95Index];
// p95Latency = 180ms
SLO: Service Level Objective
An SLO is a target. It's the goal you set for an SLI. "Our availability SLI should be above 99.9% over a 30-day rolling window" is an SLO.
SLOs are internal goals set by your engineering team. They should be:
- Achievable — Don't set 99.999% if your infrastructure can't support it
- Meaningful — Tied to actual user experience
- Measurable — Based on SLIs you can actually track
Common SLO examples
| Service | SLI | SLO Target |
|---|---|---|
| API | Availability | 99.9% over 30 days |
| API | Latency (p95) | Under 200ms |
| Dashboard | Availability | 99.5% over 30 days |
| Webhook delivery | Success rate | 99.95% over 7 days |
Error budgets
The most powerful concept in SLO-based reliability is the error budget. If your SLO is 99.9% availability over 30 days, your error budget is 0.1% — that's about 43 minutes of downtime per month.
Error budgets create healthy tension between shipping features and maintaining reliability. When you've consumed your error budget, you slow down deployments and focus on stability. When you have budget remaining, you can ship with confidence.
SLA: Service Level Agreement
An SLA is a contract. It's a formal agreement with your customers that defines consequences when you fail to meet service levels. SLAs typically include financial penalties — credits or refunds when availability drops below the agreed threshold.
Important: Your SLA should always be less strict than your internal SLO. If your SLO is 99.9%, your SLA might be 99.5%. This gives you a buffer to catch issues before they become contractual violations.
SLA tiers in practice
| Availability | Monthly downtime | Typical use |
|---|---|---|
| 99% | ~7.3 hours | Internal tools, dev environments |
| 99.9% | ~43 minutes | Most SaaS products |
| 99.95% | ~22 minutes | Business-critical APIs |
| 99.99% | ~4.3 minutes | Financial, healthcare |
Putting it all together
Here's how SLIs, SLOs, and SLAs relate in practice:
- You measure SLIs (availability is currently 99.87%)
- You set internal SLOs (target 99.9% availability)
- You make external SLA commitments (guarantee 99.5% to customers)
- You monitor SLIs against SLOs to catch problems before they breach SLAs
How to start tracking
You don't need complex tooling to start with SLIs. Begin with uptime monitoring — track whether your endpoints are returning successful responses from multiple regions. This gives you your availability SLI immediately. If you're looking for a straightforward way to track endpoint availability and response times, PingGuard monitors your services from 3 regions and gives you real-time uptime percentages — the foundation of your SLI measurement. Free for up to 5 endpoints.
Comments
Loading comments...