🛠️ SRE & DevOps Tools — Free, Browser-based

Free SRE & DevOps Tools for
Site Reliability Engineers

Nine free, browser-based tools for SRE and DevOps engineers — SLA uptime calculator, error budget tracker, CIDR subnet calculator, curl builder, HTTP status codes, P99 latency percentiles, bandwidth calculator and more. All client-side. No sign-up. No data sent to servers.

🛠️ Try My SRE Assistant →Browse all 72 tools

✦ AI-POWERED — Ask in plain English, get exact numbers

My SRE Assistant

"Our SLA is 99.9% and we've been down 28 minutes this month — how much error budget is left?"

Open Assistant →

All SRE & DevOps Tools

9 tools · All free · Client-side only · No sign-up required

📊

SLA / Uptime Calculator

Calculate allowed downtime for 99.9%, 99.95%, 99.99%, and 99.999% SLAs. Daily, weekly, monthly, yearly breakdowns with error budget context.

SLAReliability

Open →

🎯

Error Budget Calculator

Track error budget consumption and remaining budget for any SLA. Visual burn rate indicator — red when you are burning too fast.

SRESLO

Open →

🌐

CIDR / Subnet Calculator

Network address, broadcast, subnet mask, wildcard mask, first and last usable IP, and host count from any CIDR notation.

Build curl commands visually — select method, add headers, set auth, paste body. Generates a ready-to-run curl command.

APIDebugging

Open →

💾

Byte & Bandwidth Calculator

Convert between bytes, KB, MB, GB, TB and PB instantly. Calculate file transfer times at any bandwidth speed.

Searchable reference of 40+ common network ports — SSH, HTTP, MySQL, Redis, Kafka, Kubernetes, Elasticsearch and more.

Complete reference with plain-English descriptions and fix guidance for every HTTP status code. Filter by 1xx through 5xx.

APIDebugging

Open →

🔄

YAML ↔ JSON Converter

Convert JSON to YAML for Kubernetes manifests, Docker Compose, and GitHub Actions — or YAML back to JSON. Instant, browser-based.

ConfigKubernetes

Open →

📈

Percentile Calculator

Calculate P50, P75, P90, P95, P99, P99.9 from any latency dataset. Outlier detection flags tail latency issues automatically.

LatencySLO

Open →

What are SRE Tools and Why Do You Need Them?

Site Reliability Engineering (SRE) is a discipline that applies software engineering principles to infrastructure and operations problems. SRE teams are responsible for the availability, latency, performance, efficiency, and reliability of production services.

Every SRE team needs to calculate SLAs, track error budgets, plan network capacity, analyse latency distributions, and debug API failures. These calculations are typically done in spreadsheets or mental math — our tools make them instant, accurate, and accessible from any browser.

Whether you are in the middle of an incident calculating your remaining error budget, planning a new VPC with the right CIDR blocks, or debugging a 502 Bad Gateway error at 2am — these tools give you the correct numbers immediately without opening a spreadsheet or writing code.

All tools are browser-based. No data leaves your machine. No sign-up or account required. Use them on any device — desktop, tablet, or mobile.

Common SRE Workflows

🚨

During an Incident

Open Error Budget Calculator
Enter your SLA % and window (30 days)
Enter minutes of downtime so far
See remaining budget and burn rate
Or use My SRE Assistant for all in one chat

🏗️

Planning a New VPC

Open CIDR / Subnet Calculator
Enter your VPC CIDR (e.g. 10.0.0.0/16)
Plan subnets per availability zone
Verify usable host counts
Use /24 for standard subnets (254 hosts)

📊

Analysing API Latency

Collect latency samples from your APM
Open Percentile Calculator
Paste values (one per line or comma separated)
Read P50, P95, P99, P99.9 instantly
Check outlier warning for tail latency issues

🔍

Debugging API Errors

Open HTTP Status Code Reference
Search for the status code you are seeing
Read plain-English description and fix guidance
Use Curl Builder to reproduce the failing request
Check Port Reference for firewall rules

Frequently Asked Questions

What is an SRE tool?

SRE (Site Reliability Engineering) tools help engineers measure, track, and improve service reliability. Common SRE tools calculate SLA uptime allowances, track error budget consumption, analyse latency percentiles (P50/P95/P99), and plan network capacity. These calculators are essential for teams following Google's SRE practices of setting SLOs, measuring SLIs, and managing error budgets.

What is an error budget and how do I calculate it?

An error budget is the maximum allowed unreliability for a service within a rolling time window. If your SLA is 99.9% over 30 days, your error budget is 43.8 minutes of downtime. Error budget consumed = downtime experienced / total budget × 100%. When budget is exhausted, SRE teams freeze feature releases and focus on reliability work. Use our Error Budget Calculator to track consumption in real time.

What is the difference between SLA, SLO, and SLI?

SLI (Service Level Indicator) is a quantitative measure of service behaviour — e.g., request success rate or latency. SLO (Service Level Objective) is an internal reliability target — e.g., 99.9% requests succeed within 200ms. SLA (Service Level Agreement) is an external contract with consequences — e.g., customer refunds if availability drops below 99.5%. Error budgets are derived from SLOs, not SLAs.

Why do SRE teams use P99 instead of average latency?

Average latency hides the worst user experiences. A service with 200ms average latency might have P99 latency of 8 seconds — meaning 1 in 100 users experiences severe slowness. P99 represents the 99th percentile: 99% of requests are faster than this value. For APIs handling 1,000 requests/second, P99 directly affects 10 users every second. SRE teams set SLOs based on P95 or P99, not averages.

How do I calculate subnets for an AWS VPC?

For an AWS VPC, start with a /16 CIDR (65,536 IPs) and divide into /24 subnets (256 IPs, 254 usable) per availability zone. A common pattern: one public /24 and one private /24 per AZ, giving 6 subnets across 3 AZs. Use our CIDR Calculator to verify network addresses, broadcast addresses, and usable host counts before creating subnets in the AWS console.

What is a good SLA target for a production API?

99.9% (Three Nines) is standard for most production APIs — it allows 43.8 minutes of downtime per month. 99.95% allows 21.9 minutes per month and is appropriate for business-critical APIs with paying customers. 99.99% (Four Nines) allows only 4.38 minutes per month and requires redundancy, automated failover, and zero-downtime deployments. Five Nines (99.999%) is rare and typically only for telecom infrastructure.

What HTTP status codes mean there is a server problem?

5xx status codes indicate server-side errors. 500 Internal Server Error means an unexpected bug or crash on the server. 502 Bad Gateway means the upstream server returned an invalid response to the proxy — check your backend service health. 503 Service Unavailable means the server is overloaded or in maintenance. 504 Gateway Timeout means the upstream did not respond in time — check for slow database queries or network issues.

Related Developer Tools

SRE and DevOps engineers also use these tools regularly — JSON formatting, diff checking, regex testing, JWT decoding, and more. All free, all browser-based.

JSON Formatter Diff Checker Regex Tester JWT Decoder Base64 Encoder Cron Generator Unix Timestamp UUID Generator Hash Generator

Want to use all these tools in one conversation?

My SRE Assistant lets you ask any reliability question in plain English and chains these tools automatically.

🛠️ Open My SRE Assistant View all AI Agents

Free SRE & DevOps Tools forSite Reliability Engineers

All SRE & DevOps Tools

What are SRE Tools and Why Do You Need Them?

Common SRE Workflows

Frequently Asked Questions

Related Developer Tools

Want to use all these tools in one conversation?

Free SRE & DevOps Tools for
Site Reliability Engineers