Back to Steadfast Cloud
Interactive Demo
VALET SRE Dashboard
Live
VALET Metrics Overview
V - Volume Stat
12.4K
req/s | SLO: 15K peak
A - Availability Stat
99.92%
SLO: 99.95% | At Risk
L - Latency (p99) Stat
142ms
SLO: <200ms p99
E - Error Rate Stat
0.08%
5xx | SLO: <0.1%
T - Open Tickets Stat
3
SLO: <5/week
SLO Status Stat
AT RISK
Error Budget
Budget Remaining Gauge
34.4%
of 30-day budget
Error Budget Burn Rate Time Series
Actual Burn Ideal Burn Budget Limit
Budget Stats Stat
Consumed 65.6%
Remaining 34.4%
Days Left 9
Burn Rate 1.4x
Latency Analysis
Latency Percentiles Over Time Time Series
p50 (45ms) p90 (98ms) p99 (142ms) SLO (200ms)
Latency Distribution Histogram
0ms 50ms 100ms 150ms 200ms 250ms+
Traffic & Errors
Request Volume Time Series
Requests/min
Error Rate & Count Time Series
5xx Errors 4xx Errors
Service Health
All Services VALET Status Table
Service Volume Availability Latency (p99) Error Rate Tickets Status
Cart Service 2.1K 99.98% 95ms 0.02% 0 OK
Inventory API 3.5K 99.96% 175ms 0.04% 1 At Risk
Checkout Service 1.8K 99.97% 88ms 0.03% 0 OK
Product Catalog 4.2K 99.89% 215ms 0.08% 3 Breached
Search Service 2.8K 99.95% 112ms 0.05% 1 OK
Service Health Timeline State Timeline
Cart
Inventory
Checkout
Catalog
Search
24h ago 12h ago Now
SLO Events
Recent SLO Events Table
Time Service Type Metric Value Threshold
14:32 Product Catalog BREACH Availability 99.89% 99.95%
12:15 Inventory API WARNING Latency 185ms 200ms
09:48 Cart Service RECOVERED Errors 0.05% 0.1%
08:22 Search Service WARNING Volume 14.2K 15K
Breaches (7 Days) Bar Gauge
Catalog
4
Inventory
2
Search
1
Cart
0
Checkout
0
Mean Time to Recovery Stat
23
minutes
↓ 8 min from last week

VALET Framework Reference

Letter Metric Description CloudWatch Source SLO Target
V Volume Request throughput capacity - ensures system can handle expected load ALB RequestCount Peak: 15K req/s
A Availability Service uptime percentage - measures successful request ratio 1 - (5xx / Total) 99.95%
L Latency Response time at p99 - ensures consistent user experience ALB TargetResponseTime p99 < 200ms
E Errors 5xx error rate - tracks server-side failures ALB HTTPCode_ELB_5XX_Count < 0.1%
T Tickets Manual intervention count - measures operational toil CloudWatch Logs / Custom < 5/week