Datavail Infotech
Senior Site Reliability Engineer
Job Description
Responsibilities
- Define and maintain SLIs/SLOs, monitor alignment and error budget usage
- Lead incident response and postmortems, implement corrective measures
- Automate operations tasks via tooling (e.g. auto-remediation, scaling rules)
- Build, improve, and maintain CI/CD pipelines, canary deployments, blue/green strategies
- Lead technical discussions with customers to align on reliability, scalability, and performance requirements
- Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes
- Implement and extend observability systems (metrics, tracing, log aggregation)
- Optimize performance and cost by tuning cloud services, autoscaling, resource rightsizing
- Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments
- Collaborate with dev teams to integrate ...