
Site Reliability Engineer
Link Group
Status
Hexjobs Insights
Senior Site Reliability Engineer responsible for AI-driven applications' reliability. Requires 5+ years experience, Azure DevOps, Kubernetes, and CI/CD expertise.
Schlüsselwörter
About the RoleWe are looking for a Senior Site Reliability Engineer who will take end-to-end ownership of reliability for AI-driven applications and pipelines. This is a hands-on engineering role, not a coordination or ticket-driven position. The ideal candidate actively diagnoses, resolves, and automates production issues rather than only designing solutions.Requirements5+ years as SRE / Production / Platform EngineerStrong incident management & RCA experienceHands-on with: Azure DevOps, Kubernetes, Datadog, Azure, CI/CDProactive, ownership mindset, self-drivenExperience in production environmentsNice to have: AI/LLM pipelines, GrafanaResponsibilitiesBuild and maintain monitoring, alerting, dashboardsLead incident response & root cause analysisEnsure reliability and performance of AI pipelinesStandardize telemetry (latency, failures, throughput)Optimize CI/CD and release qualityReduce recurring incidents with engineering teams
| Veröffentlicht | vor 6 Tagen |
| Läuft ab | in 3 Monaten |
Ähnliche Jobs, die für Sie von Interesse sein könnten
Basierend auf "Site Reliability Engineer"
Keine Angebote gefunden, versuchen Sie, Ihre Suchkriterien zu ändern.