AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT THE ROLE
We are looking for a
SRE Operations Engineer
to maintain reliability across a cloud-based SaaS platform. You’ll handle live incidents, improve observability, and reduce toil through automation using Kubernetes, Terraform, Grafana, and AWS. Hands-on, execution-focused, with real ownership across CI/CD pipelines, GitOps workflows, and on-call rotations.
WHAT YOU WILL DO
- Monitor and support production and staging environments to ensure availability, performance, and stability;
- Respond to incidents, perform triage and root cause analysis, and contribute to remediation efforts;
- Participate in on-call rotations with defined SLAs;
- Handle operational requests from internal teams;
- Maintain and improve monitoring, alerting, dashboards, logs, and metrics;
- Support CI/CD pipelines, production releases, and GitOps workflows;
- Contribute to automation initiatives to reduce operational overhead;
- Maintain and improve Kubernetes-based infrastructure and containerized workloads;
- Support Infrastructure as Code practices and environment improvements.
MUST HAVES
- 2+ years of experience in
Site Reliability Engineering, DevOps, or Production Operations
;
- Experience with
AWS
supporting production environments;
- Experience supporting production SaaS applications;
- Strong understanding of
CI/CD systems
(GitHub Actions, Jenkins, CircleCI);
- Experience with
GitOps
and Git fundamentals;
- Experience using
GitHub, Jira, and Confluence
;
- Experience with
Kubernetes
(EKS, kOps or similar);
- Experience with
Docker and containerization
;
- Experience with
observability tools
(Grafana, Prometheus, Loki, PagerDuty);
- Proficiency in scripting (
Bash, Python, or Go
);
- Experience with
Infrastructure as Code
(Terraform, Helm);
- Ability to work within structured operational processes and SLAs;
- Strong written and verbal English communication skills;
- Self-driven with a growth mindset.
NICE TO HAVES
- AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator;
- Experience with multi-tenant SaaS environments;
- Experience working in globally distributed teams;
- Familiarity with ChatOps practices;
- Experience improving monitoring quality and reducing alert fatigue.
PERKS AND BENEFITS
-
Professional growth:
Mentorship, TechTalks, and personalized growth roadmaps.
-
Competitive compensation:
USD-based pay with education, fitness, and team activity budgets.
-
Exciting projects:
Modern solutions with Fortune 500 and top product companies.
-
Flextime:
Flexible schedule with remote and office options.
Meet Our Recruitment Process
It includes main stages:
Application
â
Coding Challenge
â
Video Interview
â
Technical Interview or Interview with the Hiring Manager(s).
Each step helps us understand your skills and overall fit.
If it’s a match, you’ll receive an offer.