JD - Engineer Sr Lead, Site Reliability
What you will be doing:
Software Engineer/Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business. In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive banking, payments and investment landscape. Specifically, the Site Reliability Engineer will be responsible for the following:
• Design and maintain monitoring solutions for infrastructure, application performance, and user experience.
• Implement automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
• Ensure application reliability, availability, and performance, minimizing downtime and optimizing response times.
• Lead incident response, including identification, triage, resolution, and post-incident analysis.
• Conduct capacity planning, performance tuning, and resource optimization.
• Collaborate with security teams to implement best practices and ensure compliance.
• Manage deployment pipelines and configuration management for consistent and reliable app deployments.
• Develop and test disaster recovery plans and backup strategies.
• Collaborate with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
• Participate in on-call rotations and provide 24/7 support for critical incidents.
What you bring:
• Proficiency in development technologies, architectures, and platforms (web, API).
• Experience with cloud platforms (AWS, Azure, Google Cloud) and IaC tools.
• Hands-on experience with Docker, Kubernetes.
• Knowledge of monitoring tools (Prometheus, Grafana, DataDog) and logging frameworks (Splunk, ELK Stack).
• Experience in incident management and post-mortem reviews.
• Strong troubleshooting skills for complex technical issues.
• Proficiency in scripting languages (Python, Bash) and automation tools (Terraform, Ansible).
• Experience with CI/CD pipelines (Jenkins, GitLab CI/CD, Azure DevOps).
• Ownership approach to engineering and product outcomes.
• Excellent interpersonal communication, negotiation, and influencing skills.
What you will be doing:
Software Engineer/Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business. In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive banking, payments and investment landscape. Specifically, the Site Reliability Engineer will be responsible for the following:
• Design and maintain monitoring solutions for infrastructure, application performance, and user experience.
• Implement automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments.
• Ensure application reliability, availability, and performance, minimizing downtime and optimizing response times.
• Lead incident response, including identification, triage, resolution, and post-incident analysis.
• Conduct capacity planning, performance tuning, and resource optimization.
• Collaborate with security teams to implement best practices and ensure compliance.
• Manage deployment pipelines and configuration management for consistent and reliable app deployments.
• Develop and test disaster recovery plans and backup strategies.
• Collaborate with development, QA, DevOps, and product teams to align on reliability goals and incident response processes.
• Participate in on-call rotations and provide 24/7 support for critical incidents.
What you bring:
• Proficiency in development technologies, architectures, and platforms (web, API).
• Experience with cloud platforms (AWS, Azure, Google Cloud) and IaC tools.
• Hands-on experience with Docker, Kubernetes.
• Knowledge of monitoring tools (Prometheus, Grafana, DataDog) and logging frameworks (Splunk, ELK Stack).
• Experience in incident management and post-mortem reviews.
• Strong troubleshooting skills for complex technical issues.
• Proficiency in scripting languages (Python, Bash) and automation tools (Terraform, Ansible).
• Experience with CI/CD pipelines (Jenkins, GitLab CI/CD, Azure DevOps).
• Ownership approach to engineering and product outcomes.
• Excellent interpersonal communication, negotiation, and influencing skills.