Operate, monitor, and
support Google Cloud Platform (GCP) environments to ensure high availability,
performance, and security of cloud services.
Manage day-to-day cloud
operations including provisioning, configuration, maintenance, and
troubleshooting of GCP resources.
Support
Apigee API
Management
operations, including API deployment, policy configuration,
security enforcement, monitoring, and analytics reporting.
Assist in the operational
support of
Vertex AI and Gen AI platforms
, including model lifecycle
activities, environment support, governance controls, and integration
monitoring.
Monitor and support
BigQuery
integrations
, ensuring data pipelines, access controls, and performance are
maintained.
Perform
incident
management
, root cause analysis, and problem resolution in line with ITIL
processes; participate in on-call and escalation support as required.
Implement and maintain
observability
and monitoring
solutions (logging, metrics, alerts, dashboards) to
proactively identify and resolve issues.
Manage and support
Kubernetes
clusters
and Linux-based systems, including patching, upgrades, capacity
management, and security hardening.
Execute routine operational
tasks such as backups, recovery testing, system health checks, and environment
validation.
Support
FinOps and cost
optimization
initiatives by monitoring cloud spend, implementing tagging
standards, right-sizing resources, and generating cost reports.
Create and maintain
operational documentation, runbooks, SOPs, and compliance reports to support
audit and governance requirements.
Collaborate with cloud
engineers, security teams, and service delivery managers to support change
management, releases, and continuous improvement initiatives.
Ensure adherence to
organizational security policies, compliance standards, and best practices
across all cloud operations.
Requirements
Qualifications
Bachelor’s degree in Computer Science, Computer
Engineering, Information Systems, or a related field. Relevant certifications
are a plus.
Key
Requirements
6 to 10 years of experience in IT
infrastructure or cloud operations.
Working knowledge of Google Cloud
Platform (GCP) services, especially Apigee (API management, Security Policies,
Analytics), Vertex AI, Gen AI (Model lifecycle, governance), BigQuery
Integration.
Hand-on experience with observability
and monitoring, incident response, and routine cloud operations / maintenance.
Hands-on experience in managing
Kubernetes clusters and Linux operating system. Familiarity with basic
scripting (e.g., Shell, Python) would be an advantage.
Understanding FinOps practices and cost
optimization techniques (right-sizing, tagging, cost governance) in GCP.
Strong documentation and reporting
skills; adherence to ITIL processes.
Preferred certifications: Google Cloud Associate
Cloud Engineer, Apigee API Engineer, Linux/Kubernetes certifications.