Responsibilities
Architecture & Technical Leadership
Own the end-to-end architecture for RAG + agentic workflows (Plan → Execute → Verify) across enterprise use cases (contracts, PDFs, knowledge bases).
Define architecture standards for multi-tenant isolation, API design, service boundaries, and integration patterns.
Lead technical decision-making: build vs buy, model strategy (hosted vs open-weights), tooling selection, and performance/cost tradeoffs.
Drive architecture reviews, mentor engineers/researchers, and raise the overall bar for engineering quality and research rigor.
RAG & Retrieval Systems (Enterprise-grade)
Design retrieval pipelines that optimize grounded accuracy: chunking strategy, hybrid retrieval, reranking, query rewriting, and context construction.
Define document ingestion patterns (PDF parsing, OCR, structured extraction, metadata enrichment) and index lifecycle strategies.
Establish retrieval evaluation and regression frameworks (ground truth, offline/online evaluation, drift tracking).
Enable async and event-driven architectures for long-running tasks using queues/streams (Kafka/RabbitMQ/Redis Streams) and/or durable workflow engines (Temporal).
Inference & Platform Engineering
Architect model serving for high throughput and low latency using engines like vLLM / TGI / Triton / TorchServe (as applicable).
Define GPU orchestration and capacity strategy on Kubernetes (AKS/EKS/GKE), including scale-to-zero, scheduling, and quota-based governance.
Design platform-level controls for rate limiting, caching, backpressure, and cost containment (tenant quotas, token budgets, throttling).
Safety, Guardrails, Security & Compliance
Own guardrail architecture for prompt injection defense, tool safety, policy enforcement, and PII handling (redaction patterns).
Define secure-by-default patterns: secrets management, data protection, audit logs, and safe prompt/tool execution boundaries.
Partner with security/compliance teams to meet enterprise standards (e.g., SOC2/GDPR expectations where relevant).
Observability, Reliability & Operational Excellence
Establish SLOs and production readiness standards: error budgets, runbooks, incident response patterns.
Define observability strategy across LLM calls and agent tools: tracing, metrics, logs, cost dashboards, and token usage reporting.
Build reliability patterns for dependency failure (model provider downtime, throttling): circuit breakers, fallbacks, degradation strategies.
13+ years of experience in ML systems / platform engineering / architecture roles, with ownership of production-grade systems.
Strong software engineering fundamentals: APIs, distributed systems patterns, testing, versioning, CI/CD, and operational readiness.
Hands-on experience with Kubernetes and Docker and cloud-native design (Azure/AWS/GCP).
Strong experience designing event-driven and async architectures with durable execution patterns (queues/workflows).
Proven ability to lead architecture for complex systems involving ML/LLMs, data pipelines, and multi-service integration.
Strong Python proficiency; comfortable with async patterns and structured validation (e.g., Pydantic-style design).
Preferred Qualifications
Deep experience with RAG (retrieval + grounding + reranking) and evaluation techniques for hallucinations and answer quality.
Experience with agent frameworks and multi-step tool execution patterns (plan/execute/verify, tool routing, loop prevention).
Experience with open-weight models and adaptation methods (e.g., PEFT/LoRA), plus evaluation-driven iteration.
Experience with model inference optimization (throughput, batching, caching) and GPU efficiency management. Experience operating observability stacks (OpenTelemetry, Prometheus/Grafana, Datadog) and LLM tracing tools.
Icertis is the global leader in AI-powered contract intelligence. The Icertis platform revolutionizes contract management, equipping customers with powerful insights and automation to grow revenue, control costs, mitigate risk, and ensure compliance - the pillars of business success. Today, more than one third of the Fortune 100 trust Icertis to realize the full intent of millions of commercial agreements in 90+ countries.