Job description

蔚来AGI超星计划-Agentic Semantic Search推荐投递圣何塞、上海、北京校招实习数字技术硕士及以上蔚来AGI超星计划职位 ID：A34979职位描述课题介绍 About the position Join our AI Platform team to build intelligent agents that unlock the full value of our company's internal knowledge and data. You will work at the intersection of large language models, enterprise data systems, and agentic workflow design — creating tools that allow employees and systems to query, interpret, and act on everything from unstructured documents to live operational databases. This internship is hands-on from day one. You will own a real workstream, collaborate with senior engineers and data architects, and ship features that are used in production before the summer ends. Project scope The core challenge: our company generates enormous volumes of internal data — design specifications, operational runbooks, relational databases, event streams, and data catalogs — yet most of it remains siloed and hard to access programmatically. Your work will focus on building an AI agent layer that bridges natural language with these data sources. Specifically, you will tackle three interconnected areas: - Document intelligence — indexing and retrieving relevant content from design and operations documents using semantic search and RAG pipelines. - Structured data querying — enabling the agent to generate, validate, and execute SQL or API queries against both static (data warehouse) and real-time vehicle structured datasets. - Metadata-aware reasoning — integrating data catalog metadata so the agent understands schema context, data freshness, ownership, and access policies before surfacing results. What you will learn - How to architect multi-step AI agents with tool-use, memory, and planning using frameworks such as LangChain, LlamaIndex, or custom orchestration layers. - Practical techniques for reasoning-based retrieval-augmented generation (RAG) over heterogeneous document collections. - Text-to-SQL generation, query validation, and safe execution against live databases with schema introspection and error-handling loops. - How to work with data catalogs and metadata stores to ground agent reasoning in authoritative schema information and governance constraints. - Evaluation methodology for agentic systems: how to measure accuracy, latency, tool-call efficiency, and hallucination rates in enterprise settings. - Collaborative engineering practices: code review, technical writing, and cross-functional communication with data, product, and operations stakeholders. 预期产出 - Document retrieval agent. A working RAG pipeline over internal design and ops documents with a conversational query interface. - Data query module. A text-to-SQL agent component capable of querying both static warehouse tables and real-time data sources. - Metadata integration layer. A connector links the agent to the data catalog so it can reason about schema, lineage, and freshness. - Evaluation report. A benchmark suite and written analysis covering accuracy, latency, and failure modes of the agent system. - End-of-internship demo. A live presentation to engineering and data leadership showcasing the system and key findings.职位要求Required qualifications - Currently study in graduate school of Computer Science, Data Science, Software Engineering, or a closely related technical field. - Solid Python programming skills, including familiarity with standard data and ML libraries (pandas, NumPy, PyTorch or similar). - Working knowledge of SQL — ability to write, read, and debug queries against relational databases. - Familiarity with large language model APIs (OpenAI, Anthropic, or equivalent) and at least one hands-on project using them. - Understanding of basic information retrieval concepts: similarity search, relevance search, and the fundamentals of RAG. - Ability to read and understand REST API documentation and integrate third-party services in code. - Strong written and verbal communication skills, with the ability to document technical work clearly for both technical and non-technical audiences. Preferred qualifications - Currently pursuing a PhD Program is preferred. - Experience with agentic AI frameworks such as LangChain, LlamaIndex, AutoGen, or comparable libraries. Strong plus - Familiarity with data catalog or metadata management tools (Datahub, Amundsen, Apache Atlas, dbt docs). - Exposure to real-time or streaming data systems such as Apache Kafka, Spark Structured Streaming, or Flink. - Experience with cloud data platforms — Redshift, BigQuery or Databricks. - Knowledge of enterprise data governance concepts: data lineage, access control, PII classification, and schema versioning. - Prior internship or research experience in an AI, data engineering, or ML platform role. Strong plus - Contributions to open-source AI or data tooling projects.投递