As a Senior Data Engineer focused on AI and machine learning, you will lead the development of advanced data architectures that empower autonomous AI agents. Your work will center on creating scalable, intelligent systems that support context-aware retrieval, multi-step reasoning, and adaptive knowledge processing across diverse enterprise domains.
Key Responsibilities
- Architect and optimize data pipelines tailored for generative and agentic AI, enabling real-time context retrieval, planning, and decision workflows
- Design and maintain knowledge bases, vector databases, and graph-based storage systems to support cross-domain data integration
- Build retrieval-augmented reasoning pipelines using embedding models, contextual segmentation, and retrieval orchestration for LLM-powered agents
- Integrate structured and unstructured data sources—including documents, emails, and APIs—into searchable, continuously updated knowledge stores
- Develop feedback mechanisms that allow AI agents to refine and validate their knowledge over time
- Collaborate with AI/ML, data science, and product teams to align data strategies with evolving agent capabilities
- Implement schema validation, metadata enrichment, and lineage tracking to ensure data consistency and auditability
- Establish monitoring systems to evaluate retrieval accuracy, coverage, and alignment between models and data
- Work closely with governance and security teams to uphold Responsible AI principles and access controls
- Document data models, pipelines, and system designs to support long-term maintainability and team collaboration
- Stay current with emerging trends in graph-based reasoning, embedding techniques, and autonomous agent memory systems
- Follow internal security protocols and adhere to organizational data handling standards
Qualifications
You bring deep technical experience in knowledge base engineering, retrieval-augmented reasoning (RAQ/RAG), and generative AI data infrastructure. You have a proven track record building intelligent data systems that support dynamic AI behaviors, connecting disparate enterprise data into unified, updatable knowledge environments. You're skilled in developing high-fidelity, auditable pipelines and collaborating across technical teams. A strong grasp of data quality, traceability, and schema management is essential, along with familiarity with corporate security practices.
