Responsibilities
- Support end-to-end data needs for all AI modalities, including classic ML, GenAI/LLMs, and agentic AI systems.
- Build robust, scalable data pipelines for structured, semi-structured, and unstructured data, including text, documents, images, audio, video, and logs.
- Develop feature engineering pipelines for classic ML, including feature extraction, transformation, and feature store management.
- Build and optimize GenAI and LLM data pipelines, including embedding generation, vectorization, chunking, metadata extraction, and document enrichment for RAG and context retrieval.
- Develop data ingestion and orchestration workflows that support agentic AI, including memory stores, event-driven pipelines, tool-use data flows, and real-time retrieval services.
- Design and implement advanced data solutions using AWS (S3, Glue, Lambda, EMR, Kinesis), Databricks (Spark, Delta Lake, Vector Search), and Dataiku to enable intelligent systems at scale.
- Implement data governance, quality, lineage, monitoring, and observability to support high-performance, trustworthy AI.
- Partner with data scientists, ML engineers, and AI product teams to deliver datasets for model development, fine-tuning, evaluation, and production inference.
- Optimize pipelines for latency, cost, reliability, and throughput, ensuring AI systems—from batch ML to real-time agents—have the data they need.
Requirements
- Bachelor’s degree in a technical field (CS, Engineering, Math, or related).
- Experience supporting AI at scale across classic ML, GenAI/LLM, and agentic AI systems.
- Experience with vector databases and semantic search (Databricks Vector Search, Pinecone, FAISS, Milvus, OpenSearch).
- Experience with unstructured data technologies (OCR, NLP pipelines, computer vision data processing).
- Hands-on experience with Dataiku for automation, workflow orchestration, and AI project management.
- Knowledge of MLOps tooling: MLflow, Delta Lake, experiment tracking, CI/CD for ML.
- Understanding of agentic AI system patterns, such as memory architectures, tool APIs, event-driven workflows, and reasoning chain data requirements.
- Strong analytical mindset, attention to detail, and commitment to high data quality.
- Ability to thrive in a fast-paced, evolving AI environment and collaborate across cross-functional teams.
Benefits
- Employer-subsidized Medical, Dental, Vision, and Life Insurance; Short-Term and Long-Term Disability; 401(k) match, Flexible Spending Accounts, Health Savings Accounts, EAP, and Educational Assistance; Parental Leave, Paid Time Off (for vacation, personal business, sick time, and parental leave), and 12 Paid Holidays.
Work Arrangement
Hybrid
Additional Information
- Must be US Citizen due to contractual requirements.
- The application period for the job is estimated to be 40 days from the job posting date; however, this may be shortened or extended depending on business needs and the availability of qualified candidates.
