Shape the future of AI evaluation by designing and building core infrastructure for testing and measuring large language models. You'll work across the stack to create reliable, high-performance systems that support structured experimentation and real-time analytics.
What You’ll Do
- Develop and maintain scalable APIs using Django Ninja and build responsive user interfaces with Next.js and React.
- Design intuitive workflows for managing experiments, including input handling, evaluation logic, and multi-provider execution.
- Build interactive dashboards that surface key insights such as rating trends, regression patterns, and compliance metrics.
- Enhance a Python-based SDK to streamline integration into AI development pipelines, enabling seamless experiment creation and result retrieval.
- Optimize asynchronous processing, database queries, and containerized deployments to support high-volume evaluation workloads.
- Collaborate on user experience decisions that make batch testing, scheduling, and team reviews accessible and efficient.
- Improve code quality through robust CI/CD pipelines, containerization practices, and architectural consistency.
- Explore practical applications of emerging AI technologies to enhance developer tooling and evaluation accuracy.
What We Need From You
- Proven experience shipping end-to-end features using modern web frameworks like React, Next.js, Django, or FastAPI.
- Strong foundation in software engineering principles—API design, data modeling, testing, and performance optimization.
- Familiarity with Docker, Kubernetes, and CI/CD workflows.
- Interest in AI and its role in software development workflows.
- Ability to work on-site at least three days per week in Berlin or Bremen, with initial travel to Bremen for onboarding.
- Fluency in English (B2 or higher).
- Valid authorization to work in the EU.
What Sets You Apart
- Hands-on experience with LLM applications, evaluation frameworks, or developer SDKs.
- Background in building data-intensive dashboards or working with async task queues like Celery and PostgreSQL at scale.
- German language proficiency.
- Experience with privacy-conscious or on-premises deployment scenarios.
Technology Stack
Python, Django, Django Ninja, React, Next.js, PostgreSQL, Docker, Kubernetes, Celery, LLM providers, CI/CD pipelines
Why This Matters
You’ll have full ownership of critical features, working closely with AI researchers and ML engineers to solve complex challenges in distributed systems and real-time tracking. Your work will directly influence how teams evaluate, refine, and deploy AI applications.
Compensation includes a competitive salary and participation in a Virtual Stock Option Program (VSOP), giving you a tangible stake in the company’s growth. You’ll also get fast access to cutting-edge AI tools, modern hardware, and a productive work environment designed to support innovation.
Work Environment
This is a hybrid role requiring at least three days per week on-site in either Berlin or Bremen. The Bremen office offers waterfront views, refreshments, and even a boat onsite. A new Berlin office opens in late 2025. Initial onboarding includes travel to the Bremen headquarters.
Our Culture
We value engineering excellence, ownership, collaboration, and continuous learning. We embrace diverse perspectives and are committed to building an inclusive environment where innovation thrives. If you're passionate about AI infrastructure and want to help define industry standards, this is your opportunity to grow with transformative technology.


