About the Role
The role involves building and maintaining backend systems that power AI training workflows, with a focus on reliability, scalability, and integration across data pipelines.
Responsibilities
- Develop and maintain backend services for AI model training infrastructure
- Design scalable APIs to support machine learning workflows
- Optimize data processing pipelines for efficiency and throughput
- Collaborate with data scientists to implement training logic
- Ensure system reliability and fault tolerance under heavy loads
- Monitor and improve system performance metrics
- Integrate third-party tools and services into the training pipeline
- Write clean, maintainable, and well-documented code
- Troubleshoot production issues across services and platforms
- Support deployment and configuration of training environments
- Implement security best practices in backend systems
- Work closely with frontend teams to align on data requirements
- Manage database schemas and query performance
- Contribute to architectural decisions and technical planning
- Automate repetitive tasks in the training lifecycle
Nice to Have
- Experience with AI/ML training frameworks
- Exposure to MLOps practices and tooling
- Contributions to open-source projects
- Familiarity with message queues like Kafka or RabbitMQ
- Knowledge of monitoring and observability tools
- Experience in high-throughput computing environments
Compensation
Competitive salary and benefits package
Work Arrangement
Remote position with flexible hours
Team
Collaborative engineering team focused on AI-driven solutions
Technology Stack
- Primary languages: Python, Go
- Cloud infrastructure: AWS, GCP
- Containerization: Docker, Kubernetes
- Databases: PostgreSQL, MongoDB
- Monitoring: Prometheus, Grafana
- CI/CD: GitHub Actions, Jenkins
Application Process
- Submit resume and cover letter
- Complete a technical screening
- Participate in coding and system design interviews
- Final review by hiring team
Available for qualified candidates


