About the Role
The role involves developing data pipelines, ensuring data quality, and supporting analytics initiatives through robust infrastructure and tools.
Responsibilities
- Design and implement data pipelines for batch and real-time processing
- Optimize data storage and retrieval across distributed systems
- Ensure data accuracy, consistency, and accessibility
- Collaborate with analysts and scientists to understand data needs
- Develop and maintain ETL workflows
- Monitor system performance and troubleshoot issues
- Support data governance and compliance standards
- Integrate data from multiple sources into centralized repositories
- Write clean, maintainable code for data processing tasks
- Participate in architecture reviews and technical planning
- Document data models, pipelines, and system configurations
- Improve data security and access controls
- Work with cloud-based data platforms and services
- Automate routine data operations and validation checks
- Contribute to testing and deployment of data solutions
Nice to Have
- Master’s degree in a technical discipline
- Experience in healthcare or life sciences data domains
- Knowledge of machine learning pipelines
- Contributions to open-source data projects
- Certifications in cloud or data engineering platforms
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid work model with flexible scheduling
Team
Collaborative environment within a technology-driven team
About the Team
The team focuses on delivering reliable, high-performance data infrastructure to support business intelligence and advanced analytics.
Technology Stack
Uses modern tools including Python, SQL, Apache Airflow, Spark, and cloud-native data services on Google Cloud Platform.