About the Role
The role involves developing data pipelines, ensuring data quality, and enabling efficient access to data for analysts and scientists.
Responsibilities
- Design and implement reliable data pipelines for large-scale datasets
- Collaborate with analytics and machine learning teams to understand data needs
- Build and maintain data storage solutions optimized for query performance
- Ensure data accuracy, consistency, and integrity across systems
- Monitor pipeline health and troubleshoot issues as they arise
- Optimize data workflows for speed and cost efficiency
- Work with streaming and batch data processing technologies
- Support the integration of new data sources into existing infrastructure
- Document data models, pipelines, and system architecture
- Contribute to data governance and security practices
- Improve data accessibility for non-engineering teams
- Evaluate and integrate new data tools and technologies
- Participate in code reviews and system design discussions
- Maintain up-to-date knowledge of data engineering best practices
- Assist in scaling systems to meet growing data demands
- Work closely with product teams to instrument data collection
- Ensure compliance with data privacy standards
- Automate routine data operations and monitoring tasks
- Support the deployment and maintenance of ETL processes
- Collaborate on schema design and database optimization
- Help onboard team members to data platforms
- Contribute to incident response for data-related outages
- Refactor legacy data systems for improved reliability
- Participate in capacity planning for data infrastructure
- Assist in defining metrics and KPIs for data platform performance
Compensation
Competitive salary with equity and benefits package
Work Arrangement
Hybrid work model with office and remote flexibility
Team
Collaborative team focused on data infrastructure and analytics
What We Value
- Ownership of projects from design to deployment
- Clear communication across technical and non-technical roles
- Continuous learning and skill development
- Building systems that are both scalable and maintainable
- A culture of feedback and iterative improvement
Technology Stack
- Python, SQL, and Scala for data processing
- Apache Airflow for workflow orchestration
- Spark for large-scale data transformation
- BigQuery and Redshift for data warehousing
- AWS for cloud infrastructure
- Docker and Kubernetes for deployment
- Git and GitHub for version control
- Datadog and PagerDuty for monitoring
Available for qualified candidates


