Responsibilities
- Design and develop scalable batch and near-real-time ETL/ELT pipelines using Databricks (AWS) and Apache Spark (PySpark, Spark SQL, Structured Streaming).
- Modernize legacy SQL/Hive/stored procedure workflows into distributed Spark-native architectures.
- Perform Spark performance tuning, including:
- Build structured streaming pipelines using Kafka and Spark Structured Streaming.
- Design dimensional data models (Fact/Dimension, SCD Type 2).
- Orchestrate pipelines using Databricks Workflows / Apache Airflow.
- Integrate CI/CD pipelines using Jenkins, Git, Bitbucket/GitHub for automated deployment across DEV/UAT/PROD.
Requirements
- Experience in financial services, regulatory reporting, or enterprise data platforms.
- Hands-on experience in Delta Lake optimization and incremental processing strategies.
- Experience with Snowflake data warehousing.
- Strong understanding of distributed computing principles.
Nice to Have
- Databricks Certification (Professional level preferred).
