Key Responsibilities
- Design, develop, and maintain scalable data pipelines for ingestion, transformation, and validation of large datasets
- Work with Apache Spark and big data ecosystem to process structured and unstructured data
- Build and optimize ETL/ELT workflows for performance and reliability
- Implement data quality checks, validations, and automation processes
- Develop and manage data solutions using Azure Data Factory, Azure Synapse, and ADLS
- Apply data warehousing concepts (star schema, snowflake schema, dimensional modeling)
- Collaborate with cross-functional teams including data analysts, data scientists, and business stakeholders
- Ensure data security, governance, and compliance best practices
Required Skills (Primary)
- Strong proficiency in Python and SQL
- Hands-on experience with Apache Spark and big data frameworks
- Experience building data pipelines, transformations, validations, and automation
- Solid understanding of Data Warehousing concepts
- Experience with:
- Azure Data Factory
- Azure Synapse Analytics
- Azure Data Lake Storage (ADLS)
Secondary / Good-to-Have Skills
- Experience with Azure DevOps
- Strong knowledge of Git version control
- Hands-on experience with CI/CD pipelines
- Familiarity with agile development methodologies
Experience & Qualifications
- Minimum 5+ years of experience in Data Engineering or related field
- Bachelor’s/Master’s degree in Computer Science, Engineering, or related discipline
- Experience working in cloud-based data platforms (preferably Azure)