Role: Lead Data Engineer
Location: Mumbai
Experience: 6yrs+
Technologies / Skills:
Advanced SQL, Python and associated libraries like Pandas, Numpy etc., Pyspark, Shell scripting, Data-Modelling, Big data, Hadoop, Hive, ETL pipelines.
Responsibilities:
⢠Proven success in communicating with users, other technical teams, and senior management to
collect requirements, describe data modeling decisions and develop data engineering strategy.
⢠Ability to work with business owners to define key business requirements and convert to user
stories with required technical specifications.
⢠Communicate results and business impacts of insight initiatives to key stakeholders to collaboratively solve business problems.
⢠Working closely with the overall Enterprise Data & Analytics Architect and Engineering practice
leads to ensure adherence with the best practices and design principles.
⢠Assures quality, security and compliance requirements are met for supported area.
⢠Design and create fault-tolerance data pipelines running on cluster
⢠Excellent communication skills with the ability to influence client business and IT teams
⢠Should have design data engineering solutions end to end. Ability to come up with scalable and
modular solutions.
⢠Daily support the team members on Work prioritization, work allocation and resource
utilization.
Required Qualification:
⢠5+ years of hands-on experience Designing and developing Data Pipelines for Data Ingestion or
Transformation using Python (PySpark)/Spark SQL in AWS cloud
⢠Experience in design and development of data pipelines and processing of data at scale.
⢠Advanced experience in writing and optimizing efficient SQL queries with Python and Hive
handling Large Data Sets in Big-Data Environments
⢠Experience in debugging, tunning and optimizing PySpark data pipelines
⢠Should have implemented concepts and have good knowledge of Pyspark data frames, joins, caching, memory management, partitioning, parallelism etc.
⢠Understanding of Spark UI, Event Timelines, DAG, Spark config parameters, in order to tune the
long running data pipelines.
⢠Experience working in Agile implementations
⢠Experience with building data pipelines in streaming and batch mode.
⢠Experience with Git and CI/CD pipelines to deploy cloud applications
⢠Good knowledge of designing Hive tables with partitioning for performance.
⢠Minimum 2+ yrs. experience managing a team of 3+ members on Work prioritization, work
allocation and resource utilization.
Desired Qualification:
⢠Experience in data modelling
⢠Hands on creating workflows on any Scheduling Tool like Autosys, CA Workload Automation
⢠Proficiency in using SDKs for interacting with native AWS services
⢠Strong understanding of concepts of ETL, ELT and data modeling
Full time