Post Job Free
Sign in

Lead Data Engineer

Company:
Go Digital Technology Consulting LLP
Location:
Mumbai, Maharashtra, India
Posted:
May 02, 2024
Apply

Description:

Role: Lead Data Engineer

Location: Mumbai

Experience: 6yrs+

Technologies / Skills:

Advanced SQL, Python and associated libraries like Pandas, Numpy etc., Pyspark, Shell scripting, Data-Modelling, Big data, Hadoop, Hive, ETL pipelines.

Responsibilities:

⢠Proven success in communicating with users, other technical teams, and senior management to

collect requirements, describe data modeling decisions and develop data engineering strategy.

⢠Ability to work with business owners to define key business requirements and convert to user

stories with required technical specifications.

⢠Communicate results and business impacts of insight initiatives to key stakeholders to collaboratively solve business problems.

⢠Working closely with the overall Enterprise Data & Analytics Architect and Engineering practice

leads to ensure adherence with the best practices and design principles.

⢠Assures quality, security and compliance requirements are met for supported area.

⢠Design and create fault-tolerance data pipelines running on cluster

⢠Excellent communication skills with the ability to influence client business and IT teams

⢠Should have design data engineering solutions end to end. Ability to come up with scalable and

modular solutions.

⢠Daily support the team members on Work prioritization, work allocation and resource

utilization.

Required Qualification:

⢠5+ years of hands-on experience Designing and developing Data Pipelines for Data Ingestion or

Transformation using Python (PySpark)/Spark SQL in AWS cloud

⢠Experience in design and development of data pipelines and processing of data at scale.

⢠Advanced experience in writing and optimizing efficient SQL queries with Python and Hive

handling Large Data Sets in Big-Data Environments

⢠Experience in debugging, tunning and optimizing PySpark data pipelines

⢠Should have implemented concepts and have good knowledge of Pyspark data frames, joins, caching, memory management, partitioning, parallelism etc.

⢠Understanding of Spark UI, Event Timelines, DAG, Spark config parameters, in order to tune the

long running data pipelines.

⢠Experience working in Agile implementations

⢠Experience with building data pipelines in streaming and batch mode.

⢠Experience with Git and CI/CD pipelines to deploy cloud applications

⢠Good knowledge of designing Hive tables with partitioning for performance.

⢠Minimum 2+ yrs. experience managing a team of 3+ members on Work prioritization, work

allocation and resource utilization.

Desired Qualification:

⢠Experience in data modelling

⢠Hands on creating workflows on any Scheduling Tool like Autosys, CA Workload Automation

⢠Proficiency in using SDKs for interacting with native AWS services

⢠Strong understanding of concepts of ETL, ELT and data modeling

Full time

Apply