Lead Data Engineer

Company:

Go Digital Technology Consulting LLP

Location:

Mumbai, Maharashtra, India

Posted:

May 02, 2024

Apply

Description:

Role: Lead Data Engineer

Location: Mumbai

Experience: 6yrs+

Technologies / Skills:

Advanced SQL, Python and associated libraries like Pandas, Numpy etc., Pyspark, Shell scripting, Data-Modelling, Big data, Hadoop, Hive, ETL pipelines.

Responsibilities:

â¢ Proven success in communicating with users, other technical teams, and senior management to

collect requirements, describe data modeling decisions and develop data engineering strategy.

â¢ Ability to work with business owners to define key business requirements and convert to user

stories with required technical specifications.

â¢ Communicate results and business impacts of insight initiatives to key stakeholders to collaboratively solve business problems.

â¢ Working closely with the overall Enterprise Data & Analytics Architect and Engineering practice

leads to ensure adherence with the best practices and design principles.

â¢ Assures quality, security and compliance requirements are met for supported area.

â¢ Design and create fault-tolerance data pipelines running on cluster

â¢ Excellent communication skills with the ability to influence client business and IT teams

â¢ Should have design data engineering solutions end to end. Ability to come up with scalable and

modular solutions.

â¢ Daily support the team members on Work prioritization, work allocation and resource

utilization.

Required Qualification:

â¢ 5+ years of hands-on experience Designing and developing Data Pipelines for Data Ingestion or

Transformation using Python (PySpark)/Spark SQL in AWS cloud

â¢ Experience in design and development of data pipelines and processing of data at scale.

â¢ Advanced experience in writing and optimizing efficient SQL queries with Python and Hive

handling Large Data Sets in Big-Data Environments

â¢ Experience in debugging, tunning and optimizing PySpark data pipelines

â¢ Should have implemented concepts and have good knowledge of Pyspark data frames, joins, caching, memory management, partitioning, parallelism etc.

â¢ Understanding of Spark UI, Event Timelines, DAG, Spark config parameters, in order to tune the

long running data pipelines.

â¢ Experience working in Agile implementations

â¢ Experience with building data pipelines in streaming and batch mode.

â¢ Experience with Git and CI/CD pipelines to deploy cloud applications

â¢ Good knowledge of designing Hive tables with partitioning for performance.

â¢ Minimum 2+ yrs. experience managing a team of 3+ members on Work prioritization, work

allocation and resource utilization.

Desired Qualification:

â¢ Experience in data modelling

â¢ Hands on creating workflows on any Scheduling Tool like Autosys, CA Workload Automation

â¢ Proficiency in using SDKs for interacting with native AWS services

â¢ Strong understanding of concepts of ETL, ELT and data modeling

Full time

Apply

Lead Data Engineer

Description:

Report this job