Data Engineer Senior Developer

Location:

Roanoke, TX

Posted:

May 14, 2024

Contact this candidate

Resume:

Maninder Singh Nanda

********.*****@*****.***

817-***-****

Profile

Total 15 plus yrs. of experience in designing and developing data pipelines in Snowflake, Databricks, Dataiku, FiveTran, AWS, EMR, GLUE, Lambda, Cloudera, Hadoop environments and have worked in different roles as Tech Lead / Senior developer / Tech Manager in different projects in maintenance, production support and development.

Good experience in Bigdata technologies like Hive, Pig, Apache Spark, Apache Beam, Python, Java.

Worked with airflow, Tidal, Autosys and Databricks Workflows for Orchestration.

Good experience in creating ETL pipelines and have also worked with different file formats for ingesting data into Snowflake from S3 bucket like CSV, XLS, XML etc. Also created ETL pipelines using Snowpark and Snowpipes.

Currently I am working on creating ETL pipelines using Snowflake, Airflow, Python and Amazon S3 bucket. My role involves creating Airflow Dag’s and read flat files from S3 bucket, transform the data and stich the data with snowflake existing tables. These tables are then published and used in tableau for reporting.

Good Experience in Java/J2EE, Webservices and frontend technologies like Flex, ReactJS, frameworks like Struts, Spring, pure MVC for Flex, IBM MQ.

Defined, reviewed, and enhanced architecture and design of complex and business critical products and applications.

Good experience of working and leading projects in Agile methodology.

Good Experience in Medical and Healthcare, Insurance and Airlines domains.

VISA Status – Green Card

EXPERIENCE Summary

Languages and Tools and Libraries

No of Years / months

Python

8 yr

Java

10 yr

Snowflake

5 yr

Databricks

1 yr

Cloudera

5 yr

SQL

10 yr

AWS Cloud

8 yr

Apache Spark

8 yr

Airflow

1 yr

Dataiku

6 months

FiveTran

6 months

Git repository

8 yr

DBT

1 month

Project Experience: -

Turnberry Solutions (Client Pfizer) Oct 2023 – Till date

AWS Snowflake – Created new data pipelines using Snowflake, Airflow in AWS environment.

Location – Remote, TX

Role – Principal Data Engineer

Responsibilities: -

• Currently working on creating data pipelines for various brands for medication for Pfizer client.

• Created a generic framework to process data coming in various format and various sources and ingested and flattened the data so that it is accessible in Snowflake.

• Created Snowpark stored procedures using python language for reading and processing xls, csv files from S3 bucket and ingesting the data in snowflake tables.

• Created a framework for schema validation between producer files/tables and consumer tables.

• Used Airflow for Orchestration and scheduling. Created Dag’s and tasks for creating dependencies and flows. Used snowflake /SQLExecutors connectors for calling snowflake stored procedures from airflow.

• Stitched data from various tables to generate consolidated reports, the data is stored in fact tables to be used by reporting tools

• Used Dataiku for creating ETL pipelines for reading the data from sharepoint and saving in Snowflake.

• Created Python recipes in Dataiku for processing Excel and CSV files for loading from Sharepoint to Snowflake.

• Participating in migrating the Dataiku python ETL’s to Snowflake and Airflow.

• Worked on POC for FiveTran.

• Helped team to deploy Airflow on local system using docker in windows platform so that they can test the Dags and run airflow locally.

Environment: Snowflake, AWS S3 storage, Airflow, Python, git, Dataiku, FiveTran.

Capgemini (Client The Hartford) Sept 2022 – Aug 2023

AWS Snowflake–Created new pipelines using Python, Snowflake and EMR in AWS environment.

Location – Remote, TX

Role – Principal Data Engineer

Responsibilities: -

• Worked for The Hartford client on various use cases related to Auto and Property insurance.

• Migrating Java and Python applications to AWS EMR cluster

• Creating new pipelines in Python and Snowflake for Billing bot and SMS cancellation.

• Created a framework to process call center data coming in zip files and ingest and flatten the data which is accessible in structured way in tables in snowflake database.

• Working on new use cases to ingest and process the data for reporting call volume.

• Created Snowpark Stored Procedures using python language for ingesting data from external stages.

• Created pipelines for finding out the effectiveness of bots.

• Used Liquibase to create and update the DDL.

• Used Autosys for orchestration.

• Use git for version control.

Environment: AWS Cloud, EMR, Snowflake, S3 storage, Python, Java, Autosys, Liquibase, Jenkins and Udeploy for CICD pipelines,git.

Christus Health Jan 2020 – Aug 2022

AWS Snowflake – Worked on various use cases for loading data to Snowflake.

Location – Irving, TX

Role – Principal Data Engineer

Responsibilities: -

• Created a data pipeline in Snowflake using python scripts for loading csv and json files.

• Creating new pipelines in Python and Snowflake for patient data using Snowflake, Snowpipes, Snowpark and python.

• Created a framework for schema validation between producer files and consumer tables.

• Used snowflake snowpark for stitching the data between different tables for reporting and data analysis.

• Used AWS Glue and Lambda to load the json files to Snowflake database. Lambda is used to determine the file types and move the files to correct folders and AWS Glue is used for reading, transforming and loading the data to snowflake.

• Used AWS Step functions for orchestration.

• Ingested streaming data from Kafka into Snowflake. Used Snowflake Sink connector and configured it to automate the process of creating Table, Snow Stages and SnowPipes for ingesting the json files into the tables in Snowflake.

•Also created Snowpipes to load the data directly from S3 bucket

Environment: Apache Spark, Snowflake, Python, AWS Glue, Lambda, Step functions, Cloud formation, Kafka.

Christus Health March 2018 – Jan 2020

AWS Databricks – Designed and developed ETL pipelines in Databricks.

Location – Irving, TX

Role – Principal Data Engineer

Responsibilities: -

• Created ETL pipelines in Databricks for various use cases.

• Used the Delta table library for python for creating delta tables.

• Created DLT pipelines in Databricks.

• Ingested data from CSV files and created tables with automatic schema evolution.

• Used Autoloader to process new files from cloud storage for incremental load.

• Used Databricks Workflow for Orchestration.

Environment: Apache Spark, AWS, Databricks, Python.

Niit Technologies (Client Sabre Holdings) June 2012 – March 2018

Data Pipeline project – Designed and developed the Data Pipeline framework for ETL.

Role – Principal Data Engineer

Responsibilities: -

• Designed and created a generic library for data pipeline using Apache spark and python in Big Data Cloudera environment.

• Design and created a reusable and generic pyspark app which can read and transform the data from any source and save it to HDFS file system. Some of the features are listed below:

1) The framework is generic and can read from any source.

2) Supports multiple file formats like csv, txt, xml, Json, xls etc.

3) Handling different types of transformations such as dedupe, arithmetic operations, adding new columns, lookup and filtering data etc. in a generic way.

4) Saves the data as parquet om Hadoop file system.

5) The data saved can be accessed from hive by creating external tables using Hadoop parquet file location.

6) It can perform operations like appending, merging or truncate and save.

7) Most cases can be handled just by creating a configuration file without any coding.

8) Capable of filtering the files to be loaded on case by case basis.

9) Notifications are sent if the file is not loaded in case of failure.

10) Logging for exceptions and failure.

11) Capable of retrieving and processing data using hive or impala or from any SQL database.

12) Created Streaming application using java and flume. The Avro files are created by flume which are processed by Java consumer application and saved as parquet file.

13) Ingested streaming data from Flume.

Environment: Service Now, Java JDK 1.8, Apache Spark, Apache Hadoop (Cloudera), HDFS, Pig, Hive, Shell Scripting, MySql, LINUX, flume, IBM MQ Java library,Tidal, Flume.

Niit Technologies (Client Sabre Holdings) April 2007 – June 2012

Location – South Lake, TX

Role – Principal Developer

I have worked in various projects from April 2007 till 2014 the details for each project are given below.

MySabre – is an airline Internet Booking Engine for Agencies. It allows you to check the flight availability, prices and finally confirms the itinerary by creating a PNR. It also provides with many more features like the Graphical Seat Map which is used for selecting and booking seats online, Graphical View for connecting to multiple hosts etc.

AirGUI – is a next generation version for MySabre with similar plus more airline search features and is developed in FLEX using the pure Flex MVC and Spring framework.

Self Service Checkin and Admin Tool application – Admin Tool is a configuration tool. The three applications Web Check-In, Mobile Check-in and Kiosk can be administered and configured through Self Service Admin Tool. Each Airlines has a separate profile that defines where a Kiosk device is located, its features available to the user and how it functions (i.e., default language, timeouts, printing of bag tags, etc.). Similarly, each Airline will have a separate profile for Web check-in and Mobile check-in applications. Based on the role of the Admin Tool users the profiles can be added, modified or deleted. The tool is developed using Flex pure MVC framework and remote connectivity between client and service side is handled using BlazeDS.

Self Service Checkin – is a checkin application for airline passenger. A passenger can checkin via Web application or can checkin at Airport via Kiosk application. Both applications are configurable. The applications are administered and configured through Self Service Admin Tool. Broadly the applications provide options for selecting seat, baggage checkin and generates boarding pass. As the passenger is checked in through these applications and is issued a boarding pass, they can straight away go for boarding at the boarding gate.

Environment: Flex builder 3.0 & SDK 3.0, JDK 1.5.2, JAXB 2.0, Eclipse 3.2, Oracle9i, Spring framework, PVCS tracker, Clear Case, subversion 2.0, Junit, Flex Unit, Code review tool findbug plugin 1.3.3, jetty 6.1, JBoss 4.2, Maven 2.0,REST API.

From Jan 1998 till April 2007 I have worked with NIIT Technologies for various clients in Finance and Airlines Domain.

Clients: - Virgin America, Sabre Holdings, Toyota, System and Software, Home Services, Tokyo Stock Exchange.

Environment: JDK 1.4.1, JAAS, XML, XSL, JSP, SOAP Web Services, Rest API, Applet, JDBC, HTML, JavaScript, TOMCAT, Oracle9i, Intellij3.0, Windows 2000 Professional, PVCS Tracker, Clear Case.

EDUCATION :

Degree / Diploma completed

College / University

Masters ( I T )

Karnataka University (INDIA)

Bachelor of Commerce form Delhi University

Delhi University (INDIA)

3.5 yrs Diploma in Software Development

NIIT (INDIA)

Contact this candidate