Data Engineer Azure Cloud

Location:

North Fayette Township, PA, 15071

Posted:

May 09, 2025

Contact this candidate

Resume:

CHETAN BEZAWADA

****************@*****.*** 412-***-****

Summary

Over 9+ Years of Big Data experience in building highly scalable data analytics applications. Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka. Good hands-on experience working with various distributions of Hadoop like Cloudera (CDH), Hortonworks (HDP) and Amazon EMR. Good understanding of Distributed Systems architecture and design principles behind Parallel Computing. Expertise in developing production ready Spark applications utilizing Spark-Core, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's. Experience working with Azure cloud and its services like Azure Cloud, Azure Data Factory (ADF), Azure Synapse Analytics, Snowflake, ETL/ELT, Azure Databricks. Experience working with Amazon Web Services (AWS) cloud and its services like EC2, S3, RDS, EMR, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Elastic Cache, Auto Scaling, Cloud Front, Cloud Watch, Data Pipeline, DMS, Aurora, ETL and other AWS Services. Good experience working with AWS Cloud services like S3, EMR, Redshift, Glue, Athena etc., Deep understanding of performance tuning, partitioning for building scalable data lakes. Worked on building real time data workflows using Kafka, Spark streaming and HBase. Extensive knowledge on NoSQL databases like HBase. Solid experience in working with csv, text, sequential parquet, orc, Json formats of data. Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts. Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data. Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc. Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

Skills

Scala

Pyspark

Python

SQL

Hadoop

HDFS

Hive

Sqoop

Spark

Kafka

Microsoft Tools

Oracle

MYSQL

HBase

Spark SQL

Spark streaming

Impala

Txt

XML

JSON

CSV

ORC

AWS

Azure

Microsoft Power BI

Experience

Priceline Dover, DE

Sr. Data Engineer

07/2023 - Current

Designed a Data Model using Draw.io, based on customer requirement

Deployed resource groups by using Azure Services and cloud shell in Azure

Created a Cosmo DB and loaded data from Azure Data Lake to CosmoDB by using Azure Data Factory(ADF)

Involved in usage of Azure Functions for reading the data from blob storage and containers and delivering the output to cosmoDB

Consolidating all customer interactions and data into a single, comprehensive profile, enabling more personalized and targeted marketing efforts

Expertise in utilizing Microsoft CDP tools, including data integration, management, and analysis and leveraging advanced features of Microsoft CDP to optimize customer data workflows and processes

Utilized Microsoft's analytics tools, including Power BI and Azure Synapse, to derive actionable insights from customer data and support data-driven decision-making for customer insights

Integrated Microsoft CDP with other enterprise systems and third-party applications to extend functionality and streamline business processes

Developed integration solution to bring AWS to Azure using Azure data factory (ADF) S3 connectors and loaded into Synapse dedicated pools and Snowflake

Built and deployed into prod Azure data infrastructure for various Azure cloud services using terraform as Infrastructure as a code (IaC)

Involved in creating Databricks Notebooks for loading .csv files from ADLS and translating to Delta tables

Created linked services to connect to Azure Blob storage and Azure synapse database tables and used these linked services while creating Datasets

Created the Managed tables, External tables, Temp views and Global Temp views using Spark SQL

Expertise in developing production ready Spark applications utilizing Spark-Core, Data frames, Spark-SQL, Spark-ML and Spark- Streaming API's

Reviewing technical specifications and design documents associated with the Data warehouse

Dedicated SQL pools-Workload Management-Workload groups- Concurrency Performance

Scaling up and scaling down the Dedicated SQL pools of the Lower environments and Prod environments during the non- Business hours

Used GIT HUB to move the code from one environment to another environment

Environment: Microsoft customer data platform(CDP), Azure Cloud, Azure Data Factory (ADF), Azure Synapse Analytics, Snowflake, ETL/ELT, Azure Databricks, Python, SQL, T-SQL, GraphQL Stored Procs, Azure Functions, Azure SQL, SQL Server, SSIS, SSRS, Mongo DB, Cosmos DB, AWS S3, Azure Storage, Blobs, Queues, Files, Azure Logic Apps

HGS Digital Chicago, IL

Data Engineer

09/2022 – 05/2023

Azure cloud solutions with strong experience in Azure Data bricks notebooks using Pyspark, Spark SQL, Azure Synapse, Azure Data Factory and Spark

Extensive years spent on Designing, Developing, Debugging, Administering, Monitoring and Tuning the Teradata and Netezza Databases

Experience in Azure Portal, Azure Blob Storage, Azure Data Factory, Azure Data Lake Gen2, Azure Delta Lake & Dedicated SQL pools

Experience in creating Data pipelines in Azure Data Factory and Scheduling the jobs using the Triggers in ADF

Extracting the raw data in the form of CSV, Json and Parquet files using Data-frames, Dataset Reader API & Dataset writer API

Transformed the data using Pyspark and Spark SQL notebooks in Azure Data bricks

Also converting notebooks code to use Delta Lake instead of Data Lake

Mounting the Azure Data Lake Storage(ADLS Gen2) to Azure Data Bricks to different Paths(Raw, Processed and Presentation Layers) that effectively and securely read the Data Using Azure Key Vault & Data bricks Secret scope

Design and Develop ETL solutions through mappings, workflows to collect data from multiple data sources based on mapping specifications and load into Teradata, Netezza and Azure cloud

Performing unit testing for the code to ensure that its functionality is matching with the requirement specification

Environment: Azure Cloud, Azure Data Factory (ADF), Azure Synapse Analytics, Snowflake, ETL/ELT, Azure Databricks, Python, SQL, T-SQL, Stored Procs, Azure Functions, Azure SQL, SQL Server, SSIS, SSRS, Mongo DB, Cosmos DB, AWS S3, Azure Storage, Blobs, Queues, Files, Azure Logic Apps

Couth Infotech Pvt. Ltd, India

Data Engineer/Hadoop Developer

12/2019 - 04/2022

Worked on building centralized Data lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena

Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services

Expertise in designing and implementing ETL pipelines using PySpark, and integrating with various data sources such as databases, APIs, and file systems

Worked extensively with AWS services like EC2, S3, VPC, App flow, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS

Developed Python scripts to parse XML, Json files and load the data in AWS Snowflake Data warehouse

Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, Parquet/Text Files into AWS Redshift

Developed BI data lake POCs using AWS services including Athena, s3, EC2 and AWS Quick sight

Experience in developing and deploying a Microservice application utilizing almost of the AWS Stack including s3, EC2, DynamoDB

Applied required transformation using AWS Glue and loaded data back to Redshift and S3

Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud

Worked on a full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption

Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase

Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries

Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns

Excellent communication skills and ability to explain complex technical concepts to non-technical stakeholders

Experience in designing and developing Power BI dashboards, reports, and visualizations using different data sources

Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce a series of aggregated datasets for downstream analysis

Build a real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift

Environment: AWS S3, EMR, Redshift, Athena, Glue, Spark, Power BI, Scala, Python, Hive, Kafka

Maisa Solutions Private Limited Hyderabad, India

Data Engineer

05/2017 - 10/2019

Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements

Experience in ETL jobs and developing and managing data pipelines

Experience in optimizing PySpark jobs for performance and scalability, including tuning Spark configurations and leveraging Spark RDDs and Data Frames

Strong programming skills in Python, and experience in using Python libraries such as NumPy, Pandas, and Matplotlib for data analysis and visualization

Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables

Performed data transformations and analytics on a large dataset using Spark

Excellent problem-solving and analytical skills, and ability to work in a fast-paced and collaborative environment

Worked on building End to end data pipelines using spark and storing final data into Data Warehouse

Involved in designing Hive schemas, designing and developing normalized and denormalized data models

Tuning and optimizing the Spark jobs with partitioning/bucketing and memory management of driver and executor

Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark

Developed the Pyspark code for AWS Glue jobs and for EMR

Knowledge of SQL and experience in creating complex SQL queries to extract data from databases and other data sources for Tableau

Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory

Ability to collaborate with cross-functional teams to understand business requirements and design Tableau solutions that meet those needs

Developed Hive queries and Sqoop data from RDBMS to data lake staging area

Optimizing the Hive Queries using the various files formats like PARQUET, JSON, AVRO

Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process

Involved in the planning process of iterations under the Agile Scrum methodology

Environment: Spark, AWS S3, Python, Sqoop, Azure cloud, Azure synapse, ADF, Hive, Kafka, Hadoop, HDFS, Agile, UHC, AWS Redshift, AWS Quick sight Snowflake

Asistmi solutions Pvt. Ltd, India

SQL Developer(Intern)

06/2015 - 04/2017

Involved in interacting with the end-user (client) to gather business requirements and modelling the business requirements

Expertise in writing SQL query by using DDL, DML, Joins, Aggregations, Window functions

Created database objects like tables, Views, sequences, synonyms, Stored Procedures, functions and Triggers

Developed Database Triggers to enforce Data integrity and additional Referential Integrity

Created indexes on columns of tables to increase the performance of the queries

Developed and modified SQL code to make new enhancements or resolve problems as per the customer requirements

Used Test driven approach for developing the application and implemented the unit tests using Python Unit test framework

Complete data integrity

Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access

Performed API testing by utilizing POSTMAN tool for various request methods such as GET, POST, PUT, and DELETE on each URL to check responses and error handling

Analyzing tables and indexes for performance tuning

Knowledge of Normalization, Data Warehousing, and data transfer, documentation, preventive maintenance, code review, automation, store procedures and triggers

Coordinated with the front-end design team to provide the necessary stored procedures, packages necessary data and writing ad hoc complex queries as requested for users and management

Good knowledge of indexes and query optimization for large volume data Troubleshooting and problem-solving using python Script and other programming languages

Performed SQL database sharing and indexing procedures as required to handle heavy traffic loads

Performed database tuning, make complex SQL statements simple and easy to read, and make queries more efficient

Created Dashboards with interactive views, trends, and drill downs

Environment: MySQL, SQL server, Power BI, ETL-Tools, PostgreSQL, Python

Education and Training

Robert Morris University

Master's in Management Information systems

05/2023

Saveetha Engineering college Anna University

Bachelor's in Computer science

01/2017

Area Of Expertise

Scala, pyspark, python, SQL, Hadoop, HDFS, Hive, Sqoop, Spark, Kafka, Microsoft Tools, Oracle, MYSQL, HBase, HDFS, Spark SQL, Spark streaming, Impala, Txt, XML, JSON, CSV, ORC, AWS, Azure, Microsoft Power BI

Contact this candidate