CHETAN BEZAWADA
****************@*****.*** 412-***-****
Summary
Over 9+ Years of Big Data experience in building highly scalable data analytics applications. Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka. Good hands-on experience working with various distributions of Hadoop like Cloudera (CDH), Hortonworks (HDP) and Amazon EMR. Good understanding of Distributed Systems architecture and design principles behind Parallel Computing. Expertise in developing production ready Spark applications utilizing Spark-Core, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's. Experience working with Azure cloud and its services like Azure Cloud, Azure Data Factory (ADF), Azure Synapse Analytics, Snowflake, ETL/ELT, Azure Databricks. Experience working with Amazon Web Services (AWS) cloud and its services like EC2, S3, RDS, EMR, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Elastic Cache, Auto Scaling, Cloud Front, Cloud Watch, Data Pipeline, DMS, Aurora, ETL and other AWS Services. Good experience working with AWS Cloud services like S3, EMR, Redshift, Glue, Athena etc., Deep understanding of performance tuning, partitioning for building scalable data lakes. Worked on building real time data workflows using Kafka, Spark streaming and HBase. Extensive knowledge on NoSQL databases like HBase. Solid experience in working with csv, text, sequential parquet, orc, Json formats of data. Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts. Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data. Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc. Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
Skills
Scala
Pyspark
Python
SQL
Hadoop
HDFS
Hive
Sqoop
Spark
Kafka
Microsoft Tools
Oracle
MYSQL
HBase
Spark SQL
Spark streaming
Impala
Txt
XML
JSON
CSV
ORC
AWS
Azure
Microsoft Power BI
Experience
Priceline Dover, DE
Sr. Data Engineer
07/2023 - Current
Designed a Data Model using Draw.io, based on customer requirement
Deployed resource groups by using Azure Services and cloud shell in Azure
Created a Cosmo DB and loaded data from Azure Data Lake to CosmoDB by using Azure Data Factory(ADF)
Involved in usage of Azure Functions for reading the data from blob storage and containers and delivering the output to cosmoDB
Consolidating all customer interactions and data into a single, comprehensive profile, enabling more personalized and targeted marketing efforts
Expertise in utilizing Microsoft CDP tools, including data integration, management, and analysis and leveraging advanced features of Microsoft CDP to optimize customer data workflows and processes
Utilized Microsoft's analytics tools, including Power BI and Azure Synapse, to derive actionable insights from customer data and support data-driven decision-making for customer insights
Integrated Microsoft CDP with other enterprise systems and third-party applications to extend functionality and streamline business processes
Developed integration solution to bring AWS to Azure using Azure data factory (ADF) S3 connectors and loaded into Synapse dedicated pools and Snowflake
Built and deployed into prod Azure data infrastructure for various Azure cloud services using terraform as Infrastructure as a code (IaC)
Involved in creating Databricks Notebooks for loading .csv files from ADLS and translating to Delta tables
Created linked services to connect to Azure Blob storage and Azure synapse database tables and used these linked services while creating Datasets
Created the Managed tables, External tables, Temp views and Global Temp views using Spark SQL
Expertise in developing production ready Spark applications utilizing Spark-Core, Data frames, Spark-SQL, Spark-ML and Spark- Streaming API's
Reviewing technical specifications and design documents associated with the Data warehouse
Dedicated SQL pools-Workload Management-Workload groups- Concurrency Performance
Scaling up and scaling down the Dedicated SQL pools of the Lower environments and Prod environments during the non- Business hours
Used GIT HUB to move the code from one environment to another environment
Environment: Microsoft customer data platform(CDP), Azure Cloud, Azure Data Factory (ADF), Azure Synapse Analytics, Snowflake, ETL/ELT, Azure Databricks, Python, SQL, T-SQL, GraphQL Stored Procs, Azure Functions, Azure SQL, SQL Server, SSIS, SSRS, Mongo DB, Cosmos DB, AWS S3, Azure Storage, Blobs, Queues, Files, Azure Logic Apps
HGS Digital Chicago, IL
Data Engineer
09/2022 – 05/2023
Azure cloud solutions with strong experience in Azure Data bricks notebooks using Pyspark, Spark SQL, Azure Synapse, Azure Data Factory and Spark
Extensive years spent on Designing, Developing, Debugging, Administering, Monitoring and Tuning the Teradata and Netezza Databases
Experience in Azure Portal, Azure Blob Storage, Azure Data Factory, Azure Data Lake Gen2, Azure Delta Lake & Dedicated SQL pools
Experience in creating Data pipelines in Azure Data Factory and Scheduling the jobs using the Triggers in ADF
Extracting the raw data in the form of CSV, Json and Parquet files using Data-frames, Dataset Reader API & Dataset writer API
Transformed the data using Pyspark and Spark SQL notebooks in Azure Data bricks
Also converting notebooks code to use Delta Lake instead of Data Lake
Mounting the Azure Data Lake Storage(ADLS Gen2) to Azure Data Bricks to different Paths(Raw, Processed and Presentation Layers) that effectively and securely read the Data Using Azure Key Vault & Data bricks Secret scope
Design and Develop ETL solutions through mappings, workflows to collect data from multiple data sources based on mapping specifications and load into Teradata, Netezza and Azure cloud
Performing unit testing for the code to ensure that its functionality is matching with the requirement specification
Environment: Azure Cloud, Azure Data Factory (ADF), Azure Synapse Analytics, Snowflake, ETL/ELT, Azure Databricks, Python, SQL, T-SQL, Stored Procs, Azure Functions, Azure SQL, SQL Server, SSIS, SSRS, Mongo DB, Cosmos DB, AWS S3, Azure Storage, Blobs, Queues, Files, Azure Logic Apps
Couth Infotech Pvt. Ltd, India
Data Engineer/Hadoop Developer
12/2019 - 04/2022
Worked on building centralized Data lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena
Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services
Expertise in designing and implementing ETL pipelines using PySpark, and integrating with various data sources such as databases, APIs, and file systems
Worked extensively with AWS services like EC2, S3, VPC, App flow, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS
Developed Python scripts to parse XML, Json files and load the data in AWS Snowflake Data warehouse
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, Parquet/Text Files into AWS Redshift
Developed BI data lake POCs using AWS services including Athena, s3, EC2 and AWS Quick sight
Experience in developing and deploying a Microservice application utilizing almost of the AWS Stack including s3, EC2, DynamoDB
Applied required transformation using AWS Glue and loaded data back to Redshift and S3
Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud
Worked on a full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption
Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase
Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns
Excellent communication skills and ability to explain complex technical concepts to non-technical stakeholders
Experience in designing and developing Power BI dashboards, reports, and visualizations using different data sources
Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce a series of aggregated datasets for downstream analysis
Build a real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift
Environment: AWS S3, EMR, Redshift, Athena, Glue, Spark, Power BI, Scala, Python, Hive, Kafka
Maisa Solutions Private Limited Hyderabad, India
Data Engineer
05/2017 - 10/2019
Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements
Experience in ETL jobs and developing and managing data pipelines
Experience in optimizing PySpark jobs for performance and scalability, including tuning Spark configurations and leveraging Spark RDDs and Data Frames
Strong programming skills in Python, and experience in using Python libraries such as NumPy, Pandas, and Matplotlib for data analysis and visualization
Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables
Performed data transformations and analytics on a large dataset using Spark
Excellent problem-solving and analytical skills, and ability to work in a fast-paced and collaborative environment
Worked on building End to end data pipelines using spark and storing final data into Data Warehouse
Involved in designing Hive schemas, designing and developing normalized and denormalized data models
Tuning and optimizing the Spark jobs with partitioning/bucketing and memory management of driver and executor
Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark
Developed the Pyspark code for AWS Glue jobs and for EMR
Knowledge of SQL and experience in creating complex SQL queries to extract data from databases and other data sources for Tableau
Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory
Ability to collaborate with cross-functional teams to understand business requirements and design Tableau solutions that meet those needs
Developed Hive queries and Sqoop data from RDBMS to data lake staging area
Optimizing the Hive Queries using the various files formats like PARQUET, JSON, AVRO
Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process
Involved in the planning process of iterations under the Agile Scrum methodology
Environment: Spark, AWS S3, Python, Sqoop, Azure cloud, Azure synapse, ADF, Hive, Kafka, Hadoop, HDFS, Agile, UHC, AWS Redshift, AWS Quick sight Snowflake
Asistmi solutions Pvt. Ltd, India
SQL Developer(Intern)
06/2015 - 04/2017
Involved in interacting with the end-user (client) to gather business requirements and modelling the business requirements
Expertise in writing SQL query by using DDL, DML, Joins, Aggregations, Window functions
Created database objects like tables, Views, sequences, synonyms, Stored Procedures, functions and Triggers
Developed Database Triggers to enforce Data integrity and additional Referential Integrity
Created indexes on columns of tables to increase the performance of the queries
Developed and modified SQL code to make new enhancements or resolve problems as per the customer requirements
Used Test driven approach for developing the application and implemented the unit tests using Python Unit test framework
Complete data integrity
Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access
Performed API testing by utilizing POSTMAN tool for various request methods such as GET, POST, PUT, and DELETE on each URL to check responses and error handling
Analyzing tables and indexes for performance tuning
Knowledge of Normalization, Data Warehousing, and data transfer, documentation, preventive maintenance, code review, automation, store procedures and triggers
Coordinated with the front-end design team to provide the necessary stored procedures, packages necessary data and writing ad hoc complex queries as requested for users and management
Good knowledge of indexes and query optimization for large volume data Troubleshooting and problem-solving using python Script and other programming languages
Performed SQL database sharing and indexing procedures as required to handle heavy traffic loads
Performed database tuning, make complex SQL statements simple and easy to read, and make queries more efficient
Created Dashboards with interactive views, trends, and drill downs
Environment: MySQL, SQL server, Power BI, ETL-Tools, PostgreSQL, Python
Education and Training
Robert Morris University
Master's in Management Information systems
05/2023
Saveetha Engineering college Anna University
Bachelor's in Computer science
01/2017
Area Of Expertise
Scala, pyspark, python, SQL, Hadoop, HDFS, Hive, Sqoop, Spark, Kafka, Microsoft Tools, Oracle, MYSQL, HBase, HDFS, Spark SQL, Spark streaming, Impala, Txt, XML, JSON, CSV, ORC, AWS, Azure, Microsoft Power BI