Chetan Bezawada
Sr. Data Engineer
Email: ****************@*****.***
PH: 412-***-****
https://www.linkedin.com/in/chetan-b-5ba90a21/
Professional Summary:
Over 8+ Years of Big Data experience in building highly scalable data analytics applications. Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka.
Good hands-on experience working with various distributions of Hadoop like Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.
Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.
Experience working with Amazon Web Services (AWS) cloud and its services like EC2, S3, RDS, EMR, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Elastic Cache, Auto Scaling, Cloud Front, Cloud Watch, Data Pipeline, DMS, Aurora, ETL and other AWS Services.
Hands on experience in Big Query, GCS Bucket, G-Cloud Function, Cloud dataflow and Data fusion, Pub/Sub cloud shell, Composer (Air Flow as a service).
Hands on Experience with AWS Snowflake cloud data warehouse and AWS S3 bucket for integrating data from
Worked extensively on Hive for building complex data analytical applications.
Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple Hadoop Input & output formats.
Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems.
Good experience working with AWS Cloud services like S3, EMR, Redshift, Glue, Athena etc.,
Deep understanding of performance tuning, partitioning for building scalable data lakes.
Worked on building real time data workflows using Kafka, Spark streaming and HBase.
Extensive knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
Experience in connecting various Hadoop sources like Hive, Impala, Phoenix to Tableau for reporting.
Good knowledge in the core concepts of programming such as algorithms, data structures, collections.
Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
Experience in working with big data technologies such as Hadoop, Spark, and NoSQL databases and integrating them with Tableau for visualization purposes.
Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
Worked on a fast paced Agile based work environment as well as a traditional waterfall model based development environment.
Passionate about working and gaining more expertise on a variety of cutting-edge Big Data technologies.
Ability to adapt quickly to evolving technology, strong sense of responsibility and accomplishment.
Eager to update my knowledge base constantly and learn new skills according to business needs.
TECHNICAL SKILLS:
●Programming languages: Python, PySpark, Shell Scripting, SQL, PL/SQL and UNIX Bash, Windows PowerShell
●Big Data: Hadoop, Sqoop, Apache Spark, NiFi, Kafka, Snowflake, Cloudera, Horton Works, PySpark, Scala,Spark, Spark SQL, HDFS, MapReduce, Hive
●Data Modeling Tools: Erwin Data Modeler, ER Studio v17
●Operating Systems: UNIX, LINUX, Solaris, Mainframes
●Data bases: Oracle, SQL Server, My SQL, DB2, Sybase, Netezza, Hive, Impala
●Cloud Technologies: AWS, AZURE, GCP
●IDE Tools: Aginitiy for Hadoop, PyCharm, Toad, SQL Developer, SQL *Plus, Sublime Text, VI Editor
●OLAP Tools: Tableau, SSAS, Business Objects, and Crystal Reports 9
●ETL/Data warehouse Tools: Informatica 9.6/9.1, and Tableau, Informatica Power Center 9.x/8.x/7.x, Informatica Cloud, Talend Open studio & Integration suite
●Scheduling Tools: ESP Job Scheduler, Autosys, Windows scheduler
●Others: AutoSys, Crontab, ArcGIS, Clarity, Informatica, Business Objects, IBM MQ, Splunk
Professional Experience:
Client: Priceline – Dover, DE Aug 2021 – till now
Sr. Data Engineer
Responsibilities:
Worked on building centralized Data lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.
Strong experience in PySpark, including data processing, data manipulation, and data transformation using PySpark APIs.
Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.
Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams.
Proficiency in working with big data technologies such as Hadoop Distributed File System (HDFS), Apache Hive, and Apache HBase.
Expertise in designing and implementing ETL pipelines using PySpark, and integrating with various data sources such as databases, APIs, and file systems.
Worked extensively with AWS services like EC2, S3, VPC, App flow, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS.
Developed Python scripts to parse XML, Json files and load the data in AWS Snowflake Data warehouse.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, Parquet/Text Files into AWS Redshift.
Developed BI data lake POCs using AWS services including athena, s3, EC2 and AWS Quicksight.
Experience in developing and deploying a Microservice application utilizing almost of the AWS Stack including s3, EC2, DynamoDB.
Data modeling: Data Bricks allows data engineers to model data using various techniques such as schema design, data normalization, and dimensional modeling. Data engineers should have a good understanding of these concepts and should be able to apply them using Data Bricks.
Applied required transformation using AWS Glue and loaded data back to Redshift and S3.
Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.
Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.
Worked on a full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.
Knowledge of data warehousing concepts and experience in designing and developing ETL processes for data integration.
Strong understanding of data governance and data management best practices, including data quality, data security, and data privacy.
Excellent communication skills and ability to explain complex technical concepts to non-technical stakeholders.
Experience in designing and developing Tableau dashboards, reports, and visualizations using different data sources.
Expertise in optimizing data sources for Tableau, including data cleaning, data transformation, and data integration from multiple sources.
Worked on automating the Infrastructure setup, launching and termination EMR clusters etc.,
Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce a series of aggregated datasets for downstream analysis.
Build a real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.
Worked on creating Kafka producers using Kafka Java Producer Api for connecting to external Rest live stream applications and producing messages to Kafka topics.
Environment: AWS S3, EMR, Redshift, Athena, Glue, Spark, Tableau, Scala, Python, Java, Hive, Kafka, Data bricks.
Client: AT&T Dallas, TX Mar 2019 – Jul 2021
Data Analytics Engineer
Responsibilities:
Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements.
Experience in ETL jobs and developing and managing data pipelines.
Experience in optimizing PySpark jobs for performance and scalability, including tuning Spark configurations and leveraging Spark RDDs and DataFrames.
Familiarity with cloud-based data platforms such as Amazon Web Services (AWS) and Microsoft Azure, and experience in deploying PySpark jobs on these platforms.
Strong programming skills in Python, and experience in using Python libraries such as NumPy, Pandas, and Matplotlib for data analysis and visualization.
Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables.
Performed data transformations and analytics on a large dataset using Spark.
Excellent problem-solving and analytical skills, and ability to work in a fast-paced and collaborative environment.
Worked on building End to end data pipelines using spark and storing final data into Data Warehouse.
Involved in designing Hive schemas, designing and developing normalized and denormalized data models.
Tuning and optimizing the Spark jobs with partitioning/bucketing and memory management of driver and executor.
Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
Developed the Pyspark code for AWS Glue jobs and for EMR.
Knowledge of SQL and experience in creating complex SQL queries to extract data from databases and other data sources for Tableau.
Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
Ability to collaborate with cross-functional teams to understand business requirements and design Tableau solutions that meet those needs.
Developed Hive queries and Sqoop data from RDBMS to data lake staging area.
Optimizing the Hive Queries using the various files formats like PARQUET, JSON, AVRO.
Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
Involved in the planning process of iterations under the Agile Scrum methodology.
Experience with Partitions, bucketing concepts in Hive and designed both using Managed and External tables in Hive.
Working closely with the Data science team and understanding the requirement clearly and creating a hive table on HDFS.
Automated end to end data processing pipelines and scheduled various data workflows.
Scheduling Spark jobs using Oozie workflow in Hadoop Cluster.
Actively participating in the code reviews, meetings and troubleshoot technical issues and solving them.
Environment: Spark, AWS S3, Python, Sqoop, Hive, Kafka,Tableau, Hadoop, HDFS, Agile, Unix UHC, AWS Redshift, AWS Quicksight Snowflake.
Client: HGS Digital, Chicago, IL Jul 2017 – Feb 2019
Hadoop Developer
Responsibilities:
Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase.
Experience in performance tuning Tableau dashboards and reports for faster response time and improved user experience.
Familiarity with Tableau Server and Tableau Online, including administration, deployment, and troubleshooting.
Worked in Agile methodology and actively participated in standup calls, PI planning.
Involved in Requirement gathering and prepared the Design documents.
Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Experience in implementing ML Algorithms using distributed paradigms of Spark/Flink, in production, on Azure Databricks/AWS Sagemaker.
Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.
Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
Implementation and delivery of MSBI platform solutions to develop and deploy ETL, AWS, Azure, Azure analytical, data analytics, reporting and scorecard / dashboards on SQL Server using SSIS, SSRS.
Developed Hive queries and Sqoop data from RDBMS to Hadoop staging area.
Handled importing of data from various data sources, performed transformations using Hive, and loaded data into data lake.
Experience with AWS NoSQL databases such as Amazon DynamoDB or Amazon DocumentDB
Proficiency in designing, deploying, and managing NoSQL databases on AWS
Ability to optimize NoSQL database performance on AWS by leveraging caching, indexing, and other techniques
Experience with AWS security and compliance requirements for NoSQL databases, including encryption and access controls
Strong understanding of NoSQL data modeling principles and best practices on AWS
Experienced in handling large datasets using Partitions, Spark in Memory capabilities.
Processed data stored in data lake and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
Developed dataflows and processes for the Data processing using SQL (SparkSQL& Dataframes).
Designed and developed Map Reduce (hive) programs to analyze & evaluate multiple solutions, considering multiple cost factors across the business as well operational impact.
Involved in the planning process of iterations under the Agile Scrum methodology.
Working on Hive Metastore backup, Partitioning, and bucketing techniques in hive to improve the performance.
Scheduling Spark jobs using Oozie workflow in Hadoop Cluster and Generated detailed design documentation for the source-to-target transformations.
Environment: Spark, Python, Sqoop, Hive, Hadoop, SQL,Tableau, HBase, MapReduce, HDFS, Oozie, Agile, AWS Redshift, AWS S3.
Maisa Solutions Private Limited Hyderabad, India May 2015 to Oct 2016
SQL Developer
Responsibilities:
Involved in interacting with the end-user (client) to gather business requirements and modeling the business requirements.
Expertise in writing SQL query by using DDL, DML, Joins, Aggregations, Window functions.
Created database objects like tables, Views, sequences, synonyms, Stored Procedures, functions and Triggers.
Developed Database Triggers to enforce Data integrity and additional Referential Integrity.
Created indexes on columns of tables to increase the performance of the queries.
Developed and modified SQL code to make new enhancements or resolve problems as per the customer requirements.
Analyzing tables and indexes for performance tuning.
Knowledge of Normalization, Data Warehousing, and data transfer, documentation, preventive maintenance, code review, automation, store procedures and triggers.
Coordinated with the front-end design team to provide the necessary stored procedures, packages necessary data and writing adhoc complex queries as requested for users and management.
Good knowledge of indexes and query optimization for large volume data Troubleshooting and problem-solving using python Script and other programming languages.
Perform SQL database sharing and indexing procedures as required to handle heavy traffic loads
Performed database tuning, make complex SQL statements simple and easy to read, and make queries more efficient.
Created Dashboards with interactive views, trends, and drill downs.
Environment: MySQL, SQL server, Power BI, ETL-Tools, PostgreSQL
Couth Infotech Pvt. Ltd, Hyderabad, India June 2014 to April 2015
Python Developer
Responsibilities:
Used Test driven approach for developing the application and implemented the unit tests using Python Unit test framework.
Migrated successfully the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access.
Performed API testing by utilizing POSTMAN tool for various request methods such as GET, POST, PUT, and DELETE on each URL to check responses and error handling.
Created Python and Bash tools to increase efficiency of retail management application system and operations, data conversion scripts, AMQP/Rabbit M.
Q, REST, JSON, and CRUD scripts for API Integration.
Performed debugging and troubleshooting the web applications using Git as a version-controlling tool to collaborate and coordinate with the team members.
Developed and executed various MySQL database queries from python using python -MySQL connector and MySQL database package.
Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using SQLAlchemy and PostgreSQL.
Created a web application using Python scripting for data processing, MySQL for the database, and HTML CSS, jQuery and High Charts for data visualization of the served pages.
Generated property list for every application dynamically using Python modules like math, glob, random, itertools, functools, NumPy, matplotlib, seaborn and pandas.
Added the navigations and paginations and filtering columns and adding and removing the desired columns for view utilizing Python based GUI components.
Environment: python, MySQL, html, PostgreSQL, POSTMAN, jQuery, CSS, AMQP/RabbitMQ, REST, JSON, and CRUD scripts.
EDUCATION:
Master’s in Management Information systems - Robert Morris University - 2018
Bachelor’s Saveetha Engineering college, Anna University - 2014