Data Engineer Business Intelligence

Location:

Atlanta, GA

Posted:

June 02, 2025

Contact this candidate

Resume:

Varshith Son Dubba

Data Engineer

More than *+ years of IT experience in Data Warehousing and Business Intelligence. Experience in analyzing requirements, designing, implementing, and unit testing various Data Warehousing projects.

+1-302-***-**** ************@*****.***

• Around 6+ Years of professional experience in full life cycle system development involving analysis, design development, testing, documentation, implementation

& maintenance of application software in Web-based and Client/Server environ- ment.

• Proficiency with Scala, Apache HBase, Hive, Pig, Sqoop, Zookeeper, Spark, Spark SQL, Spark Streaming, Kinesis, Airflow, Yarn, and Hadoop (HDFS, MapReduce).

• Expert in utilizing Spark SQL, Spark streaming and developing Data frames us- ing Spark with Snowflake

• Expertise in Azure infrastructure management (Azure Web Roles, Worker Roles, SQL Azure, Azure Storage, Azure AD Licenses, Office365).

• Worked on production support looking into logs, hot fixes and used Splunk for log monitoring along with AWS CloudWatch.

• Has experience with story grooming, Sprint planning, daily standups, and soft- ware techniques like Agile and SAFe. Good Working knowledge on Az- ure Data bricks clusters, notebooks, jobs and auto scaling.

• Proven Problem-solving abilities, including the capacity to identify, Break down, and address complex data challenges.

• Practical experience using KTables, Global KTables, and KStreams in Apache Kafka and Confluent environments to work with Kafka streaming.

• Experience working with Front end technologies like Html, CSS, JS, ReactJS.

• Practical knowledge in setting up and designing large-scale data lakes, pipelines, and effective ETL (extract/load/transform) procedures to collect, organize, and standardize data that can be used to Converted a current on-premises application to use Azure cloud databases and storage.

• Experience in migrating on premise to Windows Azure using Azure Site Recov- ery and Azure backups. Designed and implemented a Scalable data architecture on AWS using Kubernetes, Terraform, and Snowflake, enabling seamless data integration and processing across multiple data sources.

• Experience with Snowflake cloud data warehouse and AWS S3 bucket for inte- grating data from multiple source system which include loading nested JSON formatted data ino snowflake table. Practical knowledge of data analytics ser- vices like Quick Sight, Glue Data Catalog, and Athena.

• Experience in automating day-to-day activities by using Windows PowerShell.

• Deployed Dockers Engines in Virtualized Platforms for containerization of mul- tiple apps. Proficient in building CI/CD pipelines in Jenkins using pipeline syn- tax and groovy libraries.

• Experience developing sophisticated SQL queries, procedures, and triggers us- ing RDBMS like Oracle and MySQL.

Big Data Eco-system:

HDFS, MapReduce, Spark, Yarn,

Hive, Pig, HBase, Sqoop, Flume,

Kafka, Oozie, Zoo-Keeper, Im-

pala

Hadoop Technologies:

Apache Hadoop 2.x/1.x,

Cloudera CDH4/CDH5, Horton

works

Programming Languages: Win-

dows, Linux (Ubuntu, Centos)

NoSQL Database:

HBase, Cassandra and Mongo

Database:

RDBMS, MySQL, Teradata, DB2,

Oracle

Container/ Cluster Managers:

Docker, Kubernetes

BI Tool:

Tableau, PowerBI

Cloud Environment:

AWS (Amazon Web Services),

Azure

Web Development:

HTML, CSS, JavaScript

IDE Tools:

Eclipse, Jupyter, Anaconda, Py-

Charm

Development Methodologies:

Agile, Waterfall

EDUCATION

TECHNICAL SKILLS

PROFILE SUMMARY

Masters from Wilmington Uni-

versity, USA

WORK EXPERIENCE

Client: Barclays Bank Delaware, Wilmington, Delaware, USA (Aug 2023 – Present) Role: Azure Data Engineer

Description: Barclays Bank Delaware is a personal banking company. It offers credit cards, personal loans, and savings solutions to individuals. Played a pivotal role in designing, implementing, and optimizing data processing workflows and infrastructure for Barclays Bank Delaware's data operations. Responsibilities:

• Experience in using different types of stages like Transformer, Aggregator, Merge, Join, Lookup, and Sort, remove dupli- cated, Funnel, Filter, Pivot for developing jobs. Designed and deployed a Kubernetes-based containerized infrastructure for data processing and analytics, leading to a 20% increase in data processing capacity.

• Worked on SQL and PL/SQL for backend data transactions and validations

• Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

• Storing different configs in No SQL database Mongo DB and manipulating the configs using PyMongo.

• Using Azure Cluster services, Azure Data Factory V2 ingested a large amount and diversity of data from diverse source systems into Azure Data Lake Gen2. Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support.

• Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming and did data quality checks using Spark Streaming and arranged bad and passable flags on the data. Experience in creat- ing Kubernetes replication controllers, Clusters and label services to deployed Microservices in Docker.

• Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods.

• Managed relational database services in which the Azure SQL handles reliability, scaling, and maintenance. Inte- grated data storage solutions. Instantiated, created, and maintained CI/CD continuous integration & deployment pipe- lines and apply automation to environments and applications.

• Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop. Implemented AJAX, JSON, and Java script to create interactive web screens. Supported development of Web portals, completed Data- base Modelling in PostgreSQL, front end support in HTML/CSS, jQuery.

• Implemented Navigation rules for the application and page outcomes, written controllers using annotations.

• Created Data stage and ETL jobs for populating the data into Data Warehouse constantly from different source systems like ODS, flat files, Parquet. Created clusters to classify control and test groups.

• Responsible for estimating cluster size, monitoring, and troubleshooting the Spark Databricks cluster.

• Designed and implemented Infrastructure as code using Terraform, enabling automated provisioning and scaling of cloud resources on Azure. Used ML techniques for predictions using statistical libraries in python.

• Created Complex Stored Procedures, Slow Changing Dimension Type 2, Triggers, Functions, Tables, Views and other T SQL code and SQL joins to ensure efficient data retrieval.

• Worked on Angular JS to augment browser applications with MVC capability.

• Analyzed and developed a modern data solution with Azure PaaS service to enable data visualization. Understood the application's current Production state and the impact of new installation on existing business processes.

• Worked with Python ORM classes with SQL Alchemy to get update the data from database. Experienced in adjusting the performance of Spark applications for the proper batch interval time, parallelism level, and memory tuning.

• Worked on creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs. Environment: Databricks, Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure HDInsight, Blob storage, Apache Spark, Azure ADF V2, ADLS, Spark SQL, Python/Scala, Ansible Scripts, Kubernetes, Docker, Jenkins, Azure SQL DW(Synopsis), Azure SQL DB

Client: Incyte, Wilmington, Delaware, USA (Oct 2022 - Jul 2023) Role: AWS Data Engineer

Description: Incyte Corporation is an American multinational pharmaceutical company. I involved in implementing and adhering to data governance policies, ensuring data quality, security, and compliance with regulations. Responsibilities:

• The AWS Lambda functions were written in Spark with cross - functional dependencies that generated custom libraries for delivering the Lambda function in the cloud. Performed raw data ingestion into, which triggered a lambda function and put refined data into ADLS.

• Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application.

• Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical require- ments, resolve technical problems and suggest Big Data based analytical solutions.

• Analyzed the SQL scripts and designed it by using Spark SQL for faster performance.

• Worked on CI/CD tools like Jenkins, Docker in Devops Team for setting up application process from end-to-end using Deployment for lower environments and Delivery for higher environments by using approvals in between.

• Handled importing of data from various data sources, performed transformations using B, loaded data into HDFS and Extracted the data from SQL into HDFS using Sqoop.

• Automated AWS infrastructure to be deployed periodically using Docker images and CloudFormation templates.

• Working on data management disciplines including data integration, modeling and other areas directly relevant to busi- ness intelligence/business analytics development. Performed Unit, System Integration Testing.

• Expertise in creating and developing applications for an android operating system using Android Studio, Eclipse IDE, SQLite, Java, XML, Android SDK, and ADT plugin.

• Creating job flow using Airflow in python and automating the jobs. Airflow will have separate stack for developing DAGs on and will run jobs on EMR or EC2 Cluster.

• Designed and developed ETL pipelines for real-time data integration and transformation using Kubernetes and Docker.

• Zookeeper was utilized to manage synchronization, serialization, and coordination throughout the cluster after migrating from JMS Solace to Kinesis. Developed multiple notebooks using Pyspark and Spark SQL in Databricks for data extrac- tion, analyzing and transforming the data according to the business requirements.

• Utilized Elasticsearch and Kibana for indexing and visualizing the real-time analytics results, enabling stakeholders to gain actionable insights quickly. Designed and develop JAVA API (Commerce API) which provides functionality to con- nect to the Cassandra through Java services.

• Developed Triggers, stored procedures, functions, and packagers using cursors associated with the project using PL/SQL.

• Extracting and Analyzing data from various sources. Data wrangling and cleanup using Python-Pandas.

• Worked extensively with Python in optimization of the code for better performance.

• Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis Environment: Python, Pandas, NumPy, SSIS, PySpark, AWS, AWS S3, AWS DynamoDB, Amazon RDS, Amazon Redshift, AWS Glue, SQL Server, PostgreSQL, Tableau, Power BI, Apache Airflow. Client: (Accenture) Synchrony Financial, Mumbai, India (Aug 2019 - Jul 2022) Role: Application Developer/ Data Engineer

Description: Synchrony Financial is an American consumer financial services company. My role is to designing and man- aging Extract, Transform, Load (ETL) processes to prepare data for analysis. Transforming raw data into formats suitable for analysis and reporting.

Responsibilities:

• Wrote scripts to Import and Export data to CSV and EXCEL formats from different environments using Python and made a Celery action using REST API call. Integrated Azure Data Factory with Blob Storage to move data through DataBricks for processing and then to Azure Data Lake Storage and Azure SQL data warehouse.

• Implemented Azure data lake, Azure Data factory and Azure data bricks to move and conform the data from on - prem- ises to cloud to serve the analytical needs of the company.

• Assess the infrastructure needs for each application and deploy it on Azure platform.

• Used Cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query.

• Designing and implementing data integration solutions using Azure Data Factory to move data between various data sources, including on-premises and cloud-based systems. Written queries in MySQL and Native SQL.

• Analyzing the functional requirement documents from Business.

• Developed automated monitoring and alerting systems using Kubernetes and Docker, ensuring proactive identification and resolution of data pipeline issues. Created pipelines to load the data using ADF.

• Design and configure database, Back-end applications and programs. Managed large datasets using Pandas data frames and SQL. Used Cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query.

• Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.

• Involved in monitoring and scheduling the pipelines using Triggers in Azure DataFactory.

• Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS.

• Used Continuous Delivery Pipeline. Deployed microservices, including provisioning Azure environments and devel- oped modules using Python scripting and Shell Scripting. Environment: Hadoop, HDFS, Map Reduce, Hive, Pyspark, Flume, ETL, AWS, Oozie, Sqoop, Oracle, PIG, Eclipse, MySQL, Java

Client: Aditya Birla Fashion and Retail, Mumbai, India (Jun 2017 - Jul 2019) Role: Data Engineer

Description: Aditya Birla Fashion and Retail Limited (ABFRL) is an Indian fashion retail company. Played a key role in data acquisition, pipeline development, and analytics to meet diverse business needs at Aditya Birla Fashion and Retail. Responsibilities:

• Extensively involved in all phases of Data acquisition, data collection, data cleaning, model development, model valida- tion and visualization to deliver business needs of different teams. Developed data connectors to extract data from social media APIs and integrated them into the pipeline using Python and Apache Kafka Connect.

• Build Data pipelines using Python, Apache Airflow for ETL related jobs inserting data into Oracle.

• Implemented airflow for workflow automation and scheduling tasks and created DAGs tasks.

• Responsible for Building and Testing of applications. Experience in handling database issues and connections with SQL and NoSQL databases like MongoDB by installing and configuring various packages in python (Teradata, MySQL, MySQL connector, PyMongo and SQLAlchemy).

• Developed Spark applications using Scala and spark SQL for data extraction, transformation, and aggregation from mul- tiple file formats for analyzing and transforming the data uncover insight into the customer usage patterns and even.

• Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers. Designed GIT branching strategies, merging per the needs of release frequency by implementing GIT flow workflow on Bit bucket.

• Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Ac- tivity and creating UNIX shell scripts for database connectivity and executing queries in parallel job execution.

• Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects.

• Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS. Used Docker for managing the application environments. Environment: Azure Data Factory, Apache Airflow, Talend, Spark, Azure SQL Database, Snowflake, SQLServer, Hive, T- SQL, Spark, Kafka, Azure DataLake, Python, Scala, Tableau, Git.

Contact this candidate