Sr. Data Engineer

Location:

Deerfield, IL

Posted:

April 15, 2025

Contact this candidate

Resume:

DHEERAJ REDDY GONNURU

Sr. Data Engineer / AWS / Azure

Phone: +1-464-***-****

E-Mail: ************@*****.***

PROFESSIONAL SUMMARY:

Around 10+ years of professional experience in developing and maintaining web and client/server applications using Microsoft Technologies.

Hands-on experience in Azure Cloud Services (PaaS & IaaS), Storage, Web Apps, Active Directory, USQLS, Application Insights, and Logic Apps.

Experience of working with Azure Monitoring, Data Factory, Traffic Manager, Service Bus, Key Vault.

Experienced in Azure Data Factory and preparing CI/CD scripts for the deployment and development experience on cloud platforms (Azure).

Solid experience on building ETL ingestion flows using Azure Data Factory.

Experience in building Azure Stream Analytics ingestion spec for data ingestion which helps users to get sub second results in Realtime.

Experience in building ETL (Azure Data Bricks) data pipelines leveraging PySpark and Spark SQL.

Experience in building the Orchestration on Azure Data Factory for scheduling purposes.

Hands-on experience in Azure Analytics Services – Azure Data Lake Store (ADLS), Azure Data Lake Analytics (ADLA), Azure SQL DW, Azure Data Factory (ADF), Azure Data Bricks (ADB) etc.

Experience working with Azure Logic APP Integration tool.

Experience in implementation of Azure log analytics providing Platform as a service for SD-WAN firewall logs.

Experience in building the data pipeline by leveraging the Azure Data Factory.

Expertise on working with databases like Azure SQL DB, Azure and SQL DW.

Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup, For Each, Wait, Execute Pipeline, Set Variable, Filter, until, etc.

Have knowledge and experience on Basic Admin activities related to ADF like providing access to ADLs using service principle, install IR, created services like ADLS, logic apps etc.

Good experience on polybase external tables in SQL DW.

Experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Hbase, Zookeeper, Oozie and Flume.

Sound Experience with AWS services like Amazon EC2, S3, EMR, Amazon RDS, VPC, Amazon Elastic Load Balancing, IAM, Auto Scaling, CloudWatch, SNS, SES, SQS, and Lambda to trigger resources.

Capable in using Amazon S3 to support data transfer over SSL and the data gets encrypted automatically once it is uploaded. Skilled in using Amazon Redshift to perform large scale database migrations.

Ingested data into Snowflake Cloud Data Warehouse using Snowpipe. Extensive experience in working with micro batching to ingest millions of files on Snowflake cloud when files arrive to staging area.

Expertise in setting up processes for Hadoop based application design and implementation.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database and vice-versa.

Experience in managing and reviewing Hadoop log files.

Experienced in processing big data on the Apache Hadoop framework using MapReduce programs.

Excellent understanding and knowledge of NOSQL databases like HBase and Mongo DB.

Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

Proficiency in SQL across several dialects (MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).

Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files.

TECHNICAL SKILLS:

Domain

Technologies

Big Data Technologies and Tools

Apache Hadoop, Map R, Cloudera (CDH4/CDH5), Horton Work’s distribution (v2.1/v2.2), HDFS, MapReduce, Hadoop2/YARN, PIG, Hive, HiveQL, Impala, Flume, Oozie, HUE, Informatica, Kafka, Spark, Spark Streaming,Spark SQL, Apache Nifi, Zookeeper, Avro.

Cloud Services

AWS, DynamoDB, SQS, SNS, CloudWatch, X-Ray, Lambda, EC2, Route53, ELB, EBS, IAM, KMS, Amazon S3, Elastic Beanstalk, Cloud Formation, Cloud Front, Azure data Factory, Azure Data Bricks, Snowflake, Logic Apps, Functional App, Snowflake, Azure DevOps, Azure Service Bus.

Languages

C, Java, PHP, Scala, Python, SQL, HiveQL

Web Technologies

HTML, XML, CSS, JavaScript, JSP, JDBC, Maven, AJAX

Databases

Oracle, MySQL, Microsoft SQL Server, Snowflake, Teradata, DB2

NO SQL Databases

HBase, Dynamo DB, Cassandra

Operating Systems

Windows (XP,7,8,10), UNIX, LINUX, Ubuntu

Cloud Computing Services

Google Cloud Platform, Azure, AWS (EMR,S3,Cloud Watch,EC2,Elastic Search)

IDE/ Tools

IntelliJIDEA, Eclipse, Vscode, Putty, Anaconda, xShell, Ambari, Cygwin, WinSCP, Tableau.

WORK EXPERIENCE:

Client: PNC Bank Mar 2023 to Present

Cloud/Azure Data Engineer

Company Description:

PNC Bank is one of the largest financial services institutions in the U.S., offering a wide range of banking products and services, including retail banking, corporate and institutional banking, asset management, and wealth management. They provide services to individuals, small businesses, and large corporations, helping with everything from personal banking, loans, mortgages, and credit cards to investment strategies and corporate finance solutions.

Responsibilities:

Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application using Agile Methodology.

Responsible for the execution of big data analytics, predictive analytics, and machine learning initiatives.

Created Linked Services for multiple source system (i.e.: Azure SQL Server, ADLS, BLOB, Rest API).

Created Pipeline’s to extract data from on premises source systems to azure cloud data lake storage, extensively worked on copy activities and implemented the copy behavior’s such as flatten hierarchy, preserve hierarchy and Merge hierarchy, Implemented Error Handling concept through copy activity.

Exposure on Azure Data Factory activities such as Lookups, Stored procedures, if condition, for each, Set Variable, Append Variable, Get Metadata, Filter and wait.

Configured the logic apps to handle email notification to the end users and key shareholders with the help of web services activity.

Created dynamic pipeline to handle multiple sources extracting to multiple targets and extensively used azure key vaults to configure the connections in linked services.

Configured and implemented the Azure Data Factory Triggers and scheduled the Pipelines and monitored the scheduled Azure Data Factory pipelines and configured the alerts to get notification of failure pipelines.

Extensively worked on Azure Data Lake Analytics with the help of Azure Data bricks to implement SCD-1, SCD-2 approaches.

Created Azure Stream Analytics Jobs to replication the real time data to load to Azure SQL Data warehouse.

Implemented delta logic extractions for various sources with the help of control table; implemented the Data Frameworks to handle the deadlocks, recovery, logging the data of pipelines.

Understand the latest features like (Azure DevOps, OMS, NSG Rules, etc..,) introduced by Microsoft Azure and utilized it for existing business applications.

Developed Kafka producers and consumers to publish and subscribe to data streams, respectively, and to handle data routing and partitioning.

Integrated Kafka with various data processing technologies, such as Spark, Hadoop, and Flink, to build end-to-end data processing solutions.

Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).

Deployed the codes to multiple environments with the help of CI/CD process and worked on code defect during the SIT and UAT testing and provide supports to data loads for testing; Implemented reusable components to reduce manual interventions.

Developing Spark (Scala) notebooks to transform and partition the data and organize files in ADLS.

Working on Azure Data bricks to run Spark-Python Notebooks through ADF pipelines.

Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks.

Created Triggers, PowerShell scripts and the parameter JSON files for the deployments.

Worked with VSTS for the CI/CD Implementation.

Reviewing individual work on ingesting data into azure data lake and provide feedbacks based on reference architecture, naming conventions, guidelines, and best practices.

Implemented End-End logging frameworks for Data factory pipelines.

Developed and maintained automated data processing and ETL pipelines using IBM Tivoli.

Optimized Unix performance by analyzing system metrics and tuning system settings such as kernel parameters and file system configurations.

Managed data engineering projects using Jira to track tasks, bugs, and project progress.

Managed GitLab repositories for data engineering projects, including version control, pull requests, and code reviews.

Worked in Agile environments and participated in Agile ceremonies such as daily stand-ups, sprint planning, and retrospectives.

Environment: Azure Data Factory, Azure Data Bricks, PolyBase, Azure DW, ADLS, Azure DevOps, BLOB, Azure SQL Server, Azure DW, Azure Devops, Azure synapse, Horton works, Oracle, MySQL, Pyspark, Scala, Erwin, IBM Tivoli, airflow, Unix, GitHub, Git, agile.

Client: Metlife Insurance Aug 2021 to Feb 2023

Data Engineer / AWS.

Company Description:

MetLife is a leading global provider of insurance, annuities, and employee benefit programs. Founded in 1868, the company operates in over 40 markets worldwide, including the United States, Japan, Latin America, Asia, Europe, and the Middle East. MetLife offers a diverse range of products and services, such as life insurance, dental and vision plans, disability income insurance, annuities, and asset management.

Responsibilities:

Used Python to implement simple and complicated spark tasks for data analysis across various data formats.

Worked on AWS services like S3, EC2, IAM, RDS with Orchestration and Data pipeline like AWS Step function/Data Pipeline/Glue

Created AWS Lambda functions and API Gateways to submit data via API Gateway that is accessible via Lambda function.

Generating Scala and java classes from the respective APIs so that they can be incorporated in the overall application.

Created AWS Lambda functions using Python for deployment management in AWS and designed, investigated, and implemented public facing websites on Amazon Web Services and integrated it with other application infrastructure.

Created external tables with partitions using Hive, AWS Athena and Redshift, Developed PySpark code for AWS Glue jobs and for EMR.

Created external and normal tables and views in Snowflake database.

Responsible for delivering datasets from Snowflake to One Lake Data Warehouse and built CI/CD pipeline using Jenkins and AWS lambda and Importing data from DynamoDB to Redshift in Batches using Amazon Batch using TWS scheduler.

Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.

Experience in developing Microservices with Spring boot using Java framework using Scala.

Responsible for delivering datasets from Snowflake to One Lake Data Warehouse and built CI/CD pipeline.

Developed code in Spark SQL for implementing Business logic with Python as programming language.

Used Spark streaming to divide streaming data into batches as an input spark engine for batch processing.

Wrote Spark applications for data validation, cleansing, transformation and custom aggression and used spark engine, spark SQL for data analysis.

Applied Python Spark scripts to categorize data organizations according to various categories of records. Spark cluster monitoring with assistance.

Installed and Configured Apache Airflow for AWS S3 bucket and created dags to run the Airflow.

Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon Sage Maker.

Worked on the code transfer of a quality monitoring application from AWS EC2 to AWS Lambda, as well as the construction of logical datasets to administer quality monitoring on snowflake warehouses.

Upgrade and downgrade scripts were developed to move data from tables into Spark-Redis for quick access by a huge client base without sacrificing performance.

In accordance with project proposals, coordinated with end users to build and deploy analytics solutions for Python-based user-based recommendations.

Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to Snowflake.

Worked with Python and Scala to transform Hive/SQL queries into Spark (RDDs, Data frames, and Datasets).

Expertise in using the Scala programming language to build microservices.

Expertise utilizing Spark SQL to manage Hive queries in an integrated Spark environment.

Participated in daily scrum sessions and the story-driven agile development process.

Creating data frames and datasets using Spark and Spark Streaming, then performing transformations and actions.

Experience with Kafka on publish-subscribe messaging as a distributed commit log.

Environment: Hadoop, Scala, Spark, Hive, Teradata, Tableau, Linux, Python, Java, Kafka, AWS S3 Buckets, AWS Glue, NIFI, Postgres, Snowflake, AWS EC2, Oracle PL/SQL, Flink, Development toolkit (JIRA, Bitbucket/Git, Service now etc.,)

Client: Walmart, Bentonville, Arkansas. June 2019 to July 2021

Azure Data Engineer

Company Description:

Walmart is a multinational retail corporation that operates a chain of discount department stores, supermarkets, and warehouse clubs. Walmart has grown to become one of the world's largest retailers. As of 2023, the company reported revenues of $611.3 billion and employed approximately 2.1 million associates worldwide.

Responsibilities:

Created Pipelines in ADF using Linked Services, Datasets and Pipeline to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Datawarehouse, write-back tool, and backward.

Extracted, Transformed and Loaded data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL, and U-SQL Azure Data Lake Analytics.

Data is Ingested to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Worked on Azure Services like IaaS, PaaS and worked on storage like Blob (Page and Block), SQL Azure.

Implemented OLAP multi-dimensional functionality using Azure SQL Data Warehouse. Retrieved data using Azure SQL and Azure ML which is used to build, test, and predict the data.

Worked on Cloud databases such as Azure SQL Database, SQL managed instance, SQL Elastic pool on Azure, and SQL server.

Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).

Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.

Designed and developed Azure Data Factory pipelines to Extract, Load and Transform data from difference sources systems (Mainframe, SQL Server, IBM DB2, Shared Drives, etc.) to Azure Data Storage services using a combination of Azure Data Factory, Azure Databricks (PySpark, Spark-SQL), Azure Stream Analytics and U-SQL Azure Data Lake Analytics. Data Ingestion into various Azure Storage Services like Azure Data Lake, Azure Blob Storage, Azure Synapse Analytics (formerly known as Azure Data Warehouse).

Configured and deployed Azure Automation Scripts for a multitude of applications utilizing the Azure stack (including Compute, Web & Mobile, Blobs, ADF, Resource Groups, Azure Data Lake, HDInsight Clusters, Azure Data Factory, Azure SQL, Cloud Services, and ARM), Services and Utilities focusing on Automation.

Involved in Migrating Objects from Teradata to Snowflake and created Snow pipe for continuous data load.

Increased consumption of solutions including Azure SQL Databases, Azure Cosmos DB and Azure SQL.

Created continuous integration and continuous delivery (CI/CD) pipeline on Azure that helps to automate steps in the software delivery process.

Deploying and managing applications in Datacenter, Virtual environment, and Azure platform as well.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's and PySpark.

Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.

Handled importing of data from various data sources, performed transformations using Hive, and loaded data into HDFS.

Design, development, and implementation of performant ETL pipelines using PySpark and Azure Data Factory.

Worked with Git version control system and experience in managing code repositories using Azure DevOps Git repositories.

Developed and maintained Jira plugins and integrations with other tools such as GitHub, Jenkins, or Slack.

Worked with Agile project management tools such as Jira, Trello, or Asana to track project progress and update stakeholders.

Environment: Azure Data Factory(V2), Azure Data Bricks (PySpark, Spark SQL), Azure Data Lake, Azure BLOB Storage, Azure ML, Azure SQL, Hive, Git, JIRA, HQL, Snowflake, Teradata, Unix, powershell, Github, Git, agile.

Prudential Financial, New Jersey. Nov 2018 to May 2019

Data Engineer

Company Description:

Prudential Financial, Inc. is a prominent American financial services company offering a diverse range of products and services, including insurance, retirement planning, investment management, and asset management. The company operates primarily in the United States, Asia, Europe, and Latin America, serving both individual and institutional clients.

Responsibilities:

Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.

Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).

Have good experience in setting up separate application and reporting data tiers across servers using Geo replication functionality.

Implemented Disaster Recovery and Failover servers in Cloud by replicating data across regions.

Have extensive experience in creating pipeline jobs, scheduling triggers, Mapping data flows using Azure Data Factory(V2) and using Key Vaults to store credentials.

Good experience in creating Elastic pool databases and schedule Elastic jobs for executing TSQL procedures.

For Log analytics and for better query response used Kusto Explorer and created alerts using Kusto query language.

Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.

Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).

Designed SSIS Packages using Business Intelligence Development Studio (BIDS) to extract data from various data sources and load into SQL Server database for further Data Analysis and Reporting by using multiple transformations.

Worked on creating correlated and non-correlated sub-queries to resolve complex business queries involving multiple tables from different databases.

Developed business intelligence solutions using SQL server data tools 2015 & 2017 versions and load data to SQL andAzure Cloud databases.

Perform analyses on data quality and apply business rules in all layers of data extraction transformation and loading process.

Perform validation and verify software at all testing phases which includes Functional Testing, System Integration Testing, End to End Testing, Regression Testing, Sanity Testing, User Acceptance Testing, Smoke Testing, Disaster Recovery Testing, Production Acceptance Testing and Pre-prod Testing phases.

Have good experience in logging defects in Jira and Azure Devops tools.

Involved in planning cutover strategy, go-live schedule including the scheduled release dates of Portfolio central Datamart changes.

Environment: Microsoft SQL Server, SSDT-2012 &2015, Azure Synapse Analytics, Azure Data Lake & BLOB, Azure SQL, Azure data factory, Azure analysis services.

Cigna, Dallas TX. June 2016 to Oct 2018

Hadoop Developer

Company Description:

Cigna is a global health service company offering a wide range of health insurance products and services. In recent developments, Cigna has focused on expanding its health services division, Evernorth, to provide comprehensive health solutions beyond traditional insurance offerings. This includes acquiring companies like Express Scripts to enhance pharmacy services and divesting certain Medicare businesses to streamline operations. Additionally, the company has faced scrutiny over its claims processing practices and is implementing measures to improve customer satisfaction, including linking executive compensation to customer service outcomes.

Responsibilities:

•Designed and implemented Big Data solutions that facilitated data-driven decision-making for the business and technology teams, with a focus on customer acquisition and business solutions.

•Installed, configured, and managed various components of the Hadoop ecosystem, such as Hive, Sqoop, and Pig.

•Loaded unstructured and semi-structured data from multiple sources into the Hadoop cluster using Flume and managed it effectively.

•Developed MapReduce programs to cleanse and parse data stored in HDFS, obtained from various sources, and performed joins on the Map side using distributed cache.

•Utilized Hive data warehousing tool to analyze the data in HDFS and created internal and external tables with properly defined static and dynamic partitions for enhanced efficiency.

•Leveraged Avro SerDe's packaged with Hive for serialization and de-serialization to parse the contents of streamed log data.

•Developed custom Hive UDFs to perform comprehensive data analysis.

•Used Pig to develop ad-hoc queries and extracted business-relevant information to RDBMS using Sqoop to make it available for the BI team to generate reports based on data.

•Implemented a daily workflow for the extraction, processing, and analysis of data using Oozie.

•Troubleshot MapReduce jobs by reviewing log files, ensuring smooth functioning of the system.

Environment: Hadoop, MapReduce, Hive, Pig, Oozie, Sqoop, Flume, Cloudera, Spark 1.6.0

Equifax Inc, Atlanta, GA. April 2014 – May 2016

ETL Developer.

Company Description:

Equifax Inc. is a leading global data, analytics, and technology company specializing in consumer credit reporting and related services. Founded in 1899 and headquartered in Atlanta, Georgia, Equifax operates in 24 countries, serving businesses, government agencies, and consumers worldwide. Equifax continues to play a crucial role in the global economy by providing data-driven insights and services that facilitate informed decision-making across various industries.

Responsibility:

•Developed and implemented ETL workflows for a large healthcare organization, integrating data from various sources such as EMR systems, billing systems, and pharmacy databases using Informatica PowerCenter.

•Designed and implemented a data integration solution for a financial services company, leveraging Informatica PowerExchange for real-time data extraction and loading into a centralized data warehouse.

•Built an ETL pipeline for a retail company, combining data from various systems such as point-of-sale, inventory management, and customer relationship management using Informatica Cloud Data Integration.

•Implemented a real-time data replication solution for a manufacturing company, using Informatica Change Data Capture to capture and integrate transactional data from multiple databases.

•Automated data extraction and loading processes for a global logistics company, using Informatica Cloud Integration Hub to manage data flow between on-premise and cloud-based applications.

•Developed and maintained ETL workflows for a large insurance company, ensuring data accuracy and completeness using Informatica Data Validation and Informatica Metadata Manager.

•Designed and implemented a data warehousing solution for a government agency, integrating data from multiple departments using Informatica PowerCenter and providing ad-hoc reporting capabilities using Informatica PowerExchange.

•Optimized performance of existing ETL workflows for a media company, improving data processing speed and reducing errors using Informatica PowerCenter and Informatica Big Data Management.

Environment: Informatica, PowerCenter, SQL, ETL Workflows,

Contact this candidate