Data Engineer Quality Assurance

Location:

Aurora, IL

Posted:

February 11, 2025

Contact this candidate

Resume:

United States

408-***-****

***********@*****.***

EDUCATION

Master of Science

Information Technology and Management

Campbellsville University, Campbellsville, KY

NITHIN KUMAR YELUGURI

SUMMARY

Adept Sr. Cloud Data Engineer with a proven track record at multiple clients, specializing in real-time analytics and ETL development in both on-premises environments like Microsoft and Oracle to cloud solutions like AZURE, AWS and Google Cloud (GCP). Around 9 years of IT experience in Data Engineering, System Analysis, Design, Development, Implementation and testing and support of Databases, Data Warehouse Applications, Data Visualization, technologies using Azure Synapse Analytics, Data Factory, Data Lake, Blob, Postgres, AWS tools, Google Cloud, SQL DW, SQL, Power BI, SSRS, SSAS, SSIS. Expert in data migration and quality assurance, I led a 20% reduction in processing times through performance tuning. Designing and implementing cloud data architectures, ETL pipelines, and data warehouses to support business-critical applications. Experience in Amazon Web Services (Amazon EC2, Amazon S3, Amazon Simple DB, Amazon RDS, Amazon Redshift, Amazon DynamoDB, Amazon Glue, Amazon Elastic Load Balancing, Elastic Search, Amazon Lambdas, AWS Identity and Access Management). Proficient in writing complex AWS Lambda, Glue, EMR, Python, and Spark jobs to process and transform large datasets in real-time. Strong development skills with Azure Data Factory, Azure Databricks, Azure SQL Data warehouse, Azure SQL Data base, Azure Storage Explorer. Extensive Knowledge on developing Spark SQL jobs by developing Data Frames. Executed complex HiveQL queries for required data extraction from Hive tables and written Hive UDF’s. hands on experience in Azure Stream analytics, Event Hub, Azure SQL Database & Azure Synapse. Excellent in High Level Design of ETL Packages and SSIS Packages for integrating data using OLE DB connection from heterogeneous sources like (Excel, CSV and flat files, MS Access, SAS, Oracle, DB2, CRM) by using multiple transformations provided by SSIS such as Data Conversion, Conditional Split, Bulk Insert, Derived Column, Merge, Merge Join and Union all. Implemented Copy activity, Dataflow, Trigger in Azure Data Factory Pipeline Activities for On - cloud ETL processing. Strong understanding of database structures, theories, principles, and practices with a knack for writing complex SQL queries. Demonstrated success in working collaboratively with cross-functional teams to deliver optimal solutions that align with business objectives. Noteworthy contributions include streamlining data processes and improving system performance efficiency. I am a highly competent Data Engineer with a background in designing, testing, and maintaining data management systems. Possess strong skills in database design and data mining, coupled with adeptness at using real time data to improve business decision making. Previous work resulted in optimizing data retrieval processes and improving system efficiency. I am skilled in databases using SQL, and data visualization with tools such as Power BI and SSRS. Excel in problem-solving, collaboration, and adaptability to leverage technical skills in developing innovative data solutions across diverse environments.

TECHNICAL SKILLS

Real-time analytics

Scripting languages

Data migration

ETL development

SQL expertise

Data quality assurance

Performance tuning

Data governance

NoSQL databases

Skills to Prioritize:

- AWS Glue

- Google Cloud (GCP)

- Azure Data Factory

- Python & Spark

- SQL & NoSQL

- Data Migration & ETL

Databases; ETL & Reporting - SQL Server 2019/2017/2016/14/12/08 R2, Azure SQL Data Warehouse, MySQL, Oracle 12c/11g, PostgreSQL 11.5, 10.12, 9.5, MS Access 2000/8.0, Teradata 14.0, DB2, SQL Server Management Studio, SSDT 2016/14/12, BIDS 2008R2/08, SQL Server Integration Services (SSIS 2019/2017/2016/2014/2008 R2), SQL Server Reporting Services (SSRS 2016/2 R2), Power BI.

Cloud Technologies & Programming Languages - Azure Data Warehouse, Azure SQL, Azure Data Lake, Azure Data Factory, Azure Databricks, AWS (Lambda, Glue, EMR, Redshift, S3), Google Cloud. SQL, T-SQL, PL-SQL, PySpark, Spark with SQL, PL/pgSQL, Linux, Bash, Shell Scripting, Python, Scala.

Data Engineering - ETL, Data Modeling, Data Warehousing, Real-time Data Processing, Data migration, Data Governance, Real-time analytics, Data quality assurance, Performance tuning.

Data warehousing

Metadata management

Big data processing

Data security

Data pipeline control

Data integration

Data pipeline design

Data modeling

Spark framework

EXPERIENCE

SR. CLOUD DATA ENGINEER

CME (Chicago Mercantile Exchange) Chicago, IL March 2024 - Current

Developed complex data transformation jobs using Python and Spark for real-time and batch processing

Wrote and optimized AWS Lambda functions for handling event-driven ETL processes

Worked with business stakeholders to define KPIs and metrics for customer behavior and sales trends

Led the migration of on-premises BI infrastructure to AWS, optimizing storage and computing resources

Worked with external vendors/partners e.g., salesforce to onboard external data into Target s3

Developed Spark jobs, clean data from various feeds to make it suitable for ingestion and analysis

Utilized AWS EMR clusters to process large datasets and ensure scalability in data operations

Imported data from various sources into Spark RDD for analysis

Used SQL on the reports generated by companies to judge the complexity of data and data types

Created DynamoDB tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL, and a variety of portfolios

Experience in using AWS Glue crawlers to create tables on raw data in AWS S3

Loading processed data to AWS Redshift

Developed scalable data pipelines in AWS Glue and Python to extract, transform, and load data from multiple sources to Snowflake

Implemented data orchestration using AWS Lambda functions and managed real-time data processing using AWS EMR

Used python Boto 3 to configure the services AWS glue, EC2, S3

Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations

Written manual SQL procedures for ETL and data cleansing tasks in ETL load to ODS

Created DataStage ETL coding/naming standard and designed DataStage Parallel jobs as is or forklifted them

Loaded the data into Spark RDD and performed in-memory computation to generate the output

Performed transformations and actions on the RDD to meet the business requirements

Involved in loading data from UNIX file system to HDFS

Used Spark-SQL to Load Parquet data and created Datasets defined by Case classes and managed Structured data using Spark SQL and finally stored into Hive tables for downstream consumption

Developed Data Pipelines using Apache Kafka to stream and process real time data

Migrated HiveQL queries into Spark SQL to improve performance

Extracted Real time feed using Spark streaming and converted to RDD and process data into Data Frame and load the data into HBase

Managed version control and deployment of Python-based ETL scripts in Git and Jenkins

Utilized Amazon S3 for data lake storage and AWS Redshift for certain reporting workloads

Designed and implemented data governance processes to ensure data quality and compliance

Conducted performance tuning of Spark jobs and AWS Glue scripts, reducing processing times by 20%

Automated data integration tasks using AWS Lambda, reducing manual intervention by 40%

Established real-time reporting systems, enabling executives to access up-to-date financial data and key performance indicators

Worked on migration of existing feed from Hive to Spark to reduce the latency of feeds in existing HiveQL

Delivered technical documentation for ETL processes and cloud-based architecture

Actively participated in sprint planning and agile ceremonies to ensure timely delivery of data engineering tasks

Coordinated the migration of Oracle databases to GCP CloudSQL for PostgreSQL.

Performed data copy from on-premises Oracle databases to GCP CloudSQL.

Configured CDC for primary schemas to ensure data consistency and real-time updates.

Conducted extensive query performance tuning to optimize database performance post-migration.

Collaborated with development teams to troubleshoot and resolve performance-related issues.

Refactored and rewrote SQL code to enhance performance and maintainability in PostgreSQL.

Utilized GCP tools(BigQuery) and services for efficient database management and monitoring.

Developed automated scripts for database backup and recovery processes.

Implemented security measures to protect sensitive data during migration.

Provided training and support to junior developers on best practices in database development and migration.

Analyzed and optimized hardware configurations to improve database performance.

Environment: ETL, SQL Developer, AWS, Redshift, Lambda, Glue,EMR, Spark, MySQL, SQLLoader, Data Migration Services (DMS), Oracle 12c/11g, PostgreSQL 11.5, 10.12, 9.5, DataStage, Hive, Unix, Linux, Windows, Google Cloud, Microsoft Visual Studio, MS Excel, Python3.8, Shell Scripting, PySpark, Spark3.0, SparkSQL, SparkDF/DS.

SR. DATA ENGINEER

UPMC (University of Pittsburgh Medical Centre) Pittsburgh, PA March 2022 - February 2024

Company Overview: [Remote]

Gathering, analyzing, and documenting the business requirements and business rules by directly working with Business Analysts

Involved in the entire SDLC (Software Development Life Cycle) process that includes implementation, testing, deployment, documentation, training, and maintenance

Generated on-demand and scheduled reports for business analysis or management decision using SQL Server Reporting Services

Designed and developed AWS-based data pipeline solutions using AWS Lambda and Glue to process large volumes of healthcare data

Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena

Worked with Healthcare Data/Claims 835 and 837 formats for analytical purposes, X12 Enterprise Data Integration (EDI), PHI

Worked with X12 EDI standards for healthcare data, HEDIS, and HIPAA

Developed complex ETL Packages using SQL Server 2019 Integration Services to load data from various sources like SQL Server/DB2 to Staging Database and then to Data Warehouse

Built the entire infrastructure that is required for optimal ETL from a variety of data sources using AWS, mainly with Pyspark

Used Spark-SQL to process the data and to run on Spark engine

Used Spark for improving performance and optimization on existing algorithms using Spark-SQL

Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations using AWS Athena

Create external tables with partitions using AWS Athena and Redshift

Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD

Develop framework for converting existing PowerCenter mappings and to PySpark (Python and Spark) Jobs

Create Pyspark frame to bring data from DB2 to Amazon S3

Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements

Created and enforced policies to achieve HIPAA compliance

Data mapping, migration, and conversion to Data Warehouse Platform

Defined and deployed monitoring, metrics, and logging systems on AWS

Logical and Physical data modeling using Erwin for data warehouse database in STAR SCHEMA

Developed end to end ETL pipeline using Spark-SQL, Python and Salesforce on Spark engine

Developed Spark jobs, clean data from various feeds to make it suitable for ingestion and analysis

Imported data from various sources into Spark RDD for analysis

Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift

Created Python and UNIX shell scripts while interacting with different AWS services

Developed Complex mappings to feed Data warehouse & Data marts by extensively using Mapplets, and Transformations like Lookup, Filter, Router, Expression, Aggregator, Joiner, Stored Procedure and Update Strategy

Implemented Event Handlers, Package Configurations, Logging, System and User-defined Variables, Check Points and Expressions for SSIS/ ETL Packages

Created AWS S3 buckets, performed folder management in each bucket, managed cloud trail logs and objects within each bucket

Completed POC on usability of AWS using EC2 and S3 storage and lambda functions

Develop ETL processes for the data warehouse to clean and standardize data and metadata for loading into the Datawarehouse

Create external tables with partitions using Hive, AWS Athena, and Redshift

Optimized SQL queries using indexes and execution plans for maximum efficiency and performance

Automated mappings to run using UNIX shell scripts, which included Pre- and Post-session jobs and extracted data from Transaction System into Staging Area

Wrote scripts and indexing strategy for migration to AWS Redshift from SQL Server and MySQL databases

Modify SSIS ETL packages for changes business requirements to load data from various sources to SQL server tables in Data Warehouse

Performed Developer Testing, Functional testing, Unit testing and created Test Plans and Test Cases.

Environment: ETL, MS SQL Server 2019/2017/2016/2012/2008R2, MySQL, SSIS, Data Warehouse, SQL Server 2019/2017/2016/2014, Agile, PySpark, AWS Glue, AWS Batch AWS S3, Athena, Redshift, Python3.8, Shell Scripting, PySpark, Spark3.0, SparkSQL,SparkDF/DS.

SSIS/SQL/AZURE SQL DATA INTEGRATION ENGINEER

Gallagher Bassett Rolling Meadows, IL April 2018 - March 2022

Gathering, analyzing, and documenting the business requirements and business rules by directly working with end users/clients

Created and maintained documents like Documentation Roadmap, DATA Models, ETL Execution Plan, Configuration Management Procedures, and Project Plan

Designed and developed various SSIS packages (ETL) to extract and transform data and involved in scheduling and deploying SSIS Packages

Extended SSIS Capabilities by Creating Custom SSIS Components for SharePoint

Implemented various tasks and transformations for data cleansing and performance tuning of SSIS packages

Extracted and Loaded SharePoint Lists and Surveys Data into SQL Server Database with SSIS

Created complex Stored Procedures, Triggers, Functions (UDF), Indexes, Tables, Views and other T-SQL codes and SQL joins for SSIS packages

Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB)

Extend SSIS capabilities by creating custom SSIS components for SharePoint

Configure SSIS packages with XML configuration file, environment variable, parent package variable and SQL Server table

Worked with SSIS Data flow task, Transformations, For-loop containers and Fuzzy Lookups, Data conversion and configured data in slowly changing dimensions

Create complex Stored Procedures, Triggers, Functions (UDF), Indexes, Tables, Views and other T-SQL code and SQL joins for SSIS packages and SSRS reports

Perform SSIS Development and support, developed ETL solutions for integrating data from multiple sources like Flat Files (delimited, fixed width), Excel, SQL Server, Raw File, and DB2 into the central OLTP database

Worked on creating dependencies of activities in Azure Data factory

Propagation of XML data through a policy engine to the database Master and Data Lineage database

Creating Stored Procedure and Scheduled them in Azure Environment

Monitoring Produced and Consumed Data Sets of ADF

Creating Data Factories in Azure Data factory

Build complex ETL jobs data transform data visually with data flows or by using compute services Azure Databricks, and Azure SQL Database

Creating Multiple Data Sets in Azure worked in migration of SQL Server 2008/2012/2014 to SQL Server 2017

Moving CSV files from Azure blob to Azure SQL Server

Analyzed existing databases, tables, and other objects to prepare to migrate to Azure Synapse

Worked on on-prem Datawarehouse migration to Azure Synapse using polybase and ADF

Performed SSIS Development and support, developed ETL solutions for integrating data from multiple sources like Flat Files (delimited, fixed width), Excel, SQL Server, Raw File, and DB2 into the central OLTP database

Migrating on-prem ETLs from MS SQL server to Azure Cloud using Azure Data Factory and Databricks

Involved in configuring Azure platform for data pipelines, ADF, Azure Blob Storage and Data Lakes and building workflows to automate data flow using ADF

Developed metadata management program with business and technical definitions, data lineage, life cycle, identify authoritative source

Analyzing the Data from different sources using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Synapse, Azure Data Lake Analytics, HDInsight, Hive, Sqoop

Experienced in forecasting the data space requirements using the features like Capacity planning, mapping source to target data using data lineage feature and calculating the impact analysis

Creating Azure Data factories for loading the data to Azure SQL database from Cosmos platform

Developed, documented Data Modeling standards to establish in the organization

Created and managed backend process to meet diverse business needs

Performed data loading, data import/export, data transmission

Involved in data modeling Star Schemas according to the Business requirements using Erwin 6.3.8

Worked extensively on system analysis, design, development, testing and implementation of projects

Assist with load testing activities by setting up database counters to monitor backend performance

Pulled data into MS SQL Server Database from different DBMS databases like Oracle and Teradata with ETL packages

Created and scheduled Various SQL Jobs, using SQL Server Agent to perform various administrative tasks

Environment: ETL, MS SQL Server 2017/2016/2012/2008, PDW, MS SQL Server Master Data Services, SQL Server Integration Services (SSIS), MS Visio, Azure Data Factory, Azure Databricks, MS Visual Studio/.NET 2008, TFS, VSTS, Azure Devops, Sharepoint.

SQL DATABASE DEVELOPER

Techno Vision Solutions Farmington Hills, MI January 2017 - April 2018

Created Complex ETL Packages using SSIS for extracting, cleaning, transforming, and loading data from staging tables to partitioned tables with incremental load

Developed, deployed, and monitored SSIS Packages

Data Automation Using SSIS to automate data processes for input of information

Developed SSIS packages to extract, transform and load data from Oracle and SQL Server databases into Data Warehouse

Deployed SSIS packages into various Environments (Dev, UAT and Prod) using Deployment Utility

Implemented data validation by using T-SQL queries to confirm that SSRS reports data results that are being returned are correct

Design Access databases for reporting and claims data analysis

Create tables, table updates and inserts to manipulate the data

Created packages using various control flow tasks such as Data Flow Task, Execute SQL Task, Execute Package Task, and File System Task

Designed and implemented multiple dashboards for internal metrics using Azure Synapse - PowerPivot & Power Query tools

Involved in creation of Data Warehouse database Physical Model, Logical Model using Erwin data modeling tool

Designed the Data warehouse and done the mappings from Source to the Target tables

Created Indexes for faster data retrieval from the database

Developed Spark applications in Databricks using PySpark and Spark SQL to perform transformations and aggregations on source data before loading it into Azure Synapse Analytics for reporting

Writing Stored Procedures and Functions for better performance and flexibility

Created SQL Server jobs and scheduled them to load data periodically using SQL server Agent

Strong skills in transformation data from DB2 to SQL Server

Developed Aggregations, partitions, and calculated members for cube as per business requirements

Defined appropriate measure groups and KPIs and deployed cubes

Created Parameterized, Crosstab, Drill Down and Summary reports by Using SSRS and created report snapshots to improve the performance of SSRS.

Environment: ETL, MS SQL Server 2012/2008, MS SQL Server Master Data Services 2012, SQL Server Integration Services (SSIS), SSRS, MS Visual Studio/.NET 2008, BI Extractor, Microsoft Excel, Add-In Excel with Master data.

Contact this candidate