United States
***********@*****.***
EDUCATION
Master of Science
Information Technology and Management
Campbellsville University, Campbellsville, KY
NITHIN KUMAR YELUGURI
SUMMARY
Adept Sr. Cloud Data Engineer with a proven track record at multiple clients, specializing in real-time analytics and ETL development in both on-premises environments like Microsoft and Oracle to cloud solutions like AZURE, AWS and Google Cloud (GCP). Around 9 years of IT experience in Data Engineering, System Analysis, Design, Development, Implementation and testing and support of Databases, Data Warehouse Applications, Data Visualization, technologies using Azure Synapse Analytics, Data Factory, Data Lake, Blob, Postgres, AWS tools, Google Cloud, SQL DW, SQL, Power BI, SSRS, SSAS, SSIS. Expert in data migration and quality assurance, I led a 20% reduction in processing times through performance tuning. Designing and implementing cloud data architectures, ETL pipelines, and data warehouses to support business-critical applications. Experience in Amazon Web Services (Amazon EC2, Amazon S3, Amazon Simple DB, Amazon RDS, Amazon Redshift, Amazon DynamoDB, Amazon Glue, Amazon Elastic Load Balancing, Elastic Search, Amazon Lambdas, AWS Identity and Access Management). Proficient in writing complex AWS Lambda, Glue, EMR, Python, and Spark jobs to process and transform large datasets in real-time. Strong development skills with Azure Data Factory, Azure Databricks, Azure SQL Data warehouse, Azure SQL Data base, Azure Storage Explorer. Extensive Knowledge on developing Spark SQL jobs by developing Data Frames. Executed complex HiveQL queries for required data extraction from Hive tables and written Hive UDF’s. hands on experience in Azure Stream analytics, Event Hub, Azure SQL Database & Azure Synapse. Excellent in High Level Design of ETL Packages and SSIS Packages for integrating data using OLE DB connection from heterogeneous sources like (Excel, CSV and flat files, MS Access, SAS, Oracle, DB2, CRM) by using multiple transformations provided by SSIS such as Data Conversion, Conditional Split, Bulk Insert, Derived Column, Merge, Merge Join and Union all. Implemented Copy activity, Dataflow, Trigger in Azure Data Factory Pipeline Activities for On - cloud ETL processing. Strong understanding of database structures, theories, principles, and practices with a knack for writing complex SQL queries. Demonstrated success in working collaboratively with cross-functional teams to deliver optimal solutions that align with business objectives. Noteworthy contributions include streamlining data processes and improving system performance efficiency. I am a highly competent Data Engineer with a background in designing, testing, and maintaining data management systems. Possess strong skills in database design and data mining, coupled with adeptness at using real time data to improve business decision making. Previous work resulted in optimizing data retrieval processes and improving system efficiency. I am skilled in databases using SQL, and data visualization with tools such as Power BI and SSRS. Excel in problem-solving, collaboration, and adaptability to leverage technical skills in developing innovative data solutions across diverse environments.
TECHNICAL SKILLS
Real-time analytics
Scripting languages
Data migration
ETL development
SQL expertise
Data quality assurance
Performance tuning
Data governance
NoSQL databases
Skills to Prioritize:
- AWS Glue
- Google Cloud (GCP)
- Azure Data Factory
- Python & Spark
- SQL & NoSQL
- Data Migration & ETL
Databases; ETL & Reporting - SQL Server 2019/2017/2016/14/12/08 R2, Azure SQL Data Warehouse, MySQL, Oracle 12c/11g, PostgreSQL 11.5, 10.12, 9.5, MS Access 2000/8.0, Teradata 14.0, DB2, SQL Server Management Studio, SSDT 2016/14/12, BIDS 2008R2/08, SQL Server Integration Services (SSIS 2019/2017/2016/2014/2008 R2), SQL Server Reporting Services (SSRS 2016/2 R2), Power BI.
Cloud Technologies & Programming Languages - Azure Data Warehouse, Azure SQL, Azure Data Lake, Azure Data Factory, Azure Databricks, AWS (Lambda, Glue, EMR, Redshift, S3), Google Cloud. SQL, T-SQL, PL-SQL, PySpark, Spark with SQL, PL/pgSQL, Linux, Bash, Shell Scripting, Python, Scala.
Data Engineering - ETL, Data Modeling, Data Warehousing, Real-time Data Processing, Data migration, Data Governance, Real-time analytics, Data quality assurance, Performance tuning.
Data warehousing
Metadata management
Big data processing
Data security
Data pipeline control
Data integration
Data pipeline design
Data modeling
Spark framework
EXPERIENCE
SR. CLOUD DATA ENGINEER
CME (Chicago Mercantile Exchange) Chicago, IL March 2024 - Current
Developed complex data transformation jobs using Python and Spark for real-time and batch processing
Wrote and optimized AWS Lambda functions for handling event-driven ETL processes
Worked with business stakeholders to define KPIs and metrics for customer behavior and sales trends
Led the migration of on-premises BI infrastructure to AWS, optimizing storage and computing resources
Worked with external vendors/partners e.g., salesforce to onboard external data into Target s3
Developed Spark jobs, clean data from various feeds to make it suitable for ingestion and analysis
Utilized AWS EMR clusters to process large datasets and ensure scalability in data operations
Imported data from various sources into Spark RDD for analysis
Used SQL on the reports generated by companies to judge the complexity of data and data types
Created DynamoDB tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL, and a variety of portfolios
Experience in using AWS Glue crawlers to create tables on raw data in AWS S3
Loading processed data to AWS Redshift
Developed scalable data pipelines in AWS Glue and Python to extract, transform, and load data from multiple sources to Snowflake
Implemented data orchestration using AWS Lambda functions and managed real-time data processing using AWS EMR
Used python Boto 3 to configure the services AWS glue, EC2, S3
Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations
Written manual SQL procedures for ETL and data cleansing tasks in ETL load to ODS
Created DataStage ETL coding/naming standard and designed DataStage Parallel jobs as is or forklifted them
Loaded the data into Spark RDD and performed in-memory computation to generate the output
Performed transformations and actions on the RDD to meet the business requirements
Involved in loading data from UNIX file system to HDFS
Used Spark-SQL to Load Parquet data and created Datasets defined by Case classes and managed Structured data using Spark SQL and finally stored into Hive tables for downstream consumption
Developed Data Pipelines using Apache Kafka to stream and process real time data
Migrated HiveQL queries into Spark SQL to improve performance
Extracted Real time feed using Spark streaming and converted to RDD and process data into Data Frame and load the data into HBase
Managed version control and deployment of Python-based ETL scripts in Git and Jenkins
Utilized Amazon S3 for data lake storage and AWS Redshift for certain reporting workloads
Designed and implemented data governance processes to ensure data quality and compliance
Conducted performance tuning of Spark jobs and AWS Glue scripts, reducing processing times by 20%
Automated data integration tasks using AWS Lambda, reducing manual intervention by 40%
Established real-time reporting systems, enabling executives to access up-to-date financial data and key performance indicators
Worked on migration of existing feed from Hive to Spark to reduce the latency of feeds in existing HiveQL
Delivered technical documentation for ETL processes and cloud-based architecture
Actively participated in sprint planning and agile ceremonies to ensure timely delivery of data engineering tasks
Coordinated the migration of Oracle databases to GCP CloudSQL for PostgreSQL.
Performed data copy from on-premises Oracle databases to GCP CloudSQL.
Configured CDC for primary schemas to ensure data consistency and real-time updates.
Conducted extensive query performance tuning to optimize database performance post-migration.
Collaborated with development teams to troubleshoot and resolve performance-related issues.
Refactored and rewrote SQL code to enhance performance and maintainability in PostgreSQL.
Utilized GCP tools(BigQuery) and services for efficient database management and monitoring.
Developed automated scripts for database backup and recovery processes.
Implemented security measures to protect sensitive data during migration.
Provided training and support to junior developers on best practices in database development and migration.
Analyzed and optimized hardware configurations to improve database performance.
Environment: ETL, SQL Developer, AWS, Redshift, Lambda, Glue,EMR, Spark, MySQL, SQLLoader, Data Migration Services (DMS), Oracle 12c/11g, PostgreSQL 11.5, 10.12, 9.5, DataStage, Hive, Unix, Linux, Windows, Google Cloud, Microsoft Visual Studio, MS Excel, Python3.8, Shell Scripting, PySpark, Spark3.0, SparkSQL, SparkDF/DS.
SR. DATA ENGINEER
UPMC (University of Pittsburgh Medical Centre) Pittsburgh, PA March 2022 - February 2024
Company Overview: [Remote]
Gathering, analyzing, and documenting the business requirements and business rules by directly working with Business Analysts
Involved in the entire SDLC (Software Development Life Cycle) process that includes implementation, testing, deployment, documentation, training, and maintenance
Generated on-demand and scheduled reports for business analysis or management decision using SQL Server Reporting Services
Designed and developed AWS-based data pipeline solutions using AWS Lambda and Glue to process large volumes of healthcare data
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena
Worked with Healthcare Data/Claims 835 and 837 formats for analytical purposes, X12 Enterprise Data Integration (EDI), PHI
Worked with X12 EDI standards for healthcare data, HEDIS, and HIPAA
Developed complex ETL Packages using SQL Server 2019 Integration Services to load data from various sources like SQL Server/DB2 to Staging Database and then to Data Warehouse
Built the entire infrastructure that is required for optimal ETL from a variety of data sources using AWS, mainly with Pyspark
Used Spark-SQL to process the data and to run on Spark engine
Used Spark for improving performance and optimization on existing algorithms using Spark-SQL
Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations using AWS Athena
Create external tables with partitions using AWS Athena and Redshift
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD
Develop framework for converting existing PowerCenter mappings and to PySpark (Python and Spark) Jobs
Create Pyspark frame to bring data from DB2 to Amazon S3
Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements
Created and enforced policies to achieve HIPAA compliance
Data mapping, migration, and conversion to Data Warehouse Platform
Defined and deployed monitoring, metrics, and logging systems on AWS
Logical and Physical data modeling using Erwin for data warehouse database in STAR SCHEMA
Developed end to end ETL pipeline using Spark-SQL, Python and Salesforce on Spark engine
Developed Spark jobs, clean data from various feeds to make it suitable for ingestion and analysis
Imported data from various sources into Spark RDD for analysis
Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
Created Python and UNIX shell scripts while interacting with different AWS services
Developed Complex mappings to feed Data warehouse & Data marts by extensively using Mapplets, and Transformations like Lookup, Filter, Router, Expression, Aggregator, Joiner, Stored Procedure and Update Strategy
Implemented Event Handlers, Package Configurations, Logging, System and User-defined Variables, Check Points and Expressions for SSIS/ ETL Packages
Created AWS S3 buckets, performed folder management in each bucket, managed cloud trail logs and objects within each bucket
Completed POC on usability of AWS using EC2 and S3 storage and lambda functions
Develop ETL processes for the data warehouse to clean and standardize data and metadata for loading into the Datawarehouse
Create external tables with partitions using Hive, AWS Athena, and Redshift
Optimized SQL queries using indexes and execution plans for maximum efficiency and performance
Automated mappings to run using UNIX shell scripts, which included Pre- and Post-session jobs and extracted data from Transaction System into Staging Area
Wrote scripts and indexing strategy for migration to AWS Redshift from SQL Server and MySQL databases
Modify SSIS ETL packages for changes business requirements to load data from various sources to SQL server tables in Data Warehouse
Performed Developer Testing, Functional testing, Unit testing and created Test Plans and Test Cases.
Environment: ETL, MS SQL Server 2019/2017/2016/2012/2008R2, MySQL, SSIS, Data Warehouse, SQL Server 2019/2017/2016/2014, Agile, PySpark, AWS Glue, AWS Batch AWS S3, Athena, Redshift, Python3.8, Shell Scripting, PySpark, Spark3.0, SparkSQL,SparkDF/DS.
SSIS/SQL/AZURE SQL DATA INTEGRATION ENGINEER
Gallagher Bassett Rolling Meadows, IL April 2018 - March 2022
Gathering, analyzing, and documenting the business requirements and business rules by directly working with end users/clients
Created and maintained documents like Documentation Roadmap, DATA Models, ETL Execution Plan, Configuration Management Procedures, and Project Plan
Designed and developed various SSIS packages (ETL) to extract and transform data and involved in scheduling and deploying SSIS Packages
Extended SSIS Capabilities by Creating Custom SSIS Components for SharePoint
Implemented various tasks and transformations for data cleansing and performance tuning of SSIS packages
Extracted and Loaded SharePoint Lists and Surveys Data into SQL Server Database with SSIS
Created complex Stored Procedures, Triggers, Functions (UDF), Indexes, Tables, Views and other T-SQL codes and SQL joins for SSIS packages
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB)
Extend SSIS capabilities by creating custom SSIS components for SharePoint
Configure SSIS packages with XML configuration file, environment variable, parent package variable and SQL Server table
Worked with SSIS Data flow task, Transformations, For-loop containers and Fuzzy Lookups, Data conversion and configured data in slowly changing dimensions
Create complex Stored Procedures, Triggers, Functions (UDF), Indexes, Tables, Views and other T-SQL code and SQL joins for SSIS packages and SSRS reports
Perform SSIS Development and support, developed ETL solutions for integrating data from multiple sources like Flat Files (delimited, fixed width), Excel, SQL Server, Raw File, and DB2 into the central OLTP database
Worked on creating dependencies of activities in Azure Data factory
Propagation of XML data through a policy engine to the database Master and Data Lineage database
Creating Stored Procedure and Scheduled them in Azure Environment
Monitoring Produced and Consumed Data Sets of ADF
Creating Data Factories in Azure Data factory
Build complex ETL jobs data transform data visually with data flows or by using compute services Azure Databricks, and Azure SQL Database
Creating Multiple Data Sets in Azure worked in migration of SQL Server 2008/2012/2014 to SQL Server 2017
Moving CSV files from Azure blob to Azure SQL Server
Analyzed existing databases, tables, and other objects to prepare to migrate to Azure Synapse
Worked on on-prem Datawarehouse migration to Azure Synapse using polybase and ADF
Performed SSIS Development and support, developed ETL solutions for integrating data from multiple sources like Flat Files (delimited, fixed width), Excel, SQL Server, Raw File, and DB2 into the central OLTP database
Migrating on-prem ETLs from MS SQL server to Azure Cloud using Azure Data Factory and Databricks
Involved in configuring Azure platform for data pipelines, ADF, Azure Blob Storage and Data Lakes and building workflows to automate data flow using ADF
Developed metadata management program with business and technical definitions, data lineage, life cycle, identify authoritative source
Analyzing the Data from different sources using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Synapse, Azure Data Lake Analytics, HDInsight, Hive, Sqoop
Experienced in forecasting the data space requirements using the features like Capacity planning, mapping source to target data using data lineage feature and calculating the impact analysis
Creating Azure Data factories for loading the data to Azure SQL database from Cosmos platform
Developed, documented Data Modeling standards to establish in the organization
Created and managed backend process to meet diverse business needs
Performed data loading, data import/export, data transmission
Involved in data modeling Star Schemas according to the Business requirements using Erwin 6.3.8
Worked extensively on system analysis, design, development, testing and implementation of projects
Assist with load testing activities by setting up database counters to monitor backend performance
Pulled data into MS SQL Server Database from different DBMS databases like Oracle and Teradata with ETL packages
Created and scheduled Various SQL Jobs, using SQL Server Agent to perform various administrative tasks
Environment: ETL, MS SQL Server 2017/2016/2012/2008, PDW, MS SQL Server Master Data Services, SQL Server Integration Services (SSIS), MS Visio, Azure Data Factory, Azure Databricks, MS Visual Studio/.NET 2008, TFS, VSTS, Azure Devops, Sharepoint.
SQL DATABASE DEVELOPER
Techno Vision Solutions Farmington Hills, MI January 2017 - April 2018
Created Complex ETL Packages using SSIS for extracting, cleaning, transforming, and loading data from staging tables to partitioned tables with incremental load
Developed, deployed, and monitored SSIS Packages
Data Automation Using SSIS to automate data processes for input of information
Developed SSIS packages to extract, transform and load data from Oracle and SQL Server databases into Data Warehouse
Deployed SSIS packages into various Environments (Dev, UAT and Prod) using Deployment Utility
Implemented data validation by using T-SQL queries to confirm that SSRS reports data results that are being returned are correct
Design Access databases for reporting and claims data analysis
Create tables, table updates and inserts to manipulate the data
Created packages using various control flow tasks such as Data Flow Task, Execute SQL Task, Execute Package Task, and File System Task
Designed and implemented multiple dashboards for internal metrics using Azure Synapse - PowerPivot & Power Query tools
Involved in creation of Data Warehouse database Physical Model, Logical Model using Erwin data modeling tool
Designed the Data warehouse and done the mappings from Source to the Target tables
Created Indexes for faster data retrieval from the database
Developed Spark applications in Databricks using PySpark and Spark SQL to perform transformations and aggregations on source data before loading it into Azure Synapse Analytics for reporting
Writing Stored Procedures and Functions for better performance and flexibility
Created SQL Server jobs and scheduled them to load data periodically using SQL server Agent
Strong skills in transformation data from DB2 to SQL Server
Developed Aggregations, partitions, and calculated members for cube as per business requirements
Defined appropriate measure groups and KPIs and deployed cubes
Created Parameterized, Crosstab, Drill Down and Summary reports by Using SSRS and created report snapshots to improve the performance of SSRS.
Environment: ETL, MS SQL Server 2012/2008, MS SQL Server Master Data Services 2012, SQL Server Integration Services (SSIS), SSRS, MS Visual Studio/.NET 2008, BI Extractor, Microsoft Excel, Add-In Excel with Master data.