Krishnaveni Yennawar
E-mail: ***********.***@*****.*** Visa: H4 EAD Location: Charlotte, NC Cell: 980-***-****
LinkedIn: www.linkedin.com/in/krishnaveni-bkd
Azure Data Engineer
EXECUTIVE SUMMARY
11+ years of experience as a Azure Data Engineer and extensively worked with designing, developing, and implementing Big Data Applications using Microsoft Azure Cloud, AWS, and big data technologies like Apache Hive, Apache Spark, PySpark, & Spark SQL.
Experienced in building ETL pipelines in Azure Synapse Analytics, Data Factory, taking data from Azure Data Lake/Databases and load it into star/snowflake schema in Azure Synapse Dedicated& Serverless SQL Pools, Delta Lake.
Independently lead projects through design, implementation, automation, and maintenance of large-scale enterprise ETL processes for a global client base.
Experience working on Azure Services like Data Lake, Data Lake Analytics, SQL Database, Synapse, Data Bricks, Data factory, Logic Apps and SQL Data warehouse and GCP services Like Big Query, Dataproc, Pub sub etc.
Involved end-to-end in all the projects handled during overall tenure including Agile process, Requirements Gathering, Dimensional model Design, Data warehousing, Database modelling, Documentation, Packages and BI Reports Development, Implementation, Acceptance Testing and Production Support.
Experience in creating Spark applications on Databricks using pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Technical expertise in Microsoft technologies, such as .NET...ASP.NET, ASP.NET MVC 3.0/4.5, Web services, Web API, LINQ, SOAP, XML, HTML, Java Script, Visual Studio, AJAX, Ado.Net.
Proficient in utilizing data engineering services such as Azure Blob Storage, Azure Functions Azure Data Factory, Apache Airflow, Key Vaults, Apache Kafka, Azure Synapse Analytics, Snowflake and Azure Databricks.
Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.
Experience working in SAS Grid environments (High Performance Analytics Suite) and SAS Unix for automation.
Worked extensively in Data warehouse projects using Azure Data factory (ADF), SQL Server Integrated Services (SSIS) involving Extraction, Transformations and Loading data from source/target on multi-servers and heterogonous sources.
Expert in creating SSIS packages to extract and upload applying various transformations, Error Handling using Event Handler, package Deployment, maintaining Error log Audit information in SQL tables, create SQL Jobs to schedule SSIS packages, Add Notifications for any job failures.
Very good experience in data warehouse services such as Azure Synapse Analytics, Azure SQL Database, Snowflake, TeraData. Strong experience in migrating other databases to Snowflake.
Experience in data mining, including predictive behavior analysis, Optimization and Customer Segmentation analysis using SAS and SQL. Experience in using various version control systems like Git, GitHub.
Skilled with Big Data - Hadoop (MapReduce & Hive), Spark (SQL, Streaming), Azure Cosmos DB, SQL Data warehouse, Azure DMS, Azure Data Factory, AWS Redshift, Athena, Lambda, Step Function and SQL.
Expertise in developing Reports using PowerBI, SQL Server Reporting Services (SSRS) and MS SQL Server using Stored Procedures, Expressions, Matrix, Tabular, Groups, Functions, Expressions with nested Functions, Charts and other report items.
Skilled in Terraform in managing resource scheduling, disposable environments and multitier application. Extensive work experience in ETL processes consisting of data sourcing, data transformation.
Deploy and manage security permissions for SSRS Reports on Report Server, use Data Driven Subscriptions/ Schedule SSRS reports on e-mail, invoking deployed SSRS reports from .NET applications using Report Viewer and working with Parameterized, Cascading parameters, Sub-Reports, Drill down, Drill through SSRS Reports.
Proficient in writing complex SQL, T-SQL queries and creating SQL objects like Tables, Views, Indexes, Stored Procedures, Triggers and functions on databases Azure SQL Server 2016, Oracle.
Experienced in development custom software development experience using Python, SQL and various Microsoft Technologies like C#.NET. Experience in data cleansing with MDM and Data Profiling.
Strong Experience in Data Warehousing concepts and understanding of SSAS - Knowledge of Dimensions, Attributes, Hierarchy, Cubes, DAX queries, KPIs, Calculated Members, Partitions in SQL Server Analysis Services (SSAS).
Proactive approach, strong Analysis and Troubleshooting skills, good Presentation skills and Team collaboration have aided in receiving fabulous Client Appreciations and Rewards. Owns strong capability to quickly adapt to new applications and platforms.
Strong Project management skills, curious to provide Training and assistance to junior developers.
Technical Skills
Cloud Data Engineering
Azure Synapse analytics, Data factory, Data Lake, Deltalake, Azure Data Bricks, Pyspark notebooks
Business Intelligence Tool (ETL, Reporting)
Power BI, MS BI (SSIS, SSRS, SSAS), Tableau
Programming
Pyspark, Python, T-SQL, SQL, MDX, DAX
Databases
Azure SQL, Dedicated SQL pool, Serverless Pools, Teradata, MS SQL Server, Oracle, NoSQL
Technology & Tools
SSMS, BIDS, SSDT, ADO.NET
Operating System
Microsoft Windows 10, 11
Professional Experience
Senior Azure Data Engineer
Regions Bank, USA Nov 2021 to Present
Project description:
This Project DW BI Reporting helping Asset managers, Management for better implementation of business strategies and plan and drive corporate performance
Responsibilities:
Designed dimensional model and built pipelines in Azure Data Factory to move Bigdata/large datasets from on-prem to Azure cloud SQL Datawarehouse model or Azure Data Lake/Blob storage.
Developed Pyspark scripts in Data bricks for data cleaning, analysis and performed schema validation and data profiling.
Build Databricks notebooks for transforming data and load and read them back as Parquet from Azure Data Lake.
Involved in creation/review of functional requirement specifications and supporting documents for business systems, experience in database design process and data modeling process.
Designed and implemented robust ETL pipelines leveraging Delta Lake on Databricks, ensuring high-performance data processing and enabling ACID transactions for reliable data consistency.
Utilized Delta Lake’s schema evolution and data versioning features to handle dynamic data structures seamlessly, supporting both batch and streaming data ingestion for scalable solutions.
Utilize Azure Synapse Analytics to streamline ETL workflows for ingesting and transforming banking data from multiple sources (e.g., transactional systems, customer profiles, and external feeds). Leverage Spark and T-SQL pools to ensure efficient data processing.
Responsible for bridging technology and business and to drive the design strategy for data sourcing.
Worked on multiple Data Marts in Enterprise Data Warehouse Project (EDW) and involved in designing OLAP data models. Create the packages as per the business need to pull the data from MDM.
Implement partitioning, materialized views, and workload management strategies to enhance query performance for large-scale banking datasets. Optimize Synapse Dedicated SQL pools and Serverless pools to balance cost and performance, ensuring real-time analytics for client behavior analysis and risk assessment.
Design and maintain a scalable Synapse-based data warehouse with proper data governance, RBAC, and data masking to comply with financial regulations (e.g., GDPR, PCI DSS). Integrate with Azure Purview and Defender for Cloud to enhance security and lineage tracking for sensitive banking data.
Implemented Delta Lake optimizations like data compaction, indexing, and Z-Order clustering to enhance query performance and reduce latency for analytical workloads.
Experience in connecting to SAS interface to Hadoop and querying results to SAS BI client tools.
Description of End-to-end development of Actimize models for trading compliance solutions of the project bank.
Developed data processing pipelines in Azure Data Factory such as reading data from external sources, merging the obtained data, performing data enrichment, and loading into data warehouses.
Using Power BI developed various types of reports like List Reports, dashboard Reports, Charts & Sub reports, Drill down reports and Linked Reports.
Working in Teradata from developing the ETL with Complex tuned queries including analytical functions and BTEQ scripts. Using GitHub version control tool to coordinate team-development
Worked on the data analysis and fixing the data for over 35 plus key indicator fields in the MDM.
Developed an optimized and efficient code base for the User Interface servicing fixed income traders for a leading asset management system.
Experienced in data access layer development using Entity Framework, Entity Framework Core, and ADO.NET.
Used stored procedure, lookup, executes pipeline, data flow, copy data, azure function features in ADF.
Led the migration of legacy banking data systems to Azure, leveraging Data Factory for data ingestion and Azure SQL Database for centralized data management.
Used various techniques from SAS/Base, SAS/Advanced for descriptive analytics and statistical analysis using SAS/STAT. Used PowerShell for DevOps in Windows-based systems
Responsible for Code standardization, optimization and performance techniques which will help in improving the performance of ETL Data factory pipelines and Synapse pipelines.
Worked on different levels of ETL Loads like Extract Data from Source System to Staging and then from Staging to load data into Production Data Base Tables.
Migrated on-premises Teradata database, objects to the Azure Serverless, Dedicated, Azure SQL databases and migrated SSIS and legacy ETL packages to the Azure Synapse and Data factory cloud.
Defect Analysis and Fixing as per requirements, Testing and Documentation for the ETL changes for each Cloud migration release. Creation of Analysis and Test scripts to verify the data on newly migrated database.
Created and monitored SQL, Azure ETL Jobs in SQL Server agent and resolved issues in failed jobs and report manager reports. Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL.
Automated all teh jobs for extracting teh data from different Data Sources like MySQL to pushing teh result set data to Hadoop Distributed File System, Cloudera.
Created and delivered detailed technical presentations in PowerPoint to communicate complex data engineering concepts, project progress, and results to cross-functional teams and senior management.
Utilized advanced features such as custom animations, interactive elements, and data-driven charts to present intricate data models and analytics outcomes effectively.
Developed self-service Power BI reports and used report Caching, Snap shots, Processing options in Report Manager for report performance. Created reports using SQL, MS Excel, UNIX, Shell scripting.
Implemented Actimize Anti-Money Laundering (AML) system to monitor suspicious transactions and enhance regulatory compliance. Created subscriptions and data-Driven Subscriptions for Report scheduling.
Experience in ADO.NET controls such as Grid View, Data List, Form view and Repeater.
Performed data quality issue analysis using Snow SQL by building analytical warehouses on Snowflake. Helped individual teams to set up their repositories in bit bucket and maintain their code and help them setting up jobs which can make use of CI/CD environment. To meet specific business requirements wrote UDF's in Scala and PySpark. Analyzed large data sets using Hive queries for Structure.
Extensively Performed data validations using Shell scripting, Data integrity before delivering data to operations
Extracted data from SQL Server, API’s, un-structured, JSON files, XML, txt, csv flat files and staged into staging area and applied business logic to load them in the analytics SQL Server database and then processed to SSAS Cube.
Designed the Compliance Data Warehouse using Power Designer- Conceptual, Logical and Physical based on subject area - Equity, Fixed Income, Listed Derivatives, Index, Equity Transactions, Cost Centre, HR Employee data, Asset management, and Fix protocol. Define virtual warehouse sizing for Snowflake for different type of workloads.
Sr. Business Intelligence/Data Engineer
Yash Technologies. (Smith & Nephew, USA) May 2019 to Oct 2021
Project description:
Smith& Nephew is a leading national provider of technology-enabled healthcare services designed to help physicians and hospitals better engage patients throughout the entire healthcare continuum. The BI suite of solutions includes a range of patient access & communications, revenue cycle management, and consulting & analytics services, including billing, coding, patient balances, eligibility & enrollment, third party liability, and mobile-first engagement and communication software for patients and providers
Responsibilities:
Develop dynamic Synapse, Azure Data Factory pipelines using parameters and trigger them as desired using events like file availability on Blob Storage, based on schedule and via Logic Apps.
Utilize Azure’s ETL, Azure Data Factory (ADF) services to ingest data from legacy disparate data stores – SAP (Hana), SFTP servers and Hadoop’s HDFS to Azure Data Lake Storage (Gen2)
Converted SAS programs to python based for efficiency and utilizing the packages
Developed SSIS packages Using Control Flow Tasks like For Loop Container, For Each Loop Container, Sequential Container, Execute Process Task, Execute Package Task, File System Task, Execute SQL Task and Data Flow Tasks like Derived column, Condition split, Multi-cast, Merge
Creating end to end Spark applications using Scala to perform various data cleansing, Validation, transformation according to the requirement. Define virtual warehouse sizing for Snowflake for different type of workloads
Designed and developed SSIS Packages to import and export data from MS Excel, SQL Server 2012 and Flat files.
Unit tested the data between Redshift and Snowflake.
Developed data warehouse model in snowflake for over 100 datasets using where cape. Creating Reports in Looker based on Snowflake Connections. GitHub/Jenkins are used for auto code build and deployments.
Worked on batch performance through data processing using Perl, Shell scripting and UNIX.
Lead the data analytics team for infrastructure analytics program. Responsible for data delivery from data sourcing
Written Templates for Azure Infrastructure as code using Terraform to build staging and production environments
Created and maintained various DevOps related tools for the team such as provisioning scripts
Worked in Transformation and Loading of data from multiple sources such as Oracle, MS SQL Server, Legacy systems, Flat Files and XML Files into EDW (Enterprise Data Warehouse) systems.
Using Power BI developed various types of reports like tabular Reports, dashboard Reports, Charts & Sub reports, Drill down, Drill-through reports and Linked Reports.
Developed Pyspark applications in Azure Databricks data extraction, transformation and aggregation from multiple file formats & transformed the data to uncover business insights.
Worked on different levels of ETL Loads like Extract Data from Source System to Staging and then from Staging to load data into Production Data Base Tables.
Developed comprehensive data models and reports in Excel, utilizing advanced functions, PivotTables, and Power Query to support business decision-making.
Used report Caching, Snap shots, Processing options in Report Manager for report performance.
Extracted data from SQL server, txt, and csv flat files and staged into staging area and applied business logic to load them in the analytics SQL Server database and then processed to SSAS Cube.
Created subscriptions and data-Driven Subscriptions for Report scheduling. Drove the project from the data modeling role by modeling enterprise wide conceptual
Wrote complex DAX queries to extract data from cube for the Power BI reports generation. Use various Teradata load utilities for data load and UNIX shell scripting for file validation process
Train and guide new team members in Database/MS BI technologies and Project deliverables.
Data Engineer
SEI Investments, Oaks, PA May 2016 to Apr 2019
Project description:
SEI, is a global financial services firm, as a pivotal Data Engineer, spearheading the design and implementation of sophisticated data solutions to manage extensive volumes of both structured and unstructured data. My role encompassed a broad spectrum of responsibilities aimed at ensuring the seamless flow of data and its transformation into actionable insights.
Responsibilities:
Designed and implemented scalable data pipelines using Python and PySpark to process and transform large volumes of structured and unstructured data.
Designed and implemented data models in SQL-based databases, including Snowflake, PostgreSQL, and Oracle.
Developed serverless architectures using AWS Lambda and Step Functions for real-time data processing and event-driven workflows.
Identified and documented Functional/Non-Functional and other business related decisions for implementing Actimize-SAM to comply with AML Regulations.
Redesigned the Views in snowflake to increase the performance.
Develop stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.
Orchestrated automated data pipelines using Apache Airflow to schedule, monitor and manage ETL/ELT processes across various platforms.
Utilized Snowflake as a cloud-based data warehousing solution, managing large-scale data sets and performing complex data manipulations.
Implemented a distributed log processing system using Apache Spark, enabling real-time analysis and monitoring of application logs
Collaborated with data analysts, data scientists, and business stakeholders to understand data requirements and provide optimized solutions.
Implemented ETL/ELT processes and data transformations using AWS Glue and DBT ensuring high- quality data for downstream analytics and reporting.
Data Engineer
UHG, India Aug 2011 to Oct 2014
Project description:
UHG provides resources that combine connectivity with intelligence to major participants in the health industry – enabling them to make better business and health care. This Project DWBI helping management for better implementation of business strategies and plan and drive corporate performance
.
Responsibilities:
Developed and maintained PL/SQL packages, triggers, and data migration scripts, ensuring seamless database operations and system functionality.
Designed and implemented high availability and fault-tolerant solutions, such as database replication and clustering strategies, ensuring data reliability
Implemented classification using supervised algorithms like Logistic Regression, Decision trees, KNN, and Naive Bayes. Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances.
Used apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators and python callable and branching operators.
Used R to manipulate data for data loads, extracts, statistical analysis, modeling, and data munging.
Created instances in AWS as well as migrated data to AWS from data Center using AWS migration services including Kinesis firehose, AWS Snowball, S3 Transfer acceleration, etc.
Working on Multiple AWS instances, set the security groups, Elastic Load Balancer and AMIs.
Created Macros, to generate reports daily, monthly basis and moving files from Test to Production.
Database Data including CRUD (Create, Read, Update, Delete), Filtering, Sorting, etc.
Education:
Bachelors- 2008, from Osmania University
Masters from Osmania University, 2011
Certifications:
Microsoft Certified: Azure Data Engineer (DP-203)