AZURE DATA ENGINEER
Name: Vinaya Unnithan Email: ******************@*****.*** Cell: +1-309-***-****
PROFESSIONAL SUMMARY
To obtain a challenging Data Engineer position that leverages my 8+ years of experience in the software industry, including experience in Azure cloud services and Big Data technologies, and 5 years of experience in Data warehouse implementations.
Developed ELT, ETL pipelines to move data to and from Snowflake data store using combination of Python and Snowflake Snow SQL.
Developed ETL/ELT pipelines using Azure Data Factory (ADF) to orchestrate data integration from various sources, including MongoDB, Azure SQL, Snowflake, and on-premises databases.
Hands-on experience with data processing and transformation using PySpark in Azure Databricks.
Expertise in Azure DevOps, CI/CD pipelines, version control tools (GitHub, GitLab, Azure DevOps), and ARM templates for automated deployments.
Experience in designing and optimizing data lake solutions using Azure Data Lake Storage (ADLS) with effective partitioning and indexing.
Implemented real-time data ingestion and integration with MongoDB for event-driven architectures using Azure Event Hub.
Developed CI/CD pipelines using GitHub Actions and Azure DevOps to automate deployments, reducing manual interventions by 30%.
Worked extensively with Azure DevOps and GitHub Actions to implement CI/CD pipelines for both .NET applications and data pipelines.
Experience in building data pipelines using Azure Data Factory, Azure Databricks, and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Datawarehouse.
Expertise in developing Platform as a Service (PaaS) applications within the Microsoft Azure cloud environment.
Designed and implemented cloud-native solutions, leveraging Azure services to maximize scalability, reliability, and cost-efficiency.
Demonstrated proficiency in Azure services such as Azure App Service, Azure SQL Database, and Azure Functions to build fully managed applications.
Successfully delivered PaaS applications that align with industry best practices and Azure architectural guidelines.
Developing ETL transformations and validation using Spark-SQL,Spark Data Frames with Azure Databricks and Azure Data Factory.
Experience on Apache Kafka, Apache Nifi and Spark integration for real time data processing. Data Factory for creating data pipelines to orchestrate the data into SQL database.
Hands on experience in implementing data pipeline solutions using Hadoop, azure, ADF, Synapse, Spark, Map-Reduce, Hive, Tez, Python, Scala, Azure functions, Azure Logic apps, stream sets, Azure Data Lake Storage Gen2 and snowflake.
Strong expertise in optimizing Spark jobs and leveraging Azure Synapse Analytics for big data processing and analytics. Proven track record in performance optimization and capacity planning to ensure scalability and efficiency.
Experienced in developing CI/CD frameworks for data pipelines and collaborating with DevOps teams for automated pipeline deployment. Proficient in scripting languages such as Python and Scala.
Skilled in working with Hive, Spark SQL, Kafka, and Spark Streaming for ETL tasks and real-time data processing.
Implemented advanced data manipulation and analysis techniques using window functions, resulting in optimized and efficient data processing and analytics pipelines.
Using the Informatica Salesforce plugin, I facilitated the seamless transfer of data from an Oracle Exadata database to the Salesforce cloud environment.
TECHNICAL SKILLS
Azure Services
Azure data Factory, Azure Data Bricks,Azure data Explorer, snowflake, Logic Apps, Functional App, Snowflake, Azure DevOps, Delta lake,Data warehouse
Big Data Technologies
MapReduce, Hive, Teg, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper
Hadoop Distribution
Cloudera, Horton Works
Languages
SQL,Mongo DB PL/SQL, Python,.Net, HiveQL, Scala.
Web Technologies
HTML, CSS, JavaScript, XML, JSP, Restful, SOAP
Operating Systems
Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Build Automation tools
Ant, Maven
Version Control
GIT, GitHub.
IDE &Build Tools, Design
Eclipse, Visual Studio.
Databases
Mongo DB,MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB
EDUCATION
Bachelors in Computer Science and Engineering from Avanthi institute of engineering and technology, Hyderabad, India.
WORK EXPERIENCE
Professional Experience
United Health Group, Chicago, IL
Nov 2023 – Present
Data Engineer
Responsibilities:
Developed and orchestrated complex ETL pipelines using Azure Data Factory (ADF) to automate data integration workflows across various sources like ADLS, Snowflake, and on-prem databases.
Implemented scalable data engineering solutions on Databricks for data transformation, cleansing, and analytics using PySpark, optimizing performance for large datasets.
Integrated Delta Lake with Databricks to manage big data storage and enforce data versioning, ensuring consistency and reliability for analytics and machine learning workloads.
Designed and maintained data lakes on Azure Data Lake Storage (ADLS) with optimized partitioning strategies for improved query performance.
Utilized GitHub for version control and collaboration, ensuring code quality and continuous integration/continuous deployment (CI/CD) through automated pipelines.
Designed and implemented Collibra Data Catalog, enabling efficient metadata management and data asset discovery.
Developed ETL pipelines using SSIS, ADF, AWS Glue, and Python to extract, transform, and load data from diverse sources into Snowflake, Azure Data Lake, and AWS Redshift.
Implemented CI/CD pipelines using GitHub Actions and AWS CodePipeline to automate data pipeline deployments, reducing manual errors by 30%.
Developed ETL pipelines to extract, transform, and load data from Oracle ERP into enterprise data warehouses.
Integrated MongoDB with Azure Data Factory (ADF) to automate data ingestion workflows from multiple sources.
Developed ETL transformations using Spark-SQL and Spark Data Frames in Databricks, ensuring efficient data movement between MongoDB, Azure SQL, and Snowflake.
Optimized MongoDB performance by implementing indexing strategies and partitioning techniques.
Developed API integrations using .NET and Azure Functions to expose real-time data insights stored in MongoDB.
Created data governance workflows and business glossaries to enhance data quality and compliance.
Experience in data cataloging, data stewardship, and governance frameworks to ensure data quality and compliance.
Strong understanding of data governance principles related to Enterprise Resource Planning (ERP) platforms.
Experience working with business glossaries, data dictionaries, and governance workflows for structured data governance.
Environment: Azure Databricks SSIS, Data Factory,Azure data explorer,Delta lake,Functional App, Snowflake, MONGO db, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance Big quey,Data warehouse, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline,ADLS.
Johnson & Johnson, Jersey City, NJ
May 2022 – Sep 2023
Senior Azure Data Engineer
Responsibilities:
Integrated on-premises MySQL, Cassandra and cloud-based Blob storage, Azure SQL DB data using Azure Data Factory, applying transformations, and loading data into Snowflake. Created ETL transformations and validations using Spark-SQL, Spark Data Frames with Azure Databricks and Azure Data Factory.
Collaborated with Azure Logic Apps administrators to monitor and resolve issues related to process automation and data processing pipelines.
Optimized code for Azure Functions to extract, transform, and load data from diverse sources, including databases, APIs, and file systems.
Orchestrated seamless data movement into SQL databases using Data Factory's data pipelines.
Developed data warehousing techniques, data cleansing, Slowly Changing Dimension (SCD) handling, surrogate key assignment, and change data capture for Snowflake modelling.
Designed and implemented MongoDB-based data models for scalable data storage solutions.
Developed data pipelines in Azure Data Factory (ADF)to extract, transform, and load data from MongoDB, Azure SQL, and Snowflake.
Utilized MongoDB Atlas for cloud-based deployment and performance tuning.
Developed and optimized data ingestion processes using Azure Databricks and PySpark.
Integrated MongoDB with Apache Kafka for real-time data streaming and analytics.
Developed data warehousing techniques, Slowly Changing Dimensions (SCD) handling, and change data capture for Snowflake modeling.
Created CI/CD pipelines for .NET applications using Azure DevOps, automating deployment and testing processes.
Environment: Azure Databricks,.Net, Data Factory,Mongo DB, Azure data explorer,Delta lake, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance,Data warehouse, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.
PNC Financial Services, Pittsburgh, PA
Feb 2021 – Apr 2022
Senior Azure Data Engineer
Responsibilities:
MapReduce functionalities.
Developed a data pipeline using Kafka Enhanced Spark performance by optimizing data processing algorithms, leveraging techniques such as partitioning, caching, and broadcast variables.
Implemented efficient data integration solutions to seamlessly ingest and integrate data from diverse sources, including databases, APIs, and file systems, using tools like Apache Kafka, Apache NiFi, and Azure Data Factory.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and also done POC on Azure Data Bricks.
Experience in cost optimization strategies in GCP, including reserved slots, query tuning, and storage lifecycle policies.Skilled at integrating on-premises SQL Server databases with Azure services like Azure SQL Database and Azure Cosmos DB.
Perform ETL using Azure Data Bricks, Migrated on premise Oracle ETL process to azure synapse analytics.
Worked on Migrating SQL database to Azure data lake, Azure data lake analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse.
Utilized .NET best practices to optimize application performance, implementing efficient algorithms and coding patterns to achieve high-speed data processing and responsiveness.
Proficient in utilizing Azure Data Explorer (ADX) for advanced querying and analytics on large-scale datasets, enabling efficient data exploration and insights generation.
Designed and implemented robust data models and schemas to support efficient data storage, retrieval, and analysis using technologies like Apache Hive, Apache Parquet, or Snowflake.
Worked on RDD’s & Data frames (Spark SQL) using PySpark for analyzing and processing the data.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data.
Environment: Azure Databricks,Big Query.Net, Data Factory,Azure data explorer,Data warehouse,Delta lake Logic Apps, Functional App, Snowflake, Mongo DB, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.
Humana, Charlotte, NC
Dec 2019 – Jan 2021
Data Engineer
Responsibilities:
Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing and reporting of voluminous, rapidly changing data.
Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.
Worked on creating tabular models on Azure analytic services for meeting business reporting requirements.
Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in Azure Databricks.
Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
Capable of integrating APIs with various Azure services and data sources, ensuring seamless data exchange.
Experienced in defining custom policies in Azure API Management to implement advanced API behavior, such as request/response transformation.
Designed and set up an enterprise data lake while developing .NET-based applications for data ingestion and transformation.
Developed .NET applications to monitor data pipelines, providing real-time error tracking and resolution.
Integrated .NET applications with Azure Data Explorer for real-time log analysis and monitoring.
and real-time data processing.
Developed a reusable frame worked for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects.
Importing & exporting databases using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages).
Environment: Azure, Azure Data Factory,Mongo DB,Delta lake,Azure data explorer,Data warehouse, Databricks, PySpark, Python, Apache Spark, HBase, HIVE, SQOOP, Snowflake, Python, SSRS, Tableau.
eClinicalWorks India - Mumbai, Maharashtra
Aug 2016 – Sep 2019
Data Engineer
Responsibilities:
Designed and developed applications on the data lake to transform the data according business users to perform analytics.
In-depth understanding/ knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts.
Capable of implementing security measures and compliance standards for on-premises databases hosted in Azure.
Experienced in monitoring the health and performance of on-premises databases in Azure and performing routine maintenance tasks.
Built Azure Data Factory (ADF) pipelines to move data between MongoDB, Azure Data Lake, and Snowflake.
Developed NoSQL-based applications leveraging MongoDB for flexible data storage.
Used MongoDB’s aggregation framework to perform complex queries and transformations.
Optimized MongoDB queries for high-performance analytics and reporting.
Integrated MongoDB with Power BI for interactive dashboards and business intelligence solutions.
Utilized Azure services for real-time data processing and analysis, enabling timely insights and informed decision-making.
Implemented Azure security and compliance measures, ensuring data privacy and governance in cloud-based environments.
Collaborated on Azure-based projects, optimizing resource allocation and cost management for efficient cloud operations.
Environment: Azure,Cloudera CDH 3/4, Azure Databricks, Data warehouse,Hadoop,Python, HDFS, MapReduce, Hive, Oozie, Pig, Shell Scripting, MySQL.
Sandhata Technologies, Pune
Jun 2014- Jul 2016
Data Engineer
Responsibilities:
Create and maintain a database for Server Inventory, Performance Inventory.
Worked in Agile Scrum Methodology with daily stand-up meetings, great knowledge of working with Visual SourceSafe for Visual Studio 2010 and tracking the projects using Trello.
Skilled at configuring backup and restore strategies for on-premises databases using Azure Backup services.
Knowledgeable about optimizing costs associated with running on-premises databases in Azure, including resource scaling and utilization.Proficient in Microsoft Azure cloud services, adept at designing and deploying data solutions using Azure's wide array of services.
Utilized Databricks notebooks to explore and visualize data, facilitating exploratory data analysis and model development.
Involved in creating SSIS jobs to automate the reports generation, cube refresh packages.
Great Expertise in Deploying SSIS Package to Production and used different types of Package configurations to export various package properties to make package environment independent.
Environment: Windows server,Python,Azure, MS SQL Server 2014, Azure Databricks, SSIS, SSAS, SSRS, SQL Profiler, Power BI, C#, Performance Point Server, MS Office, SharePoint