Azure Data Cloud

Location:

McKinney, TX

Posted:

January 10, 2024

Contact this candidate

Resume:

SUMEDH BANGARU

Microsoft-certified Azure Data Engineer

Phone: 940-***-**** E-Mail: **********@*****.***

PROFESSIONAL SUMMARY

11+ years of experience in ETL Development, Data warehouse, Azure with Snowflake, and scalable data ingestion pipelines. Skilled in Azure Data Factory architecture, enabling seamless integration between on-premises and Azure Cloud using Python, PySpark, and Microsoft Azure Cloud services.

Hands-on experience in working with Azure Cloud and its components like Azure Data Factory, Azure Data Lake Gen2, Azure Blob Storage, Azure Databricks, Azure Synapse Analytics, Logic Apps, Function apps, Azure Key Vault.

Experience in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP Cubes, Star Schema and Snowflake Schema.

Proficient in managing and configuring Azure Blob Storage, File Storage, Queue Storage, and Table Storage.

Skilled in developing robust Data Lake data ingestion pipelines, performing data extraction, transformation, and loading (ETL) processes to ensure data quality and availability.

Implemented data ingestion pipelines using Azure Synapse Analytics to efficiently extract, transform, and load (ETL) large volumes of structured and unstructured data into the data warehouse.

Collaborated with data scientists and analysts to deploy machine learning models within Azure Synapse Analytics, enabling predictive analytics and automated decision-making based on historical and real-time data.

Hands-on experience in working on real-time data processing solutions using Azure Synapse Analytics, leveraging its capabilities to handle streaming data and perform near real-time analytics on high-velocity data streams.

Proficient at using Databricks notebooks for data exploration with Pyspark/Scala, scripting using Python/SQL, and deploying APIs for the analytics team.

Developed data processing workflows using Azure Databricks, leveraging Spark for distributed data processing and transformation tasks.

Designed and implemented scalable and automated data integration workflows using Azure Logic Apps, enabling seamless data transfer and synchronization between various systems, applications, and data sources.

Proficient in working with Hadoop ecosystem technologies such as HDFS, MapReduce, YARN, Sqoop, Cassandra, Pig, Kafka, Zookeeper, and Hive.

Expertise in large-scale data processing, machine learning, and real-time analytics using Apache Spark.

Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.

Strong expertise in loading unstructured and semi-structured data into Hadoop clusters coming from different sources using Flume.

Performed complex data workflows using Apache Oozie for efficient data processing and workflow automation.

Strong understanding of developing MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.

Demonstrated ability to troubleshoot issues, diagnose bottlenecks, and provide support for Delta Lake implementations on Azure, ensuring the reliability and availability of data lakes.

Utilized Azure Delta Lake to establish a robust and efficient data lake architecture, ensuring data integrity, reliability, and optimal analytics performance.

Skilled in implementing data quality checks and governance policies within Delta Lake to maintain data accuracy, enforce compliance, and facilitate data discovery and lineage tracking.

Participated in the development, improvement, and maintenance of Snowflake database applications.

Strong expertise in optimizing Spark jobs and leveraging Azure Synapse Analytics for big data processing and analytics.

Proven track record in performance optimization and capacity planning to ensure scalability and efficiency.

Implemented CI/CD pipelines using Azure DevOps to streamline data engineering processes and ensure efficient and reliable delivery of data solutions.

TECHNICAL SKILLS

Azure Services

Azure data Factory, Azure Data Bricks, Logic Apps, Function Apps, Snowflake, Azure DevOps

Big Data Technologies

MapReduce, Hive, Tez, Python, PySpark, Scala, Kafka, Spark, Oozie, Sqoop, Zookeeper, Cassandra, Flume, Pig

Hadoop Distribution

Cloudera, Horton Works

Languages

SQL, PL/SQL, Python, HiveQL, Scala, U-SQL, and NoSQL.

Web Technologies

HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Build Automation tools

Ant, Maven, PowerShell scripts

Version Control

GIT, GitHub.

IDE &Build Tools, Design

Eclipse, Visual Studio.

Databases

MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB, MongoDB, K-12, Milvus Vector DB

WORK EXPERIENCE

Sr.Azure Databricks Engineer (ETL) Mar 2022 – Till Now

UHG (Optum), TX

Responsibilities:

Designed and implemented scalable data ingestion pipelines using Azure Data Factory, ingesting data from various sources such as SQL databases, CSV files, and REST APIs.

Designed and developed an enterprise data catalog leveraging Unity Catalog on Databricks to centralize metadata and provide a business glossary.

Managed and optimized OLTP systems to ensure real-time, high-speed processing of transactional data, enhancing the efficiency and responsiveness of critical business operations.

Designed and implemented Delta Live Tables (DLTs) within the Snowflake cloud data platform to enable efficient enterprise data pipelines.

Proficient in designing, implementing, and managing complex data workflows and pipelines using Apache Airflow, automating end-to-end data processing, and orchestrating tasks with dependencies.

Experienced with popular RDBMS platforms, such as MySQL, PostgreSQL, Oracle, and Microsoft SQL Server, enabling effective database management and development.

Proficient in using PySpark, a Python library for distributed data processing, to perform data transformations, aggregations, and analytics on large-scale datasets.

Proficient in writing and optimizing queries using Hive Query Language (HQL) for efficient data retrieval and analysis.

Advanced proficiency in Python programming with a focus on data analysis, manipulation, and visualization, leveraging the extensive capabilities of libraries such as NumPy and Pandas for efficient and scalable solutions.

Proficient in designing, implementing, and maintaining RDBMS solutions, including data modeling, schema design, and database optimization, ensuring efficient data storage and retrieval.

Managed and maintained relational databases using SQL Server Management Studio (SSMS), performing tasks such as database creation, backup and recovery, and user access management.

Proficient in Azure Data Lake Storage Gen2 (ADLS Gen2), designing and implementing scalable data solutions, optimizing performance, and ensuring data integrity for efficient data processing and analysis.

Established and maintained comprehensive data governance practices within Databricks, ensuring data quality, integrity, and compliance with regulatory standards.

Implemented data governance practices and data quality checks using Azure Data Factory and Snowflake, ensuring data accuracy and consistency.

Implemented OLAP systems to provide multidimensional data analysis capabilities, enabling users to explore and analyze complex datasets for strategic decision-making.

Designed and implemented CI/CD pipelines using Azure DevOps, automating the build, test, and deployment processes for applications, resulting in improved release efficiency and software quality.

Designed and developed reports using SQL Server Reporting Services (SSRS), transforming raw data into visually appealing and actionable insights.

Proficient in implementing and managing Azure Event Hubs for real-time event streaming and data ingestion in cloud-based solutions, ensuring scalability and reliability

Proficient in utilizing Matillion ETL for cloud-based data integration, transformation, and loading, to streamline data workflows and enhance data analytics capabilities.

Designed and implemented robust ETL (Extract, Transform, Load) processes using SSIS, facilitating seamless data integration across diverse sources and destinations.

Proficient in Power BI for data visualization and business intelligence, leveraging its capabilities to create interactive dashboards and reports that drive data-driven decision-making

Demonstrated expertise in T-SQL for designing and implementing database schemas, including tables, views, and indexes, to support data integrity and meet specific application requirements.

Managed and implemented platform-as-a-service (PaaS) solutions, leveraging cloud technologies to enhance scalability, reliability, and flexibility in healthcare IT systems

Developed and optimized Spark jobs to perform data transformations, aggregations, and machine learning tasks on big data sets.

Integrated third-party tools and services, such as CI/CD pipelines, code analysis tools, and issue trackers, with GitHub to enhance the development and release processes

Implemented Azure Blob Storage security measures, including access control policies and encryption, to ensure data integrity and confidentiality.

Utilized Azure Cosmos DB to create a globally distributed and highly responsive NoSQL database solution, ensuring seamless data access and low-latency performance for healthcare applications.

Environment: Azure Databricks, Data Factory, Snowflake, Logic Apps, Function App, Snowflake, MS SQL, Oracle, Spark, Hive, SQL, Python, Scala, PySpark, PowerBI, Powershell

Azure Snowflake Data Engineer Nov 2020 – Feb 2022

American Express, AZ

Responsibilities:

Implemented end-to-end data pipelines using Azure Data Factory to extract, transform, and load (ETL) data from diverse sources into Snowflake.

Utilized Azure Policy and Compliance Center to enforce and monitor regulatory compliance, ensuring that data handling practices align with HIPAA standards.

Designed and implemented data processing workflows using Azure Databricks, leveraging Spark for large-scale data transformations.

Extensive experience in leveraging Hive for big data processing and analysis within Hadoop ecosystems, handling large datasets with scalability and performance considerations.

Leveraged Matillion's capabilities to seamlessly integrate data with popular cloud platform such as Azure ensuring data consistency and reliability

Developed OLAP cubes using technologies such as Microsoft SQL Server Analysis Services (SSAS) or other OLAP engines, creating a foundation for interactive and dynamic data analysis.

Successfully implemented and managed partitioning strategies in Azure Cosmos DB to distribute data efficiently across multiple regions, ensuring horizontal scalability and improved performance.

Demonstrated proficiency in designing, implementing, and optimizing data integration solutions using Talend, including ETL (Extract, Transform, Load) processes.

Optimized performance and reliability of data workflows by fine-tuning SSIS packages, ensuring efficient data movement and transformation within the SQL Server environment.

Implemented and maintained HIPAA-compliant data solutions on Azure, ensuring the confidentiality, integrity, and availability of healthcare data.

Implemented OLTP system and maintained database structures, ensuring data integrity and reliability for day-to-day transactional activities, while actively participating in troubleshooting and resolution of performance issues.

Proficient in healthcare data exchange standards including X12, HL7, and FHIR for streamlined interoperability and data integration in healthcare systems.

Designed and implemented data integration pipelines to collect, cleanse, and transform diverse healthcare data sources, such as Electronic Health Records (EHRs), claims data, and medical devices

Implemented security measures for DLT networks, including encryption, key management, and access controls, ensuring the integrity and confidentiality of distributed ledger data.

Skilled in SnowSQL to develop and maintain data workflows, ensuring data integrity and accessibility for informed decision-making

Implemented product navigation, search functionalities, and interactive features within the Unity catalog, enhancing user engagement and satisfaction.

Demonstrated expertise in designing and maintaining Snowflake data warehouses, implementing data security best practices, and collaborating with cross-functional teams to ensure seamless data integration, storage, and retrieval for organizational needs.

Integrated ADF with Azure Logic Apps for orchestrating complex data workflows and triggering actions based on specific events.

Designed and implemented security measures in Azure data solutions to meet HIPAA requirements, including encryption, access controls, and audit trails.

Successfully integrated Fast Healthcare Interoperability Resources (FHIR) standards into Azure data pipelines for seamless healthcare data exchange and interoperability.

Implemented secure mechanisms for handling and processing Protected Health Information (PHI) in compliance with HIPAA regulations within Azure data environments.

Adept at simplifying ETL pipelines by incorporating Snowpipe, reducing manual intervention and enhancing data integration efficiency, thereby contributing to data-driven decision-making within the organization.

Fine-tuned Snowflake data warehouses to optimize query performance and reduce costs, delivering faster access to critical healthcare data.

Collaborated closely with medical professionals, data scientists, and analysts to understand their data requirements and deliver tailored solutions using Snowflake.

Implemented Delta Lake architecture to enable robust data management, reliability, and high-performance analytics within a Microsoft Azure environment.

Integrated Snowflake with Power BI and Azure Analysis Services for creating interactive dashboards and reports, enabling self-service analytics for business users.

Environment: Azure Databricks, Data Factory, Logic Apps, Snowflake, Functional App, Snowflake, MS SQL, Oracle, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Tableau, shell scripting, Kafka

Azure Data Engineer Aug 2017 – Oct 2020

Tufts Health Plan, CA

Responsibilities:

Proficient in deploying and managing Azure VMs for scalable and flexible computing solutions.

Extensive experience with Azure Blob Storage for efficient and secure data storage, retrieval, and management.

Implemented Azure AD for seamless user authentication, authorization, and identity management in cloud applications.

Extensive experience integrating Apache Airflow with various data technologies and services, including databases, cloud platforms, and external APIs, to facilitate seamless data movement and processing.

Implemented ETL processes using Hive for seamless data integration, transformation, and loading, ensuring data quality and reliability for downstream analytics and reporting.

Developed and deployed web applications using Azure App Services, ensuring high availability and scalability.

Proficient in designing and implementing scalable and flexible data models using Azure Cosmos DB, accommodating diverse data types and optimizing for high-performance NoSQL storage.

Extensive experience in data manipulation and analysis using Pandas, including data cleaning, transformation, and exploration.

Proficient in writing and optimizing complex T-SQL queries for efficient data retrieval, aggregation, and manipulation, ensuring optimal database performance.

Managed Azure SQL Database for robust and scalable relational database solutions, optimizing performance and security.

Collaborated with BI developers and analysts to create compelling dashboards and reports that leverage the power of OLAP for insightful data presentation.

Implemented data imputation techniques as part of the data wrangling process, systematically addressing missing values to improve overall dataset completeness.

Developed and optimized ETL workflows to handle data transformations, cleansing, and loading into the star schema, ensuring data accuracy and consistency.

Applied functional programming principles in Scala to write modular and reusable code, emphasizing immutability and higher-order functions.

Configured and maintained Azure networking components, including Virtual Networks, Subnets, and Network Security Groups for secure communication.

Utilized Azure DevOps for continuous integration and continuous deployment (CI/CD) pipelines, automating software delivery processes.

Designed serverless applications using Azure Functions for efficient execution of event-triggered code without the need for infrastructure management.

Implemented Azure Monitor and Azure Log Analytics for real-time insights into application performance, security, and operational health.

Designed and implemented dynamic and reusable workflows in Airflow, integrating with various data sources, processing engines, and external systems to ensure efficient data orchestration and pipeline automation.

Experienced in deploying SSIS packages, managing configurations, and scheduling package execution, ensuring timely and reliable data integration across the organization.

Environment: Azure Databricks, Data Factory, Logic Apps, Snowflake, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Tableau, shell scripting, Kafka

Big Data Engineer (Hadoop Developer) Apr 2015 - Jul 2017

Physicians Mutual, TX

Responsibilities:

Designed and implemented scalable and efficient data processing pipelines using technologies such as Apache Hadoop and Apache Spark.

Demonstrated proficiency in the Hadoop ecosystem, including HDFS (Hadoop Distributed File System), MapReduce, Hive, and HBase, for scalable storage and processing of large datasets.

Successfully implemented and optimized data processing workflows using Databricks, leveraging Apache Spark for large-scale data analytics and machine learning tasks.

Hands-on experience in Apache Spark for large-scale data processing, analytics, and machine learning, utilizing Spark's RDDs (Resilient Distributed Datasets) and DataFrame APIs.

Implemented robust monitoring and logging solutions within Apache Airflow, ensuring real-time visibility into workflow execution, detecting issues promptly, and optimizing performance for enhanced reliability and scalability.

Utilized data visualization tools like Tableau, or Power BI, to create meaningful visual representations of big data insights, enabling effective communication of complex information to stakeholders.

Implemented data warehousing solutions for efficient storage, retrieval, and analysis of structured and unstructured data.

Implemented distributed event streaming architectures using Apache Kafka, ensuring reliable, fault-tolerant, and scalable data streaming for real-time analytics.

Extensive experience utilizing Scala in the context of big data processing frameworks such as Apache Spark.

Created and optimized scripts for data extraction, transformation, and loading (ETL) processes.

Extensive experience with big data technologies, including Apache Hadoop ecosystem components (HDFS, MapReduce) and Apache Spark for large-scale data processing.

Utilized tools like Apache Hive and Apache Pig for data transformation and analysis.

Expertise in working with NoSQL databases such as MongoDB, & Cassandra, for efficient and flexible storage and retrieval of unstructured or semi-structured data in big data applications.

Developed and maintained data schemas and structures for optimal performance and scalability.

Environment: SQL Server, Cosmos DB, Informatica, SSIS, Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, PySpark, shell script, Ambari, ETL, JIRA.

Data Warehouse Developer Feb 2012 – Mar 2015

Bank Of America, TX

Responsibilities:

Designed, developed, and maintained end-to-end ETL processes, ensuring seamless data extraction, transformation, and loading from source to target systems.

Proficient in dimensional data modelling for Data Mart design, identifying facts and dimensions, and developing fact tables and dimension tables using Slowly Changing Dimensions (SCD) techniques.

Proficient in designing, developing, and deploying ETL solutions using SQL Server Integration Services (SSIS), leveraging its powerful features for data extraction, transformation, and loading.

Extensive experience in Informatica PowerCenter, creating robust ETL workflows for extracting, transforming, and loading data across diverse data sources and destinations.

Implemented complex data transformations and cleansing routines using SSIS and Informatica to ensure data quality and integrity throughout the ETL process.

Successfully optimized ETL processes by fine-tuning SSIS packages and Informatica workflows, improving overall performance and reducing load times for large datasets.

Integrated ETL processes seamlessly with various databases (e.g., SQL Server, Oracle) and data warehouses to facilitate efficient data movement and consolidation.

Orchestrated data workflows to support business intelligence, analytics, and reporting requirements.

Demonstrated expertise in ETL tools such as Informatica, Talend, or Apache NiFi, utilizing their functionalities for data integration and transformation.

Developed and optimized ETL jobs and workflows to align with business objectives and data quality standards.

Created and maintained data models and mappings, defining the transformation logic to ensure accurate and consistent data representation.

Collaborated with data architects to design efficient and scalable data structures for ETL processes.

Conducted performance-tuning activities to optimize ETL job execution times, resource utilization, and overall system efficiency.

Implemented indexing, partitioning, and caching strategies to enhance ETL process performance.

Established and implemented data quality checks within ETL workflows to identify and address anomalies or discrepancies in the data.

Collaborated with data stewards and business users to define data quality rules and metrics.

Contact this candidate