Azure Data Engineer

Location:

St. Louis, MO

Posted:

September 21, 2023

Contact this candidate

Resume:

Sai Deepika Gummadi

**************@*****.***

Phone: 314-***-****

Data Engineer

Professional Summary:

o9+ years of experience in Azure Cloud, Azure App Services, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsight Big Data Technologies (Hadoop and Apache Spark), Data bricks, Event Hub and Event grid.

oExperience working in reading Continuous Json data from different source system using Kafka into Databricks Delta and processing the files using Apache Structured streaming, Pyspark and creating the files in parquet format.

oCreated data pipelines for both batch process, Micro-batch streaming and continuous streaming process in Databricks for high latency, low latency and ultra-low latency of data accordingly by using inbuilt Apache spark modules.

oWell versed experienced in creating pipelines in Azure Cloud ADFv2 using different activities like Move &Transform, Copy, filter, for each, Data bricks etc.

oProviding Azure technical expertise including strategic design and architectural mentorship, assessments, POCs, etc., in support of the overall sales lifecycle or consulting engagement process.

oHands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS, YARN, TEZ, Hive, Sqoop, MapReduce, Pig, OOZIE, Kafka, Storm, HBASE.

oIn-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames and Spark Streaming.

oExperience in designing and implementing data pipelines, data storage solutions, and data warehousing systems using AWS tools such as S3, RDS, DynamoDB, Redshift, and Athena.

oExperience in implementing data security and privacy policies to ensure the confidentiality and integrity of data using AWS tools such as IAM and VPC.

oStrong understanding of data architecture and design principles and the ability to develop and implement scalable data solutions using AWS services such as EC2, Glue, and Lambda.

oAbility to perform data analytics, predictive modeling, and data-driven decision-making using Azure tools such as HDInsight, Data Factory, and Synapse Analytics.

oExperience in working with data streams and real-time data processing systems using Azure tools such as Event Hubs, Stream Analytics, and Service Bus.

oSkilled in automating data migration processes using Azure Data Factory and scheduling pipelines for timely data updates.

oExperienced in cloud monitoring and performance optimization using tools such as AWS CloudWatch

oand Azure Monitor to ensure the availability and stability of cloud-based data solutions.

oStrong knowledge of big data platforms, including Hadoop, Spark, and Cassandra, and their applications in data processing and analytics.

oStrong experience in writing applications using Python using different libraries like Pandas, NumPy, SciPy, Matpotlib etc.

oGood understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like CosmosDB.

oMuch experience in performing Data Modelling by designing Conceptual, Logical data models and translating them to Physical data models for high volume datasets from various sources like Oracle, Teradata, Vertica, and SQL Server by using Erwin tool.

oExpert knowledge and experience in Business Intelligence Data Architecture, Data Management and Modeling to integrate multiple, complex data sources that are transactional and non-transactional, structured, and unstructured.

oAlso, design and develop relational databases for collecting and storing data and build and design data input and data collection mechanisms.

oWell versed with Relational and Dimensional Modeling techniques like Star, Snowflake Schema, OLTP, OLAP, Normalization, Fact and Dimensional Tables.

oGood knowledge in creating SQL queries, collecting statistics and Teradata SQL query performance tuning techniques and Optimizer/explain plan.

Technical Skills:

Azure Cloud Platform

ADFv2, BLOB Storage, ADLS2, Azure SQL DB, SQL server, Azure Synapse, Azure Analytic Services, Data bricks, Mapping Dataflow (MDF), Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Azure Machine Learning, App Services, Logic Apps, Event Grid, Service Bus, Azure DevOps, GIT Repository Management, ARM Templates

Reporting and BI Tools

Power BI, Tableau and Cognos

ETL Tools:

ADFV2, Informatica Power Center 10.x/9.x, DataStage 11.x/9.x, SSIS

Programming Languages

PySpark, Python, U-SQL, T-SQL, LINUX Shell Scripting, AZURE PowerShell, C#, Java

Big data Technologies

Hadoop, HDFS, Hive, Apache Spark, Apache Kafka, Pig, Zookeeper, Sqoop, Oozie, HBASE, YARN

Databases

Azure SQL Warehouse, Azure SQL DB, Azure Cosmos No SQL DB, Oracle, Microsoft SQL Server

IDE and Tools

Code, Eclipse, SSMS, Maven, SBT, MS-Project, GitHub, Microsoft Visual Studio

Cloud Stack

AWS, GCP, Azure, Snowflake

Methodologies

Waterfall, Agile/Scrum, SDLC

Professional Experience:

Client: MasterCard, O’fallon, MO Aug 2021 to Present

Role: Azure Data Engineer

Responsibilities:

oCollaborated with Business Analysts and Solution Architects to gather client requirements and translated them into Azure-based design architectures.

oDesigned and maintained data pipelines using Azure Data Factory, achieving reduction in data processing time.

oCreated High-Level Technical Design and Application Design documents, ensuring clear and comprehensive documentation for stakeholders.

oDeveloped Data Flow specifications and mapping for developers, facilitating efficient data processing.

oDesigned and implemented interfaces using Azure Data Share for seamless file transfers.

oEngineered complex data transformations and manipulations using ADF and PySpark with Databricks.

oImplemented real-time data streaming with Azure Stream Analytics and Event Hub, enabling timely analytics for critical operations.

oOptimized computing time, resulting in reduction in cluster runtime and cost savings.

oMonitored and automated data engineering solutions, ensuring reliability and efficiency.

oDeveloped and optimized SQL views and stored procedures in Azure SQL DW for enhanced reporting capabilities.

oIntegrated data from diverse sources, including databases, APIs, flat files, and streaming platforms, using Python connectors and libraries.

oDesigned, developed, and maintained end-to-end data pipelines for efficient ETL processes using Python.

oLeveraged U-SQL scripts to ingest and transform data into Azure Data Warehouse.

oDesigned and implemented real-time market data processing solution using Azure Stream Analytics, Azure Event Hub, and Service Bus Queue.

oManaged data ingestion using Azure Data Lake Storage and created pipelines with Azure Data Factory v2, extracting data from diverse sources.

oCreated numerous Databricks Spark jobs with PySpark for various data operations.

oImplemented a Power BI integration module for canned reports from ADLS2.

oProficiently used SQL Server Import and Export Data tool.

oCollaborated with cross-functional teams and assisted in troubleshooting, risk identification, and resolution.

oLed team members in technical issue resolution and mentored for skill development.

oDeveloped and maintained ETL processes using SQL Server Integration Services (SSIS) to extract, transform, and load data into a centralized data warehouse.

oCollaborated with stakeholders to gather data requirements and optimize SQL queries, resulting in improvement in query performance.

oImplemented data quality checks and validation procedures to ensure data accuracy.

oManaged database performance through index optimization and maintenance tasks.

Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure Data Lake, Blob Storage, SQL server, Teradata Utilities, Windows remote desktop, Unix, Azure PowerShell, Data bricks, Python, Pyspark Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Power BI.

Client: Wells Forgo - Charlotte, NC Nov 2019 to July 2021

Role: Sr. Data Engineer

Responsibilities:

oEnsured Wells Fargo's customer eligibility system complied with Securities and Exchange Commission (SEC) regulations for data collection, storage, and usage within the Azure environment.

oSpearheaded data privacy and security measures, maintaining full compliance with SEC guidelines for handling securities-related data.

oDeveloped and implemented robust data governance policies and procedures within the Azure ecosystem, improving data quality, accuracy, and reliability.

oEstablished a culture of data stewardship, ensuring data usage remained aligned with SEC regulations.

oLed the development and implementation of a transformative customer eligibility project within the Azure platform.

oLoaded transformed data into target systems, including databases, data warehouses, and cloud storage, employing Python libraries and custom scripts.

oUtilized cutting-edge data analytics techniques to analyze customers' financial and personal data, revolutionizing the process of determining eligibility for auto financing.

oDesigned and implemented an efficient Extract, Transform, Load (ETL) architecture using Azure services for seamless data transfer from source servers to the Data Warehouse.

oImplemented automated data cleansing and integration processes, resulting in improved data quality.

oImplemented real-time event processing of data from multiple servers within the Azure environment, enabling rapid response to critical data events.

oActively participated in designing and developing CI/CD pipelines for data engineering within the Azure ecosystem.

oImplemented automation from code commit to deployment using Azure DevOps.

oManaged a cloud data warehouse on Azure, facilitating batch processing and streaming.

oEnhanced data visualization for customer data using Azure Power BI.

Environment: Azure Data factory, Azure Databricks, Azure Event Hubs, Azure SQL Datawarehouse, Power BI.

Client: Repco - Hyderabad, INDIA Aug 2017 to Feb 2019

Data Engineer

Responsibilities:

oImplemented data validation, cleansing, and enrichment processes to ensure compliance with regulatory requirements, including SEC and Finra rules.

oDesigned and implemented data architectures for managing large volumes of home loans data while adhering to GDPR and CCPA data privacy and security regulations.

oDeveloped and automated multiple ETL jobs using Amazon EMR, facilitating seamless data transfer from HDFS to S3.

oCreated batch data pipelines for extracting data from S3 and loading it into RedShift using Glue jobs.

oUtilized PySpark and Scala to automate data ingestion from various sources, including APIs, AWS S3, and Redshift.

oConfigured Spark streaming to store and process real-time data from Kafka.

oLeveraged AWS EMR to store structured data in Hive and unstructured data in HBase.

oCleaned and transformed data in HDFS using MapReduce (YARN) programs for ingestion into Hive schemas.

oDeveloped and maintained data reporting and analytics solutions to support regulatory reporting and compliance monitoring.

oCreated a data lake in Snowflake using Stitch, App Testing, and Production support.

oManaged S3 buckets, implemented policies, and utilized S3 and Glacier for storage and backup on AWS.

oGenerated reports for the BI team by exporting analyzed data to relational databases for visualization using Sqoop.

oCreated custom User Defined Functions (UDFs) to extend Hive and Pig core functionality.

oEnabled ODBC/JDBC data connectivity to Hive tables and worked with tools like Tableau and Flink

Environment: AWS S3, Glue, AWS EMR, Glacier, Redshift, Snowflake, Spark SQL, Sqoop, Flink, YARN, Kafka, MapReduce, Hadoop, HDFS, Hive, Tableau, Spotfire, HBase.

Client: Colruyt Group– Hyderabad, INDIA Jan 2014 to July 2017

Data Engineer

Responsibilities:

oExamined claims and supporting documentation to ensure policy compliance before processing.

oDeveloped a strong understanding of claim processing from both client and service provider perspectives, identifying key metrics for each.

oManaged policy servicing and maintenance operations, including coverage changes, beneficiary data updates, and premium payments.

oProcessed claims data efficiently through the system.

oPossess a good understanding of Electronic Health Record (EHR) systems, including their functionalities, data models, data elements, and data privacy and security regulations.

oDesigned high-performance batch ETL pipelines using AWS cloud services.

oExtracted data from relational databases and APIs with AWS Data Factory to store in AWS data lake storage.

oUtilized PySpark scripts in AWS Databricks for data transformations and conversions.

oDesigned data warehousing solutions using AWS Synapse Analytics for storing and analyzing transformed data.

oImplemented and designed Python microservices in the healthcare domain.

oMonitored productivity and resources using AWS Log Analytics.

oImplemented CI/CD pipelines with AWS DevOps for automated build, test, and deployment processes.

oUtilized AWS Event Hub to capture real-time data streams and route them to the appropriate data stores.

oMonitored data pipeline performance using AWS Monitoring and Analytics tools to ensure seamless data flow and identify potential bottlenecks.

oPlayed a critical role in a data migration project involving EHR, ensuring accurate, efficient, and secure data migration.

oEnsured data pipeline security using AWS security features, including role-based access control and encryption, to safeguard data privacy and confidentiality.

oManaged encryption keys and passwords through AWS Key Vault.

oUtilized AWS Logic Apps to orchestrate complex business processes and workflows.

oImplemented serverless computing solutions using AWS Lambda Functions for cost-effective and scalable data processing.

oDesigned visualization dashboards for data analytics using Power BI.

oProficient in practicing Agile methodology to update workflows and manage project lifecycles and sprints.

Environment: AWS Data Factory, Data Lake storage, Synapse Analytics, Python, Event Hub, Logic Apps, Key Vault, Log Analytics, Scala, Power BI.

Contact this candidate