Resume

Azure Data Engineer, Bigdata Engineer, Data Engineer

Location:

Chesapeake, VA, 23324

Posted:

February 08, 2024

Contact this candidate

Resume:

NAMRATHA

Azure Data Engineer

Employer Details

ad3hla@r.postjobfree.com

972-***-****

PROFESSIONAL SUMMARY

9.5 years of professional IT experience with 5+ years in Data Engineering, and 4 years as Data Warehouse Developer.

Hands-on experience in working with Azure Cloud and its components includes Azure Data Factory, Azure Databricks, Logical Apps, Azure function Apps and Event Hub.

Hands on experience on working with Azure stack moving data from Data Lake to Azure blob storage.

Strong background in Data Load/Integration using the Azure Data Factory

Experience in building ETL pipelines using Azure Data Bricks leveraging PySpark, Spark SQL.

Experience in working with Azure Logic APP Integration tool.

Implemented azure storage, azure functions, and service bus queries for large enterprise level ERP integration systems.

Build the Logical and Physical data model for Snowflake and also worked with Snowflake databases, schemas and tables

Knowledge with Apache Kafka and Azure Event Hubs for messaging and streaming applications.

Experience in developing pipelines in spark using PySpark.

Have good working experience in Hadoop Ecosystem.

Experience in optimizing query performance in Hive using bucketing and partitioning techniques.

Optimized data ingestion, Data modeling, data Encryption and performance by tuning ETL workflows.

Extensive hands-on experience in performance tuning with spark Jobs.

Developed data ingestion workflows to read data from various sources and write it to Avro, Parquet, Sequence, JSON, and ORC file formats for efficient storage and retrieval.

Well versed in using ETL methodology for supporting corporate-wide- solution using Informatica.

Optimized Spark jobs and workflows by tuning Spark configurations, partitioning, and memory allocation settings.

Extensive experience in developing, maintaining and implementation of EDW, ODS and Data warehouse with Star schema and snowflake schema.

Hands-on experience with GitHub to push the code for maintaining versions.

Comprehensive knowledge of Software Development Life Cycle and worked on Agile Methodology.

EDUCATION

Bachelor’s in computer science and engineering from Jawaharlal Nehru Technological University Kakinada, India.

TECHNICAL SKILLS

Azure Services

Azure data Factory, Azure Data Bricks, Azure Synapse, ADLS, Logic Apps, Functional App, Azure DevOps

Big Data Technologies

MapReduce, Hive, Python, PySpark, Kafka, Spark streaming

Reporting tools

Power BI, Advanced Excel, Tableau

Languages

SQL, PL/SQL, Python, HiveQL

Web Technologies

HTML, CSS, JavaScript, XML, JSP

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX

Version Control

GIT, GitHub.

Methodology

Agile, Scrum.

IDE & Build Tools, Design

Eclipse, Visual Studio.

Databases

MS SQL Server, T-SQL, Azure SQL DB, Azure Synapse. MS Excel, MS Access

PROFESSIONAL EXPERIENCE

Client: Elevance Health, Norfolk, VA

Role: Azure Snowflake Data Engineer Jul 2022 – Till Now

Responsibilities:

Analyzed data from Azure data storages using Databricks and Spark cluster capabilities, extracting valuable insights from large datasets.

Developed and maintained end-to-end operations of ETL data pipeline and worked with large data sets in azure data factory.

Experience in Migrating SQL database to Azure data lake, Azure SQL database, Databricks and Azure SQL Data warehouse that involves database permissions.

Designed and implemented data archiving and data retention strategies using Azure Blob Storage.

Worked with Azure Storage services for storing data and integrating it with another services for further processing.

Increased the efficiency of data fetching by using queries for optimizing and indexing.

Experienced in Snowflake modeling using data warehousing techniques, data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.

Developed custom activities using Azure Functions, Azure Databricks, and PowerShell scripts to perform data transformations, data cleaning, and data validation.

Created Pipelines in ADF using Linked Services, Datasets and pipelines to Extract, load and transform data from different sources.

Developed ETL pipelines to transform data from the Snowflake data store using Python and Snowflake Snow SQL as well.

Utilized the data stored in storage accounts and analyzed data using Azure Synapse Analytics to perform advanced analytics.

Worked with Azure Data Factory to integrate data from both on-premises (MYSQL) and cloud services and applied transformations to load back data to Snowflake.

Written queries using T-SQL for manipulation on data and query performance.

Developed data ingestion pipelines using Azure Event Hubs and Azure Functions to enable real-time data streaming into Snowflake.

Developed custom monitoring and alerting solutions using Azure Monitor for proactive identification and resolution of performance issues.

Developed end-to-end data pipelines, integrating multiple data sources such as DB2, SQL, Oracle, flat files (csv, delimited), APIs, XML, and JSON, for comprehensive data processing.

Worked closely with the data engineering team to enhance and optimize data pipelines, improving data processing speed and efficiency.

Worked with Azure Logic Apps administrators to monitor and troubleshoot issues related to process automation and data processing pipelines.

Working with JIRA to report on Projects, and creating sub tasks for Development, QA, and Partner validation.

Experience in full breadth of Agile ceremonies, from daily stand-ups to internationally coordinated Planning

Environment:

Azure Databricks, Azure Data Factory, Azure Data lake storage, Snow SQL, Azure Synapse, Logic Apps, Azure EventHub, Spark streaming, Azure DevOps, SQL, Python, PySpark, GIT, JIRA, Kafka, Power BI

Client: State Farm, Richardson, TX

Role: Azure Data Engineer Feb 2021 – Jun 2022

Responsibilities:

Designed and implemented data pipelines using Azure EventHub for real-time streaming and processing of large volumes of data.

Designed and implemented data models, including schema design, table structures, and distribution strategies, optimizing for query performance in Azure Synapse Analytics.

Integrated data from various sources and build data lakes within Synapse Analytics for unified analytics.

Implemented Spark Streaming for real-time data processing, enabling quick insights and analytics on streaming data.

Worked with Azure Data lake for transformations using Azure data bricks for further processing and analysis.

Designed and implemented end-to-end data pipelines, ensuring efficient data flow, transformation, and integration across multiple systems.

Developed Azure SQL database using T-SQL for managing data sources.

Performed analytics on the data that was collected from various sources using Azure Synapse.

Worked with data studio for managing different databases and performed integration tasks efficiently.

Maintained T-SQL stored procedures and used user-defined functions for manipulation on data and perform logic.

Implemented data ingestion and transformation processes for diverse data sources, ensuring data quality and integrity.

Collaborated with cross-functional teams to gather requirements, design data models, and develop solutions that meet business needs.

Performed data analysis and profiling to identify data patterns, anomalies, and insights for business decision-making.

Implemented data governance and security measures to ensure compliance with industry regulations and protect sensitive data.

Conducted performance tuning and optimization of data pipelines and queries to improve overall system efficiency.

Provided technical guidance and support to team members, fostering knowledge sharing and collaboration.

Worked on RDD’s & Data frames using PySpark for analyzing and processing the data.

Documented technical specifications, data flow diagrams, and process documentation to ensure clear communication and knowledge transfer within the team.

Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.

Using JIRA to manage the issues/project workflow.

Used Git as version control tools to maintain the code repository.

Environment:

Azure Databricks, Data Factory, Azure Data lake storage, Azure Synapse, Logic Apps, Azure EventHub, Spark streaming, T-SQL, Python, PySpark, GIT, JIRA, Kafka, Power BI

Client: Change healthcare, Nashville, TN

Role: Big Data Engineer Sep 2019 – Jan 2021

Responsibilities:

Imported data from MySQL to Hadoop Distributed File System (HDFS) on a regular basis, ensuring seamless data integration.

Performed aggregations on large volumes of data using Apache Spark and stored the processed data in Hive warehouse for further analysis.

Expertise in creating PL/ SQL Procedures, Functions, Triggers, and cursors.

Successfully loaded and transformed large sets of structured, semi-structured, and unstructured data, enabling effective analysis and insights generation.

Developed Hive queries to analyze data and meet specific business requirements, utilizing Hive Query Language (HiveQL) to simulate MapReduce functionalities.

Built HBase tables by leveraging HBase integration with Hive on the Analytics Zone, facilitating efficient storage and retrieval of data.

Applied Kafka and Spark Streaming to process streaming data in specific use cases, enabling real-time data analysis and insights generation.

Utilized Pandas in Python for Data Cleansing and validating the source data.

Experience in generating Fact and dimensional data modelling with various schemas.

Utilized Power Query in Power BI to Pivot and Un-pivot the data model for data cleansing.

Written complex SQL using joins, sub queries and correlated sub queries. Expertise in SQL Queries for cross verification of data.

Creating powerful calculations, key metrics, and key performance indicators (KPIs).

Implemented data validation checks and error handling mechanisms in PySpark to ensure the reliability of the ETL pipeline and minimize data anomalies.

Migrated existing data from RDBMS (Oracle) to Hadoop, facilitating efficient data processing and leveraging Hadoop's capabilities.

Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and transformation processes, ensuring data accuracy and quality.

Utilized JIRA to manage project workflows, track issues, and collaborate effectively with cross-functional teams.

Worked with Spark using PySpark and Spark SQL for faster data testing and processing, enabling efficient data analysis and insights generation.

Employed Spark Streaming to divide streaming data into batches as input to the Spark engine for batch processing, facilitating real-time data processing and analytics.

Leveraged Git as a version control tool to maintain code repositories, ensuring efficient collaboration, version tracking, and code management.

Collaborated closely with the team to identify and resolve JVM-related issues, ensuring smooth execution and performance optimization.

Environment:

MYSQL, HDFS, Apache Spark, Cloudera, HBASE, Kafka, Data Pipelines, RDBMS, Python, PySpark, Power BI, JIRA.

Client: Credit Suisse, New York, NY

Role: Big Data Developer Apr 2018 – Aug 2019

Responsibilities:

Developed ETL jobs using Spark to migrate data from Oracle to new MySQL tables.

Worked with Spark (RRD’s, Data frames, Spark SQL) Connector API's for various tasks (Data migration, Business report generation etc.)

Developed Spark Streaming application for real time sales analytics.

Written complex SQL using joins, sub queries and correlated sub queries. Expertise in SQL Queries for cross verification of data.

Analyzed the source data and handled efficiently by modifying the data types. Used excel sheet, flat files, CSV files to generate Power BI ad-hoc reports.

Developed comprehensive source-to-target mappings to document data transformations, rules, and dependencies, facilitating clear understanding and transparency.

Implemented automated data validation checks for financial reports, reducing manual errors by 25% and enhancing data accuracy.

Analyzed the SQL scripts and designed the solution to implement using PySpark.

Extracted the data from other data sources into cloud sources.

Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS.

Conducted data quality assessments and implemented data cleansing and validation processes to ensure data accuracy and consistency.

Participated in all Agile ceremonies including sprint planning, refinement, and sprint demo.

Implemented Data classification algorithms using MapReduce design patterns.

Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce jobs.

Worked on GIT to maintain source code in Git and GitHub repositories.

Environment:

Spark, HDFS, Power BI, PySpark, SQL, GitHub, ETL, Agile Methodology

Client: Nu Technology, Chennai, India

Role: Developer Aug 2013 -- Aug 2017

Responsibilities:

Working on SQL Server for querying data for better understanding of requirements.

Created jobs, SQL Mail Agent, Alerts and scheduled using packages.

Recommended structural changes and enhancements to systems and Databases.

Conducted Design reviews and technical reviews with other project stakeholders.

Used MS Excel, MS Access, and SQL to write and run various queries.

Optimized the query performance with indexing, aggregation, and materialized view tasks.

Worked extensively on creating tables, views, and SQL queries in MSSQL Server.

Manage and update models - Logical/Physical Data Modeling for developing schemas, and Reference DB according to the user requirements.

Worked with Informatica power center for data integration tasks with various sources of data.

Export the current Data Models into target systems for modeling data according to requirements and visualized data.

Worked with DAX on data calculated various measures accordingly.

Writing Triggers, Stored Procedures, Functions, Coding using Transact-SQL (TSQL), creating, and maintaining Physical Structures.

Deployment of Scripts in different environments according to Configuration Management, requirements Create / Manage Files/File group - Table/Index association Query Tuning, Performance Tuning.

Defect tracking and closing by using Quality Center Maintain Users / Roles / Permissions.

Environment:

SQL Server, T-SQL, Windows, Visual Studio, MS Excel.

Contact this candidate