Azure Data Engineer

Location:

Dallas, TX, 75225

Posted:

August 20, 2024

Contact this candidate

Resume:

Sheshadri Yogith Yamala

Azure Data Engineer

****************@*****.*** mailto:***********@*****.*** +1-417-***-****

PROFESSIONAL SUMMARY:

Overall, 10.5 years of professional experience as an Azure Data Engineer, with working proficiency in Azure and Big Data technologies.

Accumulating 4+ years of focused knowledge in Azure Data services like Azure Data Factory, Azure Data Lake storage, Blob Storage, Delta tables, Key Vault, Azure Databricks, Azure Synapse Analytics, Polybase, Event Hub, Function Apps, Logic Apps, Data Flow, Power Query, and Cosmos DB.

2 years of focused knowledge of big data technologies, Spark, Kafka, HBase, Scala, Oozie, and Zookeeper as well as working with Spark SQL, and the Core Spark API to investigate Spark capabilities and create data pipelines.

Professional expert with 4 years of data warehouse with, a dedicated focus on designing, implementing, and optimizing solutions. Developed advanced SQL queries, ETL development, and cloud-based solutions, actively driving data integration and migration projects.

Developed and deployed various ETL pipelines extensively and thoroughly with the help of Azure Data Factory.

By strong knowledge of Azure Databricks to scalable and effective ETL data pipelines for data processing and transformation projects within Azure Databricks by using PySpark and SQL to create solid data workflows.

Hands-on experience with Azure Key Vault to secure and manage private encryption keys and secrets ensuring strong data protection in cloud-based systems.

Developed data workflows using Azure Logic Apps, Azure Functions, and serverless solutions.

Working with Azure Event Hub for ingesting streaming data in real time.

Expert in organizing developed intricate workflows for data integration and transformation using Azure Synapse Pipelines.

A significant amount of expertise in developing Structured, Unstructured, and semi-structured data with Azure Blob Storage.

Hands-on experience in designing SSIS packages for handling SQL Server databases and objects between SQL Server instances.

Proficiency in programming languages Python, Scala, PySpark, SQL, and PLSQL which help design scalable and efficient data solutions.

Adept at designing cutting-edge, cloud-based data warehousing solutions using Snowflake on Azure, optimizing schemas, tables, and views for streamlined data storage and retrieval.

Implemented real-time streaming applications using Apache Kafka on distributed Hadoop Clusters.

Hands-on experience on Azure Logic Apps with a focus on triggers, including event-based triggers for real-time feedback and scheduled triggers for time-based automation workflows through task execution

Hands-on experience dealing with a variety of file formats, such as CSV, JSON, Parquet, Binary, and ORC.

Developed in leveraging Big Data technologies, including Hive, for efficient data querying and analysis.

In-depth knowledge of HDFS for reliable and scalable storage and retrieval of large datasets.

Adept at working with the Hadoop ecosystem, including Map Reduce, Yarn, and Sqoop, for seamless integration and processing of diverse data sources.

Architected and deployed data solutions on Azure Fabric, ensuring high availability and disaster recovery for mission-critical applications.

Implemented performance tuning, by optimizing data processing performance, and increasing system efficiency for both OLAP and OLTP environments.

Demonstrated expertise in implementing advanced data organization techniques, with effective Partitioning and Repartitioning strategies.

In orchestrating data workflows using industry-standard scheduling tools such as Control-M and Apache Airflow. Implemented effective scheduling mechanisms to ensure timely execution of data pipelines, enhancing overall workflow efficiency.

Expert in version control systems including Git and GitHub, ensuring co-development and effective code management.

Software development and deployment procedures were streamlined by utilizing Azure DevOps, and automated CI/CD pipelines for efficient software delivery.

Strong adherence to Agile methodologies, using collaborative and flexible development methodologies.

Performance experience in cross-functional teamwork, iterative development, and delivery of high-quality solutions aligned with business objectives.

EDUCATION

Masters in Computer Science, USA.

Bachelor from Jawaharlal Nehru Technological University, India

TECHNICAL SKILLS

Azure Services

Azure Data Factory, Azure Data Bricks, Logic Apps, Functional App, Azure DevOps, Azure Key Vaults

BigData Technologies

Hadoop, Hive, PySpark, Scala, Kafka, PySpark streaming, Oozie, Sqoop, Zookeeper

Languages

Python, SQL, PL/SQL, Python, Scala.

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Version Control

GIT, GitHub.

IDE &Build Tools, Design

Visual Studio Code, PyCharm

Databases & Data warehouse

MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB, Snowflake

Client: Nebraska Department of Health And Human Services Feb 2022 to Till Date

Role: Azure Data Engineer

Job Description:

As an Azure Data Engineer at the Nebraska Department of Health and Human Services, I played a key role in pioneering the migration of legacy data systems to modern Azure data platforms Utilizing my expertise in Azure data processing and ETL processing, improved data integrity, security, Led the design, development, and implementation of scalable data pipelines to ensure seamless data transfer while maintaining compliance.

Responsibilities:

Worked as Azure Data Engineer, responsible for building and deploying scalable data ingestion pipelines using Azure Data Factory.

Access and integrate data from multiple sources, including SQL databases, CSV files, and REST APIs to support the organization’s data-driven initiatives

Designed and optimized data processing workflows using Azure Databricks, leveraging PySpark and Spark SQL for highly distributed data processing and transformation workflows.

Building a robust ETL pipeline with Linked services in Azure Data Factory ensures easy extraction and transformation insertion of data.

Ensure data quality and integrity through data manipulation, cleansing, and transformation using Azure Data Factory and Databricks.

Design and optimize data warehouse systems, including Azure SQL Data Warehouse structures, tables, and instances for efficient data storage, retrieval and analysis

Work closely with data analysts and business stakeholders to understand their needs and implement appropriate data models and processes within Snowflake.

Implemented and optimized Spark tasks for data transformation, integration, and machine learning tasks on large data sets.

Deployed and managed data solutions on Azure Fabric, facilitating smooth data operations and supporting scalable application deployments.

Engineered and deployed end-to-end data solutions on Azure Fabric, utilizing microservices architecture to enhance scalability and reliability of data processing workflows.

Designed and implemented data masking solutions to anonymize sensitive data, ensuring compliance with privacy regulations.

Utilized data masking tools such as Informatica Data Masking, IBM InfoSphere Optim, and Oracle Data Masking to protect sensitive information.

Hands-on experience with Snowflake to combine large volumes of data processing and analytics capabilities for simple and efficient data analysis.

Configure event-based trigger mechanisms through Azure Logic Apps to optimize data pipelines and business processes.

Metadata management solution using Azure services to manage, and analyze data flows and changes.

Identify and resolve performance bottlenecks in data processing and storage by optimizing query performance and reducing data storage.

Development and use of advanced Tableau and Power BI visualizations to transform raw data into insightful charts and visualizations for informed decision-making.

Developed interactive and insightful data visualizations using Tableau, enabling stakeholders to make data-driven decisions.

Actively participating in ETL projects, ensuring data integrity, and maintaining pipeline stability.

Expertise in data processing, integration, and management projects using shell scripting for responsive and flexible data collaboration.

Hands-on experience with Git for version control and collaborative development, ensuring effective code management, collaboration, and change tracking.

Environment Skills: Azure Data Factory, Azure Databricks, Azure Event Hubs, Azure Functions, Azure Data Lake Storage, Azure Key Vault, Azure Blob Storage, Azure SQL, Azure Logic Apps, Azure DevOps, Azure Monitor, Event grids, Power BI, Snowflake Features (zero-per-cloning, time travel, shared data management).

Client: State of Michigan, O Grand Rapids, Michigan Sep 2019 to Jan 2022

Role: Azure Data Engineer

Job Description:

As an Azure Data Engineer at the State of Michigan, I played a vital role in leveraging cloud-based technologies to design, develop, and implement scalable and efficient data solutions to support various state government initiatives and projects. Working closely with stakeholders across different departments, I contributed to the development of robust data pipelines, ensuring the integrity, reliability, and accessibility of data for critical decision-making processes.

Responsibilities:

Built and implemented scalable data ingestion pipelines through Azure Data Factory, accessing data from various sources such as SQL databases, CSV files, and REST APIs.

Developed scripts for data transformations using Azure Databricks-PySpark and Spark SQL for distributed parallel processing.

Applied various data masking techniques including substitution, shuffling, encryption, and tokenization to secure sensitive data.

The pipeline was connected using Linked services in Azure Data Factory to establish a connection between the source and target.

Designed and implemented scalable data solutions utilizing Azure Fabric, including setting up distributed computing frameworks to handle large datasets and high-throughput workloads.

Designed a real-time data processing pipeline using Azure Stream Analytics, Azure Event Hubs, and Apache Airflow, enabling real-time insights and decision-making for the business.

Managed and orchestrated large-scale data workflows using Apache Airflow, ensuring timely and accurate data processing.

Azure Data Factory and Databricks were used to ensure data integrity, purity, and transformation to ensure data quality and integrity.

Implemented a cloud-based data warehousing solution using Azure Synapse Analytics to take advantage of its scalability and efficiency.

Azure SQL Databases to store relational structured data, including schemas, tables, and views, and provide efficient data storage, retrieval, and analysis for analytics and reporting purposes

Implemented appropriate data models and processes within Azure Synapse Analytics and collaborated with data analysts and business stakeholders to understand their needs.

Optimized Spark applications for data transformation, aggregation, and machine learning tasks on large datasets.

Azure Synapse Analytics was used to combine huge data processing and analytics capabilities for complex data analysis.

Configured event-based trigger mechanisms using Azure Data Factory for a more efficient data pipeline and business processes.

Aadept at Azure Synapse Analytics' unique features such as zero-copy cloning, time travel, and shared data management.

Azure Synapse Analytics and Azure Data Factory were combined to build a robust ETL pipeline, simplifying the migration of data from source to Azure-based data warehouses.

Collaborated on ETL projects, ensuring data integrity and pipeline stability.

A data pipeline using Kafka was developed and deployed using Zookeeper, Spark, and Hive for data optimization, management, and analysis.

Expert in optimizing data processing, integration, and management projects, using shell scripting for responsive and flexible data collaboration.

Git has been used for version control and collaborative development, facilitating efficient code management, collaboration, and change tracking.

Environment Skills: Azure Data Factory, Databricks, Snowflake, Azure DevOps, PySpark, Spark SQL, Azure Logic Apps, Azure Synapse Analytics, Tableau, Power BI visualizations, shell scripting, Power BI, Kafka, Zookeeper, Git.

Client: Toyota Financial Services, Dallas TX July 2017 to Aug 2019

Role: Big Data Engineer

Job Description:

As a Big Data Engineer at Toyota Financial Services in Dallas, TX, you will be responsible for designing, developing, and maintaining scalable data pipelines and analytic solutions to support business processes using your expertise in big data technology and programming language, plays a key role in transforming raw data into actionable insights, and in developing data-driven decision-making processes within an organization.

Responsibilities:

Implementing of Sqoop, Kafka, and Spark to develop and maintain the data pipeline end-to-end, facilitating seamless data migration from MySQL and Oracle to HDFS for mortgage risk analysis.

Worked extensively with HDFS, utilizing Map Reduce for retrieving data supporting loan performance analysis.

Custom Hive and Spark SQL queries tailor business needs, increasing performance in data-driven decision-making processes.

Optimized HBase tables with Hive integration for storage and retrieval, ensuring fast access to critical market information.

Real-time data analytics used Kafka and Spark Streaming, providing immediate insights for credit risk analysis from streaming data.

Spark and PySpark were used to quickly review and process data, speeding up data processing for accurate analysis.

Deployed and managed Apache Airflow on Azure Kubernetes Service (AKS) to ensure scalability, high availability, and robust performance for orchestration workloads.

Developing custom automation scripts in Oracle PL/SQL to provide accuracy and quality of data, increasing data integrity.

CI/CD pipelines were implemented in the Hadoop environment which simplified the development and deployment of applications.

To improve project management, was used to increase collaboration and efficiency across the cross-functional teams.

Developed structured, customized teams for optimized data processing workflows for mortgage market research using Zookeeper with the Oozie workflow engine.

Code repositories were maintained with Git, improved version tracking, and effective teamwork.

Environment Skills: Sqoop, MySQL, HDFS, Apache Spark Scala, Hadoop Hive, Cloudera, HBase, Kafka, Map Reduce, Zookeeper, Oozie, Python, PySpark, CI/CD Pipelines, Oracle PL/SQL.

Client: Physicians Mutual, Texas Apr 2015 to Jan 2017

Role: Hadoop Developer

Job description:

As an experienced Hadoop developer with hand on experience, I have been able to design, build, and maintain Hadoop-based solutions that support data processing, analytics, and reporting needs My expertise is with Hadoop for large amounts of structured and unstructured data of the quantity is handled properly Depends on the biology to be used.

Responsibilities:

Led ETL processes using Spark-Scala to migrate big data from Oracle to MySQL, which increased data availability for critical financial analysis projects.

Collaborated on multi-source data integration projects, seamlessly transferring data from various sources to HDFS using Sqoop.

Hands-on experience with Kafka Streaming to perform real-time data analytics, gain instant insights from streaming data, and contribute to credit risk analysis.

Automation was implemented for deployments using YAML scripts, which accelerated the creation and release of new customer-suggested systems.

Integrated diverse data sources into Domo, performing data blending, transformation, and visualization to provide comprehensive business insights.

Skilled in data visualization and reporting using Domo, creating insightful dashboards and reports to support data-driven decision-making.

Orchestrated SSIS package deployment and automated execution through job scheduling, enhancing the efficiency and reliability of data integration processes.

Proficient in deploying and scheduling Alteryx workflows, monitoring job execution, and troubleshooting errors to ensure seamless data processing.

Utilized the data warehouse to build a data mart, serving as a foundation for generating downstream reports.

Developed a user access tool to empower users to design ad-hoc reports and run queries, enabling efficient data analysis within the proposed cube.

Extensive experience in data manipulation, transformation, and cleansing using Alteryx Designer, automating workflows to streamline processes and improve efficiency.

Experience with MS SQL Server Reporting Services (SSRS) to author, manage, and deliver both paper-based and interactive web-based reports.

Collaborated effectively with cross-functional teams, including data analysts and business stakeholders, to understand and address their data requirements, ensuring alignment with business objectives

Environment Skills: Hadoop, HDFS, Spark-Scala, Sqoop, Kafka, Spark Streaming, ETL Services, Data Migration, Real-time Data Analytics, Multi-source Data Integration, YAML Scripting, Financial Analysis, Oracle, MySQL.

Client: Bank of America, Plano, TX Feb 2013 to Mar 2015

Role: Data Warehouse Developer

Job Description:

I was a key player in the design, development, and upkeep of data warehouse systems at Bank of America in Plano, Texas, helping the company fulfil its reporting and data analytics needs. My duties included every stage of the data warehousing lifecycle, including requirements analysis, design, development, deployment, and continuing maintenance.

Responsibilities:

ETL data flows were Developed and deployed with SSIS, facilitating seamless data migration and transformation from sources such as SQL Server, Access, and Excel

Optimizing SQL Server performance through the creation of stored procedures, triggers, and functions, as well as the implementation of indexing and tracking techniques.

Dimensional data modelling knowledge was used to design and develop DataMart including the presentation of facts and concepts

Monitored ETL processes, quickly detecting and resolving errors and incidents using techniques such as priority limits, breakpoints, checkpoints, and logging

Hands-on experience in creating SSAS cubes, defining aggregation, and KPIs, and helping implement data mining models for SSAS products.

Developed various reports such as parameterized, charts, graphs, linked, dashboards, scorecards, and drill-down/drill-through reports in SSAS cubes using SSRS

Implemented ETL data flows using SSIS that facilitates data migration and transformation from sources such as SQL Server, Access, and Excel.

Collaborated with cross-functional teams to gather requirements, design data solutions, and deliver actionable insights through SQL queries and Alteryx workflows.

Gained experience in dimensional data modeling and specified facts and dimensions for DataMart design, as well as the development of fact tables and dimension tables using slowly changing dimension (SCD) techniques

Monitored errors and incidents in ETL processes, using techniques such as priority constraints, breakpoints, checkpoints, and logging.

Gained experience developing SSAS cubes, implementing aggregation, defining KPIs, segmenting cubes, and developing, supporting the implementation of, and data mining models for SSAS products.

Developed reports such as parameterized, charts, graphs, links, dashboards, scorecards, and drill-down/drill-through reports on SSAS cubes using SSRS.

Environment Skills: MS SQL Server, Visual Studio Legacy Versions, SSIS, SharePoint, MS Access, Git, SQL Server 2008/2012 Enterprise, SSRS, SSIS, T-SQL, Windows Server 2003, Performance Points Server 2007, Oracle 10g, Visual Studio 2010.

Contact this candidate