Post Job Free

Resume

Sign in

Azure Data Engineer

Location:
Dallas, TX
Posted:
August 01, 2023

Contact this candidate

Resume:

sparkSAMYUKTHA RAVULA

(Azure Data Engineer)

Phone: 682-***-**** Email: adynlu@r.postjobfree.com

LinkedIn ID: www.linkedin.com/in/samyuktha-ravula

PROFESSIONAL SUMMARY

Around 9+ years of professional IT experience and over 5+ years in Data Engineering, and 4 years in Datawarehouse.

Experienced data professional with a strong background in end-to-end management of ETL data pipelines, ensuring scalability and smooth operations.

Proficient in optimizing query techniques and indexing strategies to enhance data fetching efficiency.

Skilled in utilizing SQL queries, including DDL, DML, and various database objects, for data manipulation and retrieval.

Expertise in integrating on-premises and cloud-based data sources using Azure Data Factory, applying transformations, and loading data into Snowflake.

Strong knowledge of data warehousing techniques, including data cleansing, Slowly Changing Dimension handling, surrogate key assignment, and change data capture for Snowflake modeling.

Experienced in designing and implementing scalable data ingestion pipelines using tools such as Apache Kafka, Apache Flume, and Apache Nifi.

Proficient in developing and maintaining ETL/ELT workflows using technologies like Apache Spark, Apache Beam, or Apache Airflow for efficient data extraction, transformation, and loading processes.

Skilled in implementing data quality checks and cleansing techniques to ensure data accuracy and integrity throughout the pipeline.

Experienced in building and optimizing data models and schemas using technologies like Apache Hive, Apache HBase, or Snowflake for efficient data storage and retrieval for analytics and reporting.

Strong proficiency in developing ELT/ETL pipelines using Python and Snowflake Snow SQL.

Skilled in creating ETL transformations and validations using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory.

Collaborative team member, working closely with Azure Logic Apps administrators and DevOps engineers to monitor and resolve issues related to process automation and data processing pipelines.

Experienced in optimizing code for Azure Functions to extract, transform, and load data from diverse sources.

Strong experience in designing, building, and maintaining data integration programs within Hadoop and RDBMS environments.

Proficient in implementing CI/CD frameworks for data pipelines using tools like Jenkins, ensuring efficient automation and deployment.

Skilled in executing Hive scripts through Hive on Spark and Spark SQL to address various data processing needs.

Collaborative team member, ensuring data integrity and stable data pipelines while collaborating on ETL tasks.

Strong experience in utilizing Kafka, Spark Streaming, and Hive to process streaming data, developing robust data pipelines for ingestion, transformation, and analysis.

Proficient in utilizing Spark Core and Spark SQL scripts using Scala to accelerate data processing capabilities.

Experienced in utilizing JIRA for project reporting, task management, and ensuring efficient project execution within Agile methodologies.

Actively participated in Agile ceremonies, including daily stand-ups and PI Planning, demonstrating effective project management skills.

TECHNICAL SKILLS

Azure Services

Azure data Factory, Airflow, Azure Data Bricks, Logic Apps, Functional App, Snowflake, Azure DevOps

Big Data Technologies

MapReduce, Hive, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper

Hadoop Distribution

Cloudera, Horton Works

Languages

Java, SQL, PL/SQL, Python, HiveQL, Scala.

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Build Automation tools

Ant, Maven

Version Control

GIT, GitHub.

IDE & Build Tools, Design

Eclipse, Visual Studio.

Databases

MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB

EDUCATION

Bachelors in Computer Science Engineering from CVR College Of Engineering, India.

WORK EXPERIENCE

Role: Azure Data Engineer Sep 2022 – Till Now

Client: Walmart, Dallas, TX

Responsibilities:

Managed end-to-end operations of ETL data pipelines, ensuring scalability and smooth functioning.

Implemented optimized query techniques and indexing strategies to enhance data fetching efficiency.

Utilized SQL queries, including DDL, DML, and various database objects (indexes, triggers, views, stored procedures, functions, and packages) for data manipulation and retrieval.

Integrated on-premises (MySQL, Cassandra) and cloud-based (Blob storage, Azure SQL DB) data using Azure Data Factory, applying transformations and loading data into Snowflake.

Orchestrated seamless data movement into SQL databases using Data Factory's data pipelines.

Developed data warehousing techniques, data cleansing, Slowly Changing Dimension (SCD) handling, surrogate key assignment, and change data capture for Snowflake modelling.

Designed and implemented scalable data ingestion pipelines using tools such as Apache Kafka, Apache Flume, and Apache Nifi to collect and process large volumes of data from various sources.

Developed and maintained ETL/ELT workflows using technologies like Apache Spark, Apache Beam, or Apache Airflow, enabling efficient data extraction, transformation, and loading processes.

Implemented data quality checks and data cleansing techniques to ensure the accuracy and integrity of the data throughout the pipeline.

Built and optimized data models and schemas using technologies like Apache Hive, Apache HBase, or Snowflake to support efficient data storage and retrieval for analytics and reporting purposes.

Developed ELT/ETL pipelines using Python and Snowflake Snow SQL to facilitate data movement to and from Snowflake data store.

Created ETL transformations and validations using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory.

Collaborated with Azure Logic Apps administrators to monitor and resolve issues related to process automation and data processing pipelines.

Optimized code for Azure Functions to extract, transform, and load data from diverse sources, including databases, APIs, and file systems.

Designed, built, and maintained data integration programs within Hadoop and RDBMS environments.

Implemented a CI/CD framework for data pipelines using the Jenkins tool, enabling efficient automation and deployment.

Collaborated with DevOps engineers to establish automated CI/CD and test-driven development pipelines using Azure, aligning with client requirements.

Demonstrated proficiency in scripting languages like Python and Scala for efficient data processing.

Executed Hive scripts through Hive on Spark and SparkSQL to address diverse data processing needs.

Collaborated on ETL tasks, ensuring data integrity and maintaining stable data pipelines.

Utilized Kafka, Spark Streaming, and Hive to process streaming data, developing a robust data pipeline for ingestion, transformation, and analysis.

Utilized Spark Core and Spark SQL scripts using Scala to accelerate data processing capabilities.

Utilized JIRA for project reporting, creating subtasks for development, QA, and partner validation.

Actively participated in Agile ceremonies, including daily stand-ups and internationally coordinated PI Planning, ensuring efficient project management and execution.

Developed and implemented data integration solutions for Salesforce, enabling seamless data flow between various systems and Salesforce platform.

Proficient in AWS and Ab Initio, experienced in building scalable data engineering solutions.

Skilled in Ab Initio ETL tool, with expertise in complex data integration and data quality processes.

Developed end-to-end data pipelines using AWS and Ab Initio, improving data accuracy and performance.

Experienced Azure Data Engineer skilled in Microsoft BI solutions, T-SQL, and SSIS. Collaborative team player with a track record of delivering successful projects. Results-driven with expertise in implementing new features, optimizing processes, and achieving cost savings. Certified Azure Data Engineer Associate with strong data modeling and visualization skills.

As an Azure Data Engineer, such as data modeling, data warehousing, ETL (Extract, Transform, Load) processes, or data visualization, be sure to mention them.

Proficient in designing and maintaining databases, ensuring referential integrity and utilizing ER diagrams.

Experienced in data warehousing, including developing and maintaining data warehouses.

Skilled in Azure Data Factory (ADF) for data pipelines, incorporating Python scripting and Spark API for advanced data processing.

Snowflake enables efficient analysis of customer and sales data in retail projects.It supports integration of structured and semi-structured data from various sources.

Automatic performance optimization ensures fast query execution for real-time insights.

Secure data sharing capabilities facilitate collaboration with suppliers and partners.

We can use Snowflake for demand forecasting, inventory management, and supply chain optimization.

Snowflake integrates with ecosystem tools for seamless data integration.It supports large-scale data analytics, enabling data-driven decision-making.

Retail projects can leverage Snowflake to deliver personalized shopping experiences to customers.

Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS,SnowFlake,MapReduce, report Development,, YARN, Spark, Hive, SQL, Python, Extraction, Transformation and Loading,Scala, PySpark, Spark Performance, Relational database and SQL,T-SQL, language,Azure Cosmos,PowerShell, data integration, data modeling, data pipelines, production support,Abinitio, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi,Azure Devops datalake

Role: Azure Data Engineer Nov 2020 – Aug 2022

Client: Novartis Pharmaceuticals, East Hanover, NJ

Responsibilities:

Enhanced Spark performance by optimizing data processing algorithms, leveraging techniques such as partitioning, caching, and broadcast variables.

Implemented efficient data integration solutions to seamlessly ingest and integrate data from diverse sources, including databases, APIs, and file systems, using tools like Apache Kafka, Apache NiFi, and Azure Data Factory.

Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.

Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and also done POC on Azure Data Bricks.

Perform ETL using Azure Data Bricks, Migrated on premise Oracle ETL process to azure synapse analytics.

Worked on Migrating SQL database to Azure data lake, Azure data lake analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse

controlling and granting database access and Migrating on Premise databases to azure data lake store using Azure Data Factory

Data transfer using azure synapse and Polybase.

Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development.

Developed enterprise level solution using batch processing and streaming framework (using Spark Streaming, apache Kafka.

Designed and implemented robust data models and schemas to support efficient data storage, retrieval, and analysis using technologies like Apache Hive, Apache Parquet, or Snowflake.

Developed and maintained end-to-end data pipelines using Apache Spark, Apache Airflow, or Azure Data Factory, ensuring reliable and timely data processing and delivery.

Collaborated with cross-functional teams to gather requirements, design data integration workflows, and implement scalable data solutions.

Provided production support and troubleshooting for data pipelines, identifying and resolving performance bottlenecks, data quality issues, and system failures.

Processed the schema oriented and non-schema-oriented data using Scala and Spark.

Created Partitions, Buckets based on State to further process using Bucket based Hive joins.

Created Hive Generic UDF's to process business logic that varies based on policy.

Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).

Load and transform large sets of structured, semi structured, and unstructured data.

Written Hive queries for data analysis to meet the Business requirements.

Wrote Hive queries for data analysis to meet the specified business requirements by creating Hive tables and working on them using Hive QL to simulate MapReduce functionalities.

Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing data.

Worked on RDD’s & Data frames (SparkSQL) using PySpark for analyzing and processing the data.

Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data

Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing

Implemented CICD pipelines to build and deploy the projects in Hadoop environment.

Using JIRA to manage the issues/project workflow.

Strong understanding of data visualization principles and best practices to effectively communicate data-driven insights to stakeholders.

Skilled in data modeling and creating relationships between multiple data sources within Power BI to enable comprehensive analysis and reporting.

Proficient in utilizing Power BI's advanced features such as DAX (Data Analysis Expressions) for complex calculations and measures.

Ability to customize and format visualizations in Power BI to meet specific business requirements and enhance user experience.

Demonstrated expertise in transforming raw data into meaningful visual representations using Power Query Editor and data cleansing techniques.

Worked on Spark using Python (PySpark) and Spark SQL for faster testing and processing of data.

Environment: Azure Databricks, Data Factory,Data Mart,s Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, data integration, data modeling, Data visualization, Power BI, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi,portfolio,PL/SQL.

Role: Data Engineer Jul 2019 – Oct 2020

Client: CGI, Dallas, TX

Responsibilities:

Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing data.

Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.

Worked on creating tabular models on Azure analytic services for meeting business reporting requirements.

Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in Azure Databricks.

Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.

Working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).

Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.

Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support the intraday

and real-time data processing.

Set up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and Map Reduce to access cluster for new users.

Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD.

Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response.

Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning, and bucketing in hive, doing map side joins etc.

Good knowledge on Spark platform parameters like memory, cores and executors

Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects.

Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages).

Extensive experience with Big Data technologies and DevOps, including Hadoop, Kafka, RabbitMQ, MySQL, and cloud deployment.

Experience with database management systems such as Microsoft SQL Server, Oracle, or MySQL.

Familiarity with SnapLogic, a data integration tool, for seamless data flow between systems.

Hands-on experience in designing and implementing data integration workflows using SnapLogic's visual interface.

Strong troubleshooting and problem-solving skills in data integration.

Proficient in Agile development methodologies, delivering projects with multiple priorities and minimal supervision.

Strong technical proficiency in programming languages (Python, Java, Scala), API development (REST, SOAP), scripting (Perl, Shell), and infrastructure automation (Docker, Kubernetes).

Environment: Azure, Azure Data Factory, Databricks, PySpark,RabbitMQ, RDBMS, Jenkins, Docker,SnapLogic, Kubernetes Python, Apache Spark, HBase, HIVE, SQOOP, Snowflake, Python, SSRS, Tableau.

Role: Big data Developer May 2018 – Jun 2019

Client: Rockwell Collins, Chicago, IL

Responsibilities:

Designed and developed the applications on the data lake to transform the data according business users to perform analytics.

In depth understanding/ knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts.

Involved in developing a Map Reduce framework that filters bad and unnecessary records.

Involved heavily in setting up the CI/CD pipeline using Jenkins, Maven, Nexus, GitHub, and AWS.

Developed data pipeline using flume, SQOOP, pig and map reduce to ingest customer behavioural data and purchase histories into HDFS for analysis.

Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL

Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.

The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.

Implemented the workflows using Apache OOZIE framework to automate tasks.

Developing design documents considering all possible approaches and identifying best of them.

Written Map Reduce code that will take input as log files and parse the and structures them in tabular format to facilitate effective querying on the log data.

Developed scripts and automated data management from end to end and sync up b/w all the Clusters.

Implemented Fair schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users.

Environment: Cloudera CDH 3/4, Hadoop, HDFS, MapReduce, Hive, Oozie, Pig, Shell Scripting, MySQL.

Role: Data warehouse Developer Jul 2013 – Jul 2017

Client: Hudda InfoTech Private Limited Hyderabad, India

Responsibilities:

Create and maintain database for Server Inventory, Performance Inventory.

Worked in Agile Scrum Methodology with daily stand up meetings, great knowledge working with Visual SourceSafe for Visual studio 2010 and tracking the projects using Trello.

Generated Drill through and Drill down reports with Drop down menu option, sorting the data, and defining subtotals in Power BI.

Used Data warehouse for developing Data Mart which for feeding downstream reports, development of User Access Tool using which users can create ad-hoc reports and run queries to analyze data in the proposed Cube.

Deployed the SSIS Packages and created jobs for efficient running of the packages.

Expertise in creating ETL packages using SSIS to extract data from heterogeneous database and then transform and load into the data mart.

Involved in creating SSIS jobs to automate the reports generation, cube refresh packages.

Great Expertise in Deploying SSIS Package to Production and used different types of Package configurations to export various package properties to make package environment independent.

Experienced with SQL Server Reporting Services (SSRS) to author, manage, and deliver both paper-based and interactive Web-based reports.

Developed stored procedures and triggers to facilitate consistent data entry into the database.

Shared data outside using Snowflake to quickly set up to share data without transferring or developing pipelines.

Environment: Windows server, MS SQL Server 2014, SSIS, SSAS, SSRS, SQL Profiler, Power BI, C#, Performance Point Server, MS Office, SharePoint



Contact this candidate