Azure Data Engineer

Location:

Texas City, TX

Posted:

February 02, 2024

Contact this candidate

Resume:

Rama chandra Karnati

Azure Data Engineer

Phone: 210-***-**** Email: ************@*****.***

Professional Summary

Highly skilled and experienced Azure Data Engineer with 10.3 years of expertise in Azure Cloud and Big Data.

Proficient in Azure Services like. Data Factory, Databricks, Synapse, Azure DevOps, Data Lake Gen2, Polybase, Event Hubs, Blob Storage, Azure Cosmos DB, Logic Apps, Key Vault, Azure DevOps, Azure monitor, Scala, Python.

Worked on Big Data ecosystem, Hadoop, Sqoop, Kafka, Apache spark, Apache Airflow, Hive, MapR, Oozie, Zookeeper, HDFS, Shell scripting.

Worked on Data warehouse for developing Data Mart which for feeding downstream reports, by using SSIS, SSRS, Power BI, MS SQL Server 2014.

Developed data ingestion pipelines to efficiently extract data from diverse sources, including on-premises systems, cloud-based platforms, databases, and APIs, and load it into target data repositories using Azure Data Factory.

Implemented ETL transformations and validations using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory.

Integrated Azure Logic Apps with other Azure services such as Azure Functions, Azure Service Bus, and Azure Storage, leveraging their capabilities to enhance data processing, message handling, and storage within the workflows.

Designed and optimized data models and schemas within Azure Synapse Analytics to ensure efficient storage, retrieval, and querying of data, resulting in improved performance and reduced costs.

Developed and deployed scalable and managed Azure HDInsight clusters to process and analyze large volumes of data, enabling efficient data processing and analytics.

Designed and implemented scalable and distributed data solutions using Azure Cosmos DB, a globally distributed, NoSQL database service, to efficiently store semi-structured, and unstructured data.

Leveraged Azure Blob Storage's durability and high availability features to ensure data reliability and accessibility, meeting stringent data governance and compliance requirements.

Designed and developed data solutions to efficiently process both real-time and batch data using Azure Stream Analytics, Azure Event Hub, Service Bus Queue, and ADLS Gen2.

written python and Scala code for gathering the data from snowflake, data pre-processing, feature extraction, feature engineering, modelling and evaluating.

Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.

Expertise in Snowflake Data Warehousing Demonstrated proficiency in designing and implementing data warehousing solutions using Snowflake, including schema design, data modeling, and optimizing query performance to meet business requirements.

Strong experience in utilizing Kafka, Spark Streaming, and Hive to process streaming data, developing robust data pipelines for ingestion, transformation, and analysis.

Administered and optimized Kafka clusters, ensuring high availability and reliability. Proficient in troubleshooting and fine-tuning configurations for optimal performance.

Strong experience in designing, building, and maintaining data integration programs within Hadoop and RDBMS environments.

Experienced in building and optimizing data models and schemas using technologies like Apache Hive, Apache HBase for efficient data storage and retrieval for analytics and reporting.

Proficient in developing and maintaining ETL/ELT workflows using technologies Apache Spark, Apache Beam, Apache Airflow for efficient data extraction, transformation, and loading processes.

Skilled in executing Hive scripts through Hive on Spark and Spark SQL to address various data processing needs.

Proficient in Cassandra data modeling, optimization, and performance tuning, resulting in improved query efficiency and overall system performance.

Demonstrated proficiency in configuring and managing HDFS, implementing data replication strategies, and optimizing storage to accommodate diverse big data workloads.

Extensive experience in the Hadoop ecosystem, including HDFS, MapReduce, YARN, and Sqoop, contributing to the efficient storage, processing, and transfer of large-scale data.

Strong experience in design and development of Business Intelligence solutions using data modelling, Dimension Modelling, ETL Processes, Data Integration, OLAP and OLTP client /server application. Skilled in performance tuning to optimize system responsiveness and efficiency.

Optimized data processing and analytics performance by utilizing Azure Data Lake Storage Gen2's capabilities, such as parallel data processing, optimized file formats (Parquet, ORC, JSON, AVRO, Delimited), and efficient data indexing, enabling faster query execution and reduced latency.

Skilled in implementing data quality checks and cleansing techniques to ensure data accuracy and integrity throughout the pipeline.

Collaborated with Azure DevOps to optimize CI/CD pipeline, resulting in faster upgrades to customer-facing applications and shorter time-to-market for software releases.

Proficient in Spark Core and Spark SQL scripts using Scala to accelerate data processing capabilities.

Have been part of Waterfall to Agile transformation and have been part of scrum team following agile Methodologies.

Technical Skills:

Azure Services

Azure data Factory, Azure Data Bricks, ADLS Gen2, Logic Apps, Functional App, Azure DevOps, Azure Key vaults, Azure HDInsight, Azure Event HUB, Azure Monitor.

Big Data Technologies

Hadoop, MapR, Hive, spark (PySpark), Kafka, Oozie, Sqoop, Zookeeper, Airflow, YARN, Cassandra, Mongo DB.

Programming

Python, SQL, PL/SQL, HiveQL, Scala.

Databases & Build Tools.

MS SQL Server, Azure SQL DB, Oracle 11g/12c, Mongo DB, Postgres SQL, SSMS, SSRS, SSIS.

Data Warehouse

Snowflake, Synapse analytics.

Version Control Tools

GIT, GitHub, Azure DevOps

IDE & Build Tools.

PyCharm, Visual Studio, SSMS.

Methodologies

Agile, waterfall

Work Experience:

Role: Sr Azure Data Engineer Mar 2022 – Till Date

Client: Love’s Travel Stops – Oklahoma.

Responsibilities:

Designed and implemented Extract, Transform, Load (ETL) pipelines leveraging Azure Databricks and Azure Data Factory

Enhanced Azure Functions code to efficiently extract, transform, and load data from various sources such as databases, Restful APIs, and file systems, resulting in optimized data processing and integration.

Integrated on-premises (MySQL, SQL Server) and cloud-based (Blob storage, Azure SQL DB) data using Azure Data Factory.

Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package

Developed event-driven data pipelines using Azure Event Hubs to ingest and process large volumes of events and messages in real-time, facilitating real-time analytics, monitoring, and alerting.

Experience in loading the on-premise SQL & Netezza data to Azure Blob Storage using Azure Data Factory and then to snowflake using, Snow SQL.

Works on loading data into Snowflake DB in the cloud from various sources.

Experience with Snowflake Multi - Cluster Warehouses.

Design and implement data storage solutions using Azure services such as Azure SQL Database, Azure Cosmos DB, and Azure Data Lake Storage.

Extensive experience leading and mentoring cross-functional teams of data engineers.

Proven ability to provide technical guidance, set best practices, and drive collaboration to achieve project goals.

Utilized Azure Synapse and PolyBase for seamless data transfer, enabling efficient movement of data between different systems and achieving streamlined data integration.

Implemented monitoring and logging mechanisms within Azure Logic Apps to track workflow execution, troubleshoot issues, and ensure the reliability and performance of data integration processes.

Created Hive tables to store the processed results in a tabular format. Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning and system monitoring.

Proficiently utilized scripting languages such as Python, PySpark, and Scala to streamline data processing tasks, achieving efficient and effective data manipulation and analysis.

Implemented Kafka, Spark Streaming, and Hive to process streaming data, resulting in the creation of a reliable data pipeline that effectively ingests, transforms, and analyzes real-time data.

Developed and designed system to collect data from multiple portals using Kafka and then process it using spark.

Implemented optimized query techniques and indexing strategies to enhance data fetching efficiency.

Utilized Spark Core and Spark SQL scripts using Scala to accelerate data processing capabilities.

Actively participated in Agile ceremonies, including daily stand-ups and internationally coordinated PI Planning, ensuring efficient project management and execution.

Implemented a CI/CD framework for data pipelines using the Azure Devops, enabling efficient automation and deployment.

Environment: Azure Data Factory, Azure Databricks, Azure synapse, Azure Data Lake storage-2, Azure DevOps, Logic Apps, Azure Function app, Azure cosmos DB, MS SQL, Oracle, SQL, Python, Scala, PySpark, data integration, data modeling, data pipelines, production support, Shell scripting, Kafka, Power Bi.

Role: Azure Data Engineer Mar 2020 – Feb 2022

Client: T-Mobile – Texas.

Responsibilities:

Implemented efficient data integration solutions to seamlessly ingest and integrate data from diverse sources, including databases, APIs, and file systems, using tools like Apache Kafka, Apache NiFi and Azure Data Factory.

Ingested data into Azure services including Azure Data Lake and processed the data within Azure Databricks, achieving seamless data integration and enabling advanced data processing and analysis.

Worked on Microsoft Azure services like HDInsight Clusters, BLOB, Data Factory and Logic Apps and done POC on Azure Data Bricks.

Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing, and reporting of voluminous, rapidly changing data.

Enhanced Spark performance by optimizing data processing algorithms, leveraging techniques such as partitioning, caching and broadcast variables.

Led the migration of SQL database to Azure Data Lake Gen2, Azure Blob Storage, Azure SQL Database, and Data Bricks, facilitating seamless data transfer and storage in the Azure ecosystem.

Controlling and granting database access and Migrating on Premise databases to azure data lake store using Azure Data Factory.

Implemented database access control and authorization, ensuring secure data handling practices.

Orchestrated the seamless migration of on-premises databases to Azure Data Lake Store using Azure Data Factory, ensuring efficient data transfer and storage in the cloud environment.

Deployed and optimized Python web applications to Azure DevOps CI/CD to focus on development.

Developed enterprise level solution using batch processing and streaming framework using Spark Streaming, apache Kafka.

Developed and maintained end-to-end data pipelines using Apache Spark, Apache Airflow, and Azure Data Factory, ensuring reliable and timely data processing and delivery.

Collaborated with cross-functional teams to gather requirements, design data integration workflows, and implement scalable data solutions.

Provided production support and troubleshooting for data pipelines, identifying, and resolving performance bottlenecks, data quality issues, and system failures.

Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.

Processed both schema-oriented and non-schema-oriented data using Scala and Spark, enabling efficient data manipulation and analysis.

Implemented partitioning and bucketing techniques based on state to optimize data processing, specifically utilizing bucket-based Hive joins.

Created Hive Generic UDFs to handle varying business logic based on policy requirements, enhancing data processing capabilities.

Demonstrated proficiency in working with Data Lakes and big data ecosystems such as Hadoop, Spark, Hortonworks, and Cloudera.

Loaded and transformed large volumes of structured, semi-structured, and unstructured data, ensuring smooth data integration and processing.

Developed Hive queries for data analysis, aligning them with business requirements and leveraging Hive tables and Hive QL to simulate Map R functionalities.

Constructed a robust data pipeline using Kafka, Spark, and Hive to ingest, transform, and analyze data, facilitating streamlined data processing workflows.

Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, MS SQL, Oracle, Hadoop, HDFS, Map Reduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, production support, Shell scripting, Azure DevOps, Kafka, Power Bi.

Role: Data Engineer. Nov 2017 – Jan 2020

Client: Global Atlantic Financial Group – Indiana.

Responsibilities:

Designed and developed the applications on the data lake to transform the data according business users to perform analytics.

In depth understanding of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts.

Constructed a robust data pipeline using Kafka, Spark, and Hive to ingest, transform, and analyze data, facilitating streamlined data processing workflows.

Utilized PySpark for RDDs and Data Frames, enabling comprehensive data analysis and processing tasks.

Implemented Spark scripts using Scala and Spark SQL to seamlessly access Hive tables within Spark, resulting in accelerated data processing.

Employed Spark Streaming to segment streaming data into batches, providing input to the Spark engine for efficient batch processing.

Leveraged PySpark and Spark SQL for accelerated testing and processing of data within Spark, ensuring efficient data handling and analysis.

Involved in developing a Map Reduce framework that filters bad and unnecessary records.

Involved heavily in setting up the CI/CD pipeline using Maven, Git and GitHub.

Developed data pipeline using flume, SQOOP, pig and map reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.

Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL

Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.

The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.

Implemented the workflows using Apache OOZIE framework to automate tasks.

Developing design documents considering all possible approaches and identifying best of them.

Written Map Reduce code that will take input as log files and parse the and structures them in tabular format to facilitate effective querying on the log data.

Developed scripts and automated data management from end to end and sync up b/w all the Clusters.

Implemented Fair schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users.

Environment: Cloudera CDH 3/4, Hadoop, HDFS, Airflow, Kafka, Zookeeper, Map R, Hive, Oozie, Pig, Shell Scripting, MySQL.

Role: Big Data Engineer. Apr 2015 – Feb 2017

Client: Silicon Valley bank – California.

Responsibilities:

Worked with the sourcing team to understand the format and delimiters of the data file.

Running Periodic Map-Reduce jobs to load data from Cassandra into Hadoop.

Involved in creating Hive tables, loading with data, and writing Hive queries, which will invoke and run Map-Reduce jobs in the backend.

In-depth knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node, and map-reduce concepts.

Involved in developing a Map Reduce framework that filters bad and unnecessary records.

Written the HIVE queries to extract the data processed.

The Hive tables Implemented as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.

Implemented the workflows using the Apache Oozie framework to automate tasks.

Developing design documents considering all possible approaches and identifying the best of them.

Worked on GIT to maintain source code in Git and GitHub repositories.

Performed all necessary day-to-day GIT support for different projects, Responsible for maintenance of the GIT Repositories, and the access control strategies.

Environment: Cloudera Distribution, Hadoop, HDFS, Map R, Cassandra, Hive, Oozie, Pig, Shell Scripting, MySQL, GIT.

Data Warehouse Developer Feb 2013 - Mar 2015

Cigna HealthCare, Connecticut.

Responsibilities

Create and maintain database for Server Inventory, Performance Inventory.

Worked in Agile Scrum Methodology with daily stand-up meetings, great knowledge working with Visual SourceSafe for Visual studio 2010 and tracking the projects using Trello.

Worked with SSMS in conjunction with SQL Server Integration Services (SSIS) for designing, deploying, and managing ETL processes.

Proficient in using SSMS for troubleshooting and resolving database issues, including error identification and resolution.

Generated Drill through and Drill down reports with Drop down menu option, sorting the data, and defining subtotals in Power BI.

Used Data warehouse for developing Data Mart which for feeding downstream reports, development of User Access Tool using which users can create ad-hoc reports and run queries to analyze data in the proposed Cube.

Created logical and physical designs of the database and ER Diagrams for Relational and Dimensional databases using Erwin.

Deployed the SSIS Packages and created jobs for efficient running of the packages.

Expertise in creating ETL packages using SSIS to extract data from heterogeneous database and then transform and load into the data mart.

Involved in creating SSIS jobs to automate the reports generation, cube refresh packages.

Thorough knowledge of Features, Structure, Attributes, Hierarchies, Star and Snow Flake Schemas of Data Marts.

Experienced with SQL Server Reporting Services (SSRS) to author, manage, and deliver both paper-based and interactive Web-based reports.

Developed stored procedures and triggers to facilitate consistent data entry into the database.

Environment: Windows server, MS SQL Server 2014, SSIS, SSAS, SSMS, SSRS, SQL Profiler, Dimensions, snowflake, Power BI, Performance Point Server, MS Office, SharePoint.

Contact this candidate