Resume

Power Bi Data Factory

Location:

Irving, TX

Salary:

110k

Posted:

November 09, 2023

Contact this candidate

Resume:

Name : Bharath M

Phone:469-***-****

Email ID: ad0zrq@r.postjobfree.com

Professional Summary:

Highly experienced professional with 9+years of expertise in Azure Databricks with a strong background in big data analytics and cloud engineering. Proficient in leveraging Azure Databricks to enable efficient data acquisition, processing, and analysis. Skilled in designing and implementing scalable data solutions using Databricks within the Azure ecosystem.

Demonstrated expertise in working with various Azure cloud components, including Storage Explorer, SQL Data Warehouse, Cosmos DB, HDInsight, Databricks, Data Factory, and Blob Storage. Well-versed in Spark technology, utilizing Spark Core, Spark SQL, and Spark Streaming to build high-performance data processing pipelines. Experienced in data modeling, ETL automation, and SQL/NoSQL databases. Strong proficiency in developing interactive dashboards and visualizations using tools such as Tableau, Power BI, and other data visualization platforms. A results-oriented professional with a passion for harnessing the power of Azure Databricks to drive data-driven insights and business value.

Education:

Bachelors in computer science from Nagarjuna University, 2014

GE Healthcare – Chicago, IL Dec 2022 – Present

Databricks Developer

Responsibilities:

Worked with Spark SQL/Data frames and Spark Structured Streaming.

Productionized python Spark applications to transform the ETL layer to feed data into the machine learning models downstream and power BI visualization.

Worked on orchestration and automation of the workflows via Azure Data Factory with Databricks.

Building pipelines in ADF with Databricks notebooks, scheduling triggers as well as in Azure Synapse Analytics.

Worked with data lake in Azure Data Lake Storage (ADLS) Gen2.

Processing stream data with SQL and knowledge on spark-SQL analysis.

Experience with delta lake ingestion and data transformation.

Implemented complex Spark workloads in Azure Databricks, along with dependency management, Git integration.

Created preprocessing work utilizing Spark Data frame Information outlines to smooth JSON report to a flat record.

Developed Spark Scala notebook to perform data cleaning and transformation on various tables.

Responsible for data mapping and data mediation between the source data table and WMO data tables using MS Access and MS Excel.

Experience in data mapping, logic, data modelling, created class diagrams and ER diagrams, and used SQL queries and PL/ SQL stored procedures to filter data within the database.

Building/ Maintaining Docker/ Kubernetes container clusters managed by Kubernetes Linux, Bash, GIT, Docker on GCP. • Experience in using various Python libraries such as Pandas, SciPy, TensorFlow, Kera’s, Scikit-learn.

Experience Working in Spark applications such as memory tuning to improve the processing time and efficiency.

Used relational and non-relational technologies such as SQL and NoSQL to create data warehouse, Data Lake, and ETL systems.

I worked on a lot of performance optimizations, including leveraging a distributed cache for small datasets, partitioning, bucketing, and Map Side Joins.

Tools & Technologies: Azure Databricks, Spark SQL, Azure Data Factory, Azure Synapse Analytics, Azure Data Lake Storage (ADLS) Gen2, Azure Event Hubs, NIFI, Spark, SQL, PL/ SQL, Docker, Kubernetes, Linux, Bash, GIT, GCP, Python, SQL, NoSQL, Apache Kafka, Jenkins, Docker, Terraform.

Ernst & Young – Hoboken, NJ May 2021 – Nov 2022

Databricks Developer

Production support onshore team of 12 and to engage in discussions with business partners and application development and support teams to derive solutions to problem.

Create and walkthrough with business, client, and end-user on High Level Design (HLD), low level Detailed Design (LLD), Requirement and other miscellaneous technical documents such as Change Management implementation plan and Audits records using SharePoint version control.

Preparing detailed flowcharts and process diagrams using Microsoft VISIO.

Use Relational Database Management System (RDBMS) concepts to propose design changes to database objects while maintaining normalized tables and defined integrity constraints and write program/SEQUEL scripts to purge or modify corrupted and redundant data within DB2 database. Further suggest solutions and measures to provide end-to-end data integrity.

Integrated GCP's Pub/Sub for real-time event streaming f and Spark Streaming to perform real-time data processing and analytics.

Implemented data pipelines using Apache Spark and Scala on Azure Databricks, integrating with GCP's Big Query for large-scale data processing and analysis.

Review Root Cause Analysis with development team, suggest technical solutions to mitigate deficiencies and bugs/defects in the programs.

Perform code reviews with cross functional team members and client IT personnel.

Create Project Turnover Kits using change management tools software’s ALDON and prepare rollback plan. Get all the approvals from business, client, IT management and set up appropriate configurations for successful implementation.

Tools & Technologies: Spark, Kafka, GitLab, Hadoop, Python, Tableau, Snowflake, SQL, Sqoop, Java.

Walgreens, Deerfield, IL Oct 2020 – Apr 2021

Data Engineer

Responsibilities:

Gathered business requirements, defined, and designed the data sourcing, and collaborated with the data warehouse architect on the development of logical data models.

Automated cloud deployments using Python and Azure Resource Manager Templates.

Collaborated with current application teams to understand existing applications and made migration recommendations and designed target architectures in Azure.

Created and managed Azure Databricks clusters, leveraging Spark, PySpark, Python, Scala, Hive, and Hadoop.

Utilized Azure Blob Storage for data storage and implemented IAM role-based policies.

Implemented event-driven and scheduled Azure Functions to trigger events and perform actions in response using Python.

Developed and deployed data ingestion pipelines in Azure Databricks, leveraging Spark and PySpark for processing and transforming data.

Conducted data analysis, cleaning, and modeling using PySpark, Python, and Scala.

Utilized Azure Databricks in conjunction with GCP's Big Query and Google Cloud Storage to develop and deploy scalable data processing pipelines, leveraging the power of Spark and Scala for data transformation and analysis.

Developed and implemented data ingestion and ETL processes using Scala and PySpark on Azure Databricks, integrating data from GCP services such as Google Cloud Storage and Big Query.

Implemented various types of data visualizations using Python libraries such as Matplotlib, Seaborn, and PySpark.

Utilized Hive for data storage, querying, and analysis within Azure Databricks.

Worked with large-scale data processing and storage using Hadoop in Azure Databricks.

Collaborated with cross-functional teams to gather requirements and deliver actionable insights through data analysis and visualization.

Tools & Technologies: Azure Databricks, Spark, PySpark, Python, Scala, Hive, Hadoop, Blob Storage, Azure Functions, Virtual Machines, Virtual Network (VNet), Data Lake Storage, Synapse Analytics.

Kaiser Permanente, Pleasanton, CA Jul 2019 – Sept 2020

Data Engineer - II

Involved in setting up the 25 node Hadoop cluster from scratch using Cloudera manager (CDH 5.0)

Implemented the spark streaming job which integration of Flume and Spark streaming and load the

data in to HBase by processing the data with spark streaming using Scala.

Implemented the spark streaming application which gets the data from Kafka producer and load the

data in to HBase by processing the data with spark streaming.

Implemented the spark streaming application which will process the data from multiple files in HDFS and

store the data in to HBase.

Worked on integration of Hive with HBase using HBase Storage Handlers.

Experience in debugging and monitoring the Map Reduce jobs, Spark Jobs, and health of Hadoop cluster.

in Cloudera Manager UI, Hadoop Log files and Job History server.

Involved in developing the java application to load the Hive data into the MySQL database using JDBC.

from hive warehouse directory.

Developed Python modules to test the data compression and de-compression offloading to Cavium.

security processor functionality under Hadoop Environment.

Good working experience with Jira ticketing system for project development.

Documentation of User Requirement Specifications and System Requirement Specification

Tools & Technologies: Spark, YARN, HIVE, Pig, Scala, Mahout, NIFI, Snowflake, Python, Hadoop, Azure, DynamoDB, NOSQL, Sqoop, MYSQL

Accenture INDIA Jul 2014 – May 2019 Project - 1

Senior Software Engineer

Worked with Spark SQL/Data frames and Spark Structured Streaming.

Productionized python Spark applications to transform the ETL layer to feed data into the machine learning models downstream and power BI visualization.

Worked on orchestration and automation of the workflows via Azure Data Factory with Databricks.

Building pipelines in ADF with Databricks notebooks, scheduling triggers as well as in Azure synapse Analytics.

Worked with data lake in Azure Data Lake Storage (ADLS) Gen2.

Processing stream data with SQL and knowledge on spark-SQL analysis.

Implemented real-time low cost and low latency workflows with Azure Event Hubs scheduling.

Created a multi-layered ELT platform which consisted of raw/bronze – ingestion layer, current and silver – processed layer, and mapped/gold - presentation layers.

Balanced the cost of computing by spinning up clusters on-demand vs persisting them and auto scaling on the spark jobs.

Implemented shell scripts to automate and submit spark jobs via spark-submit and spark conf config.

Perform code reviews with cross functional team members and client IT personnel.

Work with business team, client business analysts and client partner to understand the detailed requirements and translate into a business requirement document, technical specs and get approvals from stakeholders keeping in mind the time constraints, resource allocation and cost dependencies.

Tools & Technologies: Python, PySpark, Kafka, Snowflake, Hive, Apache Nifi, Java, SQL, Sqoop, Java, Python, Oracle, SQL Server, HBase

Project - 2

Software Engineer - 1

Responsibilities:

Collect, analyze, and extract data from a variety of sources to create reports, dashboards, and analytical solutions and Assisting with the debugging of Tableau dashboards.

To design and create dashboards, workbooks, and complicated aggregate computations, I used Power BI as a front-end BI tool and MS SQL Server as a back-end database.

Extensively used Informatica Client tools Power Center Designer, Workflow Manager, Workflow Monitor and Repository Manager. Extracted data from various heterogeneous sources like Oracle, Flat Files

Worked with Impala for massive parallel processing of queries for ad-hoc analysis.

Designed and developed complex queries using Hive and Impala for a logistics application.

Using PySpark and SparkSQL, I analyzed and improved relevant data stored in Snowflake.

Configure and monitor resource utilization throughout the cluster using Cloudera Manager, Search, and Navigator.

Using Apache Flume to collect and aggregate huge volumes of log data, then stage the data in HDFS for later analysis.

Used relational and non-relational technologies such as SQL and NoSQL to create data warehouse, data lake, and ETL systems for processing transforming a business demand into a technical design document.

Created Bash scripts to get log files from an FTP server and run Hive tasks to parse them.

Data or databases that are managed and used to assist performance improvement initiatives.

Tuning Hadoop performance with high availability and assisting in Hadoop cluster recovery.

Jenkins build and continuous integration technologies were used and writing Groovy scripts to automate the Jenkins pipeline's integration and delivery service was one of my responsibilities.

Tools & Technologies: - Python, Spark, Kafka, GitLab, Hadoop, Tableau, Snowflake, Hive, Java, Shell-scripting, SQL, Sqoop, Oozie, Java, SQL Server

Contact this candidate