Azure Data Cloud

Location:

Hyderabad, Telangana, India

Posted:

December 07, 2023

Contact this candidate

Resume:

Akhil Niranjan

***************@*****.*** 704-***-****

PROFESSIONAL SUMMARY

Around 5 years of experience in the software industry, including 5 years of experience in Azure cloud services, and 4 years of experience in Data warehouse.

Experience in Azure Cloud, Azure Data Factory, Azure Data Lake storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NOSQL DB, and Data bricks.

Extensive using Azure activities like Get Metadata, Azure Function, Data Flow, Lookup, webHook, Copy data, and ForEach.

Implemented event driven messages using Azure EventHub.

Experience in building data pipelines, and computing large volumes of data using Azure Data Factory.

Strong knowledge of Extraction Transformation and Loading (ETL) processes using UNIX shell scripting, SQL Loader and handled metadata using messages for data analysis.

Used JIRA as an agile tool to keep track of the tickets that were worked on using the Agile methodology.

Experienced in working in SDLC, Agile and Waterfall Methodologies.

Hands on Experience in using Visualization tools like Tableau, Power BI.

Collaborated with cross-functional teams to design and implement effective data solutions using Azure Purview and other Azure services, fostering data-driven culture and improving overall data management.

Data Mesh, utilizing its capabilities to streamline data governance, integration, and domain-oriented data management.

Experience in developing, supporting, and maintenance for the ETL (Extract, Transform, and Load) processes using Informatica.

Hands on experience on Hadoop, HDFS, Hive, Sqoop, Spark, YARN, Kafka, PySpark, Airflow, Snowflake, SQL, Python.

Experience in developing very complex mappings, reusable transformations, sessions, and workflows using the Informatica ETL tool to extract data from various sources and load it into targets.

Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.

Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into customer usage patterns.

Used various file formats like Parquet, CSV, JSON and for loading data, parsing, gathering, and performing transformations.

Extensive experience in working with Hortonworks Hadoop distribution and fully leverage implementing new Hadoop features.

Designed and created Hive external tables using a shared meta-store with Static & Dynamic partitioning, bucketing, and indexing.

Exploring with Spark improves the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, and RDD.

Extensive hands-on experience tuning spark Jobs.

Experienced in working with structured data using HiveQL, and optimizing Hive queries.

Familiarity with libraries like PySpark, NumPy, Pandas, and Matplotlib in Python.

Writing complex SQL queries using joins, group by, and nested queries.

Experience in HBase to load data using connectors and write queries using NOSQL.

Experience with solid capabilities in exploratory data analysis, statistical analysis, and visualization using R, Python, SQL, and Tableau.

In-depth understanding of Snowflake cloud technology.

Hands-on experience with Kafka and Flume to load the log data from multiple sources directly into HDFS.

TECHNICAL SKILLS

Data Warehousing:

Azure Synapse Analytics (DW), Snowflake, Amazon Redshift,BigQuery, Oracle,

Teradata.

BigData Technologies:

Hadoop, Map Reduce, HDFS, Sqoop, Hive, HBase, Flume, Kafka, Yarn, Apache Spark.

ETL Tools:

Azure Data Factory,SQL Server Integration Services(SSIS), Informatica power center,IBM DataStage

Databases:

SQL Server,MySQL, PostgreSQL, Oracle, DB2,Cassandra

Languages:

Python, Java, Scala, C, C#, R, JavaScript, PHP

Visualization Tools:

Power BI, Tableau

Data Modeling:

ER diagrams, Dimensional data modeling, Star and Snowflake Schema

Scripting:

Shell scripting (Linux/Unix),Python scripting

Version Control:

Git, GitHub,Azure DevOps and TFS.

Other Tools and Technologies

Microsoft Visual Studio, Jupyter Notebook,WinSCP, Filezilla, SQL developer,Toad,Anaconda, PyCharm, Apache

Airflow, Docker, Kubernetes

EXPERIENCE

Data Engineer

Lowe's Companies, Inc - Mooresville, NC

Sep 2021 to Present

Responsibilities:

Built a scalable technical architecture and data processing layer on Azure Cloud to solve business problems for a leading health insurance company.

Performed ETL activities using Azure Data Factory and Databricks.

Set up a development environment on IntelliJ to develop code in Python.

Structured code in different files as per organization standards and following best practices.

Perform Text Cleansing by applying various transformations using Spark Dataframes.

Developed a validation framework in Pyspark to ensure data quality as well as check for accuracy and completeness of data.

Built test functions to locally test the developed functions.

Experience on Migrating SQL database to Azure data Lake, Azure Synapse Analytics, Databricks Delta Lake and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake storage using Azure Data factory.

Performed integration tests to validate the end-to-end ETL pipeline.

Developing and optimizing data pipelines and ETL processes using Azure Synapse Analytics and Data Factory

Use Azure Data Factory to run Databricks jar and schedule ETL jobs.

Leveraged Azure Fabric to facilitate seamless data integration and orchestration across diverse data sources, systems, and domains. Proficient in designing data pipelines and data movement processes.

Use Azure DevOps Release pipelines to deploy ADF using ARM templates from one environment to another.

Utilized Spark Scala to distribute data processing on large streaming datasets to improve the ingestion and processing speed of the data.

Used stored procedure, lookup, execute pipeline, data flow, copy data, and Azure function features in ADF.

Created standard templates in Databricks to load data from the REST API’s.

Have experience implementing Spark jobs performance tuning.

Performed monitoring and management of clusters by using Azure HDInsight

Writing complex SQL queries using joins, group-by, nested queries

Involved in solving complex problems, so be sure to highlight your problem-solving skills and any specific examples of how you have applied these skills in your work.

Implemented Azure Purview to establish robust data governance practices, ensuring data quality, compliance, and security. Proficient in data source connection, data classification, and metadata management

Experience with designing and implementing data pipelines using Azure data services.

Familiarity with the principles of Data Mesh, employing domain-oriented data ownership and decentralized architecture to enhance data scalability, autonomy, and flexibility.

Experience with data modeling and data management in Azure

Experience with working with large datasets and designing scalable solutions.

Troubleshooting and debugging data-related issues.

Developing and maintaining data integration solutions using Azure Data Factory.

Working with Data Warehouses using Azure Synapse Analytics

Tools: Azure SQL Server, Azure Data Factory, ADLS Gen2, Data Bricks, Python, Pyspark, SQL, Spark SQL, IntelliJ IDE, Git, Azure DevOps, Teradata, Azure Synapse Analytics.

Data Engineer

Visa - Austin, TX

Jan 2020 to Sep 2021

Responsibilities:

Understand Business requirements, analysis and translate them into Application and operational requirements.

Designed a one-time load strategy for moving large databases to Azure SQL DWH.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using Azure Data Factory and HDInsight.

Created a framework to do data profiling, cleansing, automatic restart ability of batch pipeline, and handling rollback strategy.

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL

Implemented masking and encryption techniques to protect sensitive data.

Implemented SSIS IR to run SSIS packages from ADF.

Databricks job configuration, Refactoring of ETL Databricks notebooks

Transferring and transforming data with Azure Synapse Analytics pipelines

Capable of using AWS utilities such as EMR, S3, and Cloud Watch to run and monitor Hadoop and Spark jobs on AWS.

Used AWS Athena extensively to ingest structured data from S3 into other systems such as Redshift or to produce reports.

Created tables along with sort and distribution keys in AWS Redshift.

Developed a mapping document to map columns from source to target.

Created Azure data factory (ADF pipelines) using Azure blob.

Performed ETL using Azure Data Bricks. Migrated on-premises Oracle ETL process to Azure Synapse Analytics.

Involved in migrating large amounts of data from OLTP to OLAP using ETL Packages.

Worked on Python scripting to automate the generation of scripts. Data curation was done using Azure data bricks.

Worked on Azure data bricks, PySpark, HDInsight, Azure ADW, and hive used to load and transform data.

Implemented and Developed Hive Bucketing and Partitioning.

Implemented Kafka, spark structured streaming for real-time data ingestion.

Used Azure Data Lake as Source and pulled data using Azure blob.

Good experience working on analysis tools like Tableau, and Splunk for regression analysis, pie charts, and bar graphs.

Developed reports, and dashboards using Tableau for quick reviews to be presented to Business and IT users.

Used stored procedure, lookup, execute pipeline, data flow, copy data, and azure function features in ADF.

Worked on creating a star schema for drilling data. Created PySpark procedures, functions, and packages to load data.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.

Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.

Creating Databricks notebooks using SQL, Python, and automated notebooks using jobs.

Creating Spark clusters and configuring high-concurrency clusters using Azure Databricks to speed up the preparation of high-quality data.

Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks

Technologies: ADF, Databricks and ADL Spark, Hive, HBase, Sqoop, Flume, ADF, Blob, cosmos DB, MapReduce, HDFS, Cloudera, SQL, Apache Kafka, Azure

Data Engineer

Komodo Health - Chicago, IL

June 2018 to Dec 2019

Responsibilities:

Developed data pipeline using Spark, Hive, and HBase to ingest customer behavioral data and financial histories into the Hadoop cluster for analysis.

Working Experience on Azure Databricks cloud to organize the data into notebooks and make it easy to visualize data using dashboards.

Performed ETL on data from different source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Created database tables and stored procedures as required for reporting and ETL needs.

Databricks job configuration, Refactoring of ETL Databricks notebooks

Worked on managing the Spark Databricks.

Implemented data ingestion from various source systems using Sqoop and PySpark.

Performed end-to-end Architecture, and implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue, and Kinesis.

Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue, and Kinesis.

Hands-on experience implementing Spark and Hive jobs performance tuning.

KS by proper troubleshooting, estimation, and monitoring of the clusters.

Performed Data Aggregation, Validation, and on Azure HDInsight using spark scripts written in Python.

Performed monitoring and management of the Hadoop cluster by using Azure HDInsight.

Involved in extraction, transformation, and loading of data directly from different source systems (flat files/Excel/Oracle/SQL) using SAS/SQL, SAS/macros.

Generated PL/SQL scripts for data manipulation, validation, and materialized views for remote instances.

Created partitioned tables in Hive, also designed a data warehouse using Hive external tables, and created Hive queries for analysis.

Good experience working on analysis tools like Tableau, and Splunk for regression analysis, pie charts, and bar graphs.

Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions, and Triggers using SQL and PL/SQL.

Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.

Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.

Wrote Python scripts to parse XML documents and load the data in the database.

Used Hive, Impala, and Sqoop utilities and Oozie workflows for data extraction and data loading.

Created HBase tables to store various data formats of data coming from different sources.

Responsible for importing log files from various sources into HDFS using Flume.

Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files, and CVS files.

Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drove Proof of Concept (POC) and Proof of Technology (POT) evaluations, and implemented a Big Data solution.

Technologies: ADF, Databricks and ADL Spark, Hive, HBase, Sqoop, Flume, ADF, Blob, cosmos DB, MapReduce, HDFS, Cloudera, SQL, Apache Kafka, Azure, Python, Power BI, Unix, SQL Server

Education : Masters In Data Science, Buffalo Suny

Contact this candidate