Data Management Azure

Location:

Burlington, ON, Canada

Posted:

January 10, 2024

Contact this candidate

Resume:

********@*******.*** +1-289-***-**** www.linkedin.com/in/sharmar

Canadian Citizen Federal Reliability Clearance

CLOUD DATA SOLUTION ARCHITECT

A data professional with more than 23 years of experience working with data management systems across breadth of industries.

A Professional equally comfortable working with old and the new

oOn Premises and Cloud Solutions (Azure, AWS)

oLegacy and modern technology stacks (RDBMS, Hadoop, spark, Kafka, ElasticSearch/Solr)

oOn premises and cloud (Solaris/Unix, Azure, AWS)

oData warehouse & Data lakes (Oracle, Azure, Snowflake, Synapse)

Data Architect / Modeler with experience & exposure to breadth of patterns – ranging from denormalized data marts to key value stores to 3NF OLTP solutions. Exposed to Data Mesh / Data Fabric etc.

Experience & expertise with data governance – implementing lineage enabling solutions, models.

Experience developing business glossary and data ontologies.

Experience in business transformation, platform modernization - strategies, roadmap development

Experience with application & data migration across technologies,

olegacy to modern,

oon-premises to cloud

PROFESSIONAL DEVELOPMENT

TOGAF 9 Workshop Capgemini Inc.

Azure Data Bricks Training Program Databricks Inc.

Advanced Data Architecture Program Accenture Inc.

Big Data on AWS Global Knowledge

Talend Talend Inc.

SUMMARY OF SKILLS

Big Data Ecosystem Tools

HDFS, Hive, impala, Spark, python (PySpark), Scala, Solr, ElasticSearch, Unix Shell Scripting, Apache-Nifi, Kafka

Cloud - Azure

Azure Synapse, Azure SQL, ADLS Gen2, Azure Web Apps, Azure Data Factory, Synapse Notebooks (pyspark), Azure Batch, Azure Key Vault, Azure Container Registry, Container app, Azure Storage, Azure logic app, Azure Function

Cloud- AWS

Redshift, RDS

RDBMS

Snowflake, Oracle, Teradata, MySQL, PostgreSQL and SQL Server

No SQL Data Stores

MongoDB, Neo4J, Hbase

Data Science

R, RStudio, sparklyr, python, pandas, numpy, scikit-learn

Data Governance

Business Glossary Creation, CDE Identification / Design, Data Lineage Creation, Talend Data Catalogue, Ontology Design, IBM Information Governance Catalogue (IGC)

File Formats

Parquet, Avro, JSON, XML, ORC, Delimited Files

Front End Frameworks

Django, Angular, Nodejs, Ruby on Rails

ETL / Data Management

Talend Open Studio, Nifi, Azure Data Factory, Pentaho, Alteryx

Machine Learning / AI

Supervised / Unsupervised learning, Linear / Non-Linear Regression, Natural Language Processing, Image Processing, Computer Vision

Others

Scrum-Agile, Performance Optimization, Data Replication (Oracle Golden Gate), openAI APIs, custom chatGPT, customizing LLMs

Orchestration

Azure Data Factory, Tivoli Workload Scheduler, Control-M, Autosys, Redwood Cronacle, Docker

Domain Exposure

Financial Services (Capital Markets, Wealth Management, Retail, Fraud & AML), Insurance, Public Sector, Transportation, Telecom

PROFESSIONAL EXPERIENCE

ViprSha Inc. 02/2023 – 12/2023

Azure Data Engineer / Architect: IBM

Worked with team to formulate a cloud migration plan, develop guidelines and roadmap for the same.

Conducted Proof of Technologies for various options

oPython – ADF - Azure Batch

oPython – Databricks

oPython – Azure Synapse

oPython – LLM extensions / enhancements

Designed & developed data pipelines for ingesting / transforming data from legacy sources.

Developed a metadata driven rule engine for executing data validation rules on incoming data – managing a hierarchy of dependencies and decision tree.

Developed an API engine using Azure Web App / Swagger to invoke data validation rules on an attribute

Developed parsing scripts to extract meaningful information from corpus of code (repository of code).

Conceptualized, designed & implemented a Data Management framework.

Worked with openAI APIs to extract business logic for a given piece of code.

Technical Stack

Microsoft Azure

Python, Shell Scripting

ViprSha Inc. 04/2022 – 02/2023

Data Technical Architect - Cargill

Designed Data Pipelines for Cargill Protein & Salt (CPS) Data Lake bringing data from different types of data sources, transforming, and exposing to internal consumers.

Conceptualized, designed & implemented a data lake processing framework to track and manage data & processes on the platform. Worked with teams in India, Costa Rica, US, Canada – across multiple time zones. The data & metadata is managed in Apache Impala, with APIs exposed in three languages Python, Scala & bash shell scripting. This simplifies developing and maintaining the processes on the data lake.

Designed solution for ingesting data in the data lake from an externally hosted webservice.

Ingested time series data from manufacturing plant floors machines into the CPS data lake. Correlated and analyzed this data with other data streams

Designed and implemented a data lake pipeline to process and load data from SAP into Salesforce (SFDC), enabling integration of manufacturing data with the sales and marketing data for one of the business units of the company.

Setup design & development best practices for current and upcoming initiatives thereby increasing code and components reusability factor exponentially.

Technical Stack

Cloudera Hadoop Distribution

Python, Scala, Shell Scripting

Sqoop, Spark, Hive, Impala

ViprSha Inc. 10/2020 – 04/2022

Business Data Architecture Consultant – ScotiaBank

Developed Business Data Glossary with DGO - US Finance & Trade Surveillance team. Responsibilities included researching analyzing and developing term definitions and registration in IGC.

Helped develop components of LoanIQ integration layer by developing data pipelines - working with Scala / Python Spark SQL, Kafka, ElasticSearch using Jupyter notebooks.

Worked with Ontology development team and helped develop processes for eventual migration of the business glossary into ontologies.

Technical Stack

Azure

Python, Scala, Jupyter

Spark, Kafka, ElasticSearch

IBM Information Governance Console (IGC), Protege

ViprSha Inc. 07/2020 – 09/2020

Talend/Snowflake/Data Governance Consultant – Coop Financial [Through Capgemini Inc.]

Helped with implementation & rollout strategy of Talend Data Catalogue (TDC) at Coop. The platform enabled the Data Governance team to collect and manage metadata for the data stored in various tools.

Created a sample Business Glossary and Data Lineage using Talend Data Catalogue as a Proof of Concept and Usability.

oWorked with Talend Professional Services.

Designed and developed Data ingestion framework using Talend Open Studio, to ingest data from ADLS Gen2 storage to Snowflake cloud data warehouse.

Technical Stack

Azure, ADLS G2

Hadoop

Python, Shell Scripting

Spark, Hive, Snowflake, SnowPipe

Talend Open Studio, Talend Data Catalogue

Accenture Inc. - Toronto, Canada 11/2017 – 05/2020

Architect / Lead Developer – CMHC Data Exchange (2019-2020)

Led greenfield design and development of a big Data Lake to implement a data exchange platform based on Azure PaaS Services at Canada Mortgage and Housing Corporation (CMHC). The platform enabled Mortgage industry partners (e.g., banks, lenders) to share data and collaborate to develop new data products helping disseminate value from the data.

oModelled and articulated the metadata driven architecture using industry best practices

e.g., Instrumented application design, Error Monitoring / Logging, automated code deployment using git.

oDesigned & developed data pipelines using PySpark (on azure DataBricks), orchestrated using Azure Data Factory (ADF) and loaded data into Azure Synapse for eventual consumption by PowerBI.

oDeveloped Data model for the Azure Synapse to support the PowerBI environment.

oThe solution is primarily based on Azure PaaS Platform using Azure Databricks, ADLS G2, ADF, Azure Data Warehouse, PowerBI.

Technical Stack

Azure, ADLS G2, Azure DataBricks, Azure Data Factory

Scala, Python, Shell Scripting

Spark, Hive, Azure Synapse

Lead Data Developer – CMHC AIP Data Lake (2019)

Led improvement initiatives on the CMHC Data Platform to deliver new data supply pipelines supporting various business cases.

Developed standards & guidelines for development, deployment, operational support, Access Control, Data sensitivity classification, version control / management etc. to increase the environment stability, efficiency, reusability, and performance.

Developed data pipeline to ingest data extracts from CMHC Human Resources Department in Data Lake. The lake is primarily based on Azure IaaS Platform involving Cloudera, Kerberos, PySpark, sparklyr, Jupyter.

Led Proof of Concepts initiatives including Alteryx, optimization of R code on Cloudera Hadoop platform (using sparklyr).

Technical Stack

Azure IaaS

Hadoop

R, Scala, Python, Shell Scripting, SparklyR

Spark, Hive

Nifi, Alteryx

Lead Data Developer/Designer – Metrolinx (2018)

Designed and developed data pipelines on Hadoop platform to visualize train delays for Toronto Suburban Transit authority (GoTransit).

Developed algorithms to analyze and perform Root Cause Analysis for delays as well as identify hotspots for delays.

Designed and implemented data management model for organizing various types of data in the data lake, including emails, csv, structured data.

Implemented data management framework for ingesting and processing unstructured data from emails, excel documents and RDBMS sources.

Worked with Cloudera Distribution of Hadoop with Scala & Python.

Technical Stack

Cloudera Hadoop Distribution

Scala, Python, Shell Scripting

Spark, Hive

Capgemini Inc. 11/ 2010 – 11/2017

Senior Developer / Architect – RBC Fraud & Analytics (2017)

Toronto, Canada

Designed and implemented data pipelines supporting Fraud IT application landscape at RBC.

Designed and Implemented a near real time anomaly detection model for card transaction data. Ingested data using Kafka and Spark Streaming into hbase on the data lake through NiFi. The anomaly detection rules were configurable and were run for each transaction on the stream.

Designed and implemented a multi core architecture for solr to manage different schemas for managing multiple data sources.

Worked with Spark, Kafka, NiFi, Scala, maven, Nexus, GitHub on Hortonworks Hadoop platform.

Technical Stack

Hortonworks Hadoop Distribution

Scala, Shell Scripting

Solr

Spark, Hive, Hbase

Nifi

Solution Designer– RBC Wealth (2016-2017)

Toronto, Canada

Worked on initiative for developing conceptual architecture for information hub for RBC Wealth.

Partnered with business and IT leadership to develop a conceptual architecture for a central data repository (data lake) for RBC Wealth’s data, both internal and external.

Worked with vendors to evaluate / compare / prototype the Change Data Capture (CDC) products including Oracle Golden Gate, Attunity Replicate

Lead Developer / Designer – Barclays (2014-2016)

London, United Kingdom

Designed and developed data supply chains for PPI remediation and related data products.

Created 3NF Data Model for managing data from different source systems.

Helped design and review an initiative to collate and analyze card transactions data onto a Hadoop data lake to enable visualizations.

Helped design and implement an extensible data pipeline enabling calculation of PPI remediation amount for a customer. Worked closely with business stakeholders and delivered a simple user-friendly design. The data pipeline looked up data from a reference database (Teradata) and extracted transaction information for the accounts. Extensively worked on shell scripts, bteq scripts and data modelling.

Technical Stack

Cloudera Hadoop Distribution

Scala, Shell Scripting

Spark

Teradata, Oracle

SQL, Teradata BTEQ Scripting

ERStudio

Lead Developer / Architect – Verizon (2012-2014)

Bangalore, India

Designed and developed a data migration solution to support Billing Applications consolidation.

Architected an extensible solution to enable data ingestion and transformation over a data pipeline from Mainframe to Oracle.

Developed reusable components in shell scripts, informatica PowerExchange, PowerCenter, Oracle PLSQL.

Metadata driven design ensured a stable, repeatable, reusable, instrumented solution.

Technical Stack

Shell Scripting

Informatica PowerCenter, PowerExchange

Oracle, SQL / PLSQL

SQL, PLSQL Scripting

OTHER RELEVANT EXPERIENCE

DWH Design Lead – Sony Network Entertainment Inc. (2010-2012)

Bangalore, India / Los Angeles, USA

Designed and developed ETL pipelines for ingestion and transformation of Clickstream data for Sony’s PlayStation website into their multi-Terabyte data warehouse to help optimize the batch jobs improving performance of analytics processes.

Implemented a highly reusable informatica PowerCenter pipeline reducing ingestion effort for a group of files by a factor of 10.

Technical Stack

Shell Scripting

Informatica PowerCenter

Oracle, Oracle Exadata

SQL, PLSQL Scripting

EDUCATION

Master of Computer Management (1998-2000) Dr B R Ambedkar University, Agra, India

oMajor – Computer Applications

oElective – International Finance Management

Raghvendra Sharma

CERTIFICATIONS

1.Snowflake Certified: SnowPro Core

2.Microsoft Certified: Azure Fundamentals

3.Cloudera Certified Hadoop Developer

4.Informatica Certified Administrator & Architect

5.Certified SAFe® 4 Agilist

Contact this candidate