ad2nsy@r.postjobfree.com +1-289-***-**** www.linkedin.com/in/sharmar
Canadian Citizen Federal Reliability Clearance
CLOUD DATA SOLUTION ARCHITECT
A data professional with more than 23 years of experience working with data management systems across breadth of industries.
A Professional equally comfortable working with old and the new
oOn Premises and Cloud Solutions (Azure, AWS)
oLegacy and modern technology stacks (RDBMS, Hadoop, spark, Kafka, ElasticSearch/Solr)
oOn premises and cloud (Solaris/Unix, Azure, AWS)
oData warehouse & Data lakes (Oracle, Azure, Snowflake, Synapse)
Data Architect / Modeler with experience & exposure to breadth of patterns – ranging from denormalized data marts to key value stores to 3NF OLTP solutions. Exposed to Data Mesh / Data Fabric etc.
Experience & expertise with data governance – implementing lineage enabling solutions, models.
Experience developing business glossary and data ontologies.
Experience in business transformation, platform modernization - strategies, roadmap development
Experience with application & data migration across technologies,
olegacy to modern,
oon-premises to cloud
PROFESSIONAL DEVELOPMENT
TOGAF 9 Workshop Capgemini Inc.
Azure Data Bricks Training Program Databricks Inc.
Advanced Data Architecture Program Accenture Inc.
Big Data on AWS Global Knowledge
Talend Talend Inc.
SUMMARY OF SKILLS
Big Data Ecosystem Tools
HDFS, Hive, impala, Spark, python (PySpark), Scala, Solr, ElasticSearch, Unix Shell Scripting, Apache-Nifi, Kafka
Cloud - Azure
Azure Synapse, Azure SQL, ADLS Gen2, Azure Web Apps, Azure Data Factory, Synapse Notebooks (pyspark), Azure Batch, Azure Key Vault, Azure Container Registry, Container app, Azure Storage, Azure logic app, Azure Function
Cloud- AWS
Redshift, RDS
RDBMS
Snowflake, Oracle, Teradata, MySQL, PostgreSQL and SQL Server
No SQL Data Stores
MongoDB, Neo4J, Hbase
Data Science
R, RStudio, sparklyr, python, pandas, numpy, scikit-learn
Data Governance
Business Glossary Creation, CDE Identification / Design, Data Lineage Creation, Talend Data Catalogue, Ontology Design, IBM Information Governance Catalogue (IGC)
File Formats
Parquet, Avro, JSON, XML, ORC, Delimited Files
Front End Frameworks
Django, Angular, Nodejs, Ruby on Rails
ETL / Data Management
Talend Open Studio, Nifi, Azure Data Factory, Pentaho, Alteryx
Machine Learning / AI
Supervised / Unsupervised learning, Linear / Non-Linear Regression, Natural Language Processing, Image Processing, Computer Vision
Others
Scrum-Agile, Performance Optimization, Data Replication (Oracle Golden Gate), openAI APIs, custom chatGPT, customizing LLMs
Orchestration
Azure Data Factory, Tivoli Workload Scheduler, Control-M, Autosys, Redwood Cronacle, Docker
Domain Exposure
Financial Services (Capital Markets, Wealth Management, Retail, Fraud & AML), Insurance, Public Sector, Transportation, Telecom
PROFESSIONAL EXPERIENCE
ViprSha Inc. 02/2023 – 12/2023
Azure Data Engineer / Architect: IBM
Worked with team to formulate a cloud migration plan, develop guidelines and roadmap for the same.
Conducted Proof of Technologies for various options
oPython – ADF - Azure Batch
oPython – Databricks
oPython – Azure Synapse
oPython – LLM extensions / enhancements
Designed & developed data pipelines for ingesting / transforming data from legacy sources.
Developed a metadata driven rule engine for executing data validation rules on incoming data – managing a hierarchy of dependencies and decision tree.
Developed an API engine using Azure Web App / Swagger to invoke data validation rules on an attribute
Developed parsing scripts to extract meaningful information from corpus of code (repository of code).
Conceptualized, designed & implemented a Data Management framework.
Worked with openAI APIs to extract business logic for a given piece of code.
Technical Stack
Microsoft Azure
Python, Shell Scripting
ViprSha Inc. 04/2022 – 02/2023
Data Technical Architect - Cargill
Designed Data Pipelines for Cargill Protein & Salt (CPS) Data Lake bringing data from different types of data sources, transforming, and exposing to internal consumers.
Conceptualized, designed & implemented a data lake processing framework to track and manage data & processes on the platform. Worked with teams in India, Costa Rica, US, Canada – across multiple time zones. The data & metadata is managed in Apache Impala, with APIs exposed in three languages Python, Scala & bash shell scripting. This simplifies developing and maintaining the processes on the data lake.
Designed solution for ingesting data in the data lake from an externally hosted webservice.
Ingested time series data from manufacturing plant floors machines into the CPS data lake. Correlated and analyzed this data with other data streams
Designed and implemented a data lake pipeline to process and load data from SAP into Salesforce (SFDC), enabling integration of manufacturing data with the sales and marketing data for one of the business units of the company.
Setup design & development best practices for current and upcoming initiatives thereby increasing code and components reusability factor exponentially.
Technical Stack
Cloudera Hadoop Distribution
Python, Scala, Shell Scripting
Sqoop, Spark, Hive, Impala
ViprSha Inc. 10/2020 – 04/2022
Business Data Architecture Consultant – ScotiaBank
Developed Business Data Glossary with DGO - US Finance & Trade Surveillance team. Responsibilities included researching analyzing and developing term definitions and registration in IGC.
Helped develop components of LoanIQ integration layer by developing data pipelines - working with Scala / Python Spark SQL, Kafka, ElasticSearch using Jupyter notebooks.
Worked with Ontology development team and helped develop processes for eventual migration of the business glossary into ontologies.
Technical Stack
Azure
Python, Scala, Jupyter
Spark, Kafka, ElasticSearch
IBM Information Governance Console (IGC), Protege
ViprSha Inc. 07/2020 – 09/2020
Talend/Snowflake/Data Governance Consultant – Coop Financial [Through Capgemini Inc.]
Helped with implementation & rollout strategy of Talend Data Catalogue (TDC) at Coop. The platform enabled the Data Governance team to collect and manage metadata for the data stored in various tools.
Created a sample Business Glossary and Data Lineage using Talend Data Catalogue as a Proof of Concept and Usability.
oWorked with Talend Professional Services.
Designed and developed Data ingestion framework using Talend Open Studio, to ingest data from ADLS Gen2 storage to Snowflake cloud data warehouse.
Technical Stack
Azure, ADLS G2
Hadoop
Python, Shell Scripting
Spark, Hive, Snowflake, SnowPipe
Talend Open Studio, Talend Data Catalogue
Accenture Inc. - Toronto, Canada 11/2017 – 05/2020
Architect / Lead Developer – CMHC Data Exchange (2019-2020)
Led greenfield design and development of a big Data Lake to implement a data exchange platform based on Azure PaaS Services at Canada Mortgage and Housing Corporation (CMHC). The platform enabled Mortgage industry partners (e.g., banks, lenders) to share data and collaborate to develop new data products helping disseminate value from the data.
oModelled and articulated the metadata driven architecture using industry best practices
e.g., Instrumented application design, Error Monitoring / Logging, automated code deployment using git.
oDesigned & developed data pipelines using PySpark (on azure DataBricks), orchestrated using Azure Data Factory (ADF) and loaded data into Azure Synapse for eventual consumption by PowerBI.
oDeveloped Data model for the Azure Synapse to support the PowerBI environment.
oThe solution is primarily based on Azure PaaS Platform using Azure Databricks, ADLS G2, ADF, Azure Data Warehouse, PowerBI.
Technical Stack
Azure, ADLS G2, Azure DataBricks, Azure Data Factory
Scala, Python, Shell Scripting
Spark, Hive, Azure Synapse
Lead Data Developer – CMHC AIP Data Lake (2019)
Led improvement initiatives on the CMHC Data Platform to deliver new data supply pipelines supporting various business cases.
Developed standards & guidelines for development, deployment, operational support, Access Control, Data sensitivity classification, version control / management etc. to increase the environment stability, efficiency, reusability, and performance.
Developed data pipeline to ingest data extracts from CMHC Human Resources Department in Data Lake. The lake is primarily based on Azure IaaS Platform involving Cloudera, Kerberos, PySpark, sparklyr, Jupyter.
Led Proof of Concepts initiatives including Alteryx, optimization of R code on Cloudera Hadoop platform (using sparklyr).
Technical Stack
Azure IaaS
Hadoop
R, Scala, Python, Shell Scripting, SparklyR
Spark, Hive
Nifi, Alteryx
Lead Data Developer/Designer – Metrolinx (2018)
Designed and developed data pipelines on Hadoop platform to visualize train delays for Toronto Suburban Transit authority (GoTransit).
Developed algorithms to analyze and perform Root Cause Analysis for delays as well as identify hotspots for delays.
Designed and implemented data management model for organizing various types of data in the data lake, including emails, csv, structured data.
Implemented data management framework for ingesting and processing unstructured data from emails, excel documents and RDBMS sources.
Worked with Cloudera Distribution of Hadoop with Scala & Python.
Technical Stack
Cloudera Hadoop Distribution
Scala, Python, Shell Scripting
Spark, Hive
Capgemini Inc. 11/ 2010 – 11/2017
Senior Developer / Architect – RBC Fraud & Analytics (2017)
Toronto, Canada
Designed and implemented data pipelines supporting Fraud IT application landscape at RBC.
Designed and Implemented a near real time anomaly detection model for card transaction data. Ingested data using Kafka and Spark Streaming into hbase on the data lake through NiFi. The anomaly detection rules were configurable and were run for each transaction on the stream.
Designed and implemented a multi core architecture for solr to manage different schemas for managing multiple data sources.
Worked with Spark, Kafka, NiFi, Scala, maven, Nexus, GitHub on Hortonworks Hadoop platform.
Technical Stack
Hortonworks Hadoop Distribution
Scala, Shell Scripting
Solr
Spark, Hive, Hbase
Nifi
Solution Designer– RBC Wealth (2016-2017)
Toronto, Canada
Worked on initiative for developing conceptual architecture for information hub for RBC Wealth.
Partnered with business and IT leadership to develop a conceptual architecture for a central data repository (data lake) for RBC Wealth’s data, both internal and external.
Worked with vendors to evaluate / compare / prototype the Change Data Capture (CDC) products including Oracle Golden Gate, Attunity Replicate
Lead Developer / Designer – Barclays (2014-2016)
London, United Kingdom
Designed and developed data supply chains for PPI remediation and related data products.
Created 3NF Data Model for managing data from different source systems.
Helped design and review an initiative to collate and analyze card transactions data onto a Hadoop data lake to enable visualizations.
Helped design and implement an extensible data pipeline enabling calculation of PPI remediation amount for a customer. Worked closely with business stakeholders and delivered a simple user-friendly design. The data pipeline looked up data from a reference database (Teradata) and extracted transaction information for the accounts. Extensively worked on shell scripts, bteq scripts and data modelling.
Technical Stack
Cloudera Hadoop Distribution
Scala, Shell Scripting
Spark
Teradata, Oracle
SQL, Teradata BTEQ Scripting
ERStudio
Lead Developer / Architect – Verizon (2012-2014)
Bangalore, India
Designed and developed a data migration solution to support Billing Applications consolidation.
Architected an extensible solution to enable data ingestion and transformation over a data pipeline from Mainframe to Oracle.
Developed reusable components in shell scripts, informatica PowerExchange, PowerCenter, Oracle PLSQL.
Metadata driven design ensured a stable, repeatable, reusable, instrumented solution.
Technical Stack
Shell Scripting
Informatica PowerCenter, PowerExchange
Oracle, SQL / PLSQL
SQL, PLSQL Scripting
OTHER RELEVANT EXPERIENCE
DWH Design Lead – Sony Network Entertainment Inc. (2010-2012)
Bangalore, India / Los Angeles, USA
Designed and developed ETL pipelines for ingestion and transformation of Clickstream data for Sony’s PlayStation website into their multi-Terabyte data warehouse to help optimize the batch jobs improving performance of analytics processes.
Implemented a highly reusable informatica PowerCenter pipeline reducing ingestion effort for a group of files by a factor of 10.
Technical Stack
Shell Scripting
Informatica PowerCenter
Oracle, Oracle Exadata
SQL, PLSQL Scripting
EDUCATION
Master of Computer Management (1998-2000) Dr B R Ambedkar University, Agra, India
oMajor – Computer Applications
oElective – International Finance Management
Raghvendra Sharma
CERTIFICATIONS
1.Snowflake Certified: SnowPro Core
2.Microsoft Certified: Azure Fundamentals
3.Cloudera Certified Hadoop Developer
4.Informatica Certified Administrator & Architect
5.Certified SAFe® 4 Agilist