Post Job Free
Sign in

Big Data Platform

Location:
Mississauga, ON, Canada
Posted:
July 23, 2025

Contact this candidate

Resume:

ABDUL SABOOR

Address: Mississauga, Ontario, CANADA Cell: 905-***-**** Legal Status: Canadian Citizen

E-Mail: *************@*******.*** LinkedIn: https://www.linkedin.com/in/abdulsaboor11479/

Professional experience and primary focuses are in building Big Data Platform Architectures in Cloud Platforms – Microsoft Azure, AWS, GCP and development experience in Big Data Applications Frameworks / Solutions using modern Big Data Technologies – Apache Hadoop & Spark Ecosystems. Hands-on development, deployment/implementation and testing experience in popular tools – Python, Scala, SQL, Java. Hands-on experience and futuristic vision in building modern data platform architectures – Lambda, Kappa, and also building application frameworks for data processing – data pipelines, data quality – profiling & DQ-rules, data management & governance, MDM, metadata management, data lineage, semantic layer and also building APIs. Hands-on experience in modern techniques Data Mesh, Data Fabrics, Microservices and Containerization. Proficient in Cloud Platforms and Services – Microsoft Azure, AWS and GCP including Computing, Storage – Data Lakehouse, security, networking, DevOps, Advanced Data Analytics – AI, ML, DL tools/libraries. Secondary Interests and proficiency in Data Science and AI-ML based application/model architecture, AI/ML model development & implementation, data prep, training, monitoring, serving, and MLOps–ML Pipelines, scaling, test and automate deployment in production, and focusing on Generative AI & LLM.

Work Experience

Cloud Solutions Architect – Data and Artificial Intelligence (Senior)

June 2024 – Present

Microsoft: Cloud Platforms and Services – Microsoft Fabric (Big Data Analytics), Microsoft Purview (Data Governance, Record Management, Information Protection & Sensitivity Labels), Databricks, Microsoft Copilot, Azure AI Foundry, Azure ML Studio, Model Catalog, Azure OpenAI Services, etc.

Work on various customer engagements to give customers’ insight about Cloud platform and services and to provide solutions in various business domains particularly in Data and AI such as Microsoft Fabric, Microsoft Purview, Azure AI ML, Azure Open AI, Azure Databricks, Azure Synapse Analytics, Azure Data Factory, and various other Azure Cloud Platform and Services.

Work on multiple deliveries to solve various customers’ business problems including data platform architecture, solution or reference architecture, business strategic, operational and mainly technical problems.

Provide comprehensive trainings and conduct Workshops with various customers teams for session (1-Day, 3- Days, or 5-Days) including Proof of Concepts, Upskilling Execution, Solution/Reference Architecture build and deployment, in multiple subjects or areas such as Microsoft Fabric, Azure OpenAI, Microsoft Purview, Databricks, Databases – Azure SQL, SQL Server, Cosmos DB, etc.

Conduct trainings and workshops sessions to understand and learn modern Azure Cloud Platforms and Services concepts, techniques and do lab sessions or exercises to get hands-on experience, learn new tools/technologies, SDKs, etc. As well as trainings to improve strategic planning and execution skills, decision-making, customer engagement, problem solving, implement privacy and security while dealing with customers documents and data and also business internal documents, achieve business goals and financial targets/quotas, and many other business and operational mandatory trainings included.

Drive technical sales with decision makers and with Microsoft CSAMs using demos and PoCs to influence solution design and enable production deployments

Build trusted relationships with platform leads, co-designing secure, scalable architectures and solution. Resolve technical blockers and objections, collaborating with engineering to share insights and improve products.

Participate in various meetings to learn about new products or partner tools, apps or open AI models such as Microsoft Fabric, Microsoft Purview, Azure OpenAI, Generative AI, LLM, RAG, Prompt Engineering, Vector Search, Semantic Search, Hybrid Search, Embedding, AI Model Catalog, Fine-Tuning, Optimize Performance, etc.

Provide assistance to teams’ members who face challenges in Microsoft Fabric, Purview Data Governance, Information Protect, Sensitivity Label, Data Loss Prevention, AI ML & Azure Open (Copilot, Azure AI Foundry, Azure ML Studio), AI Search, Cognitive Search, Document Intelligence, etc.

Expertise in AI Foundry & App architecture, AI & ML models (Agentic AI framework, Semantic Kernel, Foundry SDK, Responsible AI) and Cloud native dev. (APIs, containerization, microservices, event-driven, Python, .NET)

Lead hands-on engagements—hackathons, PoCs, Upskilling Executions, code-with sessions, MVPs and architecture workshops—to accelerate adoption of Microsoft’s developer tools and cloud platforms, and also these sessions drive production-scale outcomes.

Build GenAI App, ChatBots, AI ML Models using both low-code and high-code platforms such as Azure AI Foundry, Azure Machine Learning Studio, Copilot Studio, Microsoft Fabric Data Science & AIML, Databricks using Python and various libraries such as Pandas, Numpy, SciPy, SciKit-Learn, Keras, PyTorch, Matplotlib, NLTK, Seaborn, Plotly, OpenCV, TensorFlow, Theano, OpenAI Python SDK, LangChain, Semantic Kernel, Autogen, etc.

Sr. Application Consultant – Design and Development (Lead - Level 8)

Apr 2021 – Oct 2023

CIBC Bank – Project: EDGE Platform, Data Migration, Data Management, Data Quality, Data Consumption and Advanced Data Analytics – Dev & Prod Environments

Take requirements from clients and designing business data solutions and also implement those solution using Azure Native Platform and Data Centric tools.

Involve in key strategic solution designs as well as technical developments including Data Migration, Data Consumption, Data Quality along with Data Privacy and Security in the Cloud Platform.

Responsible for creating Project Stories, Tasks in confluence in Jira for both design and development and create confluence pages for documenting those solutions.

I have built a Data Quality Product which include – Ingestion, metadata validation, operational and consumption metadata management and mainly the Data Profiling and Data Quality Rules implement through main Driver Program and Trigger Functions as well as built the ADF data pipeline for scheduled execution.

I have built a Data Quality Library which primarily focus on business data insight through data profiling and data quality rules execution through HOCON configuration utility for dynamic and hybrid approach to deal with variety of datasets or tables with different kinds of attributes.

I am building data pipelines to migrate the Customers’ data at AWS cloud storage (S3) using ADF, Databricks, ADLS Gen2, Autosys, Git, etc and pushing into external vendors’ servers. Created Application Design Documents, STMs, IA documents as for various reports for Customer Rewards and Consolidated Leads Reporting projects.

Build apps using Azure Databricks – PySpark, Scala-Spark, SparkSQL, and Python Libraries, DQ Libraries, etc.

Data Modeling, Schema creation, Data extraction through SQL Queries using MS SQL Server Management Studio

Using various technologies such as Core SQL, SSIS, SSRS, SSAS, Power BI, SQL Azure, Azure Data Lake, Azure Blob Storage, Azure Data Factory, Polybase, Azure Synapse Analytics & SQL, Azure Analytics Services, Azure Databricks, Azure ML Studio, etc. AWS Cloud Tools – AWS Glue, EMR, Kenesis, Kenesis Data Firehose, S3, Redshift, Quicksight, Athena, Glacier, SageMaker, Lake Formation etc.

Contributing in Machine Learning/AI methodologies/techniques and tools such as Azure ML, Databricks, Python Libraries – Pandas, Numpy, Scipy, Scikit-Learn, Matplotlib, Keras, Plotly, TF, etc.

Proprietary & Open-Source Tools: Ataccama, Confluence-Jira, Apache Hudi, Iceberg, Doris, Dash, AWS Deequ, Cloudera Platform – Technologies Stacks – Sqoop, Hive, HBase, Kafka, etc.

Sr. Business Intelligence Developer IV – Team Lead

Mar 2021 – Apr 2021

TD Bank – Project: Data Conversion/Transformation (SAS, Python Based Reports), Data Marts, Data Migration, Business Intelligence and Reporting, Testing – Dev & Prod Environments

I built High- and Low-level reports development plan for Team and work trackers for all assigned reports

I dealt various LOBs / stakeholders to get codes, business logic, reports templates/samples and storage locations such as Data Warehouse, Data Marts, Data Cubes or other locations such as Network Drives, and formats such as Excel, CSV, etc. or share through emails.

Responsible to dealing with report blockers, data acquisition issues and also understanding the reports as well as code complexities or business logics.

Performed Data conversion or transformation from SAS Code, python to SQL or Store Procedures and storing in Data Warehouse, Data Marts or Data Cubes, Excel and CSV Files

Data Modeling, Schema creation, Data extraction through SQL Queries using MS SQL Server Management Studio

Data Migration using SSIS, BI, Data Analytics and Reporting – SSAS, SSRS and using Tableau Server for reporting

For automation and job scheduling – using Alteryx, Bitbucket, Jira, and for documentation – Confluence

Sr. System Developer – ETL, Relational Databases, Business Intelligence & Analytics (Microsoft Azure Platform)

Mar 2020 – Sep 2020

The Cooperators Insurance – Projects: Data Migration, Management and Modernization (ETL to Microsoft Azure Platform, BI & Analytics on Azure – Dev & Prod Environments)

COOP-Project 2: Creating ETL jobs from various data sources such as DB2, Oracle DB, MS SQL Server from old schemas as well as Flat-Files in Informatica – Using Informatica workflow, designer, and workflow monitor, and also performing complex transformation on various attributes or data elements of Insurance and Financial data tables using regular expressions, condition, etc. and storing into new schemas/targets while capturing changes in data (CDC) from backend for Dev-Environment and promoting into Prod Environment.

Debugging, Fixing and adding into existing ETL codes of various Insurance and Financial Jobs/Tasks including mappings, transformations, sessions and workflows as well as creating new ETL jobs, perform testing of both existing and new jobs and fixing the logic on various attributes/data elements calculations, aggregation, filtering and or joining of various tables or attributes through various schemas, lookups from Flat-Files, DBs

Create BI reports daily, weekly and monthly basis and also sharing across different stakeholders by executing various data input files provided based-on requirements while executing relevant workflows in CSV formats

Introduced new development platform and environment using Python, Hadoop & Spark Ecosystems for creating ETL jobs or building data pipelines, Testing – Unit Tests, high performance data mining & extraction, reporting, visual and graph-based analytics of various insurance and financial data using Anaconda and ML libraries such as Numpy, Scikit-Learn, Scipy, Pandas, Keras, Dask, Matplotlib, plotly, scrapy, etc.

COOP-Project 1: Created ETL jobs from various data sources such as Netezza, DB2, Oracle DB from old schemas in Informatica – workflow, designer, and workflow monitor, and also performing transformation on various attributes or data elements of tables using regular expressions, condition, etc. and storing into new schemas while capturing changes in data (CDC) from backend for Dev & Prod Environments.

Designed new enterprise hybrid data platform architecture and data models at Azure cloud and integration with existing systems as well as various other technologies and created the detailed data standards and policies in terms of privacy, security, data governance, management, numerous other aspects related to data in order to fulfill the business and operational needs for modernization of 2020-21 technology implementation plan.

Used Java for developments, jar files for ETL jobs and workflow executions, BASH/Shell-Scripting for writing scripts for ETL jobs including param files for complete ETL process/job execution in DEV & PROD – EVNS

Using Aginity Workbench for Change Data Capture (CDC), creating schemas, executing queries as well as IBM Data Studio for migrations into new schemas and tables, running and testing workflows and jobs

Built data pipelines or designed the bulk load to Microsoft Azure Blob storage, MS SQL Server and Azure Data Lake Storage Gen2 using Azure Data Factory, Snowflake, SSIS, Azure Synapse, Databricks for data migration, transformation and analytics for custom solutions built in existing data platform.

Completed Microsoft Azure trainings including Azure Synapse Analytics, SQL Data Warehouse, Cloud Data Warehousing, Azure Data Factory, Databricks. I also had a training/exercise session of Attunity for CDC and data migration into cloud.

Sr. Data & Cloud Engineer – Big Data, ETL, Data Architecture, AI & Machine Learning, Graph Databases (Google Cloud Platforms)

Sep 2019 – Jan 2020

Scotiabank – Projects: PLATO (AURORA) ETL, Data Platform Architecture, ML & Graph Based Data Analytics, Orchestration – Production Environment

Built Custom applications at massive scale of data migration, processing, transformation and storage in on Google Cloud Platform in Petabytes size from various data sources based on various data platform projects, clients’ requirements and cloud pub/sub message orchestration.

Worked on Python, Python Unit Tests, PySpark for data migration (ETL), transformation and unit tests for pub/sub messages orchestration using DevOps (Confluence, JIRA, Bitbucket), Jenkins pipeline, Jinja, YAML.

Worked in Python for building custom applications, unit testing, git repositories, automation, created tables using big query, cloud functions, cloud pub/sub packages, deployment unit testing and deploying in production.

Job execution through Jenkins Data Pipeline to testing in DEV, QA environments before moving to PROD environment and Job execution through Control-M for scheduling and for final testing in DEV & QA.

Built data pipelines in Google Data Fusion, Autoflow, NiFi, Cloud Data Flow, Dataproc, custom built tools used to migrate and transform data into Google Cloud Platform.

Built high-level and low-level architecture design for various information systems including orchestration, ETL, Data Platform, Graph based analytical platform, Machine Learning architecture for various use cases e.g., Fraud Detection, Anti-Money Laundering, Classification, Recommender Systems, etc.

For Continuous Delivery, Integration, Orchestration – Jenkins, Confluence, JIRA, Bitbucket, Jinga, YAML, Kubernetes, Cloud Functions and Services, etc.

Worked with various stakeholders’ product design, operations, and development teams for building applications and modeling, schema and data strategies such as Data Migration, Data Transformation, Data Governance, Data Quality, Metadata management, Data Analytics and Reporting

Developed App on Graph-based data visualization use cases using Neo4J and associated tools for graph-based data analytics such as Anti-Money Laundering to visualize the origin to destination money transfer with amounts and flagging over limits with IDs, web click streams use case to visualize and analyze the consumer preferences about various products and services

Developed ML Apps using Jupyter (Anaconda, PySpark, LookML, ML, DL & AI Libraries such as Numpy, Scipy, Scikit-Learn, Keras, Pandas, Dask, Scrapy, Matplotlib, Ploty, pyDot, Seaborn, Bokeh, etc) for various use cases such as Fraud Detection, Anti-Money Laundering, Recommender Systems, etc. Manipulating use cases data by using above ML libraries and built models in order to perform scientific calculations and predictions based on algorithms of Supervise, Semi-supervise and Un-supervise learnings categories such as Regression, Classification, Clustering, Decision-Trees, Anomaly Detection, etc.

Sr. Application Consultant – Big Data Solution Architect & Development Lead (Azure & AWS Cloud Platforms)

Apr 2019 – July 2019

Capgemini – Client Sites at Brampton – Projects: Microsoft Azure, Hadoop and Hive Data Platform in Development & Production – Environment

Built Custom applications at massive scale of data migration, processing, transformation and storage in on-premises HDP Platform & Microsoft Azure in Petabytes size from Oracle Exadata and other sources based on various projects requirements and activities such as building Sqoop Jobs for on-premises data migration.

Worked on PySpark data migration and transformation project for Cable, Wireless and CRM users’ data and performed ETL and transformation of number of number of tables in Structured format from Oracle Exadata.

Worked on Custom built Sqoop framework, Java-Application, Shell Scripts, HQL & Scripts for data migration and transformation from Oracle Exadata to HDP in DEV, QA and PROD environments and performed testing.

Sqoop jobs and PySpark jobs code repositories into GitLab, merge into DEV, QA and PROD stable environments and create unique Tags, YML files for copying and execution of HQL, PySpark codes, Parm files and Shell scripts into DEV, QA & PROD environments.

Job execution through Jenkins Data Pipeline to testing in DEV, QA environments before moving to PROD environment and Job execution through Control-M for scheduling and for final testing in DEV & QA.

Created STMs (Source & Target Matrix) for various tables and data attributes/fields for identifying and preparing the final attributes, filters and transformation for the Sqoop and PySpark jobs.

Building data pipelines in Azure using ADF, Snowflake and Azure Databricks to move and transform data into Microsoft Azure Cloud – Azure Blob Storage, Azure Data Lake Gen2, Cosmos DB, Azure SQL Database, and for analytics HDInsight Interactive Query (Hive LLAP), SSIS, Azure Stream Analytics and Azure Data Studio.

For batch and stream Data processing in Azure Cloud using Databricks Notebooks for advanced data analytics.

Created New Schemas and Upgraded Schemas – tables and columns using Spark and Scala, PySpark in writing those into Azure Blob Storage, SQL Server Warehouse for data extraction, analytical and for reporting purposes for management (Production-Env)

Scala & PySpark codes debugging, Created/Wrote Utilities and Functions for various applications, Code Reviews (Pull Requests) at Git, Code Merge from Branch to Master (Production-Env)

For Continuous Delivery, Integration, Orchestration – Jenkins, Control-M, GitLab, Azure DevOps Services

Worked with various stakeholders’ product design, operations, and development teams for building applications and modeling, schema and data strategies such as Data Migration, Data Transformation, Data Governance, Data Quality, Metadata management, Data Analytics and Reporting

Sr. Data Engineer (Big Data – Google Cloud Platform)

Nov 2018 – Jan 2019

BBMTek (Creative Media Works) at Mississauga – Project: Data Platform for Mobile Apps Services (Android and iOS) Users in Production Environment

Building Custom applications at massive scale of data migration, processing, transformation and storage in GCP in Petabytes from various data sources into data platform based on per day requirements and technical challenges for both Android and iOS apps users.

Technologies Used: Apache Spark 2.3.2, Scala, ScalaTest, UnitTest, Maven 3.6.0, Jenkins, JFrog, (Google Cloud Platform – Google Cloud Storage, BigQuery, DataProc, Pub/Sub, Kubernetes), Apache Zeppelin, Apache Airflow, Tableau, NoSQL Databases – Cassandara, HDFS, File Formats Storage Tools – Parquet File, Avro Files, Hive Meta Store, Git, GitHub, IntelliJ Idea (IDE), Batch and Stream Processing – Spark & MapReduce, etc.

95% work based on writing Spark and Scala codes and ScalaTest, UnitTest for ETL and migrating data from different data stores or databases e.g., Plenty, Kochava, Freshdesk, Parquet and Avro Files to Google BigQuery

Created New Schemas and Upgraded Schemas – tables and columns using Spark and Scala in writing those into BigQuery for data extraction, analytical and for reporting purposes for management (Production-Env)

Wrote Transformation code in Spark and Scala and also wrote test cases for Transformation in Scala used IntelliJ Idea and Zeppelin for testing and analytical purposes (Production-Env)

Scala codes debugging, Created/ Wrote Utilities and Functions for various applications, Code Reviews (Pull Requests) at Git, Code Merge from Branch to Master (Production-Env)

For Continuous Delivery, Integration, Infrastructure, Orchestration – Jenkins, Apache Airflow (DAG) at Google Cloud Platform using Google SDK, and other Google Cloud Services.

Worked with various stakeholders’ product design, operations, and development teams for building applications and modeling, schema and data strategies such as Data Migration, Data Transformation, Data Governance, Data Quality, Metadata management, Data Analytics and Reporting

Data Manager (Big Data Solutions, Cloud Architecture, NoSQL DBs and Analytics – Google Cloud Platform)

Aug 2018 – Oct 2018

Nobul Corporation at Toronto – Project: Database and Data Modeling, Migration, Management, and Integration with Web App and Mobile App on Google Cloud

Built Data Migration Strategies on Google Cloud from Firebase buckets for Web App releases V1.0 to V2.0

Defined data migration, integration with mobile apps, governance and analytics strategies, built Data Architecture and Database schemas to Migrate data from Firebase (Real Time Database) to NoSQL database/platform such as Cloud BigQuery, Cloud FireStore, MongoDB, CassandraDB, HBase, etc.

Ingested structured, semi & unstructured data e.g. File Formats e.g. CSV, JSON and Firebase to NoSQL databases for scalability, real time processing to automate all the jobs in Real Estate both Canada and USA for Buying and Selling properties both residential and commercial, Biding Platform for Agents on behalf Buyers and Sellers. Tools used; Talend, Apache Airflow, Matillion ETL, Cloud Data Transfer Service, Pub/Sub Cloud Shell at GCP and migrated into BigQuery, MongoDB, Google Cloud Storage Buckets

Built Hybrid environment for data ingestion from Firebase and File Formats using JavaScript, Google Data Transfer and Storage Service and processing (MapReduce, Hadoop, Spark) and storage environment (MongoDB, Redis, BigTable, HBase) as well integrated with Nobul – Web App and Mobile App, business intelligence and analytics tools for data visualization and reporting for both business and customer intelligence

Built Spark custom applications for Buyers, Sellers, Agents while offering best deals while processing requests or deal with real-time from a number of brokers and also making real-time analytics of buyers, sellers’ properties and agents deals (completed or broken) both hybrid and real-time processing of Canada and USA properties data with polygon and referrals.

Used Google Maps Platforms for maps and places for location data in both USA and Canada properties as well as Google Maps Data Layer for geospatial data in order to store custom data or to display GeoJSON on Google Map (through APIs) into our platform for both Web and Mobile Apps consumers and brokers in order to draw polygons on our Maps for various properties

Integrated Google Cloud Platform with NoSQL real-time databases with Web App V2.0 and Mobile Apps both Android and iOS for Buyers, Sellers and Agents of properties data and also deals with real-time streaming

Built cloud-native data Warehouse for Quality Data, Data Governance, Data and Metadata Management. Implementing lambda and kappa architecture as well for both batch and real time data processing jobs through various data processing engines such as Hadoop, Spark. Workflow orchestration and stream analytics

For Continuous Delivery, Integration, Infrastructure, Orchestration – Nagios, Jenkins, Apache Maven. For repository – Github, Cloud Tools – Visual Studio, Google Map Platform

Worked with various stakeholders’ product design, operations, QA, Front-end and Back-end development team for building database management systems (modeling, schema) and data strategies such as Data Migration, Data Transformation, Data Governance, Data Quality, Metadata management and Data Analytics

Performed visual analytics while creating dashboards and reporting in Data Studio, data mining and analytics in BigQuery Analytics, as well as used DataLab to explore, transform, visualize and built machine learning models for example-built models to do the predictive analysis, consumer behaviours both buyers and sellers, market forecasting for individual and business consumers.

Big Data Architect (Enterprise Solutions and Data Architecture – AWS)

Jan 2018 – Apr 2018

Ontario Ministry of Education at Toronto – Project: Case & Grant Management Solutions Branch Employment Ontario Solutions (EOS) Community Service Cluster (CSC)

Built Big Data Architecture and Data Lake architecture after analyzing the existing system technical problems, challenges and requirements, Designed a Model for Data scope, consumption pattern, services and Data Governance with Security (Masking – Column & Row level security), Metadata Management, Data Lineage

Deployed Hortonworks HDP/HDF cluster in AWS (Cloud Environment – RHEL/CentOS), Administer & Monitor HDP/HDF environment using Ambari, Install and Configure services or frameworks, resources management, third-party tools installations and configuration with HDP e.g., Informatica (ETL), Crystal Reports, Tableau (For BI Reporting and Analytics)

Hortonworks HDP/HDF in AWS Cloud – Ambari, Hadoop, HDFS, YARN, Slider, HBase, Hive, Tez, Phoenix, MapReduce, Spark, Storm, Solr, Pig, Hue, Sqoop, Flume, Oozie, Zookeeper, Zeppelin, NiFi, Cloudbreak, Ranger, Knox, Falcon, Atlas, NFS, HDFS Encryption, WebHDFS, Amazon – AWS Glue (ETL), EMR, Kinesis, Kinesis Data Firehose, Streams and Analytics, RedShift (DWH), EMR, Athena (Interactive Query Service), ElasticSearch Service, DynamoDB, Lambda, S3 (Object Store), SageMaker (ML), Lake Formation, Quicksight, Glacier, etc.

Ingested unstructured data e.g., Log Files/SysLogs using Flume, semi-structured using File Formats e.g., CSV, JSON, and structured data into the HDFS, Hive DWS from Oracle DB and from various Data Warehouses as well as Data Marts. Move RDMBS data to HDFS & Hive DWS using Sqoop & NiFi. Automate all the jobs for extracting the data from OracleDB, File Formats to HDFS & Hive DWS, perform transformation using Spark

Built Hybrid environment for data ingestion for various DWH, DMs, processing (MR, Tez, Spark) and storage environment (HDFS, Hive, HBase) as well integrated with multiple business intelligence and data visualization tools (Crystal Reports, IBM Cognos, Tableau Server/Desktop, Qlikview, etc.)

Built Hadoop, Hive based data Warehouses and integrate Hadoop with Oracle Data Warehouse systems. Develop HBase Schemas to migrate data from HDFS, a big data Solutions for handling Millions of records

Worked on micro-services frameworks and deployed lambda & kappa architecture for both batch and real time data processing jobs, design various Oozie workflows with coordinator jobs to execute different data pipelines

Metadata Management, Data Quality, Data Governance and Security using Apache Atlas, Falcon and Ranger, Knox for data, column and record level accessibility while masking data

For automation and virtualization (containers) – Docker, Virtual Box, JBoss (SW). For Continuous Delivery, Integration, Infrastructure, Orchestration – Nagios, Jenkins, Bamboo, Puppet, Apache Maven, NewRelic

IDEs – Eclipse, Pycharm, IntelliJ Idea, Spark-Shell (Scala, SBT, Python), Testing Tools – JTest, Junit, PyTest, PyUnit, ScalaTest, UnitTest, JMeter, Selenium, TestStudio, etc. Version Control System – Git/Github, GitLab.

Develop HIVE queries using Beeline, Hue, Tez, SaprkSQL, use Crystal Report and IBM Cognos for BI Reporting, Data Analysis, Use Tableau for Data Visualization

Big Data and Metadata Management Consultant – Dev. & Admin.

Feb 2017 – July 2017

Canada Imperial Bank of Commerce (CIBC) at Toronto – Project: Data Consumption Services Enterprise Data Technology Corporate Centre Technology

Administered Microsoft Azure Cloud & AWS – create and manage VMs e.g., Terminal-Server (Windows Server 2016), Cloudera VMS – CentOS (Masters, Workers, and Gateway/Edge Nodes), Windows Server 2016 Active Directory – Created users & groups, Created RSA keys (Public & Private) & provided secure access through Kerberos (tickets) and used LDAP, group policy administration and role-based access security management

Deployed various Vendor’s products e.g., Datameer, Oracle Financial Service Analytical Application, Vertica, IBM InfoSphere IGC & DataStage, Tableau, Syncsort (ETL), Schedule1 – DataPassports

Managed and developed in Cloudera Technology Stack – Cloudera – Director, Manager and Navigator, Hadoop, HDFS, YARN, HBase, Kudu, Hive, Impala, MapReduce, Spark, Parquet, Avro, HCatalog, Solr, Pig, Cloudera Search, Sentry, Hue, Sqoop, Flume, Confluent, Kafka, Oozie, Zookeeper, Docker, Kubernetes

Microsoft – Data Factory, Snowflake, HDInsight, ADLS Gen 1 & Gen 2, Azure Stream Analytics, Data Lake Analytics, Data Explorer, Machine Learning, Databricks, SQL Data Warehouse, SSIS, SSRS, SSAS.

Protect data in transit through data protection strategy while using SSL/TLS protocols, to exchange data across different locations, appropriate safeguards such as HTTPS or VPN e.g., Azure VPN gateway – send encrypted traffic between on-premises location over the public internet

Certified Trained Cloudera Developer for Spark and Hadoop, Certified Administrator for Cloudera technologies

Worked in agile environment with agile methodologies, Designed and built financial applications (Corporate,



Contact this candidate