Post Job Free

Resume

Sign in

Data Quality Migration

Location:
Mississauga, ON, Canada
Salary:
$150,000 - $170,000
Posted:
November 07, 2023

Contact this candidate

Resume:

ABDUL SABOOR

Address: **** ******* ******, ***********, *******, L5N 5Y2, CANADA.

Cell: 905-***-**** E-Mail: ad0xgc@r.postjobfree.com

Professional working experience in Big Data Technologies stack – Apache Hadoop and Apache Spark Ecosystems as well as Cloudera (CDH) and Hortonworks (HDP, HDF) development platforms. Hands-on coding and testing in Scala, Python, Java, APIs, and perform developments using Microservices Architecture, containerize applications, Lambda, Kappa and Zeta Architectures, CAP Theorem. Proficient in Cloud Platforms & Services – Microsoft Azure, Amazon AWS, GCP including computing, storage, security, networking, CI/CD, Data Lake, DWH, advanced analytics – AI, ML & DL.

Work Experience

Sr. Application Consultant – Design and Development (Lead - Level 8)

Apr 2021 – Present

CIBC Bank – Project: EDGE Platform, Data Migration, Data Management, Data Quality, Data Consumption and Advanced Data Analytics – Dev & Prod Environments

Take requirements from clients and designing business data solutions and also implement those solution using Azure Native Platform and Data Centric tools.

Involve in key strategic solution designs as well as technical developments including Data Migration, Data Consumption, Data Quality along with Data Privacy and Security in the Cloud Platform.

Responsible for creating Project Stories, Tasks in confluence in Jira for both design and development and create confluence pages for documenting those solutions.

I have built a Data Quality Product which include – Ingestion, metadata validation, operational and consumption metadata management and mainly the Data Profiling and Data Quality Rules implement through main Driver Program and Trigger Functions as well as built the ADF data pipeline for scheduled execution.

I have built a Data Quality Library which primarily focus on business data insight through data profiling and data quality rules execution through HOCON configuration utility for dynamic and hybrid approach to deal with variety of datasets or tables with different kinds of attributes.

I am building data pipelines to migrate the Customers’ data at AWS cloud storage (S3) using ADF, Databricks, ADLS Gen2, Autosys, Git, etc and pushing into external vendors’ servers. Created Application Design Documents, STMs, IA documents as for various reports for Customer Rewards and Consolidated Leads Reporting projects.

Build apps using Azure Databricks – PySpark, Scala-Spark, SparkSQL, and Python Libraries, DQ Libraries, etc.

Data Modeling, Schema creation, Data extraction through SQL Queries using MS SQL Server Management Studio

Using various technologies such as Core SQL, SSIS, SSRS, SSAS, Power BI, SQL Azure, Azure Data Lake, Azure Blob Storage, Azure Data Factory, Polybase, Azure Synapse, Azure Analytics Services, Azure Databricks, etc.

Contributing in Machine Learning/AI methodologies/techniques and tools such as Azure ML, Databricks, Python Libraries – Pandas, Numpy, Scipy, Scikit-Learn, Matplotlib, Keras, Plotly, TF, etc.

Sr. Business Intelligence Developer IV – Team Lead

Mar 2021 – Apr 2021

TD Bank – Project: Data Conversion/Transformation (SAS, Python Based Reports), Data Marts, Data Migration, Business Intelligence and Reporting, Testing – Dev & Prod Environments

I built High- and Low-level reports development plan for Team and work trackers for all assigned reports

I dealt various LOBs / stakeholders to get codes, business logic, reports templates/samples and storage locations such as Data Warehouse, Data Marts, Data Cubes or other locations such as Network Drives, and formats such as Excel, CSV, etc. or share through emails.

Responsible to dealing with report blockers, data acquisition issues and also understanding the reports as well as code complexities or business logics.

Performed Data conversion or transformation from SAS Code, python to SQL or Store Procedures and storing in Data Warehouse, Data Marts or Data Cubes, Excel and CSV Files

Data Modeling, Schema creation, Data extraction through SQL Queries using MS SQL Server Management Studio

Data Migration using SSIS, BI, Data Analytics and Reporting – SSAS, SSRS and using Tableau Server for reporting

For automation and job scheduling – using Alteryx, Bitbucket, Jira, and for documentation – Confluence

Sr. System Developer – ETL, Relational Databases, Business Intelligence & Analytics (Microsoft Azure Platform)

Mar 2020 – Sep 2020

The Cooperators Insurance – Projects: Data Migration, Management and Modernization (ETL to Microsoft Azure Platform, BI & Analytics on Azure – Dev & Prod Environments)

COOP-Project 2: Creating ETL jobs from various data sources such as DB2, Oracle DB, MS SQL Server from old schemas as well as Flat-Files in Informatica – Using Informatica workflow, designer, and workflow monitor, and also performing complex transformation on various attributes or data elements of Insurance and Financial data tables using regular expressions, condition, etc. and storing into new schemas/targets while capturing changes in data (CDC) from backend for Dev-Environment and promoting into Prod Environment.

Debugging, Fixing and adding into existing ETL codes of various Insurance and Financial Jobs/Tasks including mappings, transformations, sessions and workflows as well as creating new ETL jobs, perform testing of both existing and new jobs and fixing the logic on various attributes/data elements calculations, aggregation, filtering and or joining of various tables or attributes through various schemas, lookups from Flat-Files, DBs

Create BI reports daily, weekly and monthly basis and also sharing across different stakeholders by executing various data input files provided based-on requirements while executing relevant workflows in CSV formats

Introduced new development platform and environment using Python, Hadoop & Spark Ecosystems for creating ETL jobs or building data pipelines, Testing – Unit Tests, high performance data mining & extraction, reporting, visual and graph-based analytics of various insurance and financial data using Anaconda and ML libraries such as Numpy, Scikit-Learn, Scipy, Pandas, Keras, Dask, Matplotlib, plotly, scrapy, etc.

COOP-Project 1: Created ETL jobs from various data sources such as Netezza, DB2, Oracle DB from old schemas in Informatica – workflow, designer, and workflow monitor, and also performing transformation on various attributes or data elements of tables using regular expressions, condition, etc. and storing into new schemas while capturing changes in data (CDC) from backend for Dev & Prod Environments.

Designed new enterprise hybrid data platform architecture and data models at Azure cloud and integration with existing systems as well as various other technologies and created the detailed data standards and policies in terms of privacy, security, data governance, management, numerous other aspects related to data in order to fulfill the business and operational needs for modernization of 2020-21 technology implementation plan.

Used Java for developments, jar files for ETL jobs and workflow executions, BASH/Shell-Scripting for writing scripts for ETL jobs including param files for complete ETL process/job execution in DEV & PROD – EVNS

Using Aginity Workbench for Change Data Capture (CDC), creating schemas, executing queries as well as IBM Data Studio for migrations into new schemas and tables, running and testing workflows and jobs

Built data pipelines or designed the bulk load to Microsoft Azure Blob storage, MS SQL Server and Azure Data Lake Storage Gen2 using Azure Data Factory, Snowflake, SSIS, Azure Synapse, Databricks for data migration, transformation and analytics for custom solutions built in existing data platform.

Completed Microsoft Azure trainings including Azure Synapse Analytics, SQL Data Warehouse, Cloud Data Warehousing, Azure Data Factory, Databricks. I also had a training/exercise session of Attunity for CDC and data migration into cloud.

Sr. Data & Cloud Engineer – Big Data, ETL, Data Architecture, AI & Machine Learning, Graph Databases (Google Cloud Platforms)

Sep 2019 – Jan 2020

Scotiabank – Projects: PLATO (AURORA) ETL, Data Platform Architecture, ML & Graph Based Data Analytics, Orchestration – Production Environment

Built Custom applications at massive scale of data migration, processing, transformation and storage in on Google Cloud Platform in Petabytes size from various data sources based on various data platform projects, clients’ requirements and cloud pub/sub message orchestration.

Worked on Python, Python Unit Tests, PySpark for data migration (ETL), transformation and unit tests for pub/sub messages orchestration using DevOps (Confluence, JIRA, Bitbucket), Jenkins pipeline, Jinja, YAML.

Worked in Python for building custom applications, unit testing, git repositories, automation, created tables using big query, cloud functions, cloud pub/sub packages, deployment unit testing and deploying in production.

Job execution through Jenkins Data Pipeline to testing in DEV, QA environments before moving to PROD environment and Job execution through Control-M for scheduling and for final testing in DEV & QA.

Built data pipelines in Google Data Fusion, Autoflow, NiFi, Cloud Data Flow, Dataproc, custom built tools used to migrate and transform data into Google Cloud Platform.

Built high-level and low-level architecture design for various information systems including orchestration, ETL, Data Platform, Graph based analytical platform, Machine Learning architecture for various use cases e.g., Fraud Detection, Anti-Money Laundering, Classification, Recommender Systems, etc.

For Continuous Delivery, Integration, Orchestration – Jenkins, Confluence, JIRA, Bitbucket, Jinga, YAML, Kubernetes, Cloud Functions and Services, etc.

Worked with various stakeholders’ product design, operations, and development teams for building applications and modeling, schema and data strategies such as Data Migration, Data Transformation, Data Governance, Data Quality, Metadata management, Data Analytics and Reporting

Developed App on Graph-based data visualization use cases using Neo4J and associated tools for graph-based data analytics such as Anti-Money Laundering to visualize the origin to destination money transfer with amounts and flagging over limits with IDs, web click streams use case to visualize and analyze the consumer preferences about various products and services

Developed ML Apps using Jupyter (Anaconda, PySpark, ML, DL & AI Libraries such as Numpy, Scipy, Scikit-Learn, Keras, Pandas, Dask, Scrapy, Matplotlib, Ploty, pyDot, Seaborn, Bokeh, etc) for various use cases such as Fraud Detection, Anti-Money Laundering, Recommender Systems, etc. Manipulating use cases data by using above ML libraries and built models in order to perform scientific calculations and predictions based on algorithms of Supervise, Semi-supervise and Un-supervise learnings categories such as Regression, Classification, Clustering, Decision-Trees, Anomaly Detection, etc.

Big Data & Data Science Expert – Data Architecture, DL, DWH, BI, Analytics. Data & Cloud Platforms – Cloudera & Hortonworks, Azure, AWS, GCP

Jan 2018 – Present

CTO (Co-Founder) – Creative Spree Inc. at Mississauga

Specialties in Data Engineering – Building Big Data Platform Architecture, Creating Data Pipelines, Data Transformation & Cleansing, Data Migration and Storage into High Performance & Scalable Storage Tools, Creating Data Lake & Data Warehouse, Performing Advanced Analytics.

Specialties in Data Science – Building ML Models for various use cases or different kinds of problems by using ML Algorithms such as Supervise, Semi-Supervise and Un-Supervise Learning for Predictive Analysis.

Expertise in Cloud Platforms, Technologies and Services – Microsoft Azure, Google Cloud Platform, Amazon AWS including platforms Compute, Storage, Databases, Developer Tools, DevOps, Containers, Networking, Identity, Integration, Migration, Analytics, AI and Machine Learning applications, tools and services.

Expertise in Big Data Platforms – Cloudera and Hortonworks Big Data, Data Migration, Data Transformation, Data Storage, Data Governance and Management, Data Privacy and Security, and also Data Science Platforms or Environments and Tools

Gather business requirements from key stakeholders and translate them into a comprehensive data solution involving data migration and transformation strategies into Cloud platform to create Data Lake as well as Data Warehouse while creating data pipelines to batch and stream processing

Amazon – RedShift (DWH), Kinesis Firehose, Streams and Analytics, Athena (Interactive Query Service), EMR, ElasticSearch Service, DynamoDB, Lambda, Amazon Machine Learning, S3 (Object Store)

Microsoft – Data Factory, Snowflake, Azure Blob Storage, HDInsight, ADLS Gen2, Azure Stream Analytics, Data Lake Analytics, Data Explorer, Machine Learning, Databricks, SQL Data Warehouse, SSIS, SSRS, SSAS.

Protect data in transit through data protection strategy while using SSL/TLS protocols, to exchange data across different locations, appropriate safeguards such as HTTPS or VPN e.g. Azure VPN gateway – send encrypted traffic between on-premises location over the public internet

Ingested unstructured (using Flume, Kafka), semi-structured (NiFi, Talend, StreamSets, Kafka, CLI, etc) and structured (RDBMS) data (using Sqoop) into the HDFS from MySQL, Oracle, and NoSQL data sources. Automate all the jobs for extracting the data from different Data Sources or create Data Pipelines from sources like MySQL, Oracle, PostgreSQL, and Schema-based File formats parquet and Avro Files and also JSON, XML, CSV files data to ingest the data into HDFS, used ETL for data ingestion – Talend, Syncsort, Informatica, NiFi, Pentaho, Apache Airflow/Azkaban, and perform transformations using Spark, Hive & Pig while loading into HDFS/HBase

Deployed Cassandra cluster in Cloud, designed data models and worked with Cassandra query language. Cassandra DB modeling, data storage, administration, performance tuning, backup and recovery, maintenance of DB, and migrated several databases from on-premises to Cassandra DB. Created, run and monitored Spark jobs on Cassandra cluster for a validation and loading from various sources e.g. Oracle DB

Write Scala-Spark, PySpark codes for data migration (ETL), data transformation and unit tests, Map Reduce java programs to analyze the log data for large-scale data sets, Use Oozie to orchestrate the map reduce jobs that extract the data on a timely manner

Write Hive Queries for analyzing data in Hive warehouse using Hive Query Language and Impala Queries

Design and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, used Apache Pig & Java-based map-reduce. Tools use for data transformation and analysis Pig/Hive

Designed the real time analytics and ingestion platform using Storm and kafka. Hands on experience with multiple NoSQL databases including HBase, MongoDB, Redis, Neo4J, OrientDB and Cassandra. Catalog, Index and Search – HCatalog, Solr, Elasticsearch, Apache Atlas (Metadata management and Data Lineage)

For automation and virtualization – Kubernetes, Docker, VMware, VirtualBox, Hyper-V. For Continuous Delivery & Integration, Infrastructure, Orchestration – Nagios, Puppet, Chef, Ansible, Vagrant, CircleCI, TeamCity, Microsoft Cloud Monitoring Services. For Testing – JTest, Junit, PyTest, PyUnit, ScalaTest, UnitTest, etc.

Built HIVE queries for the Data Analysis, Using Pentaho, PowerBI, JasperReport Server, SpagoBI, Micro-strategy, IBM Cognos, BIRT advanced analytics and reporting as well as Dashboards creations, Tableau, SAS Visual Analytics and QlikView for visual reporting, AtScale, Use R & RStudio for statistical calculations and analysis.

Developed ML Apps using Jupyter (Anaconda, PySpark, ML, DL & AI Libraries such as Numpy, Scipy, Scikit-Learn, Keras, Pandas, Dask, Scrapy, Matplotlib, Ploty, pyDot, Seaborn, Bokeh, etc) for various use cases such as Fraud Detection, Anti-Money Laundering, Customer Churn Analysis, Web Click Search, Recommender Systems, etc. Manipulating use cases data by using above ML libraries and built models in order to perform scientific calculations and predictions based on algorithms of Supervise, Semi-supervise and Un-supervise learnings categories such as Regression, Classification, Clustering, Decision-Trees, Anomly Detection, etc.

Sr. Application Consultant – Big Data Technical Architect Lead (Azure & AWS Cloud Platforms)

Apr 2019 – July 2019

Capgemini – Client Sites at Brampton – Projects: Microsoft Azure, Hadoop and Hive Data Platform in Development & Production – Environment

Built Custom applications at massive scale of data migration, processing, transformation and storage in on-premises HDP Platform & Microsoft Azure in Petabytes size from Oracle Exadata and other sources based on various projects requirements and activities such as building Sqoop Jobs for on-premises data migration.

Worked on PySpark data migration and transformation project for Cable, Wireless and CRM users’ data and performed ETL and transformation of number of number of tables in Structured format from Oracle Exadata.

Worked on Custom built Sqoop framework, Java-Application, Shell Scripts, HQL & Scripts for data migration and transformation from Oracle Exadata to HDP in DEV, QA and PROD environments and performed testing.

Sqoop jobs and PySpark jobs code repositories into GitLab, merge into DEV, QA and PROD stable environments and create unique Tags, YML files for copying and execution of HQL, PySpark codes, Parm files and Shell scripts into DEV, QA & PROD environments.

Job execution through Jenkins Data Pipeline to testing in DEV, QA environments before moving to PROD environment and Job execution through Control-M for scheduling and for final testing in DEV & QA.

Created STMs (Source & Target Matrix) for various tables and data attributes/fields for identifying and preparing the final attributes, filters and transformation for the Sqoop and PySpark jobs.

Building data pipelines in Azure using Azure Data Factory, Snowflake and Azure Databricks to move and transform data into Microsoft Azure Cloud – Azure Blob Storage, Azure Data Lake Gen2, Cosmos DB, Azure SQL Database, and for analytics HDInsight Interactive Query (Hive LLAP), SSIS, Azure Stream Analytics and Azure Data Studio.

For batch and stream Data processing in Azure Cloud using Databricks Notebooks for advanced data analytics.

Created New Schemas and Upgraded Schemas – tables and columns using Spark and Scala, PySpark in writing those into Azure Blob Storage, SQL Server Warehouse for data extraction, analytical and for reporting purposes for management (Production-Env)

Scala & PySpark codes debugging, Created/Wrote Utilities and Functions for various applications, Code Reviews (Pull Requests) at Git, Code Merge from Branch to Master (Production-Env)

For Continuous Delivery, Integration, Orchestration – Jenkins, Control-M, GitLab, Azure DevOps Services

Worked with various stakeholders’ product design, operations, and development teams for building applications and modeling, schema and data strategies such as Data Migration, Data Transformation, Data Governance, Data Quality, Metadata management, Data Analytics and Reporting

Sr. Data Engineer (Big Data – Google Cloud Platform)

Nov 2018 – Jan 2019

BBMTek (Creative Media Works) at Mississauga – Project: Data Platform for Mobile Apps Services (Android and iOS) Users in Production Environment

Building Custom applications at massive scale of data migration, processing, transformation and storage in GCP in Petabytes from various data sources into data platform based on per day requirements and technical challenges for both Android and iOS apps users.

Technologies Used: Apache Spark 2.3.2, Scala, ScalaTest, UnitTest, Maven 3.6.0, Jenkins, JFrog, (Google Cloud Platform – Google Cloud Storage, BigQuery, DataProc, Pub/Sub, Kubernetes), Apache Zeppelin, Apache Airflow, Tableau, NoSQL Databases – Cassandara, HDFS, File Formats Storage Tools – Parquet File, Avro Files, Hive Meta Store, Git, GitHub, IntelliJ Idea (IDE), Batch and Stream Processing – Spark & MapReduce, etc.

95% work based on writing Spark and Scala codes and ScalaTest, UnitTest for ETL and migrating data from different data stores or databases e.g., Plenty, Kochava, Freshdesk, Parquet and Avro Files to Google BigQuery

Created New Schemas and Upgraded Schemas – tables and columns using Spark and Scala in writing those into BigQuery for data extraction, analytical and for reporting purposes for management (Production-Env)

Wrote Transformation code in Spark and Scala and also wrote test cases for Transformation in Scala used IntelliJ Idea and Zeppelin for testing and analytical purposes (Production-Env)

Scala codes debugging, Created/ Wrote Utilities and Functions for various applications, Code Reviews (Pull Requests) at Git, Code Merge from Branch to Master (Production-Env)

For Continuous Delivery, Integration, Infrastructure, Orchestration – Jenkins, Apache Airflow (DAG) at Google Cloud Platform using Google SDK, and other Google Cloud Services.

Worked with various stakeholders’ product design, operations, and development teams for building applications and modeling, schema and data strategies such as Data Migration, Data Transformation, Data Governance, Data Quality, Metadata management, Data Analytics and Reporting

Data Manager (Big Data Solutions, Cloud Architecture, NoSQL DBs and Analytics – Google Cloud Platform)

Aug 2018 – Oct 2018

Nobul Corporation at Toronto – Project: Database and Data Modeling, Migration, Management, and Integration with Web App and Mobile App on Google Cloud

Built Data Migration Strategies on Google Cloud from Firebase buckets for Web App releases V1.0 to V2.0

Defined data migration, integration with mobile apps, governance and analytics strategies, built Data Architecture and Database schemas to Migrate data from Firebase (Real Time Database) to NoSQL database/platform such as Cloud BigQuery, Cloud FireStore, MongoDB, CassandraDB, HBase, etc.

Ingested structured, semi & unstructured data e.g. File Formats e.g. CSV, JSON and Firebase to NoSQL databases for scalability, real time processing to automate all the jobs in Real Estate both Canada and USA for Buying and Selling properties both residential and commercial, Biding Platform for Agents on behalf Buyers and Sellers. Tools used; Talend, Apache Airflow, Matillion ETL, Cloud Data Transfer Service, Pub/Sub Cloud Shell at GCP and migrated into BigQuery, MongoDB, Google Cloud Storage Buckets

Built Hybrid environment for data ingestion from Firebase and File Formats using JavaScript, Google Data Transfer and Storage Service and processing (MapReduce, Hadoop, Spark) and storage environment (MongoDB, Redis, BigTable, HBase) as well integrated with Nobul – Web App and Mobile App, business intelligence and analytics tools for data visualization and reporting for both business and customer intelligence

Built Spark custom applications for Buyers, Sellers, Agents while offering best deals while processing requests or deal with real-time from a number of brokers and also making real-time analytics of buyers, sellers’ properties and agents deals (completed or broken) both hybrid and real-time processing of Canada and USA properties data with polygon and referrals.

Used Google Maps Platforms for maps and places for location data in both USA and Canada properties as well as Google Maps Data Layer for geospatial data in order to store custom data or to display GeoJSON on Google Map (through APIs) into our platform for both Web and Mobile Apps consumers and brokers in order to draw polygons on our Maps for various properties

Integrated Google Cloud Platform with NoSQL real-time databases with Web App V2.0 and Mobile Apps both Android and iOS for Buyers, Sellers and Agents of properties data and also deals with real-time streaming

Built cloud-native data Warehouse for Quality Data, Data Governance, Data and Metadata Management. Implementing lambda and kappa architecture as well for both batch and real time data processing jobs through various data processing engines such as Hadoop, Spark. Workflow orchestration and stream analytics

For Continuous Delivery, Integration, Infrastructure, Orchestration – Nagios, Jenkins, Apache Maven. For repository – Github, Cloud Tools – Visual Studio, Google Map Platform

Worked with various stakeholders’ product design, operations, QA, Front-end and Back-end development team for building database management systems (modeling, schema) and data strategies such as Data Migration, Data Transformation, Data Governance, Data Quality, Metadata management and Data Analytics

Performed visual analytics while creating dashboards and reporting in Data Studio, data mining and analytics in BigQuery Analytics, as well as used DataLab to explore, transform, visualize and built machine learning models for example-built models to do the predictive analysis, consumer behaviours both buyers and sellers, market forecasting for individual and business consumers.

Big Data Architect (Enterprise Solutions and Data Architecture – AWS)

Jan 2018 – Apr 2018

Ontario Ministry of Education at Toronto – Project: Case & Grant Management Solutions Branch Employment Ontario Solutions (EOS) Community Service Cluster (CSC)

Built Big Data Architecture and Data Lake architecture after analyzing the existing system technical problems, challenges and requirements, Designed a Model for Data scope, consumption pattern, services and Data Governance with Security (Masking – Column & Row level security), Metadata Management, Data Lineage

Deployed Hortonworks HDP/HDF cluster in AWS (Cloud Environment – RHEL/CentOS), Administer & Monitor HDP/HDF environment using Ambari, Install and Configure services or frameworks, resources management, third-party tools installations and configuration with HDP e.g., Informatica (ETL), Crystal Reports, Tableau (For BI Reporting and Analytics)

Hortonworks HDP/HDF in AWS Cloud – Ambari, Hadoop, HDFS, YARN, Slider, HBase, Hive, Tez, Phoenix, MapReduce, Spark, Storm, Solr, Pig, Hue, Sqoop, Flume, Oozie, Zookeeper, Zeppelin, NiFi, Cloudbreak, Ranger, Knox, Falcon, Atlas, NFS, HDFS Encryption, WebHDFS, Amazon RedShift (DWH), Kinesis Firehose, Streams and Analytics, Athena (Interactive Query Service), EMR, ElasticSearch Service, DynamoDB, Lambda

Ingested unstructured data e.g., Log Files/SysLogs using Flume, semi-structured using File Formats e.g., CSV, JSON, and structured data into the HDFS, Hive DWS from Oracle DB and from various Data Warehouses as well as Data Marts. Move RDMBS data to HDFS & Hive DWS using Sqoop & NiFi. Automate all the jobs for extracting the data from OracleDB, File Formats to HDFS & Hive DWS, perform transformation using Spark

Built Hybrid environment for data ingestion for various DWH, DMs, processing (MR, Tez, Spark) and storage environment (HDFS, Hive, HBase) as well integrated with multiple business intelligence and data visualization tools (Crystal Reports, IBM Cognos, Tableau Server/Desktop, Qlikview, etc.)

Built Hadoop, Hive based data Warehouses and integrate Hadoop with Oracle Data Warehouse systems. Develop HBase Schemas to migrate data from HDFS, a big data Solutions for handling Millions of records

Worked on micro-services frameworks and deployed lambda & kappa architecture for both batch and real time data processing jobs, design various Oozie workflows with coordinator jobs to execute different data pipelines

Metadata Management, Data Quality, Data Governance and Security using Apache Atlas, Falcon and Ranger, Knox for data, column and record level accessibility while masking data

For automation and virtualization (containers) – Docker, Virtual Box, JBoss (SW). For Continuous Delivery, Integration, Infrastructure, Orchestration – Nagios, Jenkins, Bamboo, Puppet, Apache Maven, NewRelic

IDEs – Eclipse, Pycharm, IntelliJ Idea, Spark-Shell (Scala, SBT, Python), Testing Tools – JTest, Junit, PyTest, PyUnit, ScalaTest, UnitTest, JMeter, Selenium, TestStudio, etc. Version Control System – Git/Github, GitLab.

Develop HIVE queries using Beeline, Hue, Tez, SaprkSQL, use Crystal Report and IBM Cognos for BI Reporting, Data Analysis, Use Tableau for Data Visualization

Big Data and Metadata Management Consultant – Dev. & Admin.

Feb 2017 – July 2017

Canada Imperial Bank of Commerce (CIBC) at Toronto – Project: Data Consumption Services Enterprise Data Technology Corporate Centre Technology

Administered Microsoft Azure Cloud & AWS – create and manage VMs e.g., Terminal-Server (Windows Server 2016), Cloudera VMS – CentOS (Masters, Workers, and Gateway/Edge Nodes), Windows Server 2016 Active Directory – Created users & groups, Created RSA keys (Public & Private) & provided secure access through Kerberos (tickets) and used LDAP, group policy administration and role-based access security management

Deployed various Vendor’s products e.g., Datameer, Oracle Financial Service Analytical Application, Vertica, IBM InfoSphere IGC & DataStage, Tableau, Syncsort (ETL), Schedule1 – DataPassports

Managed and developed in Cloudera Technology Stack – Cloudera – Director, Manager and Navigator, Hadoop, HDFS, YARN, HBase, Kudu, Hive, Impala, MapReduce, Spark, Parquet, Avro, HCatalog, Solr, Pig, Cloudera Search, Sentry, Hue, Sqoop, Flume, Confluent, Kafka, Oozie, Zookeeper, Docker, Kubernetes

Microsoft – Data Factory, Snowflake, HDInsight, ADLS Gen 1 & Gen 2, Azure Stream Analytics, Data Lake Analytics, Data Explorer, Machine Learning, Databricks, SQL Data Warehouse, SSIS, SSRS, SSAS.

Protect data in transit through data protection strategy while using SSL/TLS protocols, to exchange data across different locations, appropriate safeguards such as HTTPS or VPN e.g., Azure VPN gateway – send



Contact this candidate