Big Data Software Development

Location:

Celina, TX

Posted:

February 27, 2025

Contact this candidate

Resume:

Sankarapandian Chandrasekaran

Contact# +1-224-***-****

Email: *************@*****.***

SUMMARY:

Experienced IT professional with over 20 years in software development across BFS, Information Media & Entertainment, and Healthcare domains

Strong hands-on experience implementing big data solutions using technology stack including Hadoop, Databricks, MapReduce, Hive, HDFS, Spark, Sqoop, Flume, Nifi and Oozie.

Experience in multiple Big Data distributions, Cloudera 5.x, HDP 3.0.

Experience in Data modelling using Erwin

Experience in implementing Medallion Architecture

Implemented CI/CD pipeline using GIT, Jenkins and Unix shell scripting

Having extensive experience in Metadata driven data framework

Having extensive experience in Object Oriented Programming.

Having extensive experience in Unix shell scripting.

Having experience in Crontab scheduling and Maestro scheduling.

Proficiency in designing the applications.

Proficiency in Waterfall and Agile methodologies

Sensitive data remediation and data insertion using snow flake

Technical Skills:

Big Data Technologies: Hadoop, MapReduce, Pig, Hive, HDFS, Spark, YARN, Zookeeper, Sqoop, Flume, Oozie, Nifi, HBase, Cassandra, AWS Services (EMR, EC2, S3, RDS, Glue, Redshift), Snowflake, Tableau, Databricks

Programming Languages: C++, Scala, Java, Python, Unix shell scripting

Databases: Oracle, PostgreSQL

Tools: GIT, Clear Case, Clear Quest, Clarity, Fusion, ION, GTest/GMOCK, SMARTS (Time Series Database), Stonebranch scheduler

Methodologies: Waterfall, Agile

Certifications:

Databricks Certified Data Engineer Associate

Databricks Certified Fundamentals of the Databricks Lakehouse Platform Accreditation

Databricks Certified Databricks Delivery Specialization: - Unity Catalog Upgrade

Scrum Alliance Certified Scrum Master

Project Management Professional

IIBF Certification in Banking

NSE Certification in Finance market ( Securities )

NSE Certification in Finance market ( Derivatives )

Brain bench certification in Advance C++

Organizations worked:-

HCL Technologies Ltd – March 2005 to June 2010

Cognizant Technology Solutions – June 2010 to Sep 2021

Perficient Inc – Sep 2021 to till date

Projects Handled in Perficient Inc

Project Title

Data4u application Oct 2024 – Till date

Client

Johnson & Johnson

Role

Data Architect

Functional Area

Clinical Data

Technologies Used

AWS, PostgreSql, Rest API, Databricks, Spark(Python), Spark(Scala)

Objective

Web portal development

Description:-

The Objective of the project is to develop web portal to enable Data users to Ingest, query, transform and view the clinical data through this web portal. Clinical data will be processed through the medallion Architecture and clinical data report will be generated through the Golden layer of the data.

Resposibilities:-

Design the Databricks applications to get UI parameters through PG table to perform all kinds of data transformations such as Filter, Join, Transpose, Union etc.

Data modelling using Erwin Data modeler

Determining load type of tables

Performance tuning

Project Title

UC Migration June 2024 – Oct 2024

Client

Midtown Athletic Club

Role

Data Architect

Functional Area

Member Data Integration

Technologies Used

Azure ADF, Databricks, Spark(Python), Spark(Scala)

Objective

Unity Catalog migration

Description:-

The Objective of the project is to migrate the catalog from hive_metastore to unity catalog for better data governance which enable centralized access to workspaces, data lineage, enhanced security

and provide audits to meet security compliance requirements.

Responsibilities:-

Migrate all the schemas and table under new ( unity) catalog

Replace the Ingestion paths with the volumes path

Regression test the notebook jobs to make sure the jobs are writing to tables in Unity catalog.

Project Title

Blue Cross Coordinated Care June 2022 – Sep 2024

Client

Blue Cross Blue Shield - Michigan

Role

Data Architect

Functional Area

Member Data Integration

Technologies Used

AWS (S3, EC2, Relational database services (RDS), Redshift), Spark (Scala), Databricks, PostgreSql DB, Unix shell scripting, Stone branch scheduler, Mulesoft API

Objective

Integrate member data from Datalake tables in Databricks to Operational Data Hub in PosgreSql DB for Webportal development

Description:-

The objective of the project is to migrate the data from Datalake tables in Databricks platform to Operation Data Hub in PostgreSql DB for AM360 ( Advocacy Member 360) web portal development. ETL applications integrate various member data such as member demographics, primary care Physician, clinical information, member gaps ( vision, dental, Hedis etc) from various sources and write into ODH PostgreSql DB tables. Member data will be pulled and displayed in AM360 webportal through MuleSoft APIs.

Responsibilities: -

End-to-End Design i.e. Data ->API->UI

Determining the natural and foreign keys

Providing Logical Design Model using Erwin

Determining the encryption algorithms

Determining load type of the incremental data

Performance tuning of DB tables and ETL applications

Project Title

Legacy Modernization – Nasco feeds Sep 2021 – June 2022

Client

Blue Cross Blue Shield - Michigan

Role

Data Architect

Functional Area

Medical Claims

Technologies Used

AWS (S3, EC2, Relational database services (RDS), Redshift), Spark (Scala), Databricks, PostgresDB, Unix shell scripting, Stone branch scheduler

Objective

Data standardization of Medical claims data in Operational Data Hub and migration of data into Analytical Data Hub

Description: -

The objective of the project is to standardize the medical claims data from Nasco datasource in EDW and Rich claims Extract (Mainframe) feeds. Data standardization happens through Spark (Scala) application in Databricks environment and the resultant dataset will be stored in PostgresDB in Operational Data Hub (ODH) layer using AWS RDS and then migrate the medical claims data into AWS Redshift in Analytical Data Hub ( ADH ) layer. Scheduling of application happens through Stone branch scheduler.

Responsibilities: -

End-to-End Design .

Determining the natural and foreign keys

Providing LDM using Erwin

Determine the encryption algorithms

Determine load type of the incremental data

Performance tuning of PostgreSQL tables and ETL applications

Projects Handled in Cognizant Technology Solutions

Project Title

Internal Services - Auto Finance Mar 2020 – Sep 2021

Client

Capital one

Role

Data Lead

Functional Area

Auto Loans

Technologies Used

AWS ( EMR, S3,EC2), Snowflake, Tableau, Unix shell scripting

Objective

Facilitating data services and reports to other experiences in Auto finance like Vehicle resolution, set up my loan, Catch up team etc

Description:-

The objective of the project is to provide data services to other experiences like

Sensitive data remediation, data migration, creating new tables in D2A and D2B layers and Tableau reports (Creation and modifications).

Responsibilities:-

Analyzing and Remediating sensitive data using Snowflake.

Classifying the data

Creating views as per data classification

Creating new tables in Snowflakes D1, D2A and D2B

Creating and modifying Tableau report of other experiences in Auto finance.

Project Title

Locus 2.0 Aug 2019 – Mar 2020

Client

Verizon

Role

Engineering Manager

Functional Area

Communication

Technologies Used

AWS EMR, S3, Hadoop, Spark (Scala), Postgres, Unix shell scripting

Objective

Standardization of Addresses

Description:-

The objective of the project is to standardize the addresses in the data collected from various sources. Data got ingested as csv files into AWS S3. Spark application ingest these data into Postgres tables after standardization process in AWS EMR environment.

Responsibilities:-

Choosing technologies and deliverables

Logical Data modelling using Erwin

Developing and delivering the Spark applications

Orchestrating the execution of ETL applications using Shell script

Implementing CI/CD pipelines

Project Title

Data Harmonization May 2019 – Jul 2019

Client

ACI Universal Payments

Role

Data Lead

Functional Area

Banking & Financial Services ( Cards& Payments)

Technologies Used

Hadoop, Spark (Scala), Hive, Unix shell scripting

Objective

Migrating Streaming data in to Hadoop Data Lake

Description:-

The objective of the project is to migrate streaming data into Hadoop Data Lake. Streaming data got ingested into Hadoop environment using Kafka in Avro format. Spark structured streaming application will ingest these data in to Hive Table. Spark-sql application will transform the data and save in to 3 different BI tables.

Responsibilities:-

Performance tuning of the ETL application.

Developing a Spark structured Streaming application to convert avro data in to Json format.

Reducing the execution time of the ETL applications

Project Title

Data Harmonization May 2018 – Apr 2019

Client

Vantiv now Worldpay

Role

Data Lead & Engineering Manager

Functional Area

Banking & Financial Services ( Cards & Payments)

Technologies Used

Hadoop, Spark, Hive,Oozie,Datastage, Unix shell scripting, TWS Scheduling

Objective

Migrate relational Databases to Hadoop Datalake and harmonize the Data

Description:-

The objective of the project is to migrate Oracle and DB2 Databases to Hadoop Data Lake. We also harmonize the data using Hive and Spark. IBMDatastage is used to ingest the data in to Hadoop data lake. Hive and Spark are extensively used to harmonize the data in Hadoop Data Lake. Harmonized data will be used by Business Teams and Data science team.

Responsibilities:-

Choosing technologies and deliverables

Implemented Spark applications to harmonize the data in Hadoop Data Lake.

Implemented Hive scripts to harmonize the data in Hadoop Data Lake.

Implemented Oozie workflows to execute the Hive and Spark Jobs.

Developed Shell Scripts to automate the jobs

Scheduled the TWS Jobs through Oozie workflows and Shell scripting.

Have involved in PI planning events and helped the team in finding PI objectives.

Effectively followed Agile scrum methodologies and processes

Have used Rally for tracking the Sprint user Stories

Project Title

Dataflow Framework Aug 2017 – Apr 2018

Client

Discover Financial Services

Role

Technical lead

Functional Area

Banking & Financial Services

Technologies Used

Nifi, Spark(Scala), Python, Unix shell scripting, Proteus tool, AWS [ S3 & EC2], Maestro

Objective

Develop a dataflow framework to automate the dataflow

Description:-

The objective of the project is to develop dataflow framework to automate dataflow of Card applications in Discover. Apache Nifi is extensively used to automate the dataflow from Teradata database to SOR,SOR-Usable and SOT – source in HDFS. Eventually SOT – source tables will be stored in AWS. Various services of AWS such as S3, EC2, EMR, Lamda etc.

Responsibilities:-

Implemented Nifi processors to create Dataflo for SOR, SOR-U and SOT –Source

Developed Spark applications using Scala to process avro tables.

Developed Python scripts to parse xml contents

Developed Custom Nifi Processor to launch Spark Jobs.

Implemented fraud investigation flows using NIFI and Proteus.

Scheduled the jobs thru Maestro

Implemented Nifi flows to process realtime data.

Developed shell scripts to create ftp scripts and triggers for Nifi flows.

Have involved in PI planning events and helped in finding PI objectives.

Effectively followed Agile scrum methodologies and processes

Developed data flows in Nifi to ingest realtime data in to HDFS.

Have used Jenkins for building and deploying the jar

Have used Rally for tracking the Sprint user Stories

Project Title

Global Trade Incentives June 2016 – Aug 2017

Client

Dun & Bradstreet

Role

Technical Lead & Senior Developer

Functional Area

Information Media & Entertainment

Technologies Used

Spark (Scala), Hadoop, Pig, Hive, Hbase, Sqoop, Oozie, AWS EC2

Objective

To analyze and display the Supplier data in a User Interface

Description:-

Global Trade Incentives is a D&B initiative to provide extra supplier information thru a web application. Data from the Oracle warehouse will be imported using Sqoop in to HDFS in AWS EC2 cluster. Data in HDFS will be processed by Pig and stored in to HBASE. The tables in the Hbase will be accessed through the Hive external table. These Hive tables will be joined and processed to create denormalized tables using Scala-Spark applications. This denormalized table data will be retrieved by a web application using rest services.

Responsibilities:-

Developed sqoop jobs to import data from Oracle to HDFS

Developed Pig scripts to store data in to Hbase for incremental load

Created Hive external tables on top of Hbase tables to enable the spark application to access HBase

Have done POC in Spark to convince the client to use Spark-sql instead of Hive queries.

Created Spark Jobs for Hbase bulk upload which drastically reduce the upload time of data in to HBase

Developed Spark applications to process data on Hive tables to replace slow running Hive queries

Extensively used Spark sql to perform join, filter, Pivot operations etc to translate the queries

Have improved the performance of the Spark applications by efficient tuning.

Have repartitioned the Hbase tables to improve the performance of Spark applications.

Have considerably reduced the turnaround time of slow running Spark applications.

Have used JUnit for Unit testing the Spark applications.

Have used Cucumber test plugin to perform Integration testing.

Effectively followed Agile methodologies and processes

Have used Jira for tracking the Sprint user stories.

Project Title

Stars Rewrite Jun 2014–May 2016

Client

Aetna Inc.

Role

Senior Developer

Functional Area

Health Measures

Technologies Used

C++,Unix Shell Scripting, Java-Map Reduce,Hive, Oozie, HDFS

Objective

Calculate Health Care Measures and provide Star Rating

Description

The objective of this project to develop STARS application in C++,which will calculate Health Care Measures like OMW (osteoporosis Management for Women),AAP (Adults Access to preventive/Ambulatory services) etc. Lookup files will be provided to the application. Based on the requirement these lookup files will be changed .Output files of the STARS application will be copied in to HDFS environment and it will be processed using Java-Map Reduce and Hive to calculate the Health Care Measures. Based on the calculation, it will be given a STAR Rating.

Responsibilities:-

Developing the measures as C++ applications

Developing Mapreduce application to calculate Health care measures.

Developing Hive tables to calculate Health care measures.

Generating and Submitting periodic metrics for the Stars project

Coordinating the team

Project Title

Analytics & Reporting [ Liquidity Risk ] Oct 2012–May 2014

Client

Royal Bank of Canada

Role

Business Analyst

Functional Area

Liquidity Risk in investment banking

Tool Used

ION

Objective

Generate Regulatory Reports

Description

The objective of the project is to generate Regulatory Reports such as Basel Reporting [ LCR and NSFR],FSA Reporting,EBA Reporting etc.The project will also accomplish all the regulatory norm changes in reports.This changes will impact the other areas such as CFLA [ Cost of Funding and Liquidity Attribution ],CT [ Corporate Treasury ] Reports.A&R Team will verify and validate those changes.

Responsibilities:-

Validating the Regulatory Reports in ION tool.

Validating the CFLA data.

Generating CT Reports for the regulatory requirements

Project Title

Vendor Isolation [ Application development ] Apr 2012–Sep 2012

Client

Paypal

Role

Senior Developer

Environment

FinSys Framework

Languages& Tools

C++, GIT, BULLS EYE COVERAGE, GTEST/GMOCK

Objective

Developing Credit card applications

Description:-

The objectiveof the Project is to develop applications required to make Credit Card payments through PayPal. Application receives Transaction informations through value objects which in turn will be parsed and processed with appropriate vendors. Depending on the Accept/Reject response from the Vendors, PayPal will send its own Accept/Reject response.

Responsibilities:-

Designing the application abiding existing framework

Developing the C++ application in Finsys framework

Extensively used GMock /Gtest to perform test driven development.

Project Title

Pricing Plus [ Application development ] Jan 2012–Apr 2012

Client

Paypal

Role

Senior Developer

Environment

Unix

Languages & Tools

C++, GIT, BULLS EYE COVERAGE, GTEST/GMOCK

Objective

Changing Pricing Model of PAYPAL

Description:-

The objective of the Project is to change the pricing model of PayPal which enable the latter to generate huge revenues. Pricing Api’s are the crucial api’swhich are called by various interior teams like front end,RiskTeam,Money Team to calculate the cross border fees and Foreign Exchange fees.All the Pricing Api’s are modified to impose fees based on region instead of country.

Responsibilities:-

Designing the application abiding existing framework

Developing the C++ application in Finsys framework

Extensively used GMock /Gtest to perform test driven development.

Project Title

Pay After Purchase [ Application development ] Feb 2011–Dec 2011

Client

Paypal

Role

Senior Developer

Environment

Unix

Languages & Tools

C++, GIT, BULLS EYE COVERAGE, GTEST/GMOCK

Objective

Enable the user to choose his Funding source and pay after purchase

Description:-

The objective of the project is to generate revenues in offline market.This project will enable the user to choose his funding source after making a purchage within predefined period of 3 to 7 days.Theflexibility of choosing the funding source with in predefined period of 3 to 7 days makes PayPal to generate huge revenues in offline market.

Responsibilities:-

Designing the application abiding existing framework

Developing the C++ application in Finsys framework

Extensively used GMock /Gtest to perform test driven development.

Project Title

Admin CorrectionTool[ Application development ] Jun 2010–Feb 2011

Client

Paypal

Role

Senior Developer

Environment

Unix

Languages & Tools

C++, GIT, BULLS EYE COVERAGE, GTEST/GMOCK

Objective

Enable the PayPal Admin to move funds

Description:-

The objective of the project is to enable the PayPal Admin to move funds between PayPal Accounts,External funding source such as Credit Card,Bank,PayPal Credit .It provides flexibility to choose credit cards,DebitCard,Bank Accounts and PayPal Credits.

Responsibilities:-

Designing the application abiding existing framework

Developing the C++ application in Finsys framework

Extensively used GMock /Gtest to perform test driven development.

Projects Handled in HCL Technolgies. Mar 2005 to Jun 2010.

Project Title

SEBI –Integrated market surveillance system

Client

Securities Exchange Board of India [ Capital Market Regulator ]

Role

Developer

Environment

Solaris

Third Party Product

SMARTS, Australia [ TimeSeriesDatabase ]

Languages

C, C++,Shell scripting

Markets

National Stock Exchange of India,Bombay Stock Exchange of India

Objective

To find risks associated with Securities&Derivatives and to bring Transparency in the Indian Stock Exchanges and to control Manipulations taking places in the Stock Exchanges

Description:-

This project is an Interface application between the Stock Exchanges of India and the Third party product SMARTS, Australia. This SMARTS, Australia is doing Products for Market Surveillance System for various countries. The role of Application, is to get the data from Stock Exchanges like National Stock Exchange of India and Bombay stock Exchange of India on daily basis and to convert to .FAV format in which the SMARTS, Australia accepts. The Objective of this project is to bring transparency in the Indian Stock Exchanges to control all type of manipulations taking place in it. This SMARTS product defined various type of alerts .This SMARTS will raise alerts when the data has manipulations in terms of Quantity of Stocks or Sudden changes in the Price of Stocks.

Responsibilities:-

Developing C++ application to conver the exchange data to SMARTS format

Developing Unix shell scripting to automate the execution

Crontab scheduling to execute the application on daily basis

Unit testing the C++ applications

Project Title

SEBI –Integrated market surveillance system -Depositories

Client

Securities Exchange Board of India [ Capital Market Regulator ]

Role

Developer

Environment

Solaris

Third Party Product

SMARTS, Australia [ TimeSeriesDataBase ]

Languages

C, C++,Shell scripting

Markets

Central Depository Services Ltd,National Services Depository Ltd.

Objective

To find risks associated with OffMarket and Inter depository transactions and to bring Transparency in the Indian Stock Exchanges by including Depositories along with the exchanges.

Description:

The role of Application, is to get the off-market transactions and inter-depository transactions data from Depositories like CDSL and NSDL on daily basis and to convert to .FAV format in which the SMARTS, Australia accepts. The Objective of this project is to include depositories transaction with the Indian Stock Exchanges to control all type of manipulations taking place in it.This SMARTS product defined various type of alerts to identify the manipulated offmarket and inter-depository transactions.This SMARTS will raise alerts when the trade has manipulations in terms of Quantity of Stocks or Sudden changes in the Price of Stocks.

Responsibilities:-

Developing C++ application to conver the exchange data to SMARTS format

Developing Unix shell scripting to automate the execution

Crontab scheduling to execute the application on daily basis

Unit testing the C++ applications

EDUCATION

Bachelor of Engineering,Information Technology (2000-2004)

Bharathiyar University 75.70%

Class XII

Thiagarajar model high school 85.08%

Class X

Thiagarajar model high school

84.80%

Contact this candidate