Big Data Engineer

Location:

Hoffman Estates, IL

Posted:

September 28, 2023

Contact this candidate

Resume:

Sankaralingam Amirthalingam Email: ************@*****.***

Contact No: 847-***-****

Brief Overview

More than 18 years of rich experience in Software Development, Maintenance and Support of various applications in Industrial Supply Distribution, Pharma, HealthCare, Credit Card/Banking, Retail, Wealth Management/Banking, Insurance, Banking and Retail domains.

10 years of experience in Big Data with Amazon Cloud AWS, Snowflake, Hadoop, Spark, Scala, Hive, Impala, HDFS, Pig Scripts, Parquet, Oozie, Cloudera Manager, Hbase, Sqoop, Azure, Cassandra, Ganglia, Kafka, Python and S3.

Technical expertise in the Agile and SDLC involving requirement analysis, project scoping, effort estimation, risk analysis, development and quality management, as per the specified guidelines and norms.

Associated with several prestigious projects for Grainger, Abbvie, BCBS, Vantiv, UNITEDHEALTH GROUP, Pricing Hub, IMPACT, Targeted Interactions and SAS Markdown Optimization for Sears Holdings, ECRRM for Northern Trust, VTRS for VISA, AIWCS for AIG, Actuarial System for Prudential-NJ, QA for MNBS for Prudential, Claims System for Ohio Casualty Group-Ohio, and Advanced Loan System for Citibank.

Ability to relate to people at any level of business and management across the globe and significant experience working with customers, project managers and technical teams for executing large-scale projects.

Organizational Experience

SDLC Planning & Management

Develop, implement and provide all kind of support for business application software for clients.

Achieve customer satisfaction by ensuring service quality norms and building the brand image by exceeding customer expectations.

Actively Involved in Design Phase for the client’s requirements.

Resolve support/operational issues in liaison with project managers & business group.

Handle testing phase of the project. Ensure all necessary data and matrix are generated / maintained.

AWS /Hadoop/DataStage ETL Experience

18 years of US IT experience, Architecting, Designing and leading End-To-End Data Warehousing/ETL/ Integration/Mapping solutions for various Clients involving complete SDLC.

Used AWS, Hadoop Pig Scripts, Hive, Impala, Parquet, Snappy Compression, Cloudera Manager, HDFS, Hbase, Sqoop, Salesforce, Spark, Scala and Python to create data lake/store for PDW, Impact and Pricing Hub Project.

Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala.

Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations and actions using Spark Scala.

Hierarchy, Sales, Inventory, Price and other data source created by Pig Scripts, Hive, HDFS, Hbase, Cassandra, Impala, Scala and Python loaded into MQSQL daily.

Conversion of DataStage jobs to Hadoop for Impact project which processes files with millions of records to reduce run time.

Experience in designing, reviewing, implementing and optimizing data transformation processes in the Hadoop. Able to consolidate, validate and cleanse data from a vast range of sources – from applications and databases to files.

Output created by the Pig Scripts was compared with DataStage data for approval.

Used DataStage 11.5, 8.5 & 8.1 to extract data from various source to implement ECRRM data warehouse & Dynamic Pricing Project.

Used IBM DataStage Designer and Information Analyzer and to develop parallel jobs to extract, cleanse, transform, integrate and load data into Data Warehouse.

Developed jobs in DataStage 8.1 using different stages like Transformer, Aggregator, Lookup, Join, Merge, Remove Duplicate, Sort, Row Generator, Sequential File and Data Set.

Used Director Client to validate, run, and monitor the jobs that are run by WebSphere DataStage server.

Worked on various Talend components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.

5 + years of Design and Development Experience with Teradata SQL Assistant, BTEQ, FastLoad, and Fast Extract ETL processes.

Skill Set

Databases

Snowflake, Hive, Impala, Parquet, Hbase, Oracle Exadata, Oracle 11g,Teradata, Cassandra,IBM DB2, UDB, Netezza, Infobright, MYSQL, Command Centre, MS Access, Dbase IV.

Special Software

Hadoop, SPSS, Oozie, Hive, Pig Scripts, HBASE, AWS, S3, DataStage 8.5, 8.1 & 7.5 and IBM Information Analyzer, Talend, UNICA, SAS Revenue Optimization 2.2, Cyber Fusion, STROBE, Insert Examiner, MQ Series, Hiperstation, Test Director, Rational ClearQuest, DocumentDirect, ViewDirect, Control M, Snowflake datawarehouse.

Configuration Tools

Bitbucket, Circle CI, PVCS Version Manager v7.5, ChangeMan, PANVALET, Endevor.

App. Development Tools

File Aid, ISPF, TSO, CICS, QMF, VSAM, Teradata SQL Assistant.

Programming Languages:

SAS, JAVA, COBOL, JCL, SQL, Shell Script, Perl, REXX C, C++, VB, Pascal, Fortran, EZTRIEVE Plus.

Others

Report Program Interface (RPI) on IBM, Application Program Interface (API) on IBM.

Domain

Insurance, Banking and Retail.

Environments

IBM 3090, ES9000, Windows NT/2K, AIX, UNIX on MC-68000, MS DOS, MVS.

Professional Experience Details

Hartford Insurance

Duration: May 2022

Senior Data Engineer

Responsibilities:

Migrating model process from Oracle procedure based to Snowflake tables using Python based approach to use single code base.

Perform detailed analysis/design of current Oracle packages and code, functional and technical requirements and translate them to solutions in snowflake environment.

Creating mapping/transformation process which can be used by the model process with minimal technical experience from users.

Participate in regular status meetings to track progress, resolve issues, mitigate risks and escalate concerns in a timely manner.

Environment :

Snowflake, Python, Oracle, Toad, Astronomer Airflow DAGs, CDH, HDFS, Hadoop, AWS, Hive, WinScp, Putty, UNIX Shell Scripting, GIT, Sqoop, Visual Code etc.

Grainger, Chicago, IL

Duration: October 2019 – April 2022

Big Data Architect / Senior Big Data Engineer

Responsibilities:

Perform detailed analysis/design of functional and technical requirements and translate them to solutions in the Big Data and Data Science Lab environments.

Develop robust and reusable data acquisition and processing routines to ingest data into the Data Science Lab environment.

Migrating current Hadoop processes to AWS and data to snowflake tables using Bit bucket, Circle CI environment.

Support program and project managers in the planning, estimation and implementation of projects and document use cases, solutions and recommendations.

Created Tableau Dashboard reports from Hive Data for business users.

Participate in regular status meetings to track progress, resolve issues, mitigate risks and escalate concerns in a timely manner.

Maintain and assist in data model documentation, data dictionary, data flow, data mapping and other MDM and Data Governance documentation.

Optimize the data ingestion using various Big Data technologies like Hive, Python, Flume, Sqoop, Spark, HBase.

Environment :

CDH, HDFS, Hadoop, AWS, Snowflake, Bitbucket, Circle CI, Docker, Kubernetes, Hive, SPSS, Python, Pig, Spark, SQL Server, Oracle, Teradata, Salesforce, WinScp, Putty, UNIX Shell Scripting, GIT, Zookeeper, Mapreduce2, YARN, HBase, Tableau, Sqoop, JIRA and Confluence etc.

UPTAKE, Chicago, IL

Duration : March 2019 – September 2019

Senior Data Engineer

Uptake is an industrial artificial intelligence (AI) software company that aims to help companies digitally transform with open, purpose-built software. Built around a foundation of data science and machine learning, Uptake’s core products include an Asset Performance Management application and a fully managed platform.

Responsibilities:

Understand requirements from business and come up with the technical design and data strategy.

Responsible for creation of data models and ingesting data to platform to support application development and data science teams.

Built near real time ingestion pipelines using streamsets.

Building data pipelines to ingest data from RDS to platform raw data layer. Applying ETL process using API’s, StreamSets data collectors to upload data from raw layer to business layer.

Understanding the industrial data model of platform to leverage platform capabilities for easy ingestion and fetching of data from and to platform.

Environment:

StreamSets, SQL, Docker, Python, AWS, Shell scripts, Postgres, Bitbucket, Jenkins

Abbvie, Chicago, IL

Duration: August 2018 – Feb 2019

Hadoop Big Data Architect/Lead Developer

Description: These applications have data coming from SAP, different Packaging line systems like Antares, etc., TIBCO from outside info. It also sends the data for USDSCSA Act, which is government act to streamline the transaction summary between buyer and seller in pharmaceutical area. It also has a module which sends the data to EMVS HUB (Europe hub) to eradicate counterfeit drugs, track n trace genuine products and decommissioned products in the Europe market.

Worked on Process improvements and Hadoop optimization and vehicle load projects.

From Freight and other Data sources created the Logistics Dashboard which will be displayed in Qlik for business to query and view various charts.

Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations and actions using Spark Scala.

Designed and involved in development on each module as POC to prove the code and it works for the requirement. POC’s for small file issues using HAR and other parameters.

Worked on cloud services on AWS S3 for storing Data from older partitions. This helps in keeping the new data on HDFS for faster access and use S3 for old data.

Involved in design and architectural decision for implementation at each layer. The Datalake, Integration and semantic.

Worked with multiple teams at offshore and communicating the status with client.

Environment:

AWS Hadoop, Cloudera, Cloudera Manager, HDFS, Spark, Scala, Sqoop, Hue, Hbase, Hive, Solr, Impala, Solr, Autosys, Pig Scripts,Oozie, Java, UNIX, Parquet, Snappy compression, Python, Shell Scripting, SQL, S3.

Blue Cross Blue Shield (HIGH MARK), Pittsburg, PA

Duration: March 2018 – July 2018

Hadoop Big Data Consultant / Lead Developer

Scope: Build of healthcare data domains on Big Data platform. One of the main process is the weekly claims process to bring data from the source and ingesting data in the Hadoop Platform.

Design, build and deploy data pipelines supporting data domains. Design and Development of ABC process which stand for Audit, Balance and Control Process in SPARK Scala.

This ABC process measures the time of each job. Records the source and target record counts. Based on that the control can the setup. The Control piece can the configured to handle certain threshold and decision on the next process can be made.

Worked in the ingestion and transformation to meet the business requirement and test those. Integrated the ABC process in the ingestion and other jobs.

Hands on experience with Spark Scala programming and good understanding of its 'In Memory' processing capability.

Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations using Spark Scala.

Environment:

Hortonworks, Spark, HDFS, Scala, Sqoop, Hue, Hive, Oozie, Java, UNIX, Python, Shell Scripting, SQL.

Vantiv, Mason, Ohio

Duration: July 2015 – Feb 2018

Hadoop Big Data Architect/Lead Developer

Scope: The Financial and Non-Financial data from various sources are generated and then ingested into the RAW region in BDA. The Data Prep tool (Data profiling tool/Data wrangling tool) validates the data ingested in the Raw region of BDA from actual source and feeds into the Cleansed region of BDA. The data loaded in Cleansed region is accessible by the data preparation tool for any data analysis or any further data preparation.

The cleansed data is then harmonized for each domain identified (e.g., Customer, Billing etc.) with all the required data at the lowest granularity possible. Based on finance requirements this data will be extracted (aggregated and filtered as needed) in a Pre-Exadata stage layer and prepared for Exadata ingestion. The data is also moved to the High-Performance store with columnar feature (e.g., Impala) for specific reporting requirement. This high-performance store to leverage column family feature for reads and writes.

Responsibilities:

Data Ingestion within Hadoop Environment.

Develop and implement business reports and data extraction procedures from various sources.

Created and Automated Ingest mechanism using shell script for various data sources with validation.

Worked on POCs for performance improvement initiative. Created Parquet tables and compressed using Snappy compression. Implemented this for settlement feed which saved space and improved query performance.

Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations using Spark Scala.

Experience in Kafka Producer, Consumer, Brokers, Topic and partitions.

Experienced with batch processing of data sources using Apache Spark with RDDs for creating the Ingestion Framework

Creating jobs to loading data from Informatica into Datalake environment,

Develop and review test scripts based on test cases and document and communicate with stake holders.

Monitoring and scheduling the jobs using Oozie scheduler. Analyse and fix production issues by coordinating with the team.

POC’s in Azure HDInsight using Apache Hadoop, Spark.

Environment:

BDA 4.8, CDH 5.10.1, Cloudera Manager, Hadoop, Spark, HDFS, Hadoop Pig Scripts, Scala, Sqoop, Hue, Hive, Impala, Oracle Exadata, Oozie, Java, Rally, DB2, Datastage 11.5, Talend, UNIX, Parquet, Snappy compression, Python, Shell Scripting, SQL.

Optum UnitedHealth Group, Edie Prairie, MN

Duration: Feb 2015 – June 2015

Hadoop Lead Developer

Scope: Project Tango is to improve patient outcomes, improve coding accuracy, improve coding documentation, acquire and consolidate data supporting coding, provide batch and real time analytics capabilities to know where we stand with respect to various measures such as STARS HEDIS and RAF, and web based access for various healthcare participants.

Responsibilities:

Data Ingestion within Hadoop Environment.

Develop and implement data extraction procedures from Hadoop, Netezza and other systems.

Create and Automate Ingest mechanism for various data sources.

Coordinated with technical team for installation of Hadoop and production deployment of software applications for maintenance.

Develop and review test scripts based on test cases and document and communicate to stake holders.

Used HIVE SQL to analyze the Data sources and apply business rules for reports.

Environment:

Hadoop, HDFS, Hive, Hadoop Pig Scripts, HBASE, Java, Rally, GIT, Teradata, UNIX, Cassandra, Ganglia, Shell Scripting, SQL.

Sears Holding Corporation, Chicago IL

Oct 2011– Jan 2015

Sr. Hadoop Developer, ETL Data Modeler

Pricing Hub

For this project we sourced data from various systems to create the data repository for Pricing Hub.

Pricing Hub maintains inventory, sales, price and other data for Kmart and Sears. Business can login to Pricing Hub and look at the data at item level.

Impact

IMPACT (Integrated Multi-channel Planning and Collaborative Technology) is an integrated suite of application design and built for Marketing, Planning and Pricing divisions within sears. Impact is a multi-million dollar enterprise transformation initiative taken by Sears to ensure they have the right structure, right processes and right tools to produce strong business results and build long-term customer loyalty.

Targeted Interactions

A platform that can deliver batch and real time personalized campaigns across all customer touch points and self-learn to improve offer performance. Ability to distribute offers to multiple formats, channels, and platforms

Responsibilities:

Identified and verified source data for Pricing Hub as per business need.

Created Sqoop jobs to extract data from DB2 tables. Design and develop HIVE, PIG jobs and unit tested the ETL components.

HDFS data was taken for Item Hierarchy, Sales, Price and Inventory feed. This was done in HIVE and PIG and loaded into staging MySQL tables before loading into final tables.

Designed and created MYSQL load scripts used by Pricing HUB UI.

For Impact project analyzed the DataStage jobs and developed conversion jobs for Datasets which will be used in the Hadoop process.

For Pricing Hub project performed data designs, data structures and creating tables.

Developed PIG Scripts and JAVA UDF’s for each DataStage job and test it using Hadoop input files.

Tested the Pig scripts output by comparing with DataStage files, and document any data issues.

Unica scheduling and maintenance of campaign flowcharts.

Developed UNIX scripts for fast load and export for various vendor formats.

Environment:

Hadoop, HDFS, Hive, Hadoop Pig Scripts, HBASE, Sqoop, Oozie, Python, Zookeeper, Map-reduce, Java, IBM DataStage 8.5, UNICA, DB2, Teradata, UNIX, MYSQL, Shell Scripting, Control M, SQL.

ECRRM for Northern Trust, Chicago

Feb2011– Sep 2011

Sr. DataStage Developer

The Enterprise Credit Risk Reporting Mart Project (ECRRM) is a strategic reporting solution used in managing credit risk at Northern Trust. This project will meet the regulatory expectations related to credit risk management activities. And also respond to growing demands for Credit Risk Information from a diverse group of interested parties including Corporate Risk Management, Regulators, Investor Relations, Controllers/Finance, business unit senior management, Internal Audit, outside auditors, etc.

Responsibilities:

Designed, developed and unit tests the ETL components

Created and maintained technical ETL documentation

Developed ETL process flow diagram including sourcing, extraction, transformation and loading into Enterprise Data Warehouse.

Documented, tracked and communicated data issues to the business team.

Lead the effort and partner with the business team to identify and document data quality issues.

Created ETL code based on the business mapping and work with the team to ensure that data rules are being supported and properly maintained.

Environment:

IBM DataStage 8.5, Oracle, UNIX, SQL, Shell Scripting, Rapid SQL, Control M and Database Development.

Markdown Management System for Sears Holdings, IL

August 2007 – January 2011

Systems Engineer

This application is used to send markdown prices for apparel line of business.

Data feed such as Sales and Inventory is received from various sources which is then transformed and ETL’ed to SAS Revenue Optimization application. The Inbound process uses heavy SQL queries. Business then creates Group Plans for every season for different divisions and Category. Based on the historical data which is feed to SAS we get recommendations on a weekly basis. Committee reports are then created from the Data mart.

Responsibilities:

Worked extensively on SQL and Teradata utilities Fast export, fast load.

DataStage ETL Development for Dynamic Pricing. From this Data warehouse reports are created for business to analyse sales.

Extract information from Teradata, DB2, MQSQL, Netezza, Files and Etc

Maintenance of Inbound process which uses complex Teradata queries for ETL.

Enhancement, Maintenance and on call support for Markdown Management System.

Developing Code, test plan and test case preparation.

Analysing data and systems using Teradata & SAS to create reports and queries raised by business.

Environment:

DataStage 8.1, Teradata, Netezza, MYSQL, UNIX, Perl, VS-COBOL II, JCL, SQL, TSO/ISPF, Endevor, Change man, File-Aid, DB2, Control M, SAS Revenue Optimization 2.2, SAS 9.1, SAS Tables,

VTRS (VIP Transaction Research Service) for VISA, CA

August 2006 – August 2007

Scope: VIP Transaction Research Service (VTRS) is an easy-to-use online research tool designed to simplify research for Member and internal VIP authorization transaction queries. VTRS system gets feed from various upstream applications (FTL, Base II, IARS, and VB) and stores the data in DB2 and VIEWDIRECT. From the Data Warehouse many reports are also generated from VTRS which are sent to DOCUMENT DIRECT. VTRS data is used by members who access the data from various online applications like TLC (Transaction Life Cycle), VOL (VISA Online) and VROL (VISA Resolve Online) and VEX (VISA Exception).

Responsibilities:

Analysis and coding for releases and maintenance work.

Taking care of queries from various users and providing them the needed data and access. Production Support for VTRS application.

Environment: ES-9000, Pentium, OS/390, Win NT, VS-COBOL II, JCL, TSO/ISPF, CICS, File-Aid, DB2, Endevor, VIEWDIRECT, DOCUMENTDIRECT, CONTROL M, INFOMAN, CLEAR QUEST, EZTRIEVE, ASSEMBLER.

AWICS (American International Workers Compensation System) for AIG (India)

March 2005 to June 2006

Scope: AIWCS is an underwriting system for worker’s compensation line of business. The project entails maintenance activities and support on a 24X7 basis for all open systems and Mainframe jobs running at client location. Tickets (Issues / problems) as well as tasks (requests put forth by the underwriter or broker) are received for resolution. Root cause analysis is performed for tickets received, in order to ensure that the same problems do not surface in future.

Responsibilities:

All tickets were resolved within the SLA’s, which is 5 Days for Severity 3 & 4 type of tickets.

Doing root cause analysis for all the tickets on a monthly basis.

Preparation of Metrics, status reports, estimates review for patches and enhancements.

Team Management and Onsite / offsite co-ordination.

Preparation of project documents and process to align with the ISO standards.

Environment: ES-9000, Pentium, OS/390, Win NT, VS-COBOL II, VB, SQL 2000, TSO/ISPF, File-Aid, DB2, MQ Series, PVCS Version Manager & Endevor.

Actuarial System for Prudential, New Jersey

Aug 2002 to February 2005

Description of Actuarial Systems: Actuarial system includes PFMC (product Financial Management Cycle), VMF (Vendor Master File), GAAP (Generally Accepted Accounting principles), HLVS (Health Lives Valuation System) and IWP (Intermediate Weekly Premium).

Responsibilities:

Preparation of software requirements and design document.

Developing Code, Code reviews, test plan and test case preparation, UAT support.

Preparation of project estimates and schedules for new initiatives, status reporting to the client.

Environment: ES-9000, Pentium, UNIX, OS/390, Win NT, VS-COBOL II, SQL, TSO/ISPF, Change man, File-Aid, VSAM, PVCS Version Manager v7.5, IBM DB2 Command Centre, Secure Shell, SQL, STROBE

QA for MNBS for Prudential, New Jersey

Jan 2001 to July 2002

Scope: CICS region was being maintained for the MNBS System. The region had to be updated with the new release, involving loading of packages, updating of tables, transactions and screens. The region needs to be monitored during the daytime when testers execute test plans.

MNBS Maintenance for Prudential, New Jersey

March 2000 to December 2000

Scope: The project involved the enhancement and maintenance of MNBS for assisting the New Business Division with enhancements, maintenance and support on the application.

Environment: ES-9000, Pentium, ES/390, Windows NT, VS-Cobol II, TSO.ISPF, DB2, IDMS, CICS, Changeman, SAS, File-Aid, VSAM, Expeditor, REXX

Claims System for Ohio Casualty Group, Ohio

June 1998 to Feb 2000

Environment: ES-9000, Pentium, ES/390, Windows NT, VS-Cobol II, EZTRIEVE plus, SSDF, File-Aid, VSAM, PANVALET

Advanced Loan System for Citibank,

Dec'96 to June'98

Using ES-9000 (placed at Singapore), MVS, VS-Cobol II, CICS, RPI, API

Education

MCA (Master of Computer Application), Anna University, Chennai (Madras), India.

Contact this candidate