Etl Developer Data Engineer

Location:

Germantown, MD

Posted:

May 25, 2023

Contact this candidate

Resume:

Abhishek Kumar

Email: *************.**@*****.***

Phone: +1-914-***-****

Linkedin : https://www.linkedin.com/in/abhishek-kumar-620ba597/

SUMMARY:

●12 plus Years of experience in Data warehouse/ data integration ETL development & maintenance in the IT industry including Banking & Financial Services and Food Industry with clients like FDA,UBS, Bridgewater Associates, and JPMC.

●Certified Data Engineer and Spark developer from Databricks .

●Design and implement legacy data integrations into optimized pipelines using pyspark .

●Present and implement machine learning solutions using embedding models, mlflow, sparkml for logistic regression models . Fine Tuning existing models on specific data by hyperparameter tuning .

●Create spark jobs configuring the necessary spark job clusters.

●Having expertise working in agile model, requirement gathering, analysis, end to end implementation with strong skills in latest technologies, application servers, and databases.

●Possess excellent communication skills and a performance-oriented team player with Good analytical, design skills, and problem-solving abilities.

●Experienced in different databases like SQL Server 2012/2014, Oracle 10i/11g, ETL tools like Informatica Power center, Clarity Transaction Control, SSIS,Tableau, Unix/Linux Shell scripting, xml scripting.

●Strong Informatica / Oracle / Scripting skills

●ETL solutions implementation, Data Warehousing, ODS design, ETL design, ETL Performance Tuning, Database design, Data Processors,Data Profiling, Data Management, Database Performance Tuning, and Audit data.

●Involved in all the phases of Data warehouse projects pertaining to Software Development Life Cycle (SDLC).

●Also involved in estimation, competency development, and knowledge transfer and team management.

●Good understanding of quality processes and worked in onsite\ offshore model.

TECHNICAL SKILLS:

●Operating Systems / Cloud : Windows /Unix, Linux / AWS .

●Languages / Framework : pysaprk, Unix/Linux scripting, SQL, PL-SQL, Autosys, Active Batch.

●RDBMS Relational databases: Oracle 10g/11g, sql server (all versions from 2012x).

●Tools/Services :Databricks, s3, dms, IAM, RDS, Informatica Power Center 8.1.x,8.6.x,9.1.x,10x, Informatica Data Quality (IDQ) 10x, Informatica Big Data Management v10x, CTC Recon,SSIS,Tableau.

●FTP: WinSCP, Smart FTP.

●Third Other Tools/Tech: VSS-Visual Source safe, Sub Version, git .

●Big Data: Hadoop HDFS architecture, HBase, Hive SQL

CERTIFICATIONS:

●Databricks Certified Data Engineer : Databricks Corporation

●Databricks Certified Apache Spark 3.0 developer

●Completed the S - PowerCenter 8 Mapping Design certification exam in Informatica.

●Completed the R - PowerCenter 8 Architecture and Administration certification exam in Informatica.

WORK EXPERIENCE:

Company : Apptad Inc.

Projects : CDER - Integrity - DSC

Client : Food and Drug Administration (FDA): Silver Spring (MA)

Senior Data Engineer / AWS - Databricks - Informatica IDC, PC, Machine Learning

Team Size: 11

March2020 - Present [ Evolving from Onprem to AWS cloud platform ]

Description:

The purpose of this Project is to provide information on the Center for Drug Evaluation and

Research’s (CDER) strategic informatics capabilities and guidance on how to apply them in addressing CDER user

and system needs.

CDER’s Strategic Capabilities Map depicts the capabilities that OBI is bringing to CDER users through the

implementation of their use cases in order to support their business needs. The map captures the four strategic

capabilities: (1) Electronic Submissions, (2) Integrated Data Management (Integrity), (3) Integrated Work

Management (Panorama), and (4) Business Intelligence and Publishing (Mercado).

Technologies used: AWS Cloud Platform,pyspark, python,,Databricks,s3,AWS DMS,RDS, Informatica Powercentre 10.2.0,Informatica Data Quality 10.2.0, Oracle, SQL/ PL-SQL, Toad, Linux Scripting, Java web services, Message Q’s, Fixlite adapter, Operating System: Linux 2.6

Responsibilities :

Phase II

●Identifying and analyzing business requirements, translating requirements to technical specifications and mapping documents.

●Perform and maintain data modeling using erwin and key project documents .

●Manage Jira application for the team .

●Design and develop databricks notebooks using python, pyspark, s3, rds in the most optimized way .

●Design and implement legacy data integrations into optimized using pyspark .

●Present and implement machine learning solutions using embedding models, mlflow, sparkml for logistic regression models . Fine Tuning existing models on specific data by hyperparameter tuning .

●Create spark jobs configuring the necessary spark job clusters.

●Design, develop and monitor high-performance Spark jobs for data enrichment, cleansing.

●Providing machine learning solutions using NLP / NER / transformers / pipeline .

●Do POC for machine learning solutions, test train data set creation and concept innovation.

●Implementing streamings spark jobs using databricks notebooks, autoloader, delta live tables.

●Use CI/CD pipelines using git to deploy code from dev to higher env’s.

.Phase I

●Designing and developing ETL and IDQ solutions using Informatica Powercenter and IDQ that follows the in-house ETL development standards; managing program coding for the extraction of data from all existing source systems, transforming data to industry standards and generating extract files.

●Creating Data Processor services to consume and generate XML, Json, etc file formats and using them in Informatica PC in Unstructured data transformation.

●Developing and maintaining scorecards and Profile with audit logging and alerts management.

●Establishing testing/quality assurance environments; working with clients and system to ensure that programs and modifications are error free .

●Monitoring system performance, and other aspects of systems to identify technical problems; monitoring data integrity to identify data quality problems; designing and enhance the data quality improvement process including feedback to system stewards.

●Troubleshooting production issues.

●Providing technical support to users; assisting clients in analyzing functional problems; work with clients to prioritize the data quality concerns and to communicate the need for accurate data by other constituents at the university; producing technical specifications and data mappings document; documenting the program modifications

●Documenting the solutions, the programming changes, and the problems and resolutions.

Company : Capgemini America Inc.

Projects :

Client : UBS, Stamford, Connecticut,USA

ETL Lead / senior developer

Team Size: 16 (Core + Book of Work)

Data Volume: Huge

Oct 2018 – March2020

Description:

●In FnO master derivatives data for all the markets, having clients across the globe.

●Source data from various upstream internal and external like DML, Denali, Reuters, Bloomberg,

●GMI and regulatory bodies like OCC Chicago (Option clearing corporation).

Responsibilities:

●Requirement gathering from clients in co ordination with Ops and BA’s

●Scoping the project in terms of timelines and resourcing.

●Designing the batch and real time data integration flows to handle huge volume of data.

●Take technical interview round for new members.

●Create complex Informatica code and mentor other team members to streamline with good coding practices.

●Creating procedures, triggers, functions etc using PLSQL code.

●Writing tuned sql queries used for reporting application.

●Preparing all the necessary documents like Run book, Low Level Design, Deployment Sheets, Delivery packages.

●Do ETL design and scheduling using Autosys, cron and Informatica scheduler.

Technologies used: Informatica Powercentre 10.1.1, Autosys – Scheduler, Oracle, SQL/ PL-SQL, Svn, Shell Scripting, Java, Message Q’s, Fixlite adapter, Operating System: Linux 2.6

Company : Cognizant Technology Solutions

Projects:

Client : Bridgewater Associates, Connecticut USA

ETL/Informatica Lead

Team Size: 10

Apr 2016 – Sept 2018

Description:

●Corp data solution provides HR data warehouse management; the warehouse consumes data from various streams of the company and consolidate at a data layer constitutes an ODS and warehouse.

●It’s a data migration project with multiple source system integration as Workday, ADP, Splunk etc.

Responsibilities:

●Leading peak team size of 6 members for various data integration projects of Bridgewater Associates.

●Design and developed the Error handling, the Audit Balance and control framework for end to end data warehouse platform.

●Responsibilities include design and develop mappings with optimal performance and industry best practices.

●Created IDQ Plan to cleanse and standardize the source system data.

●Designed and Developed IDQ plans with source system specific requirements using Rule based Analyzer, Weight based Analyzer, Address Validator, Bigram etc.

●Attended Team meetings as an ETL/Informatica technical advisory to offer opinion and guidance to managers to select the best plan of approach at project startup meetings.

●Designed and developed mappings with optimal performance using Aggregator, Joiner, Normalizer, Rank, Sequence Generator, Un-Cached & various kinds of Cached Lookup, Connected-Unconnected-Source Target pre-andpost-load Stored Procedure Transformations, Update Strategy, Union etc.

●Partitioning to deal with huge volumes of data where ever possible to improve performance of the mapping.

●Preparing all the necessary documents like Run book, Low Level Design, Deployment Sheets, Delivery packages.

●Provide application maintenance and production support.

●Worked on the Workday to Informatica integration setup.

Technologies used: Informatica Powercentre 10.1.1, Informatica Cloud - WorkDay Cloud Integration, Informatica IDQ, Windows power shell scripting, Active Batch –Scheduler, Sql server, SQL, Team City, Powershell

Client : Amex, Pune, India

ETL Solution Architect

Product Development Team - Data Migration platform

Nov 2015 – Apr 2016

Description:

●Currently for Temenos Data Migration, Cognizant purchases license of proprietary product Validata, having developed home grown product for migration will do away need paying license fees to external party leading to build FasData migration framework.

●Project Abstract

●FasData: Automated metadata driven dynamic data Integration platform to facilitate data migration for Core Banking Platforms like Temenos or FICO. FasData enables migration of business data across different systems. FasData enables user to describe nature of the data, write transformation rules and then execute the rules. Highly secure, available system ensures that data is seamlessly migrated from source system to generate inception messages for T24 Open Financial Service(OFS) that loads data into Temenos T24 platform which is a product lifecycle management software platform that powers core banking operations.

Responsibilities:

●The role is to work as a Senior ETL developer/ETL Architect at BFS-PSP and co-ordinate with UI team on development activities on day to day basis.

●Analyze business requirements to infer and document functional requirements.

●Works on technical specifications, design and development of informatica mappings and workflows for data centric applications and is needed to augment and strengthen our capabilities in this space.

●Managed Functional modules independently and merge work as required per release schedule.

●Managing and co-coordinating work items and deliverable across team.

●Writing and Executing the Test Cases for the ETL BUILD.

●Design, document and develop solutions based on functional requirements using Generic ETL Framework.

Technologies/Tools Used: Informatica 9.6.1, Informatica Power Center (8.6.1), Linux shell scripting, Linux, SQL PL/SQL, Java 1.8, Oracle 11g.

Client : UBS, Zurich, Switzerland

Sr. ETL Developer

Team Size: 10

Apr 2013 – Oct 2015

Description:

●GRU provides reconciliation solution to various wealth management and investment banking systems worldwide across UBS providing automated reconciliation for Position, Transaction and Instruments.

●Recon: Understanding the requirement, CTC XML Programming, Segregation of the process to be done via ETL and Reconciliation tool (CTC), Handles end to end production releases coordination various teams at onshore location. Production Release support and Documentation, Early life support Strategy for production environment, Team Handling.

Responsibilities:

●Development and implementation of Extraction, Transformation and Loading of Data using Informatica.

●Analyze the requirement and raise Issues on requirement received.

●Estimate the project efforts as per defined scope and create Work Based Structure. Allocate tasks to resources and track project status.

●Involvement in Design, Development and Testing of Mappings using Informatica Power Center 8.6.1.

●Creation of sessions and configuring workflows to run the mappings.

●Tuning the process at mapping & session level for better performance.

●Interaction with customer on daily basis resolving queries and ensure delivery is on-time

●Defect Tracking and Fixing

●Handled UNIX to Linux migration for the project.

●Plan: Generate, plan and analyze the requirement.

●Scheduling ETL jobs creating Autosys jils.

Technologies used: Windows XP, Informatica Power Center (8.6.1,9.5),Linux shell scripting, SQL, CTC, Autosys

Client : JP Morgan Chase (JPMC), Pune, India

ETL Developer

Oct 2009 – Apr 2013

Description:

●JPMC – IB has initiated a program called Credit Infrastructure Transformation to consolidate most of the existing Credit Infrastructure applications into a robust, scalable strategic infrastructure. Primary objective of this program is to capture and provide most accurate risk information timely to various people in GCRM group. CIT program has 2 major areas called Traditional Credit Products (TCP) and Data Acquisition & Control (DAC). DAC is envisaged to collect Trades & Collateral data from various systems with in credit environment. TCP infrastructure is to reduce the overlapping TCP application scope, enhance existing TCP applications and build new applications to address TCP data requirements.

●For this initiative, TCP & DAC managing a project to produce an ETL solution using Informatica for GCRM group. The requirement is to process around 2000 feeds distributed between approximately 270 work packages (a work package is a grouping of feeds based on functionality or feed type/format).

Project #1: JPMC – IB CIT-DAC TCP

Project Abstract:

●TCP (Traditional credit products) is one of the Asset class for CITDAC. In this process the supplier data elements are verified, validated and transformed by the DAC and provided for both strategic and legacy consumers. The suppliers of data are Line of Business Transaction systems in the Investment Bank (IB) areas of JPMC.

Project #2: JPMC – IB CIT-DAC RECONCILATION

Project Abstract:

●JPMC IB DAC has multiple consumers and there are multiple data extraction and data insertion jobs between multiple consumers. As it happens, the nature of data changes as it passes from one consumer to another consumer. DAC Reconciliation is the process of reconciling or tie up the data between two consumers and generate reports for ops purpose.

Project #3: JPMC – IB CIT-DAC EBR (Maintenance and Enhancement)

Project Abstract:

●Aim of the project is to understand the existing process and modify the process as per the requirement. EBR module handles enhancement, bug fixes across multiple application in DAC. EBR mainly caters data quality enhancement for multiple LOB’s and was initiative of JPMC operations and plan to support changes into bas code after post production release. Main feeds handled were: TCP (Traditional credit products), Cash securities and Market Risk investment

Responsibilities:

●Detailed analysis of System and Design the application.

●Develop Informatica mappings and sessions based on user requirements

●Unit and System testing, Work as a Defect Prevention controller in the project to minimize the defects occurred in the project.

●Develop Unix Scripts based on user requirements.

●Resolving the technical issues faced by team members.

●Communication with onsite coordinator.

●Upgrade Informatica components from 8.1 versions to 8.6 versions.

●Scheduling ETL jobs creating Autosys jils.

Tools: Informatica Power Center (8.1.1, 8.6.1), Linux

EDUCATION:

●Bachelor in Computer Science and Engineering under the affiliation of Rajeev Gandhi University of Technology Bhopal, Madhya Pradesh.

ACHIEVEMENTS:

●Was awarded by “Special contribution to project excellence” award in Cognizant.

●Awarded as best Project in BFS Cognizant Switzerland.

●Published various white papers within cognizant over topics related to Data Integration and Upcoming technologies (Cloud Computing).

●Published papers within the organization over topics like Informatica features: pipeline processing, SQL /PLSQL tuning practices.

Contact this candidate