Resume

Datastage Developer Data Warehouse

Location:

Chicago, IL

Posted:

March 27, 2024

Contact this candidate

Resume:

Bharat Kumar

Phone - 312-***-**** E-mail –ad4l9r@r.postjobfree.com

Professional Summary:

About 7+ Years of IT experience with knowledge in various domains, SDLC models and technologies.

About 7 years of experience in building data warehouse with extraction, transformation and loading (ETL) processes using IBM Infosphere (Information Server) DataStage (11.7,11.5,9.1), Informatica Powercenter (7.1.3/8.6.1)/Informatica cloud data, Talend Open studio (7.2,7.3), SQL Server 2012 version, Sybase and Teradata.

Experienced in Requirement Analysis, Test Design, Test Preparation, Test Execution, Defect Management, and Management Reporting.

Strong hands-on experience on Teradata Utilities such as BTEQ, FLOAD, MLOAD, TPUMP and Analyst using SQL SERVER, TERADATA (V14, V15.10, V16.20).

Strong working experience in all phases of development including Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses and Data Marts using IICS Informatica Cloud (CIH, CDI, CAI) and Power Center (Repository Manager, Designer, Server Manager, Workflow Manager, and Workflow Monitor).

Extensive experience in MicroStrategy Report development, Report testing, Report tuning, Ad-hoc, scheduling report delivery and Alerts.

Expertise in PySpark, Unix Shell scripting and Python.

Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Scala, Impala, Kafka, Yarn, Oozie, and Zookeeper.

Experience in Developing OLAP applications using Cognos 8BI - (Frame Work Manager, Cognos Connection, Report Studio, Query Studio, and Analysis Studio) and extracted data from the enterprise data warehouse to support the analytical and reporting for Corporate Business Units.

Extensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirements, Analysis/Design, Development and Deployment.

Experience in migrating existing databases from on premise to AWS Redshift using various AWS services

Strong experience on Data Migration on AWS Redshift Lake, S3 and EMR.

Expertise in Web technologies using Core Java, J2EE, Servlets, EJB, JSP, JDBC, Java Beans, and Design Patterns.

●Experience with Hortonworks Hadoop distribution components and custom packages.

●Experience in Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie and managing Metadata with Big Data.

●Experience in Developing complex Teradata SQL code in BTEQ script using OLAP and Aggregate functions.

●Extensively worked on PMON/Viewpoint for Teradata to look at performance Monitoring and performance tuning.

Efficient in incorporation of Various data sources such as Oracle, MS SQL Server, Sybase, JSON and Flat files into staging area.

Strong Knowledge of Data Warehouse Architecture which includes Star Schema, Snow flake Schema and Dimensional Tables, Physical and Logical Data Modeling

Experienced in migrating various data applications from Legacy to New Version Systems.

Knowledge in Query performance tuning using Explain, Collect Statistics, Compression, NUSI and Join Indexes including Join and Sparse Indexes

Worked in different roles such as Data Analyst, Developer and Production Support.

Experience in Banking and Health Care Domains.

Competent in handling all levels of data warehouse life cycle, including requirements gathering, analysis, preparing business specifications, designing technical specifications, and system requirement specifications.

EDUCATIONAL QUALIFICATION

Bachelors in Computer Science and Engineering, JNTU, India 2014.

Masters in Computer Science, Governors State University, IL 2017.

TECHNICAL SKILLS

Languages : C, C++, Java, SQL, PL/SQL, Scala, C, Unix-shell scripting, Python.

Hadoop Eco System : Hadoop 1.x/2.x(Yarn), Streamsets, Cloudera, MapR, PySpark, HDFS,

MapReduce, Mongo, HBase, Hive, Pig, Zookeeper, Sqoop, Oozie, Spark,

Python, Scala, Impala, Flume, Avro, Talend, Eclipse, Cloudera-desktop.

API’s : Servlets, EJB, REST, Java Naming and Directory Interface (JNDI),

MapReduce.

Database Tools : Teradata SQL Assistant, SQL*Loader, Db Visualizer, Oracle Designer, Sybase, SQL server, XML, SalesForce.

Teradata Tools & Utilities: SQL Assistant, BTEQ Load & Export: FastLoad, MultiLoad, Tpump, TPT, Fast Export, DataMover.

ETL Tools : IBM Information Server 11.7, 11.5and 9.1, IBM Information Analyzer 11.7 Server programming, DataStage 11.7, 11.5, Informatica Powercenter

(7.1.3/8.6.1)/IICS, SSIS, Autosys, Control-M.

BI Tools : PowerBI, MicroStrategy ONE.

RDBMS Servers : Oracle 10g, Teradata, Sybase, MS SQL Server 2012.

Cloud Services : AWS Services (S3, EC2, EMR, RDS, Amazon RedShift), Snowflake.

Operating Systems : UNIX Solaris 8/2.x, Red Hat Linux 7.x, Windows 10, Ubuntu 13.X, Mac OSX.

Development Tools : Eclipse, RAD/RSA (Rational Software Architect), SQL Developer,

Microsoft Suite (Word, Excel, PowerPoint, Access), DbVisualizer

VM Ware, Teradata SQL Assistant.

Data Modeling : Dimensional Data Modeling, Data Modeling, Star Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables.

PROFESSIONAL EXPERIENCE

Client – OPTUM/UHG

Location – Eden Prairie, MN

Year - Oct 2022 to Present

Role - Senior ETL/Datastage Developer

Project: PAU/ UDW – Polaris Project/Under Writing Reports

Responsibilities high Level:

Designed Parallel jobs involving complex business logic, update strategies, transformations, filters, lookups and source-to-target data mappings to load the target using DataStage designer jobs – 45%

Design, Analysis, Maintenance, Performance Tuning, documenting and testing of Data Stage Jobs – 25%

Gathering business and application requirements, providing feedback on missing or contradictory requirements, estimating and planning Sprints and execution of said Sprints – 10%

Organizing and managing all phases of the Software Development life cycle i.e., Developing, testing process and Deployment of the code into Production including creation of tests, executing tests, and defect management process – 12%

Responsible for unit, system and integration testing. Development test scripts, test plan and test data. Participated in UAT (User Acceptance Testing) – 8%

Responsibilities in Detail:

Worked on understanding of Data Warehouse/ODS, ETL concept and modeling structure principles in Snowflake Environment and have knowledge of Snowflake Database, Schema and Table structures.

Created Database Design documents using requirements and functional specification document.

Used relational sources and flat files to populate the data mart. Translated the business processes into Informatica mappings for building the data mart.

Coordinating with the team for reviewing the ETL code and provided development support.

Also working as ETL L1 and L2 Production Support where we need to debug the failures and give the temporary solution for the runs.

Worked on understanding of Data Warehouse/ODS, ETL concept and modeling structure principles in Snowflake Environment and have knowledge of Snowflake Database, Schema and Table structures.

Performed various applications using Python in Azure cloud environment in the process of cloud migration from Teradata to Snowflake.

For the Initial history loads in our migration project to snowflake which we loaded by using snow pipes.

Worked on Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data.

Developed BTEQ scripts to load data from Teradata Staging area to Teradata data mart.

Created load scripts using Teradata Fast Load and Mload utilities and procedures in SQL Assistant.

Performed Query Optimization with the help of explain plans, collect statistics, Primary and Secondary indexes.

Used volatile table and derived queries for breaking up complex queries into simpler queries.

Streamlined the Teradata scripts and shell scripts migration process on the UNIX box.

Error handling and performance tuning in Teradata queries and utilities.

Code maintenance in Github enterprise of all the artifacts and components of the Microsoft Azure cloud migration process.

Created coordinator file watcher jobs in Microsoft Azure cloud environment to trigger the jobs when the trigger file is updated or new trigger file is created.

Performed AZCopies to copy the data from HDFS to the Microsoft Azure Storage Blob Containers.

Worked on Member Engagement programs and Member Segmentation programs which categorize the members into different sections which help in the member reach out program.

Created interactive and visually appealing dashboards and reports using Power BI to present complex data insights to stakeholders.

Designed and customized reports to meet business requirements, including drill-down functionalities and dynamic filtering.

Utilized Power Query Editor within Power BI to perform data cleansing, transformation, and aggregation tasks.

Designed and implemented ETL processes to extract data from various sources, transform it, and load it into Power BI datasets.

Environment: Python 3.11, IBM Info sphere DataStage 11.7, 11.5, UNIX, Teradata SQL Assistant, DB2 Squirrel, XML files, AWS S3, AWS RedShift, Teradata viewpoint, TWS, Airflow, Snowflake, Databricks, SQLDeveloper, Unix Shell Scripting, PowerBI, Eclipse, Postman.

Client – USAA

Location – San Antonio, TX

Year - Feb 2022 to Oct 2022

Role – Sr. Datastage Developer

Project – RCSA 3.0

Responsibilities in Detail:

Extensively used Data Stage Parallel Extender Designer to develop various jobs to extract, cleanse, transform, integrate and load data into data warehouse database.

Production ETL consisted of a combination of using DataStage ETL, table restructuring, and UNIX scripting.

Extensively worked on Job Sequences to Control the Execution of the job flow using various Activities & Triggers (Conditional and Unconditional) like Job Activity, wait for file, Email Notification, Sequencer, Exception handler activity and Execute Command.

Worked on Integrating APIs from Snowflake to Salesforce using Data Ingestion pipelines.

Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.

Designed and implement data structures and Created business process and process flow.

Involved in day-to-day Scrum calls in agile methodology meetings with onsite team providing updates on the testing and Defect’s status.

Used relational sources and flat files to populate the data mart. Translated the business processes into Informatica mappings for building the data mart.

Experienced in connecting Cloud Based sources like AZURE topic, KAFKA and AZURE catalogs using REST API web services.

Experience in Python Scripting for automation of different tasks in the projects.

Worked on creating the Airflow scheduling job using python scripts.

Worked on Integrating APIs from Salesforce to Snowflake using Data Ingestion pipelines.

Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.

Designed and implement data structures and Created business process and process flow.

Utilized DB2 database management system for storing and managing enterprise data.

Implemented database schema designs, tables, indexes, and stored procedures in DB2 to support ETL processes and data storage requirements.

Worked on understanding of Data Warehouse/ODS, ETL concept and modeling structure principles in Snowflake Environment and have knowledge of Snowflake Database, Schema and Table structures.

Documented technical specifications, data mappings, and ETL workflows for DataStage jobs and DB2 database structures.

Generated reports and documentation to communicate ETL process designs, data lineage, and system configurations to stakeholders and team members.

Worked on understanding of Data Warehouse/ODS, ETL concept and modeling structure principles in Snowflake Environment and have knowledge of Snowflake Database, Schema and Table structures.

Environment: IBM Info sphere DataStage 11.7, UNIX, DB2 Squirrel, AWS S3, Control M, Snap Logic, Snowflake, Salesforce, Python.

Client – OPTUM/UHG

Location – Eden Prairie, MN

Year - June 2019 to Feb 2022

Role – Datastage Developer

Project: PAU/ UDW – Polaris Project/Under Writing Reports

Responsibilities in Detail:

●Reverse engineered the legacy SQL queries to find out where the respective tables and columns of legacy SQL extracts exist in Datalake.

●Developed an application in Pyspark to summarize the suspect gap information on the Members’ gap history status.

●Worked on cloud migrating our projects applications for BigData platform (BDPaaS) to Microsoft Azure cloud environment.

●Created Unix shell scripts to create the input for the Pyspark programs to read from the oozie workflow actions.

●Created and executed Oozie workflows and sub-workflows containing several actions of Hive, Shell, Java, PySpark, Sqoop, etc.

●Developed and implemented Microsoft Databricks jobs to run Databricks Notebooks performing dataloads in Databricks Workspaces.

●Created tables in Microsoft databricks to store the data in Microsoft Azure storage in delta format.

●Implemented Airflow DAGS using Python for data load and analytics processes from BDPaaS to Microsoft Azure Deltalake.

●Performed various applications using PySpark in Azure cloud environment associated with Oozie web console in the process of cloud migration of our PySpark projects.

●Created metadata json files for the delta tables created in Databricks.

●Created various Hive tables (external and internal) in BDPaaS environment to store and view the data, also created Hive tables on the top of HBase tables.

●Designed and implemented Azure Terraform scripts to create Microsoft Azure Databricks Resource Groups, Workspaces, Virtual Networks and Subnets.

●Created workflows, trigger files and shell scripts in Microsoft Azure Cloud environment.

●Created decision actions for Oozie workflows to decide whether the AZCopy is required based on the data availability.

●Designed and developed a deltalake application to pull incremental/full refresh data based on the input, from HBase and MySql databases to HDFS followed by cloud storage into the partitioned delta tables created in Microsoft Databricks.

●Implemented data cleanup processes in Databricks using Pyspark.

●Designed and implemented a PySpark application to gather, analyze and generate detailed information about gap information for the members of the member groups with different conditions and store it in MySQL database through sqoop.

●Provided continuous Production Support and monitored the job runs which are running on a frequency.

Environment: PySpark, Python, Hive, HBase, Sqoop, MySQL, DbVisualizer, MapR, Microsoft Azure, Microsoft Azure Storage Explorer, Databricks, SQLDeveloper, Unix Shell Scripting, Eclipse, Postman, Spark, BigData, Airflow, Terraform Scripting.

Client – JP Morgan Chase

Location – Chicago, IL

Year - Sep 2018 to June 2019

Position: ETL Datastage Developer

Complexity: If the migration load fails for one client the process should not abort for other clients and should continue and this is resolved by making all jobs to run in multiple instances. Issue resolution is the essence of the hour as clients should be start using the new system from the next day.

Responsibilities in Detail:

Analyzed business processes and coordinated with business analysts and users to get specificity on requirements to establish ETL standards and build Data Marts.

Writing technical documentation on how to design and execute parallel jobs, batch jobs or sequencers for Data Migration and scheduling the jobs

Developed ETL procedures to ensure conformity, compliance with standards and lack of redundancy, translated business rules and functionality requirements into ETL procedures using Data Stage.

Extensively used Data Stage Parallel Extender Designer to develop various jobs to extract, cleanse, transform, integrate and load data into data warehouse database.

Designed and develop HP Vertica anchor tables, Projections. Analyse query logs and make corrections to Projections.

Develop HP Vertica VSQL scripts for bulk loading, delta loading stage & target tables Using IICS Cloud Data Integration

Extracted the raw data from SharePoint, Sql Server, My Sql, Flat Files to staging tables, and loaded data to topics in Cloud integration hub (CIH), Flat files, SharePoint, Sql server using Informatica Cloud.

Hands on Experience with loading Customer data, Teller transactions data and Accounts Data to Salesforce and did Upserts with help of External ID (Hash Key), to maintain SCD Type 1 within CDI Using Task Flows and Used Bulk API, and Standard API.

Implemented Type1, Type2, CDC and Incremental load strategies and used Exceptional mappings to generate dynamic parameter files after the staging loads.

Used Shared folders to create reusable components (Source, target, Mapplets, email tasks) and wrote PowerShell scripts to help smooth flow of files and ETL process

Extracted client data from varied sources like flat files, XML files, Teradata tables and mapped the data into Operational Data Stores (ODS).

Creation of BTEQ scripts to perform the ELT operations as well as loading the data from multiple sources.

Collaborated with end-users to gather requirements and translate them into effective Power BI solutions.

Implemented row-level security and data-level security measures to ensure controlled access to sensitive information.

Conducted training sessions for end-users on Power BI usage and self-service reporting capabilities.

Stayed abreast of the latest Power BI features and updates, incorporating them into existing reporting solutions.

Worked closely with the development team to optimize data models and queries for Power BI performance.

Integrated Power BI reports with various data sources, including SQL Server, Excel, and cloud-based platforms.

Environment: IBM Information Server Data stage / Quality Stage 11.5, Informatica Powercenter (7.1.3/8.6.1)/ IICS, SQL Server, Unix Shell Script, MS Access, HP ALM, and GITHUB, Jenkins and CI CD, Salesforce, Power BI.

Client - AIM/Anthem

Location - Deerfield, IL

Year – Aug 2016 to Aug 2018

Role – ETL Datastage Developer

Project - 1: Implement Legacy Project - Premera

Project - 2: EIM Provider Project – Staging & Enterprise

Responsibilities in Detail:

Designed Data stage jobs using Sequential File stage, Complex Flat File Stage, Modify, Surrogate Key Generator, Pivot, Filter, Funnel, Join, Lookup, Transformer, Copy, Aggregator, and Change Capture.

Performed all the ETL Testing Techniques for Metadata Testing, Data Completeness Testing, Data Accuracy Testing, Data Transformation Testing and Data Quality Testing, Data integration.

Worked on data validation, constraints, record counts, and source to target, row counts, random sampling and error processing.

Verified session/view logs to identify the errors occurred during the ETL execution on Datastage Tool.

Debugging and Scheduling ETL workflows and Mappings and Monitoring error logs on Data stage ETL Tools.

Verifying and Validating BI reports and Business objects reports.

Involved in day-to-day Scrum calls in agile methodology meetings with onsite team providing updates on the testing and Defect’s status.

Used relational sources and flat files to populate the data mart. Translated the business processes into Informatica mappings for building the data mart.

Used browse object feature on SFDC stage to import table definitions of SFDC objects.

Extensively used Quality Stage to convert data from legacy sources into consolidated high-quality information within the enterprise Warehouse and Data Mart.

Designed Parallel jobs involving complex business logic, update strategies, transformations, filters, lookups and source-to-target data mappings to load the target using DataStage designer.

Implemented Dimensions (SCD) Type1, Type 2 to capture the changes using DataStage.

Designed Data stage jobs using Sequential File stage, Complex Flat File Stage, Modify, Surrogate Key Generator, Pivot, Filter, Funnel, Join, Lookup, Transformer, Copy, Aggregator, and Change Capture.

Extracted data from various databases like DB2, SQL Server, Oracle.

Coordinated with the team for reviewing the ETL code and provided development support.

Environment: IBM Info sphere DataStage 11.3, Oracle 11g, Flat files, UNIX, MS SQL Server database, Sybase, XML files and TortoiseSVN.

Contact this candidate