Azure Data Factory

Location:

Bloomsburg, PA

Salary:

135000

Posted:

February 20, 2025

Contact this candidate

Resume:

Navin Kumar Vishwakarma

https://www.linkedin.com/in/navin-vishwakarma-8087171a/

469-***-****

PROFILE SUMMARY:

With an impressive 18-year track record across diverse data warehousing platforms, I excel in harnessing the power of tools like Informatica PowerCenter, Intelligent Cloud Services IICS/IDMC, DBT (Data Built tool), IBM InfoSphere Data Stage, SAP-BODS, HVR and more. Hands-on experience with Azure Data Factory, AWS services, including S3, AWS Glue, RDS, Athena, Lambda, and Step Functions.

My expertise extends to databases such as Snowflake, DB2-UDB, Teradata, Oracle, SQL Server, Greenplum, Vector wise. Orchestrated the integration of countless source system and syncing data between on-premises and cloud environments. Experience in data migration across various environment between on-premise and clouds and Indexing. Acquired solution in snowflake and cloud application for features like Separation of Storage and Compute, Zero-Copy Cloning, Snow pipe, Data Sharing, Streams, Tasks, High-Performance Query Execution.

Formulate, construct, and uphold multiple ETL dedicated to data transformation, guaranteeing precision and efficacy. Engage in collaborative efforts with data engineers and analysts to comprehend data necessities and provide scalable resolutions. Clear understanding in data ware house with ralph Kimble methodology and have clear understanding of DWH concepts and dimensional modelling and data normalization.

My proficiency spans cloud data warehouse, Azure Data Factory and AWS. Auto ingestion

Proficient in Azure (ADF) worked on many data flows and SCD-1, SCD-2 data flows and pipelines.

Extensive experience with BTEQ scripting in Teradata, optimizing Fast Export, Fast Load, and Multi-Load scripts for performance.

Proficient in Big Data technologies, including Hadoop, HDFS, Hive, and Sqoop, with expertise in data modeling tools like Erwin and Microsoft Visio. Experienced in leveraging BI tools such as Tableau and Business Objects to deliver actionable insights. Skilled in working across UNIX, Linux, and Windows environments, with hands-on experience in modern development and collaboration platforms like Azure Devops, GitHub, Jira.

Create, maintain, and document a repository of functional test cases and supporting test artifacts, such as test data, data validation procedures, and automated scripts. Collaborate closely with QA engineers to design comprehensive test plans. Working closely with the Change Management Team to flow the SDLC/Azile process smoothly and ensure that all changes are properly documented and tested.

TECHNICAL SKILLS:

Azure Data Engineer, Informatica Intelligent Cloud Services IICS/IDMC, Informatica PowerCenter 10.5.2/9.5, IBM InfoSphere DataStage and Quality Stage, SSIS, SAP Business Object Data Integrator, DBT (Data build Tool), HVR, TALEND Open Studio, HDFS, HIVE and SPARK SQL, SCALA, SPARK-CORE and Data Frames.

Project Management, Migration Lead, Integration Expert. RDBMS Green plum, Snowflake, Teradata, Oracle, SQL Server Management Studio (SSMS), DB2 Vector wise. Performance Tuning (Partition, Bucketing, Map side join, Broad cast join in Spark-SQL. Scheduler Tool Control – M 6.3, Tidal. UNIX, Dimensional modeling from conceptual to Logical and physical data modeling.

EDUCATION:

2013: MBA Sikkim Manipal University (Project Management)

2002: B. Com from IGNOU, Dhanbad, Jharkhand (Finance)

2001: Diploma in Computer Application from NIIT Dhanbad (Information technology)

Project: Ware house Management Period: May’24-till Date

Environment: Snowflake, Informatica PC, Azure Data Factory, SAP BODS, DBT, Talend, Power-BI, DB2, Oracle, Data vault

Role: ETL Lead Data Engineer Customer: Knights of Columbus

Domain: Insurance Location: New Haven, CT USA

Description: It’s a migration engagement project from SAP BODS to Talend and DB2-UDB to Snowflake and Web- Focus to Power BI. Scope is to migrate legacy warehouse into Snowflake data ware house for high performance in snowflake including the ETL and reporting. Continuous integration of all the of all the sources and design the ETL technical specification in SQL to load custom view and tables to build the warehouse.

Responsibilities:

Used Azure Data Factory extensively for ingesting data from disparate source systems.

Experience in migrating SQL database to On-Premise using Azure Data factory, Azure SQL Database and Azure SQL Data warehouse and controlling and granting database access and Migrating On-premises databases to Azure Data Lake store using Azure Data factory.

Analyzed the data flow from different sources to target to provide the corresponding design Architecture in the Azure environment

While Migration from Informatica to ADF created the Data flows, Pipeline and setup triggers.

Created numerous pipelines in Azure Data Factory to get the data from disparate source systems by using different Azure Activities like Copy, filter, Derived, Lookup, select and other major activity

In Azure Data factory worked on multiple Input/output Transformer like Join, Condition Split, Exists, Union, Lookup, Schema Modifier: Derived, Select, Aggregate, Surrogate key.

Worked on Mapping document to define the logic for development.

Experience in Data Modelling to build dimension, Facts and maintain Data Integrity,

Row Filter: Filter Row, Sort, Alter Row, Assert Destination: Sink, Azure Blob Storage.

Ware house stability and lineage DBT is also used for data preparation.

Lead the design, development, and maintenance of Snowflake data warehouses.

Oversee data modeling, schema design, and data pipeline development.

Experience in using multiple features of cloning, time travel, tasks, Stream and procedure.

Handled data discrepancy issue using CTE, Temporary, Transient and View has been built to handle complex scenario. Monitor and tune Snowflake environments for performance and cost efficiency.

Worked on different types of CTE Simple, Recursive, Multiple, Inline and CTE with Window Functions.

Coordinate with cross functional teams and engage the turnaround for (Extract, Transform, Load) processes.

Data vault is being used to data modelling to create satellite and hub tables for optimal warehouse solutions.

Integrate various data sources, including on-premises databases and third-party data providers.

Develop and maintain data ingestion processes to ensure timely and accurate data availability.

Source system is in Oracle which has legacy data for the policy holder was migrated in snowflake.

Replicated Oracle indexes and Keys in snowflake to right data with integrity.

Implement indexing, partitioning, and clustering strategies to enhance query performance.

Achieving the end goal to generate more precise and meaning full reports in Power-BI with new data hub created in snowflake data ware house.

Collectively worked in support engagement in informatica and SAP BODS.

Created python script for data validation to verify existing XML and Json file for navigator.

Ensure compliance with data governance and regulatory requirements.

Work closely as a data engineers, data scientists, and business stakeholders to understand data requirements and cross question for more in-depth clarification.

Provide technical leadership and mentorship to team members.

Worked on onshore off shore model Lead from onshore to project planning clients connect, task allocation, and progress tracking. Diagnose and resolve issues related to data warehousing and ETL processes.

Provide technical support and guidance to end-users and stakeholders.

Extensively involved in creating PL/SQL Stored Procedures, Functions, Packages, Triggers, Cursors, and Indexes with Query optimizations as part of ETL Development process.

Project: ETL Managed Services Period: Jul’22-Apr’24

Environment: Informatica, Informatica IICS/IDMC, SSIS, SQL Server, Azure SQL Server, HIVE-HQL, Python, Tidal

Role: ETL Lead Customer: Geisinger Health

Domain: Health Care Location: Danville PA USA

Description: Geisinger is one of the leading hospitals in health care and plan in Pennsylvania. CMS/Behavioral Health, HEDIS is the regulatory body responsible for managing hospital KPI metrics and performance. Therefore, we need to share key points with CMS/HEDIS/EPIC to evaluate hospital compliance and the type of services or delivery hospitals are providing.

Responsibilities:

Led data integration efforts for the GHP Project, focusing on multiple PBI (Project Backlog Items), primarily managing Facets/Medicaid/Medicare/CHIP data. Responsible for generating Member/Medical/Dental/Cotiviti and RX Claim and Eligibility files.

The CMS Interoperability and Patient Access Rule was introduced by the Centers for Medicare & Medicaid Services (CMS) to improve patient access to health information and facilitate better data sharing across healthcare systems.

Medicare Advantage, Medicaid, CHIP, and Qualified Health Plan (QHP) issuers must provide patient data access through APIs using the HL7, FHIR (Fast Healthcare Interoperability Resources) standard

OBERD – Outcomes Based Electronic Research Database – Pulled the data from EPIC Cadence system (Surgical Scheduling Application) and feed data to OBERD application via Rhapsody interface engine.

Worked as a ETL lead for the upgrade of EPIC Clarity to facilitate successful and seamless migration.

Electronic Data Interchange (EDI) system used in health care to exchange information provider and payers.

EDI 837 Healthcare Claim Submission, EDI 835 Payment & Remittance Advice, EDI 277 – Claim Status Response, EDI 997 – Functional Acknowledgment,

As this is an integration and migration engagement in which we built the Informatica mappings was code in Bigdata.

Developed mapping documents to define logic for implementation.

Experienced in data modeling to design dimensions, facts, and ensure data integrity.

Created simple to Informatica mappings using different transformations available in Informatica PowerCenter 10.1/9.5.1 (Source Qualifier, Expression, Lookups (Connected/Unconnected), Joiner, Aggregator, Union, Normalizer, Update Strategy, Filter, Router, Sequence, etc.,)

Created workflows in Informatica PowerCenter using Parameter Files, Session Task, Email Task, Command Task, Decision Task, etc.

As a Migration involved in setup the environment with admins and build the road map cloud data integration.

Experience working with Informatica IICS/IDMC tool effectively using it for Data Integration and Data Migration from multiple source systems in Data warehouse.

Worked on data quality and data profiling in and create a score card.

Challenges to convert the HIVE-HQL to ANSI SQL-Server code with best performance.

Load Multiple sources data into same table maintaining the data integrity and referential integrity.

Maintain restart ability and make sure avoiding the data duplicity in incremental and history load.

Migrated Power center code in Informatica IICS/IDMC and able to configure all the parameter with the help informatica corporation and test both the code and data.

Using Python convert the existing Unix Shell scripting since the code is migrated in Informatica IICS/IDMC Cloud.

Worked on Replication and synchronization task in Informatica IICS/IDMC for data replication and data sync.

Support/Development existing SSIS code for Amisys and converting code in Informatica.

Convection of SSIS Package with enhancement to informatica

Set the Major dependencies in Tidal.

Project: NexGen Period: Jun’21 – Jun’22

Environment: DBT, Oracle, HVR, Snowflake, Tidal

Role: Lead BI Engineer Customer: General Electric

Domain: Aviation Location: Bangalore India

Description: GE was not able to drive operational decision due to multiple operational definitions with no Cross-functionality reporting & lack of drill down facility to its finance users. This implementation of BI platform NexGen would provide a near real time consistent global view & true analysis of operational & financial data to support business drivers & save manual efforts. NexGen Provides more accurate reporting based on one single table as opposed in earlier Guide.

Responsibilities:

Support decision-making processes that align with organizational goals. Provide guidance to project managers, sharing insights and best practices to build project management competencies across the team.

With DBT Macros, I create reusable components, seamlessly integrating them across the Data Pipeline for streamlined operations.

Redesigned the Views in snowflake to increase the performance.

Code detailed CTE to write complex SQL queries and model data efficiently using DBT.

HVR is used for data replication to ingest data from Oracle to Snowflake.

Worked on HVR Admin talk like Version update to source HVR-HUB and Target Database.

Have the solution to build HVR-refresh the job with larger volume data without failure.

Proficient is create different channels in HVR and setup the HVR connection.

Managed multiple channels in HVR and handled more than 2200 Tables replication

Have clear idea about the tables Migration from DEV-HVR to Prod-HVR.

Better channel Management in HVR for Business important tables to high volume data.

Have experience in creating HVR Capture and Integration to schedule as per business needs.

The lineage diagram provided by DBT proves invaluable in constructing a comprehensive data model pipeline, enhancing visualization and understanding of data flow.

Delving into the intricacies of the existing business model, I've adeptly worked on Seeds, Models, Snapshots, analyses, and Macros, tailoring them to specific requirements.

Bringing hands-on expertise to the table, I've successfully implemented cloud data warehouses and data mart using Snowflake, ensuring seamless and efficient operations.

Worked on PLSQL to create procedure to trigger multiple view load to physical tables using snowflake task

Extensively involved in creating PL/SQL Stored Procedures, Functions, Packages, Triggers, Cursors, and Indexes with Query optimizations as part of ETL Development process.

In a pivotal role, I've been actively involved in designing and developing data warehouse and data mart and analytics solutions using a diverse range of techniques, showcasing versatility in approach.

Possessing a deep understanding of data architecture design, data modeling, and physical database design and tuning, I've proven my success in optimizing data structures for optimal performance.

Project: Enterprise data platform services Period: Apr’18 – Jun’21

Environment: Informatica, Informatica IICS, HIVE, SQL Server (SSMS), Oracle, Control-M

Role: Lead BI Engineer Customer: Cox Communication

Domain: Telecom Location: Bangalore India

Description: COX Communications is the largest private broadband company in America, providing advanced digital video, Internet, telephone and home security and automation services over its own nationwide IP network. Cox serves more than 6.5 million residences and businesses across 18 states.

Responsibilities:

As a part of production support engagement project 830+ workflow is scheduled using Informatica PC. 24/7/365 support.

Created a knowledge base and understand internal and external dependencies and their fixes.

Supervised end-to-end project lifecycles, from initiation to monitoring, control, and closure.

Led planning, estimation, and scheduling efforts, ensuring information updates for all stakeholders.

Integrated change control processes, controlled baselines, and implemented risk responses and contingency planning.

Successfully executed project plans within preset budgets and deadlines.

Demonstrated success in leading a team of 27 members, serving as the primary point of contact for ongoing support.

Experienced in data modeling to design dimensions, facts, and ensure data integrity.

Developed mapping documents to define logic for implementation.

Ensured the adequacy of the project delivery team structure and enforced compliance with best practices and technical direction.

Project is tracked on Agile methodology on 2 weeks of Sprints.

Project planning involves estimating several characteristics of a project and then plan the project activities based on these estimations.

Migrate Power center code in Informatica IICS and test both code for data lineage and integrity.

Widely used Replication and synchronization task in Informatica IICS.

Assemble a qualified project team, establishing your role and authority as the project manager.

Build consensus among stakeholders, ensuring alignment with the project's scope, objectives, mandates, timeline, budget, and requirements.

Contribute to the performance evaluations and goal setting for team members involved in assigned projects.

Support decision-making processes that align with organizational goals.

Provide guidance to less experienced project managers, sharing insights and best practices to build project management competencies across the team.

Designed, Developed and Implemented ETL processes using Informatica IICS Data integration.

Created IICS connections using various cloud connectors in IICS administrator for Testing file issues.

Tidal job scheduling is with time variant and input and output job dependencies.

Designed, Developed and Implemented ETL processes using Informatica IICS Data integration.

Built a data integration platform for heterogeneous sources, including ORC, Parquet, XML, and RDBMS.

Led migration projects, transferring data from data warehouses on Hive and SQL Server.

Applied parsing techniques for structured and unstructured files using Informatica PowerCenter.

Orchestrated on loading data into Hive for data lake implementation.

Orchestrated on BDQ tickets open the end customer and resolve the open items.

Project: BHGE Application Migration Period: Nov’16 – Apr’18

Environment: Informatica, Azure Data factory, Snowflake, Oracle, SQL Server

Role: Lead BI Engineer Customer: General Electric

Domain: Oil and Gas Location: Bangalore India

Description: BHGE has a large strategic synergy initiative to move Analytics applications from legacy on premise platforms to the new Data Lake, hosted in SQL Server. The strategic Data integration technology. The purpose of the project is move to one data lake from many today and drive a common technology stack to improve operations, reduce costs, and improve uptime for users

Responsibilities:

Support all stages of our product’s life cycle, including analyzing requirements, estimating and planning tests, creating test cases, setting up traceability, and running tests.

Have knowledge and experience with Azure Cloud and Azure Data Factory (ADF).

Proficient in designing and implementing end-to-end (E2E) data testing strategies and building automated testing frameworks.

Familiar with databases like SQL Server, Snowflake and able to write complex queries and stored procedures.

Collaborate with project teams, developers, operations, and infrastructure teams to resolve data issues.

Experienced in managing the defect life cycle, including logging, replicating, prioritizing, verifying, closing issues, and reporting metrics.

Able to gather large amounts of test data needed for test execution.

Able to handle multiple projects at once and work across offshore and onsite teams.

Shows strong character, builds good relationships, balances work and personal life, takes responsibility for self-improvement, and demonstrates leadership qualities like motivation, inspiration, passion, and trust.

Supports and cares for team members, builds effective teams, helps people reach their potential, values diversity, and gives honest, helpful feedback.

Questions the usual way of doing things, embraces new technology, suggests creative ideas, supports and implements process improvements, listens to others’ ideas, and chooses the best ones.

Used proficiency of Informatica and IDQ know experience and knowledge for product development and delivery.

In Azure Data factory worked on multiple Input/output Transformer like Join, Condition Split, Exists, Union, Lookup, Schema Modifier: Derived, Select, Aggregate, Surrogate key

Row Filter: Filter Row, Sort, Alter Row, Assert Destination: Sink

Snow SQL, Snowflake TASK, CDC using streams, Tasks, Time travel, Data sharing, Zero-copy cloning, and Materialized views.

Informatica power center Designer, Workflow Manager, Workflow Monitor and Repository Manager.

Used Informatica Power Center to migrate the data from different source systems.

Project: OFS Billing Period: Feb’14 – Nov’16

Environment: SAP, SAP BODS, HVR, Oracle, Vector wise and Tableau

Role: BI Engineer Customer: General Electric

Domain: Energy Location: Bangalore India

Description: General Electric Company, doing business as GE Aerospace is an American aircraft engine supplier that is headquartered in Even dale, Ohio, outside Cincinnati. It is the legal successor to the original General Electric Company. The division operated under the name of General Electric Aircraft Engines (GEAE) until September 2005.

Responsibilities:

As a project lead, I analyzed and designed the technical design document and discussed it with the project management and business manager.

Proactively managed quality and delivery in SAP-BODS. By identifying and fixing issues even before they are reported it's a thing of joy to see the bug inflow go down under my watch.

Prepare implementing artifacts - good review/ unit test case documents developer release notes.

analysis of different defects in QA and resolve them on immediate effect.

improving the turnaround time and accuracy of bug fix to streamline the project.

But as a project leader in iGATE Global Solutions and lead the development team to achieve the project goals.

worked on complex performance Tuning bugs Race by customer and fix them efficiently by modifying the SQL query and SQL optimization techniques

As per past experience in debugging execution error using data services log (Trace Statistics and Error).

Used the look up EXT functions to derive the column in BODS ETL by lookup the value in lookup table.

Experience in Case, SQL, History Preserve, Merge, Validation, Query, Transformation

Communicated Code Review/Migration team to find the issues and solutions.

Experience of Data Flow and Work flow.

HVR is used for data replication to ingest data from Oracle to Vector wise.

Worked on HVR Admin talk like Version update to source HVR-HUB and Target Database.

Have the solution to build HVR-refresh the job with larger volume data without failure.

Proficient is create different channels in HVR and setup the HVR connection.

Managed multiple channels in HVR and handled more than 2200 Tables replication

Have clear idea about the tables Migration from DEV-HVR to Prod-HVR.

Better channel Management in HVR for Business important tables to high volume data

Ensured data consistencies is being maintain in each layer of Data.

Assisted Project Managers in establishing plans, risk assessments and milestone deliverables

Designed data models using Oracle Designer; designed programs for data extraction and loading into Oracle database

Experience with SAP BODS transforms such as Query, Validation, Merge, Case transforms.

Developed complex reports using multiple data providers, user defined objects, aggregate aware objects, charts, and synchronized queries.

Job scheduling from management console

Target Corporation

Project

Responsibility

Technology Used

To - From

EGP

ETL Lead