SANAP SURAJ
E-Mail: *****.*****@*******.***
Contact: +91-888*******
Looking for a challenging position having ample opportunities in Software Engineering / Data Warehousing Operations with a growth oriented organization documentation, application Design, Development, Testing.
Good communication and analytical skills, self-motivated, ability to conceptualize, document and rank, router, aggregator, joiner, lookup, sequence generator, and update strategy.
Good understanding of Data warehousing, ETL concepts and Dimensional modeling (Star schema and Snowflake schema)
Experience in implementing SCD, CDC and performance tuning techniques.
Comfortable in interacting with people across hierarchical levels for smooth project execution. Cloud Technologies : DataBricks, Airflow, Trino,
Tools : Jira, Git, CICD Pipelines, ALM, IICS, Informatica 9.6.1/10.2. Data Visualization : Power BI
Languages : SQL and PL/SQL
RDBMS : Oracle 11g
Operating Systems : Windows, UNIX
Scheduling Tools : Airflow, Informatica Scheduler.
Bachelor of Science in Information Technology (B.Sc(IT)) from Dr. BAMU University.
Higher Secondary School (HSC) from Maharashtra State Board with 66.67%.
Secondary School Certificate (SSC) from Maharashtra State Board with 75.38%. maintenance of ETL’s on Oracle, Informatica and Databricks.
Specializing in building and optimizing data pipelines on the Databricks platform.
Good understanding of the SDLC with proficiency in mapping business requirements, technical communicate project ideas and plans.
Skilled in design, development of mappings from varied transformation logic using filter, expression,
Offering over 3.8 years of experience in Data Warehousing encompassing development, testing and
Currently Working as Software Engineer with Cognizant Technologies since Dec 2021. Project #1:
Client : Santhosh Thangappan, Rahway, US.
Project Name : Merck.
Duration : DEC-2021 to Till Date.
Role : Data Engineer, Data Analyst, ETL Developer. Environment : Data Bricks, Power BI, Trino, Airflow, IICS, Informatica 9.X, 10.2, Oracle 11g, UNIX. Description:
The Enterprise shared area (ESA) is an enterprise-wide data repository spanning data sources and divisions. Applications receive their data via data subscription. ESA comprises of a number of source system spanning HR, Finance, GWES, learning. The largest data source with the most subscription is Work day, which holds all person data across Merck.
The ESA modernization program aims to enhance our data infrastructure/technology by transitioning to Data Lakehouse (Databricks/Trino), and to revolutionize data sharing and governance by adapting Merck discover Products (Discover marketplace, EDAC and Collibra DAC). Responsibilities:
● Responsible for analyzing data and developing/understanding ETL specifications, ensuring smooth data flow between systems using Informatica, Databricks, and other tools.
● Monitored and managed Airflow,Informatica workflow jobs, ensuring seamless data transfer between multiple systems.
● Developed and optimized data pipelines and Spark jobs for large-scale data processing and transformation in Databricks. Worked with Delta Lake for efficient data storage and management.
● Integrated ETL outputs into Power BI for data visualization and reporting. Developed dashboards, reports, and data models to support business intelligence needs.
● Orchestrated and automated ETL workflows using Apache Airflow, ensuring robust scheduling, error handling, and monitoring for data pipeline jobs.
● Leveraged Trino for fast querying and federated data access across multiple data sources. Optimized queries and managed data retrieval tasks for analytics and reporting purposes.
● Responded to Airflow, Informatica job failures by analyzing root causes and resolving issues according to standard processes and best practices.
● Designed and developed mappings (Dimensions, Soft Deletes, etc.) and Mapplets to extract data from sources like Oracle, Flat Files, and other databases.
● Implemented Type I & II changes in Slowly Changing Dimension (SCD) tables using Informatica.
● Applied mapping parameters, variables, and tasks to enhance ETL flexibility.
● Created reusable transformations like Expression, Lookup, Filter, Update Strategy, Joiner, Sequence Generator, Aggregator, Sorter, Normalizer, etc., for the design and development of complex business rules.
● Developed workflows, worklets, and sessions within Informatica to automate data pipeline processes.
● Performed performance tuning for Informatica mappings, transformations, sessions, and workflows, ensuring optimal ETL processing time.
● Conducted testing and validation of Informatica mappings, debugging session logs, and resolving any ETL issues.
● Worked on the creation of parameter files, relational database connections, and scheduled jobs in Informatica.
● Utilized Informatica Repository Manager for validation, export, and import of repository objects across development, test, and production environments.
● Coordinated with the BI Reporting team to address data discrepancies by analyzing ETL queries and ensuring accurate reporting.
● Identified, researched, and resolved root causes of ETL production issues in real-time.
● Managed the daily operations of ETL batches using run books (checklists) and ensured smooth data processing.
Personal Information:
Date of Birth : 22-June-1993
Current Address : Flat no 5, 3rd Floor, Sachin Apartment, Survey No. 5, Erandavana Gaothan, Near Achank Mitra Mandal Pune 411004