Data Engineer Warehouse

Location:

Columbus, OH

Posted:

April 23, 2025

Contact this candidate

Resume:

Madhusudhan Reddy Madhire

816-***-****

*********@***.***

PROFESSIONAL SUMMARY

Over 13 years of IT experience in System Analysis, Design Development, Implementation and testing of Databases, Data Warehouse Applications on client server technologies in Pharma, Health care, Banking and Finance Domains.

Experience in Extraction, Transformation and Loading (ETL) of Data into Data Warehouse using Informatica Power Center 9.x versions. Designed the Workflows, Mappings, creation of Sessions and scheduling them using Informatica Power Center 9.x.

Experienced Senior Data Engineer with a strong background in building scalable cloud-native data platforms, automating model deployment pipelines, and enabling advanced analytics.

Proven ability to collaborate with Data Scientists and cross-functional teams to develop end-to-end ML data pipelines, infrastructure for model training, and production-grade workflows.

Skilled in AWS, Python, Spark, Snowflake, and modern DevOps practices using CI/CD tools like Jenkins and ADO.

Experience in designing, developing, and managing robust ETL/ELT pipelines using Informatica IICS, SQL, Python, and Shell scripting.

Proven expertise in working across cloud data platforms like Snowflake and AWS S3, integrating data from APIs, and orchestrating builds through DevOps tools like Jenkins and Bitbucket.

Experience in Designing and developing the complex mappings from various Transformations like Source Qualifier, Joiner, Aggregator, Router, Filter, Expression, Lookup, Sequence Generator, Java Transformation, Update Strategy, XML Transformations and Web Services.

Experience with relational and dimensional data modeling, fact constellations and snowflake schema.

Hands on experience on PRD (Process Requirement Document) in SDLC.

Hands on experience in working with Hadoop ecosystem components like HDFS, Mapreduce programming, Hive, Pig, Sqoop, Hbase, Impala,kafka and spark.

Experience in usage of Hadoop distribution like cloudera 5.3 and Amazon AWS.

Adept at driving end-to-end data delivery for enterprise-level clients in dynamic, fast-paced environments.

Experienced in writing Mapreduce programs in java to process large datasets using map and Reduce tasks.

Experience in using spark sql with various data sources like json, parquet and hive.

Experience in transferring data from RDBMS to HDFS and Hive table using Sqoop.

Designing logical and physical databases using Erwin and developed data Mappings between source systems and target components using Mapping Design Document.

Experienced in understanding the components of Informatica Data Quality (IDQ).

Involved in creating complex physical/logical data objects and creation of Profiles and Scorecards.

Experience in Debugging sessions and mappings; Performance Tuning of the Sessions and mappings, implementing the complex business rules, optimizing the mappings.

Implemented data warehousing techniques for Data cleansing, Slowly Changing Dimension Phenomenon’s (SCD) and Change Data Capture (CDC).

Experience in creating, deploying and troubleshooting SSIS packages.

Experience in managing OLAP Database warehouse by using services like Analysis Services (SSAS).

Experience in creating, automating, scheduling and monitoring SQL jobs with SQL Server Agent.

Creating User Permissions, Logins and Maintaining DB Security Issues.

Extensive development, support and maintenance experience working in all phases of the Software Development Life Cycle (SDLC) especially in Data warehousing.

Hands on experience with Alteryx Designer and Alteryx server.

Used Alteryx tool to build for Data visualization and Data science.

Good understanding and working experience in Logical and Physical data models that capture current/future state data elements and data flows using Erwin.

Experience in working in an Agile Environment.

Experience in production Support in Technical and Performance issues.

Hands on knowledge in Python and developed test frame works using Python.

Basic understanding and knowledge in Snow flake.

KEY SKILLS

CORE COMPETENCIES

Languages & Libraries: Python, Pandas, PySpark, SQL, Dask

Cloud Platforms: AWS (S3, EC2, Lambda), Azure (optional)

Databases: PostgreSQL, Snowflake, NoSQL (MongoDB, DynamoDB)

Data Engineering: Apache Spark, ETL/ELT Pipelines, AWS Glue

ML & MLOps: Model packaging, deployment, inference pipelines

DevOps & Orchestration: Jenkins, Azure DevOps, Docker, Terraform

Tools: Git, Jupyter, Databricks, Airflow (or similar orchestration tools)

Architecture: Object stores, APIs, streaming sources, batch/real-time integration

TECHNICAL ENVIRONMENT:

Tools: IICS, Jenkins, Bitbucket, Git, Informatica PowerCenter, JIRA

Languages: SQL, Python, Shell

Cloud: AWS S3, Snowflake, Azure (basic), Lambda

Databases: Snowflake, SQL Server, Oracle

PROFESSIONAL EXPERIENCE

Client: U.S. Bancorp, OH Oct 2023 – Present

Role: Sr. Data Engineer

Description: U.S. Bancorp is an American multinational financial service and is one of the Major financial banking throughout USA and the goal of this project is to modernize legacy systems, enhance customer experience and improve operational efficiency. This bank intends to use the cloud services as Microsoft Azure which is widely adopted in the banking sector due to its scalability, security, regulatory compliance, and AI-driven analytics.

Responsibilities:

Responsible for designing, implementing, and optimizing big data solutions on the Azure Databricks platform and this is crucial for data engineering, analytics, machine learning, and ETL processes.

Designed, developed, and maintained scalable and efficient data pipelines using Azure Data Factory (ADF), Azure Databricks, and Azure Stream Analytics.

Engineered data ingestion workflows from diverse sources into Azure Blob Storage, Azure Data Lake Storage Gen2, and Azure SQL DB, ensuring consistency, quality, and SLA adherence.

Built and maintained FHIR-compliant data feeds using Azure Data Factory for healthcare interoperability.

Partnered with Data Scientists to develop and deploy production-grade machine learning models, ensuring seamless data integration and efficient access to training data.

Designed and implemented scalable data pipelines in AWS using Python, Spark, and Pandas to process structured and semi-structured data from cloud storage, relational DBs, and external APIs.

Built infrastructure to support model training workloads, integrating S3, EC2, and containerized environments using Docker.

Created reusable data ingestion frameworks to streamline processing from RDBMS, NoSQL, and streaming data sources.

Packaged ML models using Python, built deployment workflows with CI/CD pipelines in Jenkins and Azure DevOps, enabling automated testing and production rollout.

Led collaborative requirements reviews with Data Scientists and Application teams to ensure design alignment with business and technical needs.

Utilized Terraform for infrastructure provisioning and version control in AWS environments.

Developed orchestration logic for model scoring pipelines using Airflow and Jenkins.

Client: State of Ohio, OH Oct 2021 – Sep 2023

Role: Sr. ETL Developer

Description: The Ohio Bureau of workers compensation is the exclusive provider of workers compensation insurance in Ohio and serves majority of public and private employers. The goal of the project is to deliver consistently excellent experiences for each BWC customer every day by ensuring the IT plans are on track with current technologies and following the state standards.

Responsibilities:

Working with Business analyst and potentially business customers to verify the high-level design requirements.

Following design documents involving source and destination databases.

Interpreting design documents to understand mapping and transformational requirements.

Designed and implemented end-to-end ETL pipelines in IICS to support large-scale data movement between SQL Server, Snowflake, and AWS S3.

Built robust, modular ETL/ELT pipelines for ingestion into data lakes and warehouses (Snowflake, Redshift).

Delivered solutions that integrated external APIs, flat files, and object stores into centralized analytics platforms.

Improved data processing performance by 35% by refactoring transformation logic using PySpark and optimizing cluster configurations.

Assisted in migrating batch and streaming workflows from on-prem Hadoop to AWS S3 and Snowflake with performance monitoring via AWS CloudWatch and Datadog.

Supported BI and analytics teams by developing reusable datasets for reporting and dashboards.

Measuring and tuning the performance of batch systems in a windows-based environment.

Using organizational tools and methods to perform day to day operations in the project.

Client: Pennstate Health Sep 2020 – Oct 2021

Role: Informatica/IDMC Developer

Description: Penn State Health, being a Major multi-hospital health system serving patients and communities across central Pennsylvania. The scope of the project is to convert the archaic jobs from Datastage into Informatica PowerCenter and exporting the jobs to Informatica cloud, IDMC and breaking down the complex sql into smaller chunks of code.

Responsibilities:

Working in a fast-paced environment, under minimal supervision providing technical guidance to the Analytics team.

Responsible for DB schema design, integration testing and other project that are necessary to help the team achieve the project goals.

Implemented Performance tuning logic on all the critical mappings wherever there is a bottleneck involved.

Developed batch and real-time ETL workflows using Informatica and Shell scripting to process healthcare data from various source systems.

Migrated legacy ETL logic from on-prem SQL Server to Snowflake, reengineering queries for cloud-native performance.

Designed data lake ingestion frameworks using AWS S3, Lambda triggers, and Glue (supporting team-wide POCs).

Wrote complex SQL queries for profiling, validation, and reporting logic; supported regulatory data submission processes.

Supported Jenkins jobs for nightly refreshes and monitored job health with alerting tools.

Reviewed and analyzed functional requirements, mapping documents, problem solving and trouble shooting.

Performed unit testing at various levels of the ETL and actively involved in team code reviews.

Identified problems in existing production data and developed one-time scripts to correct them.

Fixed the invalid mappings and troubleshoot the technical problems of the database.

Client: Mount Carmel, OH Mar 2019 – Sep 2020

Role: Informatica cloud- IDMC Developer

Description: The goal of the project here with Mount Carmel is to collocate all the DataMart related to Claims, medical claims and Pharmacy data into one and build an Enterprise data warehouse called MDH (Medigold Data Hub) to get factual information and run reports through Tableau and Business objects.

Responsibilities:

Developed and maintained Data Extraction, Transformation and Loading of mappings using Informatica Designer 10.2 to extract the data from multiple source systems that comprise databases like Teradata, SQL Server, flat files to the Staging area and then into Enterprise Data warehouse.

Involved in a full Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementation, and deployment to business users.

Worked on data cleansing and normalization processes using SQL and Informatica PowerCenter.

Assisted in building dashboards powered by transformed data from healthcare, finance, and CRM systems.

Created scripts to automate schema comparisons and perform ad hoc data exports to S3.

structures for transformation of data sources into data warehouses.

Developed and tested the shared objects in Informatica for source/target/lookup transformations, developed complex mappings, sessions/workflows/worklets, database connections.

Worked with normal forms including first, second and third normal forms and maintained historical data.

Generated queries using SQL to check for validity of the data in the tables and updated the tables according to the Business requirements document.

Resolved troubleshooting problems relating to ETL applications and data issues.

Client: OhioHealth, OH Feb 2018 – Mar 2019

Role: ETL/IICS/IDMC Developer

Description: The Purpose of the project is to sunset legacy applications due to migration to CareConnect. This project addresses accessibility of this data and long-term maintenance of the data through the implementation of a single, standardized, secure, and compliant enterprise data archiving solution. In essence, it replaces the need to maintain each of legacy systems found to be in scope and the support personnel for the next 22 years.

Responsibilities:

Extracted the data from legacy systems such as Mainframe tables, Netezza, Flat files and IBM DB2.

Used Informatica as ETL tool to design and develop parallel jobs to process more than 40 million records on weekly basis.

Designed and developed DataStage extraction, transformation, and loading of jobs from three source systems into the data warehouse (Oracle tables, SQL server and Mainframe DB).

Extracted the raw data from sql server to staging tables using Informatica Cloud.

Developed Cloud mappings to extract the data for different regions and hospitals.

Refreshing the mappings for any changes/additions to CRM source attributes.

Developed the audit activity for all the cloud mappings.

Automated/Scheduled the cloud jobs to run daily with email notifications for any failures.

Generated automated stats with the staging loads comparing with the present day counts to the previous day counts.

Created File watcher jobs to setup the dependency between Cloud and PowerCenter jobs.

Created and developed mappings to load the data from staging tables to EDW DataMart tables based on Source to Staging mapping design document.

Responsible to Build ETL code, prepare Unite test case scenarios, Functional testing, Cross product Integration Testing, EDU/UAT testing, Production deployment, Production support of Current ETL components and future deployments.

Responsible for the ongoing operational stability of the ETL processes ensuring they are properly monitored and audited to provide data integrity and timeliness of delivery.

Extensively worked on building DataStage jobs using various stages like Oracle Connector, Funnel, Transformer stage, Sequential file stage, Look Up, Join and Peek Stages.

Used Alteryx for XML comparison for DEV, QA, EDU environments.

Worked on upgrading DataStage jobs from v9.1 to v11.5

Client: State of Nebraska, NE Mar 2017 – Feb 2018

Role: ETL/MDM Developer

Description: The State of Nebraska decides to implement a new configurable, flexible and integrated Medicaid Eligibility and Enrollment solution that would be based on Commercial Off-The-Shelf (COTS) software called NTRAC (Nebraska Timely, Responsive, Accurate and Customer Service) which will meet the CMS Seven Standards and Condition. The State of Nebraska’s requirement for Enterprise Data Management is to have a mixture of OLTP, Operational Data Store (ODS), Data Warehouse, and Data Mart technologies that support transaction-processing systems, information integration and reporting.

Responsibilities:

Involved in requirements gathering for BI projects by interacting with business users and upstream data producers.

Created source to target mapping sheet to migrate the Medicaid eligibility and enrollment data to a new system.

Have been involved in Data migration of Medicaid eligibility and enrollment data from existing system to a new system called IBM Curam that is completely different data model.

Used the Secure file transfer protocol (SFTP) to connect to external Servers to get the files or send the files to Master Data Management (MDM).

Extensively worked on building DataStage jobs using various stages like Funnel, Transformer stage, Sequential file stage, Lookup, Join and Peek Stages.

Have worked on Data Quality (DQ) jobs in both Informatica Developer by creating Data Rule Definition and Data Rule Set Definition.

Have been involved in automation of entering business glossary to Information Governance Catalog (IGC).

Design, develop and test ETL applications consistent with application architecture guidelines.

Review detailed test cases and provide technical support during all phases of testing in UAT and implementation.

Involve in production implementation activities and validate the production system after implementation until system stabilizes.

Client: Nationwide, OH Jul 2016 – Feb 2017

Role: Informatica Developer

Description: The goal of the project in Nationwide is to have a unique number built for the entire Enterprise Data Warehouse other than the social security number for all its users and vendors for all the applications. The project was intended for migrating data from the source system to company’s data warehouse. The data was made available through Teradata, Netezza and from other sources while the target being Oracle database.

Responsibilities:

Involved in designing Logical and Physical Modeling using Erwin Data Modeler tool.

Involved in the Development of Informatica mappings and mapplets and tuned them for Optimum performance, Dependencies and Batch Design.

Implementing Extraction process of flat files into target database

Migrated Mappings from Development to Testing and Testing to Production.

Deploying SSIS package to SQL server and automated the package.

Loading data from different views and created data mappings to load the data from the views into a single table.

Used Parameter Files for multiple DB connections for the sources and defining variable values.

Developing the documentation and plan for transformation and loading the Warehouse.

Analyzing the data from various sources for data quality issues.

Participated in all facets of the software development cycle including providing input on requirement specifications, high-level design documents, and user’s guides.

Writing Oracle Stored Procedures, SQL scripts and calling at pre and post session.

Created Concurrent Tasks in the workflows using workflow manager monitor jobs in Workflow Monitor.

Controlled version documents using PVCS.

Designing mapping templates to specify high level approach.

Involved in Unit testing and documentation and production defects were saved in HP Quality center.

Client: First Hawaiian Bank, OH Oct 2014 – Jul 2016

Role: ETL/IDQ Developer

Description: FHB is a commercial bank, which is a subsidiary of Bancwest corporation, which itself is a subsidiary of French banking company, BNP Paribas. FHB has disparate data marts (including CSD, Search space (Anti Money laundering System), Risk Management System) solutions on different technology platform. The bank intend to collocate and integrate all disparate “data marts” by designing and building Enterprise Data Warehouse (EDW) to consolidate and streamline data flows, de-complex the environment, reduce cost, improve data management and data governance, and introduce business intelligence and analytics in CCAR Project

Responsibilities:

Designed and developed ETL processes based on business rules using Informatica Power center.

Developed mappings in Informatica Power Center to load the data from various sources using transformations like Source Qualifier, Expression, Lookup (connected and unconnected), Aggregator, Update Strategy, Filter, Router etc.

Implemented CCAR Methodology which is the most severe, comprehensive and rigorous tests conducted by most of the US banks.

Solid experience with Oracle, PL/ SQL, Informatica, XML, Erwin, Visio, Unix/Linux Shell Scripting, Autosys and Quality Center.

Extensively worked on performance tuning by identifying the bottlenecks in Sources, Targets, Mappings and Transformations and enhanced Performance for Informatica session using large data files by using partitions.

Created Health check queries for all the audit fields and SSN fields in the stage, data store and EDW applications.

Experienced in working with Informatica IDQ 9.51. HF3 and have a complete understanding of IDQ Components.

Responsible for the development of Complex data quality rules and index design, development and implementation patterns with cleansing, parsing, standardization, validation, scorecard, exception, notification and reporting with ETL tool.

Extracted data from various sources like SQL Server, Oracle, .CSV, Excel and Text file from Client servers through FTP.

Created various multi-dimensional cubes (SSAS), processed facts, dimensions and populated data for front end users.

Created Physical data objects, logical data objects in Data Quality in Informatica Developer.

Created Profiles and Scorecards in IDQ and provided the Scorecards (pictorial representation) in the Informatica Analyst tool.

Have used Informatica Metadata Manager, which is a Power Center web application used to browse, analyze, and manage metadata from disparate metadata Repositories.

Experienced in working with Metadata Manager, which uses Power Center workflows to extract metadata from metadata sources and load it into a centralized metadata warehouse.

Created and configured the Run data lineage, performed data profiling on the metadata, sharing metadata and report extraction in the Metadata Manager

Created Business Terms and categories through Business Glossary desktop and results were successfully displayed in Analyst tool.

Client: Independent Health, NY Jan 2014 – Sep 2014

Role: Informatica Developer

Description: Independent Health is one of the largest healthcare organizations providing health care services. The goal of the project is to analyze operational data sources, define the data warehouse schema and develop ETL processes for the creation, administration and maintenance and overall support of the Data warehouse

Responsibilities:

Designed and developed ETL processes based on business rules using Informatica Power center.

Responsible for identifying reusable logic to build several Mapplets, which would be used in several mappings.

Created mappings to extract and de-normalize (flatten) data from XML files using multiple joiners with Informatica Power Center.

Worked on performance tuning by identifying the bottlenecks in Sources, Targets, Mappings and Transformations and enhanced Performance for Informatica session using large data files by using partitions.

Created the transformation routines to transform and load the data.

Worked closely with the end users and Business Analysts to understand the business and develop the transformation logic to be used in Informatica Power Center.

Was part of scripting team for shell script to automate and migrate data from ODS to data warehouse.

Written UNIX Korn shell scripts along with Control M for scheduling the sessions and workflows.

Informatica cloud platform allows the partners to build complex integrations and running them as custom Informatica cloud services.

Developed Test Plans and written Test Cases to cover overall quality assurance testing.

Used global temporary tables and volatile temporary tables in Teradata and used Fload, Mload and error tables from Teradata.

Upgraded SQL Servers from SQL 2008 to SQL 2008R2 version.

Migrated databases from SQL Server 2008 to SQL Server 2008R2 by performing in-place and side-by-side migration of existing critical databases.

Client: CS Medical Center, TX Jan 2013 –Dec 2013

Role: Informatica Developer

CS Medical Center operates as a hospital that provides outpatient, inpatient care, biomedical research, graduate and undergraduate medical education, and community services. The goal of the project was tracking history of outpatient, inpatient and drug history, Insurance and their billing methods.

Responsibilities:

Collaborated with Business analysts for requirements gathering, analysis, ETL design and development for extracting data from the heterogeneous source systems like MS SQL Server, Oracle, flat files, XML files and loading into Staging and Data Ware House Star Schema.

Extensively used ERWIN for Logical / Physical data modeling and Dimensional data modeling, and designed Star and Snowflake schemas.

Maintained warehouse metadata, naming standards and warehouse standards for future application development.

Extensively used Informatica client tools Source Analyzer, Warehouse designer, Mapping Designer, Mapplet Designer, Transformation Developer, Informatica Repository Manager and Informatica Workflow Manager.

Involved in massive data cleansing prior to data staging.

Designed and developed ETL routines, using Informatica Power Center within the Informatica Mappings, usage of Lookups, Aggregator, Java, XML, Ranking, Mapplets, connected and unconnected stored procedures / functions / Lookups, SQL overrides usage in Lookups and source filter usage in Source qualifiers and data flow management into multiple targets using Routers were extensively done.

Created complex mappings with shared objects/Reusable Transformations/Mapplets using mapping/Mapplet Parameters/Variables.

Designed and developed the logic for handling slowly changing dimension table’s load by flagging the record using update strategy for populating the desired.

Extensively used pmcmd commands on command prompt and executed Unix Shell scripts to automate workflows and to populate parameter files.

Worked with UNIX shell scripts extensively for job execution and automation.

Experience in writing several SQL Scripts for finding long running queries and resolving blocked sessions, archiving the old and unwanted data from production server to archive server.

Experience in creating clustered index, non-clustered index, column store indexes and managing them based on fragmentation levels.

Scheduled Informatica workflows using AppWorx to run at regular intervals.

Used SQL tools like Query Analyzer and TOAD to run SQL queries and validate the data.

Tuning the Mappings for Optimum Performance, Dependencies and Batch Design.

Involved in Unit testing, User Acceptance Testing to check whether the data is loading into target, which was extracted from different source systems according to the user requirements.

Client: Federal Reserve Bank of Richmond, VA Aug 2011 – Dec 2012

Role: Informatica Developer

Description: The Federal Reserve Bank of Richmond is the headquarters of the Fifth District of the Federal Reserve that deploys a full range of corporate & banking services including capital raising, market making and financial advisory services. The aim of the project was to create a Data Warehouse that would involve source data from different departments like Finance, Sales and provide complete analytical solutions.

Responsibilities:

Involved in design, development and maintenance using procedures for getting data from all systems to Data Warehousing system and data was standardized to store various Business units in tables

Worked with business users to create/modify existing reports using the reporting tool.

Parsed high-level design specifications to simple ETL coding and mapping standards.

Used Power Center for Extraction, Transformation and Loading data from heterogeneous source systems into the target database.

Created DTS Packages using SQL Server 2000.

Used stored procedure, views and functions for faster processing of bulk volume of source data

Upgraded exiting complex DTS packages to corresponding SSIS by using integration services Control Flow and Data Flow Transformation tasks (i.e. File system, For Each Loop, Derived Columns, Union all, Merge join, Condition Split, Aggregate).

Develop ETL processes to replicate data from multiple platforms into reporting databases.

Created and scheduled SSIS packages for transferring of running feeds from various departments and multiple servers and resources to Dev. Servers.

Responsible for unit testing and Integration testing.

Assisted in mentoring internal staff on Informatica best practices and skill.

Responsible for Performance Tuning of Informatica Mapping and Tuned SQL Queries.

Responsible for multiple projects with cross-functional teams and business processes.

Responsible for development and support of ETL routines and designed Informatica loads required for populating the data warehouse and Experience in loading high-volume data, Tuning and troubleshooting of mappings and Created documentation to support for the Application.

Developed ETL process for the integrated data repository from external sources.

Provided production support by monitoring the processes running daily basis.

Created Functional Specifications for the different Problem Logs to get the approval of the work.

Created Remedy Tickets for the work approval from different users.

Certifications

AWS Certified Solutions Architect – Associate

Snowflake SnowPro Core Certification

Contact this candidate