PROFESSIONAL SUMMARY:
Data Warehousing: Fourteen plus years of experience in implementing Data Warehousing projects with emphasis in analyzing the business needs of clients, developing effective and efficient solutions, and ensuring client deliverables within committed timelines and involved in all stages of Software Development Life Cycle from analysis and planning to development & deployment and followed the agile methodology & scrum process.
ETL: Proficient in ETL (Extract, Transform, Loading) process in analyzing, designing and developing DWH projects using AWS Glue, Informatica, IICS,Talend Open Studio, Teradata Utilities and Oracle PL/SQL.
Data Modelling: Good understanding of Conceptual, Logical and Physical Data Modelling (Dimensional & Relational) concepts, designing of Enterprise DWH/Data Mart and Database Schemas (Star and Snowflake).
Databases: Expertise in using Oracle, Postgress, IBM-DB2, Teradata, AWS - Database Services, Snowflake.
Others: AWS Services, Exposure to Unix and Python languages, Data Governance, Data Lineage, Snowflake Cloud Data Warehousing.
TECHNICAL SKILLS:
ETL Tools : AWS - Glue/DMS, Informatica - IICS, Power Centre 10.2/10.1/9.6/9.x, Developer 10.2, MDM 10.2,
IDQ, 10.1, Power Exchange, Talend Open Studio, DataStage, SSIS, Oracle Warehouse Builder
DBMS/DWH : AWS - RDS, Redshift, Oracle 11/10g, Postgress, IBM - DB2, SQLServer, Teradata 15/14/13,
Snowflake, Mainframes, Snowflake Cloud DataWarehousing
BI Tools : SAP BusinessObjects, PowerBI
Languages : Python, Pandas, PySpark, SQL, PL/SQL, SQL Assistant, PostgreSQL, COBOL, UNIX
Others : AWS Services, GitHub, Putty, WinSCP, Tectia, Jira, Jenkins, IBM UrbanCode Deploy, Consul,
Confluence, Erwin, Dataguise, Tortoise SVN, BTEQ, FLoad, MLoad, FExport, TPump
PROFESSIONAL EXPERIENCE:
LEIDOS - Baltimore, MD July ’22 - Current
Client: Centers for Medicare and Medicaid Services (CMS) - NDW Project
Designation: Lead ETL/Data Engineer
Responsibilities:
Involved in meetings to gather the requirements from customer and created ETL logic to extract data from files and relational tables, transform and load data in the data warehouse.
Written Python - PySpark scripts for ETL processes to migrate the data from S3 / RDS into RDS/RedShift instance using AWS Glue.
Taken initiation to migrate the existing Informatica ETL logic to AWS Glue.
Proficient in writing CloudFormation Template (CFT) to build the AWS Services with the paradigm of infrastructure as a code.
Migrated legacy ETL jobs from Informatica to AWS Glue using PySpark for scalable data processing.
Built CloudFormation Templates (IaC) to provision AWS resources in a reproducible manner.
Designed Apache Airflow DAGs to orchestrate complex ETL workflows and improve observability.
Developed IICS taskflows and runtime parameterized mappings for dynamic pipeline execution.
Implemented CI/CD workflows via GitHub, Jenkins, and Consul for ETL automation.
Conducted performance optimization and data validation to ensure quality delivery.
Extensively used the Informatica cloud (IICS) transformations like Address validator, Exception, Parser, Solid experience in debugging and troubleshooting Sessions using the Debugger and Workflow Monitor.
Built directed acyclic graphs (DAGS) in Apache Airflow to orchestrate the ETL process and to visualize data pipelines running in production, monitor progress, and troubleshoot issues when needed.
Onboarded the code for deployment via Continuous Integration and Continuous Deployment (CI/CD) process using GitHub, Consul and Jenkins.
Testing and debugging the ETL logic in order to evaluate the performance and to check whether the code is meeting the business requirement and tuned ETL to optimize the performance.
Developed new IICS taskflows, mappings with run-time target generation in cloud database and also migrated the power-center workflows to IICS platform.
Environment: AWS Services – Glue / EC2 / RDS / S3 / RedShift / CloudWatch / CFT, Athena, Informatica, Python,PySpark, Postgress, GitHub, Jenkins, Consul, UNIX.
VIRTUSA - New York City, NY June ’21 - July ’22
Client: HealthFirst
Designation: Sr. ETL Developer / Data Engineer
Responsibilities:
Involved in customer journey meetings to understand the business rules and implemented accordingly.
Written PySpark scripts for ETL processes to migrate the data from S3 into RDS instance using AWS Glue.
Utilized AWS Glue to refresh the data in AWS OpenSearch/ElasticSearch ELK stack for faster search functionalities for the application.
Creation of catalog database tables for Athena for fast querying AWS S3 data using AWS Glue.
Written CloudFormation Template to build the AWS Services with the paradigm of infrastructure as a code.
Exposure to Snowflake utilities, Snow Pipe, SnowSQL, Resource Monitors, Role Based Access Controls, Data Sharing, Virtual Warehouse Sizing, Query Performance- Tuning, Zero copy cloning
Created Snow Pipe for continuous load data and used copy to bulk load the data.
Have utilized Internal, External stage and transformed data during load
Built robust ETL pipelines using PySpark and AWS Glue to ingest data into RDS and Redshift.
Leveraged AWS OpenSearch to enable real-time analytics and search indexing.
Developed Snowpipe for continuous data ingestion and automated bulk loading via SnowSQL.
Migrated legacy PowerCenter workflows into IICS and implemented parameterized taskflows.
Designed and optimized data models in Snowflake, and enforced security via RBAC policies.
Developed Informatica mappings, tasks, workflows and sessions to move the data at specific intervals using workflow manager and modified existing mappings for enhancements of new business requirements to load into staging tables and then to target tables in EDW.
Testing and debugging the ETL and database objects in order to evaluate the performance and to check whether the code is meeting the business requirement and tuned ETL to optimize the performance.
Taken initiation and performed POCs in AWS Services/Informatica/Snowflake as needed to reach the project goals.
Environment: AWS Services – Glue / EC2 / RDS / S3 / OpenSearch / CloudWatch / CFT, Athena, Informatica IICS, PySpark, Postgress, Snowflake Cloud Data Warehousing, GitHub, UNIX, SoapUI.
FIDELITY INVESTMENTS - Durham, NC July ’20 - June ’21
Designation: Sr. ETL Developer / Data Engineer
Responsibilities:
Involved in the customer journey meetings to get the understanding of the business.
Implemented complex business rules in Informatica by creating re-usable transformations, robust mappings/mapplets, utilizing varied transformations like source qualifier, joiner, rank, update strategy, expression, union, sorter, lookup, sequence generator and router transformation etc.
Identified performance issues in existing sources, targets, and mappings by analyzing the data flow, evaluating transformations, and tuned accordingly for better performance.
Testing and debugging the ETL and database objects in order to evaluate the performance and to check whether the code is meeting the business requirement.
Involved in tuning of Informatica mappings, transformations and (workflow) sessions to optimize the performance.
Collaborated with the Data Engineering team on code reviews, performance tuning, and Snowflake migration initiatives.
Developed Oracle PL/SQL objects and worked with middleware engineers to integrate the code for API calls.
Reviewed existing Oracle PL/SQL code and modified Packages, Store Procedures and Functions according to the business requirement.
Participated in Data Engineering team meetings for ETL Code Reviews and architectural designs.
Environment: Informatica10.2, Oracle - SQL, PL/SQL, AWS, Bitbucket, IBM UrbanCode Deploy, Confluence, Jenkins, Snowflake Cloud Data warehousing, Python.
MANTECH INTERNATIONAL CORPORATION - Owings Mills, MD Feb ’19 - May ’20
Client: Centers for Medicare and Medicaid Services (CMS) - EQRS Project
Designation: Lead. ETL Specialist / Data Engineer
Responsibilities:
Analyze the business requirement and created ETL logic to extract data from files and relational tables coming from vendors, transform and load data in the data warehouse.
Design and development of ETL pipelines in AWS Glue to load the data from RDS, S3 buckets into Redshift.
Involved in writing Python scripts for ETL purposes and performed Data analysis using Python libraries.
Have utilized Python libraries like Pandas, PySpark, SQLAlchemy, BOTO3 for data analysis.
Have utilized AWS components like DMS, Glue, RDS, S3, EC2, RedShift, Cloud Watch / Formation.
Developed Informatica mappings, tasks, workflows and sessions to move the data at specific intervals using workflow manager and modified existing mappings for enhancements of new business requirements to load into staging tables and then to target tables in EDW.
Implemented complex business rules in Informatica by creating re-usable transformations, robust mappings/mapplets, utilizing varied transformations like source qualifier, joiner, rank, update strategy, expression, union, sorter, lookup, sequence generator and router transformation etc.
Used Informatica debugging techniques to debug the mappings and used session log files and bad files to trace errors of target data load.
Worked on Informatica partitions to handle huge volumes of data to load successfully into targets sources.
Identified performance issues in existing sources, targets and mappings by analyzing the data flow, evaluating transformations and tuned accordingly for better performance.
Developed Talend jobs for initial and incremental loads to move the data from ON-PERM databases to AWS RDS databases using Talend Open Studio.
Created Informatica mappings/Talend jobs which involved Slowly Changing Dimensions (SCD Type1 and Type2) to implement the CDC logic so as to capture the changes and preserved historical data.
Data De-Identification to mask the sensitive PHI/PII information and loaded into cloud environments for testing the applications in lower environments.
Have utilized AWS components like DMS, Glue, RDS, S3, EC2, RedShift, Cloud Watch / Formation.
Performed unit testing and supported migration of ETL code from lower to higher life cycles.
Created unit test cases to test data loads and check the components adhere to the technical design.
Environment: Informatica PC 10.2, Talend Open Studio, Dataguise, Oracle, PostgreSQL, UNIX, Python, WinSCP, Confluence, AWS - Glue, RDS, Redshift, S3, EC2, EMR, CloudWatch, CloudFormation, Athena, GitHub, Jenkins.
CONDUENT - Elkridge, MD Nov ’18 - Feb ’19
Client: State of Maryland
Designation: Sr. Informatica Developer
Responsibilities:
Understand the business rules based on High Level document specifications and implemented the data transformation methodologies.
Performed Data Quality Checks, Cleansed the incoming data feeds and profiled the source systems data as per business rules using IDQ and participated in development of IDQ mappings using various transformation like Labler, Standarization, Case Converter, Match and Address validation.
Worked on ETL process for bringing in the data into MDM landing tables.
Based on the data quality analysis and discussion with stakeholders the source data trusted scores are defined and developed validation rules based on the profiled data quality and data analysis.
Came to conclusion on key fields after discussing with the Data Stewards and defined Match rules in Match and Merge settings of the base tables by creating Match Path Components, Match Columns and Rule sets.
Configured match rule set filters for meeting the different data scenarios and performed match/merge and ran match rules to check the effectiveness of MDM on data and fine-tuned the match rules.
Configured Informatica Data Directory (IDD) for Data Governance to be used by the Business Users, Managers and Data Stewards
Environment: Informatica PC 10.2 / MDM 10.2 / IDQ 10.2, Mainframes, Postgress, WinSCP.
CSRA (a GDIT Company) - Windsor Mill, MD Apr ’17 - Sept ’18
Client: Centers for Medicare and Medicaid Services (CMS) - EDSC Project
CSRA - Columbia, MD Dec ’12 - Mar ’17
Client: Centers for Medicare and Medicaid Services (CMS) - DECC Project
Designation: Informatica / Teradata Developer
Responsibilities:
Understanding business requirements and converting them into technical specifications and reverse engineering the existing systems or processes for documenting.
Extensively worked on ETL process to load the data from/into to Oracle, Teradata, DB2, Flat files, etc.
Using Informatica PC created complex mappings to transform the data according to the business rules.
Designed CDC mappings to capture the delta records and mappings which involved SCD Type 1 & Type 2.
Worked on workflows, sessions and monitoring them to ensure data is properly loaded on to the target tables.
Identified the bottlenecks in ETL process and improved the performance issues in existing sources, targets and mappings by analyzing the data flow, evaluating transformations and tuned objects for optimum execution timelines using Partitioning, Index Usage, Optimization concepts.
Worked on Static, Dynamic, Persistent Cache to improve the performance of the lookup transformations.
Testing and debugging of all ETL and database objects in order to evaluate the performance and to check whether the code is meeting the business requirement.
Involved in writing complex SQL queries to pull the required information from different databases. Handled millions of data and effectively running the queries against the database with no performance issues.
Written BTEQ, FLoad, MLoad scripts for loading the huge volume of data from legacy systems to target data warehouse. Developed T-Pump scripts to load low volume data into Teradata.
Created PL/SQL objects like Stored Procs, Functions, Packages for moving the data from staging to data mart.
Extensively used Bulk Collection in PL/SQL objects for improving the performance.
Worked on data verification and validation to evaluate the data generated according to the requirements is appropriate and consistent.
Handled errors using Exception Handling extensively for the ease of debugging and displaying the error messages in the application.
Created indexes on the tables for faster retrieval of the data to enhance database performance.
Proficient in performance analysis and query tuning using EXPLAIN PLAN, Collect Statistics.
Involved in Data Modeling, System/Data Analysis, Design and Development for OLTP and OLAP systems.
Environment: Informatica PC 10.1/9.x, Teradata 15/14, BTEQ, MLOAD, FLOAD, TPUMP, UNIX, Oracle 11g, SQL, PL/SQL, ERWIN.
GVK Solutions - Clarksburg, MD Jan ’12 - Nov ’12
Designation: Teradata Developer
Responsibilities:
Involved in requirement gathering, analysis, development, testing and implementation of business rules.
Used Informatica PC Designer to create mappings using varied transformations to pipeline data to DataMart.
Done performance tuning at source, target, mapping, sessions and system levels.
Fixed issues with the existing FLoad / MLoad Scripts in for loading of data in the warehouse effectively.
Loaded data from the staging tables to the base tables using BTEQ scripts and written MLOAD scripts for loading the data from flat files to the staging tables.
Performed Query Optimization with the help of Explain Plan, Collect Statistics, Primary and Secondary Indexes. Used volatile table and derived queries for breaking up complex queries into simpler queries.
Worked with PPI Teradata tables and was involved in Teradata specific SQL fine-tuning to increase performance of the overall ETL process.
Environment: Informatica 9.1, Teradata 13, SQL Assistant, BTEQ, MLOAD, FLOAD, TPUMP, UNIX, Oracle 10g, SQL.
EDUCATIONAL QUALIFICATIONS:
MS in Electrical and Computers - University of Dayton, Dayton, OH Jan ’09 - Dec ’11
M.Sc in Electronics - National Institute of Technology, Warangal, INDIA July ’04 - Sept ’07
B.Sc (Mathematics, Physics and Computers) - Osmania University, Hyderabad, INDIA Aug ‘2k - May ‘03