Senior Data Engineer - ETL, DW, Hadoop, BI

Location:

Indian Trail, NC

Posted:

January 13, 2026

Contact this candidate

Resume:

Sriharsha Kotte

Senior software Data Engineer

+1-703-***-****

***************@*****.***

LinkedIn: ***************@*****.***

SUMMARY:

Experienced Data Engineer with over 12+ experience as Data ware house developer using ETL Informatica, Autosys, SQL Server, ORACLE,HDFS, Github,Unix,Microsoft Power BI Specialized in developing and implementing Data warehouse projects.

Strong background in ETL, data warehousing, and Data analytics with proven success in operational risk and compliance domains.

Expertise in requirement gathering, documentation, developing, testing and deployment. Coordinating with different Teams to understand their scope and help them to achieve their goals by Providing the right solutions.

Deep hands-on experience in Hadoop ecosystems (HDFS, YARN, Hive, Spark) for high-volume batch and real-time processing.

Experience in developing and implementing ETL process using Informatica PowerCenter(10.5.8) to support Banking, Finance, Insurance claims and enhancement project.

Experience in Autosys, and creating efficient workflow jil for scheduling flow.

Experience in design, development, and implementation of database systems using Oracle, MS-SQL Server for both OLTP & Data Warehousing Systems applications.

Provided troubleshooting of Hadoop and ETL jobs, ensuring quick turnaround during production incidents.

Collaborated with administrator teams to check for recurring issues and future scalability.

Coordinated with offshore teams to drive BAU support activities, including log analysis, bug fixes, and enhancements.

Demonstrated strong analytical skills and coordinate with downstream partners during critical production failures.

Experience in building data pipelines using ETL, and loading data to SQL Datawarehouse.

Worked on SQL joins and functions across Teradata, Oracle, and Azure SQL for compliance and risk reporting.

Worked on creating Power BI dashboards charts for Analytical Purposes.

Expertise working on Service now creating Change Requests,Incidents, RITM Requests, Problem Tasks based on Project needs and assigning to Different Teams.

Worked on SDLC Process Migration of code from Lower environment to Higher Environment (PROD/BCP).

Designed and optimized HDFS-based data pipelines for Payments flow, ensuring high availability and performance.

Developed Hive queries, partitioning, and indexing strategies to accelerate reporting and reduce query execution time.

Built and maintained MapReduce jobs for large-scale batch processing, supporting mission-critical ETL workflows.

Automated workflows to orchestrate jobs, minimizing manual intervention and improving SLA compliance.

Expertise in crating Jil Jobs and Automate end of Day Batch jobs to Trigger as per Schedule using Autosys.

Worked on Migration of workflows of Powercenter from 10.4 to 10.5.1,10.5.3,10.5.8.

Skilled in writing Test cases and automate regression Test cycles using Hyper execute workflows.

Publish Test case results in OCTANE. Perform Performance and Load testing using Blaze meter.

Defined user stories and driving the agile board during project execution, participate in sprint demo and retrospective.

Developed reusable unix scripts for data validation, archival, and anomaly detection.

Used Shell scripting to automate data archival, job monitoring, and failure recovery across different environments.

Experience in Performance Tuning and Query Optimization using profilers and performance monitor.

Implemented Git-based version control workflows including feature branching, pull requests, and code reviews to streamline team collaboration.

Designed and maintained CI/CD pipelines using tools like Jenkins, GitHub Actions to automate build, test, and deployment processes.

Integrated automated testing into CI workflows to ensure code quality and reduce regression issues before deployment.

Integrated S3 Bucket for file uploading/ downloading for different downstream partners.

TOOLS & TECHNOLOGIES:

ETL, Azure Data Factory,HDFS, Informatica Powercenter, MicrosoftPower BI, Python, unix,Autosys.

HDFS, YARN, MapReduce, Hive, Pig

File Formats: Flat File, XML, JSON, CSV,EDI File.

API Integration: RESTful APIs, Kafka Streaming

Databases: Oracle, Teradata, SQL Server, DB2, Azure SQL,

DevOps/Monitoring: Jenkins, GitHub, CI/CD, UCD,Harness

ServiceNow,Hyperexecute,Blazemeter,Octane,UCD Deploymnet, Jenkins,Harness.

EXPERIENCE:

Wells Fargo, Charlotte, NC

Data Engineer

Jan 2018 – Present

Designed and implemented distributed ETL pipelines using informatica Power center for processing financial transactions.

Interact with business teams or end users to understand requirements and convert business specifications to technical specifications for ETL processes.

Participated in Data modelling meetings, helped to identify the relationship between tables according to the requirements.

Analyse functional Requirements and create related feature stories and Plan for release migrationas per timelines.

Collaborated with DevOps teams to integrate Hadoop jobs into containerized environments, streamlining deployment across hybrid cloud setups.

Built data pipelines using Spark and Hive that take payment and risk data from OBPM and OBLM systems, store it in Hadoop and Oracle, and then prepare clean, ready-to-use data in SQL Server for reports and analysis.

Integrated multi-source feeds (Oracle, Kafka, MQ, flat files, REST APIs) with schema drift handling, late-arriving data logic.

Worked with Teradata source system WDM (warehouse Data Mart) to read data Required and Provide to different partner teams.

Implemented full, delta, and snapshot loads with resilent design, checkpointing, and data handling to guarantee repeatable runs and fast recoveries.

Designed ETL mappings and provide transformed data for different systems.

Worked on integrating ETL with real time MQ message XGI team to read from MQ and transform the data as per business rules and showing the business users the transformed data.

Leads code reviews to ensure no performance issues and design standards are followed.supports project meetings by collecting status reports from onsite and offsite teams

Built and scheduled workflows using Autosys Scheduler. Create Jil Files Required.

Collaborated with data governance teams to ensure data compliance and masking of sensitive information.

Created REST API integrations for external systems to ingest compliance data into the big data platform.

Participated in Agile ceremonies and worked closely with business teams to refine data engineering requirements.

Built ETL workflows using Hadoop and Oozie, automating data ingestion and transformation from multiple sources.

Collaborated with cross-functional teams to migrate legacy systems into Hadoop-based data lakes, ensuring scalability and cost efficiency.

Implemented data replication and recovery strategies in HDFS to maintain business continuity and minimize downtime..

Designed and implemented HDFS-based data pipelines to store and process multi-terabyte datasets with high availability and fault tolerance.

Developed and optimized MapReduce jobs for batch data processing, improving performance by reducing execution time through efficient partitioning and tuning.

Integrated Hadoop ecosystem tools such as Hive, Pig, and Impala for querying and analytics, enabling faster insights from raw data.

Leveraged YARN resource management to schedule and monitor distributed jobs across large clusters..

Interact with Different business teams to understand Requirements and Prepare Mapping Documents Required for Development.

Actively participated in team discussions, contributing insights that led to successful resolution of technical and operational challenges.

Collaborated cross-functionally with diverse teams to identify and resolve complex issues, improving project delivery timelines, Closing Defects on Time.

Deployed big data workloads on Kubernetes clusters, enabling auto-scaling and high availability for analytics pipelines.

Implemented CI/CD pipelines with GitHub Actions + Docker, reducing release cycle time by 50%.

NYU Langone Medical Center, New York, NY

Senior Data Engineer

July 2017 – January 2018

Developed robust ETL pipelines using Informatica PowerCenter and Airflow, supporting healthcare data ingestion, transformation, and compliance reporting.

Built batch processing pipelines using Power centre and integrated them with Power BI for downstream reporting.

Created and maintained Hive tables using ORC format to optimize query performance and storage.

Automated complex ETL workflows using Autosys schedular for daily health insurance data loads.

Collaborated with actuarial and business intelligence teams to deliver reporting datasets.

Implemented dynamic schema evaluation to handle evolving health insurance data feeds.

Transformed raw data into structured formats and performed aggregations enabling faster reporting SLAs.

Worked in Agile teams using Jira to manage sprints and user stories for healthcare data products.

Provided technical documentation and walkthroughs for onboarding data analysts and engineers.

Ensured HIPAA compliance, security, and lineage tracking through metadata-driven frameworks and logging automation.

Delivered data-driven insights via Power BI improving reporting efficiency for business users.

Tuned performance of long-running sessions by analysing session logs, optimizing transformations, and leveraging cache settings.

Created parameterized sessions and workflows to promote flexibility across environments and streamline deployment.

Automated post-load validation, log monitoring, and notifications using UNIX shell scripts.

Validated data loads in Oracle DWH using advanced PL/SQL scripts to ensure data accuracy and completeness.

Participated in defect triage meetings with QA and business teams; provided UAT support and resolved issues within defined SLAs.

Monitored nightly batch ETL job executions and provided L2 production support to minimize downtime and data delays.

Kelley Blue Book (KBB), Irvine, CA

ETL Developer

January 2013 – December 2016

Designed and implemented ETL workflows in Informatica PowerCenter to integrate dealer, vehicle pricing, and market intelligence data from diverse sources.

Collaborated with cross-functional business and technical teams to translate analytical requirements into scalable ETL designs.

Created comprehensive source-to-target mappings and reusable transformation logic to support consistent data integration processes.

Participated in the successful migration of Informatica servers from Windows to Linux environments, ensuring compatibility and performance.

Developed custom shell scripts to automate file validation and data ingestion workflows, enhancing operational reliability.

Resolved performance issues by debugging complex workflows and tuning long-running sessions using SQL overrides and cache optimization techniques.

Managed job scheduling and dependency chains using Autosys, ensuring reliable and timely data pipeline executions.

Worked with Teradata source system to read data Required and load to oracle database.

Built audit and control frameworks to monitor data quality, enforce validation rules, and support compliance requirements.

Supported the development of data marts for business intelligence and analytics reporting.

Collaborated with QA teams to provide support during ETL testing phases, data migration validation, and defect resolution.

EDUACTION:

MS in Electrical Engineering Gannon University - Erie, PA, USA - 2012

Bachelor of Technology (ECE) VIT University– Vellore, India - 2010

Certifications:

• Microsoft Azure Fundamentals (AZ-900)

Contact this candidate