Post Job Free
Sign in

Senior Data Engineer-ETL, Oracle, Python, Airflow

Location:
Oakwood Glen, TX, 75025
Posted:
April 30, 2026

Contact this candidate

Resume:

Sai Charan Gangili — Senior Data Engineer

813-***-**** ***********.**@*****.***

PROFESSIONAL SUMMARY:

Possessing around 5 years of extensive experience as a Data Warehouse Engineer, adept at implementing and managing robust Linux-based data infrastructure.

Proven expertise in Shell Scripting and Oracle development, consistently enhancing data warehousing processes and optimizing system performance.

Highly skilled in developing and enhancing ETL processes, database load, and extract operations, utilizing tools like Informatica for efficient data flow.

Proficient in Python programming and practical knowledge of Perl, essential for developing advanced data solutions and automation scripts.

Extensive practical working experience with Linux environment setup and intricate Unix file systems, including mount types, permissions, and standard tools.

Demonstrated ability to identify and implement critical system and architecture improvements, ensuring high availability and scalability of data platforms.

Strong background in relational databases, with hands-on experience in Oracle environments, including performance tuning and schema management.

Experienced in leveraging orchestration tools such as Apache Airflow with Python for automating complex data pipelines and workflow management.

Committed to continuous process improvement and automation, driving efficiencies in data management and operational workflows within data warehousing.

Solid understanding and practical application of the Agile methodology, contributing effectively to cross-functional teams and project delivery.

Expert in designing and implementing end-to-end data warehousing solutions, ensuring data quality, integrity, and timely delivery for analytical insights.

Excellent written and oral communication skills, facilitating clear articulation of technical concepts and fostering collaborative working environments.

EDUCATION:

Master of Science in Data Science @ Illinois Institute of Technology TECHNICAL SKILLS:

Operating Systems & Scripting: Linux, Unix, Shell Scripting, Bash, Perl

Databases & Warehousing: Oracle Exadata, Oracle, PostgreSQL, MySQL, MS SQL Server, Amazon Redshift, Azure Synapse Analytics, Data Warehousing, Dimensional Modeling

ETL & Orchestration: Informatica PowerCenter, Azure Data Factory, AWS Glue, Apache Airflow, AWS Step Functions

Programming Languages: Python, Scala, SQL

Cloud & Big Data: AWS (S3, EMR, Athena), Azure (ADLS, Databricks), Apache Spark, Hadoop, Hive

Version Control & DevOps: GitHub, Jenkins, Docker, Kubernetes

Methodologies: Agile, SDLC, Data Modeling

WORK EXPERIENCE:

Senior Data Engineer @ Tenet Healthcare Dallas, TX Aug 2024 – Present

Implemented, configured, and managed robust Linux-based processes and infrastructure crucial for advanced healthcare data warehousing initiatives.

Developed and enhanced complex Shell Scripts for automating data extraction, transformation, and loading (ETL) into Oracle and cloud data warehouses.

Identified and implemented critical system and architecture improvements, optimizing data flow and enhancing the performance of large-scale data platforms.

Enhanced various Linux-based toolsets, scripts, jobs, and processes, significantly reducing manual intervention and improving operational efficiency.

Designed and deployed scalable ETL pipelines utilizing PySpark on AWS EMR, integrating with Oracle databases for comprehensive data processing.

Managed and optimized Oracle database interactions, ensuring seamless integration of healthcare data and high availability for analytical reporting.

Ingested diverse structured and semi-structured data from various sources into Amazon S3, applying Unix file system principles for data organization.

Performed extensive data transformation and cleansing using Spark and SQL, maintaining data quality and consistency across all data assets.

Orchestrated complex data workflows and scheduling using Apache Airflow with Python, ensuring timely delivery of critical data for business intelligence.

Built data warehouse solutions using Amazon Redshift, leveraging deep understanding of data flows and dimensional modeling techniques.

Containerized data processing applications using Docker and deployed them efficiently via Kubernetes, enhancing scalability and resource utilization.

Automated continuous integration and continuous deployment (CI/CD) pipelines using Jenkins, streamlining development cycles for data warehousing solutions. Technologies Used: Linux, Shell Scripting, Oracle, AWS (S3, EMR, Redshift, Athena), PySpark, Hadoop, Hive, Apache Airflow, Python, Docker, Kubernetes, Jenkins

Data Engineer @ PNC Financial Services Pittsburgh, PA Apr 2022 – Jun 2023

Implemented and managed Linux-based infrastructure to support data warehousing operations, ensuring high performance and reliability for financial data.

Developed and optimized Shell Scripts for automating data pipeline orchestrations and system maintenance tasks within the data warehouse environment.

Enhanced ETL/database load and extract processes for financial data, achieving significant improvements in data ingestion and processing speeds.

Contributed to system and architecture improvements, focusing on optimizing data storage and retrieval mechanisms for large transactional datasets.

Developed robust data processing pipelines using Azure Databricks and Spark, integrating with Oracle databases for comprehensive data handling.

Designed and implemented advanced ETL workflows using Azure Data Factory, focusing on efficient batch processing of critical financial information.

Ingested data from various relational databases, including Oracle and SQL Server, into Azure Data Lake Storage (ADLS) using secure Linux-based transfers.

Performed complex data transformations using Spark (Scala) and SQL, preparing data for reporting and analytics in Azure Synapse Analytics.

Implemented sophisticated data warehousing solutions utilizing Azure Synapse Analytics, providing a scalable platform for enterprise-wide data analysis.

Applied stringent data validation and quality checks throughout ETL processes, ensuring the accuracy and integrity of financial data assets.

Used GitHub for version control of data engineering code and Jenkins for continuous integration and automated deployments of data solutions.

Collaborated within an Agile framework, actively participating in sprint planning and daily stand-ups to deliver data warehousing enhancements efficiently.

Technologies Used: Linux, Shell Scripting, Oracle, Azure (ADLS, ADF, Synapse, Databricks), Spark (Scala), SQL Server, Informatica, Apache Airflow, Python, GitHub, Jenkins Junior Data Engineer @ Dollar General Goodlettsville, TN Nov 2019 – Mar 2022

Designed and developed comprehensive ETL processes using Informatica PowerCenter, facilitating data movement for retail data warehousing.

Enhanced database load and extract processes, optimizing performance for large datasets extracted from Oracle and other relational sources.

Managed Unix-based environments, utilizing Shell Scripting for automating various data processing tasks, file transfers, and job scheduling.

Extracted data efficiently from Oracle databases and loaded it into staging and warehouse tables, adhering to dimensional data modeling principles.

Developed and optimized complex SQL queries for data transformation, validation, and reporting within the Oracle database environment.

Implemented robust data cleansing and transformation rules to ensure high data quality and consistency across all retail data systems.

Created intricate workflows and mappings within Informatica PowerCenter for batch processing, handling large volumes of transactional data.

Monitored ETL jobs proactively, managing error logging and exception handling to ensure continuous data availability and integrity.

Utilized Unix shell scripting extensively for the automation of daily, weekly, and monthly ETL jobs, significantly improving operational efficiency.

Ensured data integrity and consistency across disparate systems, maintaining reliable data for critical business intelligence and analytical needs.

Maintained rigorous version control for all ETL code and scripts using GitHub, fostering collaborative development and streamlined code management.

Worked effectively within the Hadoop ecosystem, particularly Hive, for processing and analyzing large semi-structured datasets, complementing Oracle solutions.

Technologies Used: Linux (Unix), Shell Scripting, Oracle, Informatica PowerCenter, SQL, Hive, Jenkins, GitHub



Contact this candidate