Senior Data Engineer - Linux/ETL Focused Expert

Location:

Plano, TX

Posted:

April 30, 2026

Contact this candidate

Resume:

Sai Krupa Reddy — Senior Data Engineer

717-***-**** *********@*****.***

PROFESSIONAL SUMMARY:

Accomplished Data Warehouse Engineer with approximately 5 years of proven experience in implementing, configuring, and managing Linux-based processes and infrastructure for robust data warehousing solutions.

Adept at enhancing complex ETL and database load/extract processes, ensuring optimal performance and data integrity for critical business operations.

Specialized in developing and optimizing Shell Scripts and various Linux-based toolsets, jobs, and processes to streamline data operations and automation.

Extensive practical experience in Linux environment setup, including deep understanding of Unix file systems, permissions, standard tools, and piping mechanisms.

Proficient in Oracle development and administration, with hands-on experience in managing relational databases, including Oracle Exadata environments.

Skilled in Python programming, leveraging it for data processing, automation, and the enhancement of data warehouse functionalities and orchestration workflows.

Demonstrated expertise in designing and implementing system and architecture improvements to enhance the scalability and efficiency of data warehouse platforms.

Experienced in working with leading ETL tools, including Informatica, to facilitate complex data integration and transformation across diverse sources.

Proven ability in utilizing orchestration tools such as Apache Airflow with Python to manage and schedule intricate data pipelines and workflows.

Committed to fostering automation and continual process improvement within data engineering practices to maximize operational efficiency and reliability.

Strong understanding and practical application of Agile methodology, contributing effectively to sprint planning and collaborative team environments.

Possessing excellent written and oral communication skills, adept at conveying complex technical information to both technical and non-technical stakeholders.

EDUCATION:

Master of Science in Software Engineering @ University of Maryland Baltimore County TECHNICAL SKILLS:

Operating Systems: Linux, Unix

Programming Languages: Python, Shell Scripting, SQL, Perl

Databases & Data Warehousing: Oracle Exadata, Oracle, MySQL, PostgreSQL, Hive, Data Warehousing

ETL & Orchestration: Informatica PowerCenter, Apache Airflow, AWS Glue

Big Data Technologies: Apache Spark, Hadoop, Databricks, Snowflake

Cloud Platforms: AWS (S3, EMR, Lambda, Athena)

Version Control & DevOps: GitHub, Jenkins, Docker

Methodologies: Agile, Scrum

WORK EXPERIENCE:

Senior Data Engineer @ Abacus Insights Boston, MA May 2025 – Present

Implemented and managed robust Linux-based processes and infrastructure specifically designed for scalable data warehousing solutions.

Developed and enhanced advanced Shell Scripts for automating critical data load/extract processes, ensuring efficient data flow within the data warehouse.

Administered and optimized Oracle Exadata environments, focusing on performance tuning and data management for high-volume data warehousing needs.

Engineered complex ETL workflows using Informatica to integrate diverse healthcare datasets from various sources into the centralized data warehouse.

Designed and implemented system and architecture improvements to enhance the efficiency and reliability of data warehouse operations on Linux platforms.

Utilized Python extensively for developing data processing scripts, enhancing existing toolsets, and integrating with orchestration systems.

Managed and configured Unix file systems, including mount types and permissions, to maintain secure and organized data storage within the environment.

Orchestrated intricate data pipelines using Apache Airflow with Python, automating scheduling, monitoring, and error handling for critical ETL jobs.

Collaborated with cross-functional teams to identify and implement automation opportunities, significantly reducing manual intervention in data processes.

Ensured high data quality and consistency by implementing robust validation and reconciliation frameworks for all ingested and transformed data.

Provided technical leadership in troubleshooting and resolving complex data warehousing issues, maintaining high availability of data services.

Developed comprehensive technical documentation for data warehouse architecture, ETL processes, and Linux environment configurations, using Confluence. Technologies Used: Linux, Shell Scripting, Oracle Exadata, Informatica PowerCenter, Python, Apache Airflow, SQL, Unix File Systems, GitHub, Jenkins

Data Engineer @ Capital One McLean, VA Apr 2021 – Dec 2022

Designed and implemented scalable ETL workflows within a Linux environment to process large financial datasets for risk analytics and reporting.

Developed and maintained robust Shell Scripts for automating data extraction, transformation, and loading processes into the data warehouse.

Managed and optimized Oracle databases, focusing on query performance tuning and data integrity for enterprise-level financial data.

Implemented data ingestion pipelines to load transactional data from various sources into a centralized Oracle-based data warehouse.

Utilized Python for developing data processing modules, automating routine tasks, and integrating with external data services.

Configured and managed Unix file systems, ensuring proper permissions, data security, and efficient storage allocation for data warehousing components.

Enhanced existing Linux-based toolsets and jobs, improving overall data pipeline efficiency and reducing processing times.

Scheduled and monitored complex batch processing jobs using Apache Airflow with Python, ensuring timely and accurate data delivery.

Collaborated with data analysts and business intelligence teams to understand data requirements and deliver optimized datasets.

Implemented data validation and reconciliation frameworks to ensure high data quality and consistency across all financial data assets.

Contributed to the development of reusable Python libraries for common data transformation and aggregation tasks within the data warehouse.

Maintained version control of all scripts and code using GitHub and automated deployments through Jenkins CI/CD pipelines.

Technologies Used: Linux, Shell Scripting, Oracle, Python, Apache Airflow, SQL, Unix File Systems, GitHub, Jenkins, PostgreSQL, MySQL

Junior Data Engineer @ Fred Meyer Portland, OR Nov 2019 – Mar 2021

Designed and developed comprehensive ETL workflows using Informatica PowerCenter to integrate retail transactional data from diverse sources.

Extracted and transformed data from relational databases, including Oracle and MySQL, loading it into a centralized data warehouse.

Implemented complex data transformations, mappings, and data cleansing rules within Informatica to ensure high data quality and consistency.

Developed and optimized advanced SQL queries for data validation, performance tuning, and efficient data retrieval in Oracle and MySQL.

Utilized Shell Scripting to automate routine tasks, including file transfers, data pre-processing, and ETL job execution on Linux servers.

Built incremental data loading strategies to efficiently process large retail datasets, minimizing load times and system resource utilization.

Collaborated with business analysts to gather detailed data requirements and translate them into effective ETL solutions and data models.

Maintained and monitored ETL workflow schedules, ensuring reliable data delivery for business intelligence and reporting purposes.

Participated in the design of data warehouse schemas and data models to support analytics and reporting needs.

Assisted in troubleshooting and resolving data integration issues, ensuring continuous availability of data for business users.

Managed source code for ETL mappings and scripts using GitHub, ensuring version control and collaborative development practices.

Contributed to the automation of deployment processes for ETL workflows using Jenkins, streamlining releases and reducing manual effort.

Technologies Used: Linux, Shell Scripting, Informatica PowerCenter, Oracle, MySQL, SQL, GitHub, Jenkins

Contact this candidate