Senior Data Engineer - Linux Data Warehousing Expert

Location:

Ridgeview, TX, 75025

Posted:

April 30, 2026

Contact this candidate

Resume:

Madhumitha Kadari — Senior Data Engineer

716-***-**** ********************@*****.***

PROFESSIONAL SUMMARY:

Seasoned Data Warehouse Engineer with 5 years of experience implementing and managing robust Linux-based data warehousing processes and infrastructure efficiently.

Expertise in enhancing ETL, database load, and extract processes for optimal data flow and system performance using advanced shell scripting capabilities.

Proficient in Shell Scripting, Python, and SQL for developing sophisticated data solutions and automating critical data operations effectively.

Strong practical experience in designing and developing solutions for relational databases, specifically Oracle, including Oracle Exadata environments.

Skilled in identifying and implementing system/architecture improvements, ensuring scalability and efficiency of complex data platforms and systems.

Adept at utilizing ETL tools like Informatica and orchestration platforms such as Apache Airflow with Python for complex workflow management.

Comprehensive understanding of Unix file systems, encompassing mount types, permissions, standard tools, and pipe operations for robust data handling.

Dedicated to automation and continual process improvement, delivering high-quality data solutions within Agile development methodologies efficiently.

Proven ability to manage metadata, ensure data quality, and build scalable data pipelines in diverse on-premise and cloud environments securely.

EDUCATION:

Master of Science in Data Sciences and Applications @ University at Buffalo TECHNICAL SKILLS:

Operating Systems & Scripting: Linux, Shell Scripting, Unix, Bash

Databases & Data Warehousing: Oracle, Oracle Exadata, PostgreSQL, Hive, Redshift, Data Modeling

Programming Languages: Python, SQL, PL/SQL

ETL & Orchestration: Informatica PowerCenter, AWS Glue, Apache Airflow, Data Pipeline

Cloud Platforms: AWS (S3, EMR, Glue, Lambda, Redshift, Athena)

Big Data Technologies: Hadoop, Spark, PySpark, Kafka

Version Control & CI/CD: Git, Jenkins, Docker

Tools & Methodologies: JIRA, Confluence, Tableau, Agile, Scrum WORK EXPERIENCE:

Senior Data Engineer @ Aetna Hartford, CT Sep 2025 – Present

Architected and managed Linux-based data warehousing infrastructure, optimizing processes for healthcare data processing and storage efficiency.

Developed intricate Shell Scripts to automate critical ETL workflows, enhancing database load and extract processes within Oracle Exadata environments.

Implemented robust data pipelines using AWS Glue and PySpark, ensuring secure and scalable ingestion from diverse healthcare data sources into Amazon S3.

Designed and deployed system/architecture improvements to existing data platforms, boosting query performance and data availability by 25% for analytics.

Managed and configured Oracle Exadata databases, executing complex SQL and PL/SQL scripts for data manipulation and performance tuning of data warehouses.

Utilized Apache Airflow with Python to orchestrate and schedule hundreds of complex data pipelines, ensuring timely delivery of critical business intelligence.

Implemented comprehensive data validation and quality checks using custom Python frameworks within Linux environments, maintaining data integrity across systems.

Optimized data retrieval and analytical queries on Amazon Redshift and Athena, providing faster insights for business stakeholders and reporting dashboards.

Administered Unix file systems, managing permissions, storage, and leveraging standard tools and pipes for efficient data handling and processing.

Collaborated with cross-functional teams to define data requirements, design optimal data models, and integrate new data sources into the enterprise data warehouse.

Applied CI/CD practices using Jenkins and Git for automated deployment of data solutions, ensuring rapid and reliable delivery of enhancements.

Developed insightful dashboards using Tableau for real-time monitoring and reporting on healthcare data, supporting strategic decision-making and operational efficiency. Technologies Used: Linux, Shell Scripting, Oracle Exadata, Informatica PowerCenter, Apache Airflow, Python, SQL, AWS

(S3, Glue, Redshift, Athena), PySpark, Git, Jenkins, JIRA Data Engineer @ JPMorgan Chase New York, NY Oct 2022 – Jul 2024

Engineered and optimized Linux-based data solutions using Hadoop and Spark for processing vast volumes of financial transaction data efficiently.

Developed advanced Shell Scripts for automating data ingestion, transformation, and database load processes into Oracle Exadata financial data warehouses.

Designed and implemented ETL processes using Informatica PowerCenter to integrate diverse data sources into the enterprise data warehouse for reporting.

Managed the setup and configuration of Linux environments for data processing clusters, ensuring secure and performant operation of data pipelines.

Built sophisticated data ingestion pipelines to load structured and semi-structured data into HDFS, enhancing data availability for analytical purposes.

Optimized Hive tables and executed complex SQL queries for financial reporting, improving data retrieval speeds and analytical accuracy significantly.

Processed real-time streaming data using Kafka and integrated it with Spark Streaming, enabling near real-time analytics for market trends.

Administered Unix file systems, utilizing tools and commands for managing data storage, permissions, and ensuring data integrity across various platforms.

Migrated on-premise financial data to AWS S3, applying optimal storage formats like ORC to improve query performance and reduce storage costs.

Collaborated with stakeholders to gather requirements and translate them into robust data warehousing solutions, adhering to financial industry compliance standards.

Implemented data masking and encryption techniques, ensuring strict data security and compliance with regulatory requirements within the financial sector.

Automated CI/CD pipelines using Jenkins for efficient deployment of data engineering artifacts, accelerating release cycles and maintaining code quality.

Technologies Used: Linux, Shell Scripting, Oracle Exadata, Informatica PowerCenter, Apache Hadoop, Spark (Scala), Hive, Kafka, AWS S3, Python, SQL, Unix, Git, Jenkins, Agile Data Engineer @ Meijer Grand Rapids, MI Jul 2020 – Sep 2022

Developed and maintained ETL pipelines using Informatica PowerCenter for retail data integration, ensuring consistent and accurate data flow into data warehouses.

Designed and implemented data models in Oracle databases, creating efficient schemas and transformations to support business intelligence requirements.

Wrote and optimized complex SQL and PL/SQL scripts for data extraction, cleansing, and transformation, ensuring high data quality for reporting.

Extracted data from various sources including flat files and relational databases, facilitating a centralized view of retail operational data.

Developed batch processing workflows and scheduled jobs, ensuring timely updates to the data warehouse for critical business operations.

Assisted in the initial adoption and setup of the Hadoop ecosystem for large-scale retail data processing, exploring distributed computing solutions.

Created and managed Hive tables, performing querying for basic reporting and contributing to the development of the data warehouse architecture.

Utilized Shell Scripting for automating routine tasks such as file transfers and data preparation, improving operational efficiency within Linux environments.

Ensured data consistency and quality across disparate systems, actively monitoring data flows and resolving discrepancies in the retail data warehouse.

Participated in requirement gathering and system design discussions, contributing to the development of robust and scalable data solutions.

Managed version control of all data engineering code using Git, collaborating effectively with development teams on project deliverables.

Generated insightful reports and dashboards using Tableau for various business units, providing key metrics and trends for strategic decision-making.

Technologies Used: Linux, Shell Scripting, Oracle, Informatica PowerCenter, SQL, Hadoop, Hive, Tableau, Git, Jenkins, Agile

Contact this candidate