Senior Data Engineer - ETL, Lake/Warehouse Expert

Location:

Hyderabad, Telangana, India

Salary:

110000

Posted:

April 30, 2026

Contact this candidate

Resume:

Harshitha Sunkara — Senior Data Engineer

414-***-**** ***********@*****.***

PROFESSIONAL SUMMARY:

Senior Data Engineer with over 5 years of extensive experience in designing, implementing, and managing Linux-based data warehousing infrastructure and processes. Expert in developing and optimizing complex ETL/database load/extract processes, demonstrating strong proficiency in Shell Scripting and Oracle development, including Oracle Exadata. Proven ability to identify and implement robust system and architecture improvements to enhance data flow efficiency and overall data warehouse performance. Highly skilled in Python for data manipulation, automation, and workflow orchestration with Apache Airflow. Possesses practical experience with various relational databases, advanced knowledge of Unix file systems, and expertise in ETL tools such as Informatica. Committed to automation, continuous process improvement, and delivering scalable data solutions within an Agile framework to drive critical business intelligence and analytics initiatives. EDUCATION:

Master of Information Technology Management @ University of Wisconsin WORK EXPERIENCE:

Senior Data Engineer @ Blue Cross Blue Shield (BCBS) — Chicago, Illinois Jun 2024 – Present

Implemented and managed robust Linux-based processes and infrastructure crucial for enterprise data warehousing operations on AWS.

Designed and deployed highly efficient ETL/database load and extract processes using Informatica and Oracle Exadata, handling complex healthcare data.

Developed advanced Shell Scripts to automate critical data ingestion, transformation, and validation tasks, reducing manual effort by 35%.

Identified and implemented significant system and architecture improvements within the data warehouse, enhancing data flow performance by 40%.

Utilized Python extensively to develop custom data pipelines and orchestrate complex workflows via Apache Airflow on AWS infrastructure.

Managed Oracle Exadata environments, ensuring optimal performance and availability for large-scale analytical queries and reporting.

Collaborated with cross-functional teams using Agile methodology to deliver scalable data solutions, supporting key BI and ML initiatives.

Ensured strict data governance, quality, and compliance (HIPAA) by implementing robust validation and masking policies.

Optimized PySpark jobs on AWS EMR for processing millions of claims and enrollment records, significantly reducing runtime and cost.

Implemented real-time data streaming solutions using Kafka on AWS, providing immediate insights for fraud detection and alerts.

Enhanced various Linux-based toolsets, scripts, and jobs to streamline operational procedures and improve system reliability.

Provided leadership in designing data models for Snowflake on AWS, incorporating partitioning and clustering strategies to reduce costs by 20%.

Technologies Used: Linux, Oracle Exadata, Informatica, Shell Scripting, Python, Apache Airflow, AWS (S3, EMR, Glue), Snowflake, PySpark, Kafka, Jenkins, Docker

Data Engineer @ Humana — Louisville, Kentucky Jul 2021 – Aug 2023

Managed Linux-based data warehousing environments, configuring and maintaining the underlying infrastructure for data processing.

Developed and enhanced ETL/database load and extract processes using Informatica for critical EMR, claims, and provider data.

Authored sophisticated Shell Scripts to automate routine data management tasks, ensuring data integrity and timely delivery.

Applied practical knowledge of Unix file systems, including mount types and permissions, to secure and manage data effectively.

Utilized Python to develop and optimize data pipelines within an Azure environment, focusing on scalability and performance.

Administered Oracle databases, performing tuning and optimization to support high-volume data operations and complex queries.

Implemented system and architecture improvements to enhance the efficiency of data flows from various source systems into Azure Synapse.

Orchestrated end-to-end data workflows using Apache Airflow with Python, reducing manual intervention by 40% for ETL jobs.

Collaborated with analytics teams using Agile principles to deliver reliable datasets for patient risk prediction and population health dashboards.

Ensured HIPAA compliance through rigorous data encryption and masking techniques for all sensitive patient information.

Migrated on-prem data lake into Azure Data Lake and Synapse, leveraging PySpark transformations to improve job performance by 25%.

Developed validation scripts to improve data accuracy for regulatory submissions, significantly reducing error rates. Technologies Used: Linux, Oracle, Informatica, Shell Scripting, Python, Apache Airflow, Azure Data Factory, Azure Synapse, PySpark, Snowflake, Kafka

Junior Data Engineer @ RELEX Solutions — Atlanta, Georgia Feb 2020 – Jun 2021

Supported the implementation and management of Linux-based data processes for supply chain data warehousing initiatives.

Assisted in developing and enhancing ETL/database load and extract processes for sales and inventory data using Informatica.

Created and maintained Shell Scripts to automate data ingestion and preliminary data quality checks from global retailers.

Gained practical knowledge of Unix file systems and standard tools, contributing to effective data storage and retrieval strategies.

Developed Python scripts for data manipulation and transformation, supporting demand forecasting and inventory optimization models.

Worked with Oracle databases to manage and query large datasets, ensuring data availability for analytical reports.

Contributed to system improvements by optimizing existing data flows and assisting in the migration to GCP BigQuery.

Automated data ingestion workflows using Apache Airflow with Python, improving data freshness and reliability for real-time applications.

Participated in an Agile team, delivering data solutions that enabled scalable analytics for supply chain planners.

Implemented partitioning and clustering strategies in GCP BigQuery to significantly improve query performance for large datasets.

Built Tableau dashboards visualizing SKU-level demand, replenishment, and stockouts, providing actionable insights to stakeholders.

Developed quality validation scripts, ensuring high data accuracy across integrated feeds from multiple ERP systems. Technologies Used: Linux, Oracle, Informatica, Shell Scripting, Python, Apache Airflow, GCP BigQuery, PySpark, Kafka, SQL, Tableau

TECHNICAL SKILLS:

Programming & Scripting: Python, Shell Scripting, SQL, Scala, Java, Perl

Data Warehousing & ETL: Oracle Exadata, Informatica, Snowflake, Apache Spark, PySpark, Databricks, Hive, Hadoop, Amazon Redshift, Azure Synapse, Teradata

Orchestration & Workflow: Apache Airflow, AWS Glue, Azure Data Factory, Oozie

Cloud Platforms: AWS (S3, Glue, Redshift, EMR), Azure (Data Factory, Synapse, Databricks), Google Cloud Platform

(BigQuery, Dataflow)

Databases: Oracle, PostgreSQL, MySQL, SQL Server, MongoDB, Cassandra

Streaming: Apache Kafka, AWS Kinesis, Spark Streaming

DevOps & CI/CD: Git, Jenkins, Docker, Kubernetes, Terraform

BI & Analytics Tools: Power BI, Tableau, Looker

Concepts & Methodologies: Unix File Systems, Data Modeling, Data Governance, Agile Methodology, Data Quality, MDM, Security & Compliance (HIPAA, SOC2)

Contact this candidate