SUPRAJA YADAV GUNDU
Sr Data Engineer
*****************@*****.*** 314-***-****
PROFESSIONAL SUMMARY
Data Warehouse Engineer with 6+ years of experience designing and maintaining ETL/database load and extract processes, Linux-based infrastructure, and shell scripting automation across healthcare, banking, and logistics domains. Strong working knowledge of Python, Shell scripting, Oracle, and Airflow orchestration with proven success enhancing data warehousing toolsets, scripts, and processes. Skilled in Unix file systems, relational databases, and Agile methodology with a passion for automation and continual process improvement. TECHNICAL SKILLS
Languages: Python, Shell Scripting, Bash, Perl, SQL, Java, R Databases: Oracle, Oracle Exadata, SQL Server, BigQuery, Redshift, DynamoDB, MongoDB, Snowflake Data
Warehouse:
Data Warehousing, ETL, Database Load/Extract, Star Schema, 3NF, Medallion, dbt, Delta Lake, Apache Iceberg
ETL Tools: Informatica, AWS Glue, SSIS, Talend, Apache Spark, PySpark, Hive, Kafka Orchestration: Apache Airflow, Cloud Composer, AWS Step Functions, Cloud Scheduler OS & Systems: Linux, Unix file systems (mount types, permissions, pipes, standard tools), Windows Servers Cloud: GCP (BigQuery, Dataflow, Pub/Sub, Dataproc), AWS (Lambda, S3, EC2, Redshift, EMR, CloudWatch), Azure (ADF, Synapse, Fabric)
DevOps: Terraform, GitHub Actions, Jenkins, CloudFormation, Docker, Kubernetes, Cloud Build, Bitbucket, Git Methodologies: Agile, Scrum, CI/CD, Automation, Continual Process Improvement, Code Reviews, RCA BI & Viz: Power BI, Tableau, Looker, LookML, Data Studio, Qlik Other Tools: JIRA, Confluence, ServiceNow, Postman, REST APIs, JSON, YAML, Avro, Parquet, CSV PROFESSIONAL EXPERIENCE
Sr Data Engineer Banner Health Jul 2024 – Present
• Engineered Linux-based ETL pipelines and database load/extract processes on Kubernetes clusters, enabling scalable ingestion and transformation of clinical and insurance datasets in compliance with HIPAA.
• Automated infrastructure provisioning and continual process improvement using Terraform with YAML templates, standardizing IaaS and PaaS deployments across GCP and Azure healthcare environments.
• Enhanced shell scripting toolsets, jobs, and processes for managing containerized PySpark workloads, ensuring fault-tolerant operation across Linux-based Kubernetes clusters.
• Architected reusable BigQuery datasets and applied data warehouse design patterns (medallion, 3NF, star-schema) to optimize healthcare analytics performance and data flows.
• Migrated legacy Hadoop workloads to Databricks and BigQuery, identifying system and architecture improvements that reduced pipeline runtime and operational cost.
• Implemented Delta Lake with Unity Catalog and Apache Iceberg for schema enforcement, versioning, and centralized governance across the healthcare data lakehouse.
• Collaborated with data scientists and BI teams in Agile sprints, delivering curated datasets, parameterized SQL, and reusable PySpark modules that powered Power BI executive dashboards.
• Communicated architecture decisions and trade-offs through written documentation and stakeholder reviews, enabling cross-functional alignment on healthcare data platform roadmap. Sr Data Engineer Regions Bank Dec 2020 – Aug 2023
• Designed and implemented ETL/database load and extract pipelines using Python and Apache Spark on Amazon EMR to process large-scale banking transaction data efficiently.
• Ingested data from Oracle and SQL Server relational databases into AWS S3 and Amazon Redshift, enabling faster fraud detection, financial reporting, and regulatory analytics.
• Applied Change Data Capture (CDC) techniques against Oracle and SQL Server to track incremental updates, enabling near real-time refreshes in banking data warehouse pipelines.
• Redesigned legacy ETL jobs to reduce runtime by 40% and automated manual file ingestion using AWS Lambda, Step Functions, and shell scripts on Linux environments.
• Built CI/CD pipelines using Terraform, GitHub Actions, and AWS CloudFormation to automate infrastructure provisioning and continual process improvement across banking data platforms.
• Developed modular SQL models using dbt on Redshift, enabling version-controlled transformations and improving reproducibility across data warehouse analytics pipelines.
• Operated within Agile delivery cycles alongside data architects and software engineers, defining schemas, enforcing governance, and communicating progress in sprint reviews and stand-ups.
• Streamlined automation by writing Python utilities for metadata extraction, data quality validation, and pipeline alerting, reducing failures by 15% and improving operational reliability. Data Engineer FedEx Express Jan 2018 – Dec 2020
• Built ETL/database load and extract pipelines using Google Dataflow, BigQuery, and shell scripting to process real-time shipment tracking data from multiple logistics sources.
• Automated ETL job orchestration and monitoring using Cloud Composer (Airflow with Python) to improve reliability of logistics data flows and reduce manual intervention.
• Managed Linux/Unix servers for Spark Streaming and BigQuery ingestion, working with Unix file systems, mount types, permissions, pipes, and standard tools to ensure 24/7 reliability of shipment tracking systems.
• Performed system-level pipeline monitoring, failure debugging, and infrastructure automation using Python and shell scripts, identifying architecture improvements that strengthened logistics platforms.
• Conducted Root Cause Analysis (RCA) on pipeline and ETL failures, implementing long-term toolset and process fixes in collaboration with engineering teams to prevent recurrence.
• Conducted peer code reviews and guided junior engineers on Linux scripting, data modeling, and pipeline optimization best practices, fostering written and oral communication across teams.
• Provisioned BigQuery datasets, Cloud Storage, and service accounts using Terraform and YAML, applying Infrastructure-as-Code patterns for geographically redundant logistics analytics services.
• Delivered work within Agile sprints, completed ServiceNow tickets for cloud workloads, and willingly took on problems outside current skillset to support enterprise logistics data initiatives. EDUCATION
Saint Louis University – Master's in Information Systems Aug 2023 – May 2025