UDAY KIRAN BUMA
Senior Data Warehouse Engineer • Linux Systems • ETL Architecture • Python • Oracle
*****************@*****.*** 786-***-****
PROFILE
Data Warehouse Engineer with 5+ years of experience designing and maintaining large-scale data infrastructure across financial systems, specializing in Linux-based environments, shell scripting automation, and high-performance ETL pipelines. Proven ability to modernize legacy architectures, optimize batch and streaming data workflows, and improve system reliability through automation and process engineering. Strong expertise in Oracle databases, Python-based data processing, and distributed data ecosystems with a consistent track record of improving SLAs, reducing operational overhead, and enhancing data quality at scale.
CORE TECHNICAL SKILLS
Languages: Python, SQL, Shell (Bash), Perl (working knowledge), Java
Data Warehousing: Oracle, Amazon Redshift, Snowflake, Azure Synapse
ETL & Orchestration: Apache Airflow, AWS Glue, Azure Data Factory, dbt
Linux & Systems: Unix/Linux, File Systems, Shell Scripting, Process Automation
Big Data: Apache Spark, PySpark, Hadoop, Hive, Kafka
Cloud: AWS (S3, EMR, Lambda), Azure (Data Lake, Databricks)
Databases: Oracle, PostgreSQL, MySQL, SQL Server
Monitoring: CloudWatch, Grafana
DevOps: Docker, Kubernetes, CI/CD, Git
PROFESSIONAL EXPERIENCE
Data Warehouse Engineer — Prudential Financial
Aug 2024 – Present
Designed and implemented a Linux-based data warehouse architecture using Azure Data Lake and distributed processing frameworks, consolidating over 10 enterprise data sources into a unified platform and reducing data fragmentation across actuarial systems by over 70% while improving accessibility for downstream analytics teams.
Re-engineered legacy ETL workflows by replacing SQL Server-based pipelines with Python and shell-driven orchestration, enabling better control over job scheduling, reducing manual interventions by 60%, and improving system reliability through standardized scripting practices across environments.
Built and maintained robust shell scripting frameworks to automate daily batch processes, including file ingestion, validation, transformation, and archival, improving execution efficiency by 40% and eliminating recurring operational delays caused by manual job handling.
Developed and optimized ETL pipelines handling 100GB+ daily data loads, ensuring consistent data movement across staging and warehouse layers while improving load performance by 35% through indexing strategies and partition tuning.
Managed Linux-based infrastructure for data processing, including cron scheduling, process monitoring, and job recovery mechanisms, resulting in a 99.9% pipeline success rate and significantly reduced system downtime.
Architected and enhanced Python-based data transformation workflows, integrating them with shell scripts for seamless execution, enabling modular and reusable pipeline design that accelerated new data onboarding by 50%.
Designed Airflow DAGs to orchestrate complex ETL workflows with dependency management and failure handling, improving job scheduling efficiency and reducing SLA breaches from 6% to less than 1%.
Implemented data validation and reconciliation frameworks using Python and SQL, ensuring data integrity across multiple layers and reducing data discrepancies from 5% to below 0.5% in reporting systems.
Automated database load and extract processes using shell scripts integrated with Oracle utilities, significantly improving throughput and reducing manual DBA intervention for routine operations.
Tuned SQL queries and optimized Oracle database performance by analyzing execution plans and indexing strategies, reducing query latency by up to 45% for critical reporting workloads.
Enhanced system logging and monitoring using Linux tools and custom scripts, enabling faster root cause analysis and reducing mean time to resolution (MTTR) from hours to under 20 minutes.
Designed scalable data pipelines to support near real-time processing using streaming frameworks, reducing data latency from batch-based overnight processing to under 10 minutes.
Led automation initiatives to eliminate repetitive operational tasks, resulting in a 30% improvement in team productivity and allowing engineers to focus on higher-value development work.
Integrated data quality checks into ETL workflows, ensuring compliance with business rules and reducing stakeholder-reported issues to near zero over a six-month period.
Collaborated with cross-functional teams to align data architecture with business requirements, improving reporting accuracy and enabling faster decision-making for actuarial and finance teams.
Implemented secure data handling practices within Linux and database environments, ensuring compliance with regulatory requirements and passing internal audits without any data-related findings.
Optimized file system usage and storage management strategies, reducing storage costs by 20% while maintaining performance for large-scale data processing workloads.
Built reusable shell-based utilities for file parsing, transformation, and validation, significantly reducing development time for new pipelines and standardizing engineering practices.
Provided production support and incident management for critical data pipelines, ensuring continuous availability and maintaining zero missed SLA commitments over multiple quarters.
Continuously evaluated and improved system architecture, introducing enhancements that improved scalability, maintainability, and overall system resilience in a high-volume data environment.
Data Warehouse Engineer — Tata Consultancy Services (TCS)
Jan 2021 – Dec 2023
Developed and maintained large-scale ETL pipelines loading 50M+ records daily into a centralized data warehouse using Python, SQL, and shell scripting, ensuring consistent and reliable data delivery for business-critical reporting systems.
Designed shell-based automation scripts for end-to-end pipeline execution, including data extraction, transformation, and loading, reducing manual intervention by 70% and improving operational efficiency.
Managed Linux-based environments for ETL processing, including job scheduling, file system management, and process monitoring, ensuring stable and high-performance system operations.
Built and optimized Oracle-based data models, improving query performance by 50% and enabling faster analytics for downstream business users.
Enhanced ETL workflows by implementing modular Python scripts, improving maintainability and reducing development effort for new data sources by 40%.
Diagnosed and resolved performance bottlenecks in data pipelines, reducing overall processing time by 45% through query optimization and efficient resource utilization.
Implemented data validation frameworks to ensure accuracy across pipeline stages, improving data quality metrics from 96% to 99.8%.
Designed and implemented monitoring dashboards for pipeline health using logging and alerting mechanisms, reducing issue detection time from hours to minutes.
Automated file ingestion processes using shell scripts and Linux utilities, improving data availability and reducing delays in batch processing cycles.
Collaborated with stakeholders to understand data requirements and translated them into efficient ETL designs, improving reporting turnaround time by 35%.
Worked on integrating multiple data sources including APIs, flat files, and databases into the warehouse, ensuring seamless data flow and consistency.
Supported migration initiatives to modern data platforms, contributing to improved scalability and reduced infrastructure costs.
Implemented scheduling and orchestration improvements using Airflow, enhancing workflow visibility and reliability across multiple pipelines.
Contributed to continuous improvement initiatives by identifying inefficiencies in existing processes and implementing automation solutions that reduced operational overhead.
Provided on-call support for production systems, ensuring quick resolution of issues and maintaining high system uptime across critical business operations.
NOTABLE PROJECTS
Financial Data Warehouse Modernization
Built scalable ETL pipelines using Python and shell scripting, improving processing efficiency by 60% and enabling faster data delivery across business units.
Designed robust data validation and monitoring frameworks ensuring high data accuracy and reliability across multiple systems.
Real-Time Data Processing System
Implemented streaming data pipelines using Kafka and Spark, reducing latency from hours to minutes and enabling near real-time insights.
Built monitoring dashboards to track pipeline health and performance metrics, improving system observability.
EDUCATION
M.S. in Computer Science — Florida International University
B.S. in Computer Science — JNTUH
CERTIFICATIONS
NPTEL – Programming in C
Infosys – Python Certification