Data Engineer Azure

Location:

Hyderabad, Telangana, India

Posted:

October 15, 2025

Contact this candidate

Resume:

SHIVAKUMAR THUPPALA

+1-216-***-**** *******************@*****.*** LinkedIn Open to Relocate

SUMMERY

Data Engineer with 4+ years of experience in designing and optimizing large-scale data systems across Azure, AWS, and Snowflake. Hands-on expertise in building 50+ data pipelines, migrating 10+ TB enterprise datasets, and processing 500M+ records daily for finance, healthcare, and supply chain domains. Skilled in Azure Data Factory, Databricks

(PySpark), Synapse, Delta Lake, AWS Glue, and Lambda, with a proven record of reducing ETL runtimes by 60%+, cutting cloud costs by 35%, and accelerating BI adoption by 25%. Adept at implementing data governance, monitoring, and Medallion architecture frameworks to deliver reliable, cost-efficient, and business-ready insights. PROFESSIONAL SKILL

Technical :

SQL, Python, Data Warehousing, Database Design

Framable tools :

Hadoop, Spark, Databricks, Azure Data Lake Storage Gen2 (ADLS Gen2), Azure Data Factory, Azure Synapse, Azure Logic Apps, AWS S3, AWS Glue, AWS Redshift, AWS Athena, AWS Lambda, Power BI, Tableau Database & cloud :

Microsoft SQL Server, Oracle DB, MySQL, AWS Aurora, PostgreSQL, HBase, Snowflake, Azure, AWS EXPERIENCE

Stripe (San Francisco) Data Engineer July 2024 - Present

• Led migration of 10+ TB of FP&A data from Teradata to Microsoft Azure, building enterprise-grade data pipelines with Azure Data Factory and Azure Databricks, while implementing the Medallion (Bronze–Silver–Gold) Lakehouse architecture, enabling 99.9% reliable, scalable analytics across finance functions.

• Modernized legacy Teradata stored procedures into modular, PySpark-driven Databricks notebooks, improving transformation speed by 65%, reducing technical debt, and establishing a standardized re-usable processing framework adopted across multiple business units.

• Optimized Databricks infrastructure costs by 35% by implementing autoscaling clusters, leveraging spot instances, and introducing calendar-aware job scheduling, directly aligning compute usage with financial close cycles, saving hundreds of compute hours monthly.

• Engineered proactive monitoring & observability by integrating Azure Logic Apps with ADF pipelines, deploying real-time SLA breach alerts, anomaly detection, and automated escalation workflows, cutting MTTR by 75% and improving operational resilience.

• Designed and deployed a high-performance Azure Synapse Analytics consumption layer, reducing BI refresh latency by 40%, enabling self-service analytics, and accelerating executive FP&A reporting cycles from days to hours.

• Collaborated with cross-functional finance, data science, and BI teams to define governance standards, data quality rules, and security controls, ensuring SOX compliance and building a trusted single source of truth for financial data.

• Championed cloud cost governance & performance tuning best practices, conducting performance benchmarking, schema optimization, and storage tiering strategies, contributing to long-term scalability and sustainability of the data platform.

Innovaccer, Data Engineer March 2022 - May 2023

• Designed and deployed 15+ enterprise-grade data pipelines using AWS Glue and Lambda, processing 300M+ daily patient records with HIPAA-compliant architectures, enabling real-time clinical decision support and reducing data ingestion latency by 70%.

• Optimized a mission-critical Snowflake data warehouse, reducing query runtime by 48% via partitioning, clustering, and micro-partition pruning, which drove a $12K/month reduction in cloud spend while improving scalability for analytics teams.

• Modernized a legacy on-prem Oracle claims processing system by migrating to AWS S3 + Snowflake, leveraging AWS Athena for ad-hoc exploration. Designed incremental Lambda-based ingestion frameworks that reduced pipeline runtime from 8 hours to 90 minutes, improving SLA compliance and system reliability.

• Revamped Tableau analytics layer by restructuring SQL queries, optimizing Snowflake schemas, and pre- aggregating datasets, which improved dashboard refresh times by 60% and increased clinical and operational reporting adoption by 25%.

• Implemented robust data governance and security controls including role-based access (RBAC), data masking, and audit logging across Snowflake and AWS S3, ensuring HIPAA and SOC2 compliance while strengthening data trust.

• Collaborated with cross-functional teams (data science, product, and compliance) to establish data quality standards, monitoring frameworks, and CI/CD pipelines, ensuring 99.9% uptime for analytics workloads and accelerating feature delivery for healthcare insights.

• Championed cost optimization practices by benchmarking compute usage, auto-suspending idle warehouses, and adopting storage tiering strategies, driving long-term savings while maintaining query performance. EonSpace Labs, Data Engineer March 2020 – Feb 2022

• Designed and deployed 20+ enterprise-grade data pipelines processing 500M+ records daily using ADLS Gen2, Azure Data Factory, Azure Databricks, Azure Synapse, Python, and Spark, delivering real-time supply chain insights and improving operational efficiency across global logistics networks.

• Diagnosed and eliminated performance bottlenecks in Databricks notebooks by applying query optimization, caching strategies, and parallelization techniques, reducing ETL runtime by 63% and freeing up thousands of compute hours annually.

• Enhanced Power BI analytics adoption by 20% through optimizing SQL Server data warehouse fact table loading strategies (switched from full loads to incremental/partition-based loads), reducing dashboard refresh times by 57%, and enabling faster executive reporting cycles.

• Engineered robust multi-source ingestion framework to integrate data from REST APIs, JSON, and CSVs into Delta Lake, ensuring schema evolution support, data validation, and metadata-driven automation, which increased data reliability and auditability.

• Implemented end-to-end data governance and quality checks across pipelines using Data Factory + Databricks validation layers, improving data accuracy by 30% and aligning with GDPR and SOX compliance requirements.

• Collaborated with cross-functional stakeholders (supply chain, BI, and product teams) to define data modeling standards and establish a single source of truth in Azure Synapse, enabling self-service analytics for 200+ business users.

• Championed cost optimization initiatives by tuning Spark jobs, optimizing storage tiers in ADLS Gen2, and configuring autoscaling clusters, leading to a 25% reduction in Azure spend without compromising performance. EDUCATION

Master of Science in Computer Technology, Eastern Illinois University. Charleston, IL,USA

Contact this candidate