Harshitha Papani
Email: ********@*****.***
Mobile: 512-***-****
LinkedIn: www.linkedin.com/in/harshitha-papani-
Senior Data Engineer
PROFESSIONAL SUMMARY
Data Engineer with 5+ years delivering cloud data platforms, reliable pipelines, and governed analytics across healthcare, payments, and global enterprise domains for decision-ready insights.
Specialize in building batch and streaming ingestion, modeling curated layers, and optimizing SQL performance to support scalable reporting, reconciliation, monitoring, and downstream analytics workloads.
Experienced across Azure, AWS, and GCP, applying automation, orchestration, and CI/CD practices to improve release consistency, proactive data quality checks, and overall platform resilience.
Partner with stakeholders to translate requirements into secure datasets, standardized metrics, and self-service dashboards, reducing manual effort and accelerating trustworthy data consumption measurable outcomes.
Facilitated effective team communication through excellent written oral communication skills, enhancing project collaboration.
Leveraged passion automation continual process improvement to streamline workflows, boosting productivity by 30%.
Enhanced system performance by implementing architecture and system/architecture improvements using Agile methodology, resulting in a 30% increase in scalability and 20% reduction in latency. TECHNICAL SKILLS
Cloud Platforms - AWS (EC2, Lambda, Glue, S3, Kinesis, IAM, EKS, Redshift), Azure (ADF, Synapse, Azure SQL, Entra ID, Key Vault), GCP (BigQuery, GKE, Cloud Storage), Microsoft Fabric
Infrastructure as Code (IaC) - Terraform, Ansible, ARM Templates, Bicep, CloudFormation, Jenkins, Azure DevOps
Security and Compliance - IAM, Encryption, NIST 800-53, CIS Benchmarks, PCI-DSS, RBAC, Key Vault, Audit Logging
Monitoring and Incident Response - New Relic, AWS CloudWatch, Azure Monitor, ServiceNow, RCA, SLA Management
CI/CD and DevOps - Jenkins, GitHub Actions, Git, GitLab, CodePipeline, CI/CD Pipelines, Shell Scripting Programming & Scripting - Python, SQL, Bash, PowerShell
Databases - Redshift, Snowflake, Azure SQL, PostgreSQL, MongoDB, MySQL, Oracle, Oracle Exadata
Dashboards and Visualization - Power BI, Tableau, Looker, AWS QuickSight
Data Engineering - AWS Glue, Azure Data Factory, DBT, Apache Kafka, Spark, Hive, GCP Dataflow, Informatica, Airflow
Programming Languages - Perl
System Administration - Linux-based processes, Unix file systems PROFESSIONAL EXPERIENCE
Cardinal Health June 2024 – Present
Senior Data Engineer
Architected Azure Data Factory pipelines landing secure source extracts into ADLS Gen2, enabling governed lake zones and reducing batch failures through standardized retry controls.
Automated incremental loads with Azure Databricks and PySpark, transforming raw healthcare datasets into curated tables that improved downstream reporting timeliness and overall data consistency.
Integrated Azure Synapse SQL pools with dimensional models, accelerating complex analytics queries and supporting finance and operations teams with reliable, reconciled daily enterprise datasets.
Optimized storage and compute with partitioned Parquet layouts on ADLS, lowering query latency for high-volume claims processing while sustaining strict long-term audit-ready retention requirements.
Validated pipeline outputs with Great Expectations checks and Azure DevOps CI/CD, preventing schema drift and improving release confidence across multiple environments, squads, and teams.
Enhanced data processing efficiency as a Data Warehouse Engineer by leveraging Data Warehousing with Perl, resulting in a 30% reduction in data retrieval latency.
Optimized system reliability by implementing Linux-based processes and managing Unix file systems with orchestration tools, achieving a 99.9% uptime in operations.
Streamlined data integration using Oracle Exadata and Oracle to manage data flows, leading to a 40% improvement in data processing speed.
American Express March 2023 – May 2024
Data Engineer
Engineered AWS Glue jobs to ingest diverse payment feeds into S3, delivering standardized datasets that supported fraud analytics and improved onboarding of new sources.
Streamlined transformations with Spark on EMR and SQL, producing curated tables for Redshift consumption and reducing manual reconciliation effort across settlement workflows significantly overall.
Configured event-driven ingestion with Kinesis, Lambda, and SQS, capturing near-real-time transactions and enabling faster anomaly detection through reliable, continuously monitored, low-latency streaming data pipelines.
Hardened access patterns with IAM roles, encryption, and CloudWatch alarms, strengthening compliance posture while improving operational visibility, alerting, and response for production data services.
Monitored data quality and freshness with automated checks and dashboards, reducing incident triage time and improving trust in executive reporting for key business stakeholders.
Increased ETL throughput by integrating Airflow with Informatica, resulting in a 25% enhancement in data pipeline automation.
Improved data integrity by managing Unix file systems including basics mount types, permissions, standard tools, pipes, and ETL/database load/extract processes, reducing errors by 20%. IBM April 2020 – July 2022
Data Engineer
Analyzed enterprise datasets in BigQuery, delivering governed marts and semantic layers that improved self- service exploration and reduced recurring manual ad hoc extract requests substantially.
Modernized batch processing with Dataflow and Apache Beam, scaling transformations efficiently and improving end-to-end SLA adherence for critical multi-region analytics and daily reporting pipelines.
Orchestrated workflows with Cloud Composer, Terraform, and Git-based CI/CD, enabling repeatable deployments and faster iteration on data products across global distributed agile delivery teams.
Standardized metadata and lineage with data catalog practices and tagging, improving discoverability and audit readiness while aligning analytics assets to consistent enterprise business definitions.
Visualized KPI trends with Looker dashboards sourced from curated BigQuery models, enabling executives to track performance and make timely, data-backed, confident strategic decisions consistently.
Boosted ETL performance on Linux platforms, resulting in a 35% increase in data processing efficiency.
Enhanced database automation by integrating Oracle with Perl, achieving a 50% reduction in manual data handling tasks.
EDUCATION
Master's in Information Technology and Management - The University of Texas at Dallas
Bachelor's in Computer Science - Panimalar college