SOWMIKA KAPPALA
+1-940-***-**** **************@*****.*** LinkedIn
SUMMARY
Data Engineer with 3+ years of experience building scalable ETL and ELT pipelines, data lakehouse solutions, and real-time analytics across AWS and Azure. Skilled in orchestrating pipelines with Airflow and Databricks, large-scale processing using Spark and Kafka, and data warehousing on Snowflake and Redshift. Proficient in Python and SQL for automation, modeling, and transformation, with expertise in CI/CD using Jenkins, Docker, and Terraform. Partnered with cross-functional teams of engineers, analysts, and business stakeholders to deliver governed, cost-efficient, and insight-driven data systems that improve data quality and decision-making.
TECHNICAL SKILLS
Programming Languages: Python, SQL, Scala, Java, R, Shell Scripting Big Data & ETL: Apache Spark, Kafka, Airflow, Databricks, SSIS, Hadoop, Hive, DBT (Data Build Tool), Flink, GoldenGate CDC Cloud Platforms: AWS (Glue, Redshift, EMR, Kinesis, S3, Lambda, Athena, CloudWatch), Azure (Data Factory, Synapse, Databricks, Data Lake Storage, Key Vault, Microsoft Fabric, OpenAI), GCP (BigQuery, Dataflow) Data Modeling & Warehousing: Star/Snowflake Schemas, Delta Lake, Iceberg, Snowflake, Redshift, Synapse, Oracle Exadata, PostgreSQL, MySQL, MongoDB
Governance & Quality: Great Expectations, Apache Atlas, Collibra, Metadata Management DevOps & CI/CD: Git, Jenkins, Azure DevOps, GitHub Actions, Terraform, Docker, Kubernetes, Jira Monitoring & Observability: Grafana, Prometheus, Splunk Project Management & Methodologies: Agile, Scrum, SDLC, Cross-Functional Collaboration AI/ML & Analytics: MLflow, Databricks ML Pipelines, Feature Engineering, Model Tracking, REST APIs Visualization & Analytics: Power BI (DAX), Tableau, Looker WORK EXPERIENCE
Kaiser Permanente Data Engineer Oakland, CA May 2024 -Present
• Built HIPAA-compliant data ingestion and transformation frameworks using AWS Glue, Spark on EMR, and S3 to process 10M+ patient records monthly, powering analytics and compliance reporting.
• Delivered real-time streaming ingestion with Kafka and Kinesis for 20+ medical device sources, maintaining 99.95% uptime and enabling reliable telemetry analytics.
• Integrated HL7 FHIR APIs with Airflow and Lambda for end-to-end pipeline orchestration, ensuring secure and compliant cross-system data exchange.
• Optimized Redshift schema design and query performance using advanced data-modeling best practices, reducing reporting time by 25% for 300+ clinicians and analysts.
• Automated CI/CD deployments with Jenkins and Docker, standardized AWS infrastructure with Terraform for cost- optimized and reproducible analytics environments.
• Enhanced observability with CloudWatch, Splunk, and Grafana, reducing incident-response time by 30%, and integrated Databricks with MLflow for predictive ML initiatives.
• Standardized AWS Glue and Terraform workflows adopted by 20+ engineers across data teams, enabling faster provisioning and cost-optimized analytics deployments. ICICI Bank Data Engineer India June 2021 - May 2023
• Modernized ETL workflows using Hadoop, Informatica, and SSIS on Oracle Exadata to meet RBI and AML compliance standards, improving reliability and reporting accuracy.
• Migrated critical ETL and analytics workloads from Exadata to Snowflake and Spark, boosting reporting speed by 20% and optimizing storage and compute costs.
• Engineered real-time fraud-detection pipelines using Kafka and Azure Event Hubs, reducing detection latency by 40% and preventing $2M in annual fraud losses.
• Automated data reconciliation and validation in Python and Scala, reducing batch latency by 50% and streamlining manual verification across banking datasets.
• Designed Power BI dashboards using DAX to automate compliance and audit insights, accelerating reporting turnaround by 25% for auditors and risk analysts.
• Led Azure migration with GoldenGate CDC and Data Factory, containerized Airflow pipelines using Docker and Jenkins for orchestrated CI/CD, achieving 99.9% reliability and 35% faster releases.
• Collaborated with cross-functional teams to scale Spark, Snowflake, and Azure pipelines enterprise-wide, improving data delivery SLAs by 25% and ensuring governance across business units. EDUCATION
University of North Texas, Denton, TX — Master of Science in Computer Science (GPA: 4.0/4.0) Osmania University, Hyderabad, India — Bachelors in Electronics & Communication Engineering (GPA: 8.47/10)