Data Engineer Machine Learning

Location:

Ahmedabad, Gujarat, India

Salary:

60000

Posted:

October 15, 2025

Contact this candidate

Resume:

Kiran Ranganalli

***********.****@*****.*** 415-***-**** San Francisco, CA Portfolio / LinkedIn / GitHub EDUCATION

San Francisco State University – M.S. Business Analytics (GPA 3.7) Jan 2023 - Dec 2024 VNR Vignana Jyothi Institute of Technology - Bachelor of Science - GPA: 8.2/10.0 Jul 2016 - Jul 2020 WORK EXPERIENCE

Data Engineer – Humana, USA January 2025 - Current

● Built a Python-based data profiling engine for schema drift, anomaly detection, and null analysis across Snowflake, BigQuery, Amazon S3, JSON, and Parquet datasets, reducing manual validation by 65% and enforcing governance standards.

● Integrated profiling into batch ETL and streaming pipelines with Apache Spark on Databricks, Delta Lake, and Kafka, orchestrated through Airflow DAGs with SLA monitoring, retries, and lineage tracking, improving reliability and reducing runtime by 30%.

● Deployed profiling services on AWS Lambda and ECS Fargate using IAM, autoscaling, and cost-aware scheduling, cutting compute costs by 35% while maintaining SLA compliance for critical ingestion pipelines.

● Automated deployments via CI/CD pipelines (GitHub Actions, Jenkins, Terraform, Docker) ensuring consistent packaging, infrastructure-as-code provisioning, and reducing release cycle time by 40% across environments.

● Captured profiling metrics and lineage logs in InfluxDB, CloudWatch, and ELK, with Grafana dashboards for anomaly trends, enabling proactive monitoring and reducing incident resolution time (MTTR) by 50%.

● Embedded profiling checks into machine learning pipelines for fraud and credit risk scoring, validating feature store inputs and reducing model drift by 20%, improving decision accuracy and ensuring compliance with SOX, GDPR, and BCBS239. Data Engineer Intern - DeliveredKorea Startup (San Francisco) August 2024 - December 2024

● Engineered real-time data ingestion pipelines using Apache Kafka, Spark Structured Streaming, and Delta Lake on Databricks, enabling processing of over 2M+ marketing and behavioral telemetry events daily with <2s latency, directly improving product analytics accuracy and real-time campaign reporting

● Developed and productionized dbt models with modular macros, tests, and documentation, establishing semantic layers and data contracts that standardized KPI definitions across departments, improving data trust and analytics adoption by 45% among business stakeholders.

● Designed and automated ETL workflows in Airflow with SLA-driven DAGs, retries, lineage tracking, and proactive alerting, which reduced data pipeline failures by 60%, cut manual intervention time by 35%, and improved operational reliability across analytics teams.

● Built CI/CD pipelines with Azure DevOps + Terraform to provision cloud infra, validate schema changes, and automate environment deployments, reducing release cycle time by 40% and lowering failed deployments by 60%.

● Partnered with compliance and data governance teams to implement HIPAA-compliant logging, PII redaction, and SLA monitoring; introduced automated incident postmortems and deployment checklists, ensuring zero data loss and zero audit violations. Software Engineer - Analytics - Capgemini - AXA Insurance July 2021 - December 2022

● Architected a cloud-native data warehouse solution on AWS Redshift Spectrum and S3, ingesting 300M+ transactional records daily with 99.95% SLA uptime, supporting enterprise-scale sales, promotions, and supply chain analytics.

● Automated data pipelines with AWS Glue, EMR, Step Functions, and Lambda, reducing campaign performance reporting time from T+3 days T+6 hours, enabling real-time marketing optimization and contributing to $12M+ uplift in ROI.

● Designed Kimball-style dimensional models (fact and SCD Type 2 dimensions) for retail sales, product catalog, and customer loyalty programs; optimized queries with partitioning, distribution keys, and sort keys, improving query performance by 70% and reducing compute costs by 22%.

● Implemented GDPR and CCPA-compliant governance frameworks using IAM roles, KMS encryption, lineage tracking, and automated audit logs in AWS Lake Formation, resulting in zero audit findings and reducing compliance risk by 80%.

● Mentored and led a 6-member pod of junior engineers and analysts, creating coding standards, onboarding documentation, and automated testing frameworks; improved onboarding speed by 30% and maintained 100% team retention rate.

● Collaborated with cross-functional stakeholders (Finance, Marketing, Ops, Supply Chain) to align KPI definitions for conversion, churn, and loyalty metrics, ensuring consistent analytics across $75M+ transformation initiatives. Data Analyst - INCOIS May 2020 - July 2021

● Automated multi-source ETL workflows integrating satellite imagery, GIS datasets, and telemetry signals using Python, GCP Cloud Functions, SSIS, and BigQuery, improving data freshness from T+2 days T+3 hours and ensuring timely availability of climate data for research scientists.

● Built Tableau and Power BI dashboards with governed datasets, supporting early-warning systems for extreme weather patterns, cutting manual reporting by 50% and reducing ad-hoc data requests by 60% across 12 regional centers.

● Designed SQL-based anomaly detection frameworks to identify sea surface temperature and salinity anomalies; reduced false positives by 30%, enabling researchers to focus on high-value anomalies.

● Developed compliance alerting workflows with automated audit logging, exception tracking, and escalation pipelines, aligning with ISO data governance standards, reducing regulatory response times by 40%.

● Conducted statistical analysis on vessel resource utilization data, leading to optimized scheduling models that reduced downtime by 18% and cut operational costs by 12%.

TECHNICAL SKILLS

● Programming & Data: Python, SQL, Java

● Data Engineering & Pipelines: ETL Development, Data Modeling, Data Warehousing, Orchestration (Airflow, Databricks, AWS Glue, Step Functions)

● Cloud Platforms: AWS (Redshift, S3, EMR, Lambda, Lake Formation), GCP (BigQuery, Dataflow), Azure Data Factory

● Streaming & Real-Time Processing: Apache Kafka, Kinesis, Spark Structured Streaming, Flink

● Databases: Relational (PostgreSQL, Oracle, MySQL), NoSQL (MongoDB, DynamoDB)

● DevOps & Infrastructure: CI/CD, Docker, Kubernetes, Terraform, CloudFormation

● Data Governance & Quality: Data Contracts, Great Expectations, Lineage Tracking, Security & Compliance (GDPR, HIPAA, SOX)

● Visualization & BI: Tableau, Power BI, Looker

● Certifications: AWS Certified Data Engineer (DEA-C01) (Link), Google Data Analytics Professional Certificate (Link) PROJECTS

Airflow Replatform Github

● Migrated 120+ cron jobs into standardized Airflow DAGs with retries, backoffs, lineage, and proactive monitoring; improved p95 runtime –55%, failures –70%, and on-call incidents –60%.Established automated DAG deployment with CI/CD validation, ensuring idempotency, observability, and data contract compliance across multiple environments.

Semantic Search with Hybrid Retrieval Github

● Delivered modular, idempotent Airflow DAGs with automated backfills and comprehensive data quality tests, cutting runtime from 8h 90m and failures by 95%.

● Replaced legacy ETL scripts with modern, reusable workflows, enabling faster iteration cycles, 30 hrs/week saved in ad-hoc queries. Real-Time Streaming Pipeline with Kafka & Kinesis Github

● Ingested and processed 10M+ crypto transactions/day via Kafka with Spark; reconciled ledgers, reducing mismatches 95%.

● Modeled optimized time-series schema in Redshift powering low-latency ML-based fraud detection, improving anomaly identification and prevention accuracy significantly.

Contact this candidate