KALYAN KASINADHUNI
Canton, SD(Open to Relocation) +1-605-***-**** *****************@*****.*** LinkedIn Portfolio PROFESSIONAL SUMMARY
Results-driven Data Engineer with 2+ years of experience designing, building, and optimizing cloud-native data platforms, pipelines, and analytics solutions for healthcare and fintech enterprises. Skilled in ETL/ELT development, real-time streaming (Kafka, Flink, Spark), Delta Lakehouse architecture, and feature stores (Feast), delivering high-performance analytics, cost savings, and AI/GenAI adoption. Experienced in AWS, Azure, GCP, Databricks, Snowflake, and proficient in MLOps, data governance, and data lineage frameworks. Strong knowledge of Data Mesh, Medallion Architecture, and DataOps practices, enabling scalable, secure, and compliant data engineering solutions.
TECHNICAL SKILLS
Cloud & Data Platforms: AWS (Glue, Redshift, EMR, Kinesis, Lambda, S3), Azure (Databricks, Synapse, Purview, Data Factory), GCP (BigQuery, Dataflow, Pub/Sub, Vertex AI).
Data Engineering & Big Data: Apache Spark (PySpark/Scala), Kafka, Flink, Airflow, dbt, Snowflake, Delta Lake, Apache Iceberg, Apache Hudi. Data Architecture & Pipelines: ETL/ELT Development, Data Mesh, Medallion Architecture, Change Data Capture (Debezium), Data Vault 2.0, Feature Stores (Feast), Data Governance & Lineage (OpenLineage, Marquez).
Databases & SQL: PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Redis, NoSQL, OLTP/OLAP Optimization, Advanced SQL & Query Performance Tuning. MLOps & Generative AI: MLflow, LangChain, LlamaIndex, Pinecone, Hugging Face, PyTorch, TensorFlow, Vector Databases, Human-in-Loop AI Systems. DevOps & DataOps: Terraform, Kubernetes, Docker, Jenkins, GitHub Actions, ArgoCD, CI/CD Automation, Infrastructure as Code, Policy-as-Code, Data Observability. Business Intelligence & Analytics: Power BI, Tableau, Looker, Apache Superset, Metabase, Data Modeling (Star Schema, Snowflake Schema), DAX. PROFESSIONAL EXPERIENCE
Data Engineer Aug 2025 – Present
Global Varahi AI Solutions USA
• Building high-volume ETL pipelines in Databricks and Delta Lake, processing 10TB+ daily and reducing data refresh timelines from 24 hours to 3 hours.
• Delivering real-time event ingestion by integrating Kafka and Flink with Azure Synapse, enabling sub-3 second latency for 2M+ telemetry events per day.
• Optimizing Snowflake and BigQuery workloads with advanced SQL tuning and partition strategies, boosting query speed by 70% and cutting costs by $15K monthly.
• Automating 500+ data quality checks using Airflow and Great Expectations, raising reliability from 78% to 99.6% and minimizing analyst rework.
• Partnering with data scientists to operationalize MLflow pipelines, reducing retraining cycles by 35% and improving consistency in production ML models.
• Strengthening compliance by deploying Azure Purview with Unity Catalog, achieving 97% automated detection of PII/PHI and ensuring HIPAA/GDPR adherence.
Data Engineer May 2024 - May 2025
Go Audits Delaware, USA
• Migrated 22 legacy SAS pipelines into Databricks Delta Lake with incremental MERGE logic, processing 12TB/day and reducing FDA reporting timelines from 14 days to 72 hours.
• Streamlined ICU monitoring by streaming 800+ Oracle tables into Azure Synapse via Kafka Connect and Debezium, achieving real-time ingestion with sub- 5-second latency.
• Optimized Synapse performance with caching, materialized views, and workload management, tripling query throughput and cutting monthly cloud spend by $22K.
• Built a data quality framework using Great Expectations with PagerDuty alerts, covering 500+ validation rules and raising data reliability from 78% to 99.4%.
• Re-engineered 150 SSIS ETL workflows into Azure Data Factory with parameterized templates and GitOps, reducing migration effort by 40% while enabling automated CI/CD.
• Partnered with Bayer Germany to map US/EU datasets to OMOP CDM, achieving 98% schema alignment and enabling secure cross-border clinical research collaborations.
Data Engineer I Aug 2021 - Aug 2022
Capgemini Hyderabad, India
• Migrated 22 SAS pipelines into Databricks Delta Lake with incremental MERGE logic, reducing FDA submission prep from 14 days to 72 hours while ensuring compliance.
• Delivered real-time ICU monitoring by streaming 800+ Oracle tables into Azure Synapse via Kafka Connect and Debezium, achieving <5-second latency.
• Boosted Synapse query performance 3x using materialized views, caching, and workload management, lowering Azure costs by $22K monthly.
• Automated 500+ validation checks with Great Expectations and PagerDuty alerts, raising data accuracy from 78% to 99.4% and cutting analyst rework by 85%.
• Modernized 150 SSIS packages into Azure Data Factory pipelines with GitOps templates, reducing migration effort by 40% and accelerating CI/CD deployments.
• Standardized global datasets into OMOP CDM for Bayer Germany, reaching 98% mapping accuracy and enabling secure cross-border clinical research. PROJECTS
Real-time Data Streaming & Fraud Analytics
• Designed a high-throughput pipeline with Kafka, Spark Structured Streaming, and Flink handling 500K+ events/sec, which reduced fraudulent transaction approval by 40% and protected annual revenue.
• Implemented feature store with Feast, Redis, and dbt enabling reproducible ML training and real-time feature retrieval, improving fraud detection model accuracy by 18%.
• Deployed monitoring using Prometheus, Grafana, and OpenLineage that cut incident resolution time from 9 hours to under 1 hour, improving overall SLA compliance.
Cloud Data Lakehouse Modernization
• Migrated legacy ETL jobs into Databricks Delta Lake with optimized partitioning and Z-ordering, enabling 12TB/day processing and reducing reporting latency from 3 days to 6 hours.
• Rebuilt 150+ SSIS jobs into Azure Data Factory pipelines with Terraform CI/CD, streamlining deployments and reducing maintenance overhead by 35%.
• Standardized data governance with Great Expectations, Unity Catalog, and Azure Purview, ensuring 98% schema compliance and automated detection of sensitive fields across multi-source data.
EDUCATION
Masters in Computer Science Aug 2023 - May 2025
University of South Dakota Vermillion, SD
CERTIFICATIONS
• IBM Data Engineering Professional Certificate - Coursera
• Google Cloud Professional Data Engineer - Coursera
• Microsoft Azure Data Engineer Associate (DP-203) - LinkedIn Learning
• AWS Data Analytics Specialty - LinkedIn Learning