Data Engineer - Cloud, Spark, Airflow, ETL Expert

Location:

Noida, Uttar Pradesh, India

Posted:

March 19, 2026

Contact this candidate

Resume:

SAI RAMYA KASUMURTHY

DATA ENGINEER

USA +1-732-***-**** *****************@*****.*** Linkedin: https://www.linkedin.com/in/sai-ramya-kasumurthy-a403801a4/ SUMMARY

Results-driven Data Engineer with 4+ years of experience building scalable data pipelines, distributed batch/streaming systems, and cloud-native architectures across AWS, Azure, and GCP. Skilled in Spark, Kafka, BigQuery, Redshift, Snowflake, Airflow, Dataflow, Databricks, and ADF for end-to-end data processing, modeling, orchestration, and optimization. Strong background in performance tuning, data warehousing, and real-time analytics supporting mission- critical decisions in finance, healthcare, and enterprise environments. SKILLS

Methodologies: Agile (Scrum/Kanban), SDLC, CI/CD, Waterfall Programming Languages: Python, SQL, R, Bash

Data Processing & Analytics: PySpark, Pandas, NumPy, Dask, Azure Data Explorer Cloud Platforms: Azure (Data Factory, Synapse, Functions, Blob Storage, Cosmos DB), AWS (S3, Lambda, EC2, SageMaker), GCP (BigQuery, Cloud Functions)

Big Data & Streaming: Apache Spark, Apache Kafka, Hadoop, Snowflake, Delta Lake ETL & Orchestration Tools: Azure Data Factory, Apache Airflow, Databricks, Informatica, SSIS, Azure Key Vault, Logic Apps, Confluence

Version Control & Collaboration: Git, GitHub, GitLab, Bitbucket, Azure DevOps Visualization & BI Tools: Power BI, Tableau, Matplotlib, Seaborn Databases: SQL Server, MySQL, PostgreSQL, MongoDB, Redis, Azure Cosmos DB Other Tools & Frameworks: Jupyter Notebook, Flask, FastAPI, Docker, Kubernetes Operating Systems: Windows, Linux (Ubuntu/CentOS), macOS EXPERIENCE

Citibank, USA Data Engineer Jan 2025 – Present

Built scalable real-time streaming pipelines using Pub/Sub and Cloud Composer, processing 500K+ events/minute and reducing pipeline development time from 5 days to 2 days.

Engineered automated ETL workflows using Dataflow, Python, SQL, processing 1M+ records/day and eliminating 8 hours/week of manual intervention.

Developed operational monitoring dashboards using Power BI & Tableau for pipeline health, KPIs, and model performance.

Implemented BigQuery ML–based anomaly detection, reducing incident response time by 30 minutes per event.

Automated daily metrics tracking with SQL + Cloud Monitoring, increasing system uptime to 99.98%.

Performed data preprocessing and feature engineering for analytical datasets using Pandas, NumPy. Tech Stack: GCP (BigQuery, Dataflow, Pub/Sub, Composer), Python, SQL, Airflow, Monitoring CVS Health, USA Data Engineer Jun 2024 – Dec 2024

Optimized SQL/PL-SQL workloads, boosting transaction processing efficiency by 40%.

Built real-time streaming pipelines using Oracle GoldenGate + Kafka, processing 2M+ events/min and reducing event latency by 30%.

Automated ingestion of structured/unstructured data into OCI Object Storage, reducing manual effort by 60%.

Designed ETL flows using Python, SQL, Hadoop, improving structured dataset processing by 40%.

Improved Spark batch-processing on AWS EMR, reducing pipeline execution time by 30%.

Implemented Kafka schema validation, enhancing data quality by 30%. Tech Stack: Oracle, PL/SQL, Kafka, GoldenGate, Hadoop, Python, AWS EMR Cognizant Technology Solutions, India Data Engineer Aug 2021 – Aug 2023

Designed large-scale ETL pipelines with ADF + Databricks + PySpark, processing 5TB+ healthcare data daily.

Implemented Delta Lake on ADLS and AWS S3, cutting storage costs by 30% and improving data versioning.

Automated CI/CD workflows using GitLab CI/CD, Jenkins, Terraform, reducing deployment cycles by 50%.

Developed Snowflake/BigQuery data models, reducing query execution time by 35%.

Ensured HIPAA-compliant data movement, increasing data access speed for clinicians and enabling real-time insights. Tech Stack: Azure (ADF, ADLS, Synapse), Databricks, PySpark, Snowflake, BigQuery, GitLab, Jenkins, Terraform Nefroverse, India Data Engineer May 2020 – Jul 2021

Optimized ETL/ELT pipelines with Airflow, dbt, Glue, reducing data processing time by 30%.

Improved Glue job efficiency by 25%, accelerating real-time financial workflows.

Processed large-scale financial datasets using Kafka + Spark, boosting ingestion rates by 40%.

Managed cloud data warehouses (Snowflake, Redshift, BigQuery), reducing storage costs by 25%.

Automated data workflows using Python, improving operational efficiency by 40% and ensuring timely analytics delivery. EDUCATION

Master in Data Science - Pace University, New York, USA Bachelors of Technology in Computer Science & Engineering - GITAM University, India

Contact this candidate