Data Engineer Processing

Location:

Ahmedabad, Gujarat, India

Salary:

85000

Posted:

October 15, 2025

Contact this candidate

Resume:

Bala Sai

Data Engineer

Cleveland, OH - ***** 216-***-**** ***************@*****.*** linkedin.com/in/bala-sai-388722348/ PROFESSIONAL SUMMARY

Data Engineer with 5+ years of experience designing and building scalable data pipelines across AWS, Azure, and GCP.

Developed high-performance ETL/ELT workflows, reducing 40% data processing time and improving 30% query speed in Snowflake and Redshift.

Managed petabyte-scale datasets, ensuring 99.9% pipeline uptime while reducing storage costs by 25%.

Leveraged Apache Spark, Kafka, Airflow, and Terraform to support real-time and batch data processing for analytics and business intelligence.

Designed and implemented secure, cloud-optimized data architectures, maintaining compliance with industry regulations.

Automated CI/CD deployments, increasing release efficiency by 50% and lowering infrastructure costs by 18%.

Collaborates with cross-functional teams to transform complex data challenges into scalable solutions that drive business impact.

EDUCATION

Masters in Information Technology 08/2024

Campbellsville University, KY, USA

TECHNICAL SKILLS

Languages: Python (Pandas, NumPy, SQLAlchemy), R Programming, SQL Databases & Warehousing: SQL Server, PostgreSQL, MySQL, Oracle, MongoDB, Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics Cloud & Infrastructure: AWS (S3, Redshift, Glue, Lambda, EMR, EC2, Route53, Elastic Beanstalk), Azure (Data Factory, Databricks, Blob Storage, Azure Functions, Azure Data Lake Storage), GCP (BigQuery), Infrastructure as Code (IaC), Terraform Big Data Technologies: Apache Spark (PySpark, Spark SQL), Apache Kafka, Apache Airflow, Apache Hive, Apache NiFi, Apache Flink, Hadoop (HDFS, YARN, MapReduce) ETL / ELT & Data Pipelines: AWS Glue, Talend, Informatica, APIs Data Modeling & Architecture: Dimensional Modeling, Star Schema, OLAP, Relational Modeling DevOps & Infrastructure: CI/CD (Jenkins, GitHub Actions), Version Control (Git, GitHub, GitLab, Bitbucket), Monitoring (Prometheus, Grafana)

Data Visualization & Analytics: Tableau, Power BI, Amazon QuickSight Machine Learning & AI: Supervised and Unsupervised Learning Methodologies: SDLC, Agile, Scrum, Kanban, Waterfall PROFESSIONAL EXPERIENCE

Data Engineer 10/2024 – PRESENT

Cigna Healthcare, USA

Designed ETL/ELT pipelines using AWS (S3, Redshift, Glue, Elastic Beanstalk), reducing ingestion time by 35% for datasets from APIs, databases, and SFTP, while configuring Elastic Beanstalk for scalable, high-availability deployments.

Reengineered Apache Airflow workflows with DAGs, task dependencies, and parallel processing, cutting execution time by 30% and enhancing reliability.

Developed Snowflake data models with multi-cluster warehouses, materialized views, and partitioning, reducing query costs by 40%.

Built a centralized AWS S3 data lake with partitioned storage and lifecycle automation, decreasing storage costs by 20% and boosting retrieval speeds.

Applied Python (Pandas, SQLAlchemy) in AWS Glue and Lambda, automating ingestion and transformations to save 15+ hours weekly and improve efficiency by 45%.

Implemented CI/CD with Jenkins, GitHub Actions, and Terraform, cutting deployment cycles by 50%, provisioning time by 60%, and configuration errors by 85%.

Partnered with business teams to deliver real-time insights via Tableau, Power BI, and QuickSight dashboards, supporting executive decision-making.

Established security best practices with encryption and access controls, ensuring GDPR, CCPA, HIPAA compliance.

Mentored junior engineers on SQL, AWS, and ETL, improving team efficiency by 25% through workshops.

Monitored and optimized pipeline performance, reducing cloud compute costs by 18%. Data Engineer 02/2020 – 07/2022

Transol Systems, INDIA

Explored data processing strategies for Generative AI models (GPT, BERT, DALL-E), refining data storage efficiency by 30%.

Structured SQL-based data pipelines to ingest, clean, and normalize research data from 200+ sources, improving dataset accessibility in AWS S3.

Executed Python-based preprocessing workflows (Pandas, NumPy), cutting data preparation time by 40% while maintaining dataset integrity.

Deployed cloud-based AI research environments in AWS, Azure, and GCP, optimizing resource allocation to reduce computational costs by 25%.

Assembled BI dashboards using Tableau and Power BI, providing real-time tracking of AI model accuracy and performance trends.

Merged Flask and Streamlit applications with structured logging and real-time data retrieval, improving experiment monitoring.

Analyzed unstructured text data using Amazon Transcribe, applying SQL-based indexing and trend analysis, extracting insights that enhanced research accuracy.

Correlated AI-driven decision-making effectiveness with statistical modeling (ANOVA) and database queries, uncovering a 25% improvement in research outcomes.

Jr. Data Engineer 01/2018 – 01/2020

Kranion Technologies, INDIA

Constructed Azure-based data pipelines, processing terabytes of healthcare data from on-premise and cloud sources, improving data accessibility for analytics teams.

Coordinated the implementation of Azure Data Factory ETL workflows, enhancing parallel execution and dependency management, cutting data latency by 50%.

Refined Azure Synapse Analytics performance by rewriting SQL queries, indexing heavily queried tables, and partitioning datasets, increasing speed by 60%.

Leveraged Apache Spark-based transformations in Azure Databricks, using Delta Lake and adaptive query execution, ensuring fault-tolerant processing of terabyte-scale data.

Enhanced automated data validation and anomaly detection systems, improving data accuracy by 30% while reducing manual intervention.

Secured Azure Data Lake Storage (ADLS) by enforcing RBAC policies and encryption standards, aligning with compliance requirements.

Migrated legacy on-premise ETL pipelines to a cloud-native architecture, increasing scalability while cutting infrastructure costs by 22%.

Engineered machine learning feature stores, streamlining data retrieval and model training workflows across teams.

Strengthened security frameworks by automating compliance checks for GDPR, HIPAA, and CCPA, improving audit readiness and minimizing risks.

Developed system monitoring tools using Azure Monitor and Log Analytics, detecting failures 30% faster and reducing downtime.

Contact this candidate