Data Engineer Real-Time

Location:

Los Angeles, CA

Salary:

Posted:

September 10, 2025

Contact this candidate

Resume:

Disha Kunjadia

+1-917-***-**** ***************@*****.***

SUMMARY

Accomplished Data Engineer with 6+ years of experience designing and implementing scalable data architectures, real-time processing pipelines, and hybrid cloud solutions across AWS, Azure, and GCP; recognized for delivering high-performance, secure, and cost-efficient data platforms in enterprise environments.

Optimized SQL queries and indexing strategies across Amazon Redshift, BigQuery, Athena, and MySQL, resulting in 30% faster query performance and reduced compute costs.

Architected data lake solutions using AWS S3 and Redshift Spectrum, enabling seamless integration of structured and unstructured data and improving analytics efficiency by 50%.

Implemented real-time data streaming workflows using Apache Kafka, AWS Kinesis, and Apache Flink, achieving sub- second data processing latency for operational dashboards.

Designed and automated ETL pipelines using AWS Glue, Apache Airflow, and Talend, processing over 8TB+ of structured and semi-structured data from diverse sources while reducing pipeline failure rates by 35%.

Integrated AI/ML models (PyTorch, TensorFlow, Azure ML) into data pipelines to drive predictive analytics, improving customer churn prediction accuracy to 85%+.

Developed Spark-based batch and stream processing jobs on AWS EMR and Azure HDInsight, improving data processing throughput by 40% for high-volume ingestion pipelines.

Implemented CI/CD pipelines using GitLab, Docker, and Kubernetes (EKS), accelerating data deployment cycles by 60% and enabling consistent infrastructure delivery.

Created interactive dashboards in Tableau, Power BI, and Looker, providing real-time insights to business stakeholders and reducing manual reporting time by 70%.

Built hybrid cloud data workflows across AWS and GCP, integrating services like Redshift, BigQuery, and Athena, to enable cross-cloud analytics and reduce data silos.

Engineered data validation and quality assurance frameworks using Python, Pandas, and SQL, increasing data reliability and reducing error rates by over 90%.

Enforced data security and compliance through IAM policies, encryption, and data masking techniques, ensuring alignment with GDPR and HIPAA standards.

Leveraged OpenAI API and Hugging Face Transformers to integrate NLP into reporting pipelines, automating sentiment analysis and summarization tasks for customer feedback.

Built automated monitoring and alerting systems using CloudWatch, Datadog, and ELK Stack, ensuring 99.9% pipeline uptime and proactive failure detection.

Collaborated cross-functionally with analysts, ML engineers, and business stakeholders to deliver end-to-end data solutions, reducing time-to-insight by 40% and driving data-driven decision-making. SKILLS

Scripting Languages Python, SQL, Bash, PowerShell, Java, Scala Databases SQL, MySQL, SAP HANA, CosmosDB, MongoDB

AWS Services

AWS S3, AWS Redshift, AWS RDS, AWS DynamoDB, AWS Glue, AWS Data Pipeline,Kinesis, EMR, Amazon Aurora

Visualization Tools Tableau, Power BI, Looker, Microsoft Excel, Amazon QuickSight ETL Tools Alteryx, Apache NiFi, ApacheAirflow, AzureDataFactory, Apache Spark

(ADF), Talend

Packages Pandas,NumPy, Matplotlib

AI/ML Tools AzureMLStudio,TensorFlow, PyTorch, Keras, AWS SageMaker, AzureML, Scikit-learn,

GenAI OpenAI API, ChatGPT,Hugging Face Transformers BigData Apache Hadoop, ApacheKafka, Apache Flink,Hbase, MLOps Cloud Ecosystem& DevOps

AWS, Azure, GCP (BigQuery), Snowflake,Docker, Kubernetes, CI/CD Pipelines,AWS DevOps, Azure Data Lake Storage (ADLS), Databricks on Azure and AWS, Lakehouse Architecture

Data Warehousing Amazon Redshift, Google BigQuery, Snowflake, Star Schema, Data Modeling Version Control Git, Git Hub, Git Lab

Monitoring & Logging CloudWatch,Datadog, ELK Stack, Prometheus, Grafana Other SDLC, Agile Methodologies, Data Cleaning,Automation,Problem Solving, CriticalThinking,Root Cause Analysis, A/B Testing,Bugzilla PROFESSIONAL EXPERIENCE

HEALTH CAROUSEL – SAN JOSE, CA

Data Engineer

Architected and deployed an advanced analytics platform using AWS DynamoDB, RDS, and Redshift, supporting scalable data storage and real-time decision-making for 5+ departments.

Built and optimized ETL pipelines using AWS Glue, Apache Spark, and Python, ingesting and transforming over 6TB of data daily from MongoDB, S3, and external APIs.

Created interactive dashboards in Tableau and Power BI, leveraging preprocessed data via Python (Pandas, NumPy), reducing manual reporting efforts by 65%.

Engineered Apache Kafka streaming pipelines and configured producer-consumer topics, supporting real-time analytics and reducing processing delays by 80%.

Developed serverless data workflows using AWS Lambda, automating routine transformations and reducing infrastructure costs by 30%.

Designed a cloud-native data lake architecture on AWS S3, integrating structured and unstructured data to support ML and advanced analytics across business units.

Integrated NLP models (Hugging Face, OpenAI API) into reporting pipelines for sentiment analysis, automating customer feedback classification with 90% accuracy.

Wrote complex SQL queries to aggregate data across AWS DynamoDB, RDS, and Redshift, enabling unified customer intelligence and cross-platform reporting.

Automated data ingestion from AWS RDS and MongoDB using Apache Airflow, improving data freshness and reducing batch latency by 40%.

Developed Kafka producer-consumer applications and integrated with Spark jobs for distributed data processing, ensuring 99.95% uptime of streaming workflows.

Established CI/CD pipelines with GitLab, Docker, and Kubernetes, decreasing deployment time by 60% and increasing environment consistency.

Implemented data validation and aggregation workflows using AWS Glue and Spark, reducing ETL execution time by 50% on high-volume datasets.

Ensured secure access and data compliance (GDPR, HIPAA) through IAM policies, data encryption, and masking, achieving 100% audit-readiness.

Collaborated in Agile sprints with cross-functional teams—data scientists, analysts, and product owners—to align engineering output with business KPIs, reducing analytics lead time by 35%. HDR – SAN FRANCISCO Jan 2021–April2022

Cloud Data Engineer

Architected and automated scalable ETL pipelines using Azure Data Factory and AWS Glue, integrating over 20 data sources into centralized S3 and Synapse-based data lakes, improving data ingestion efficiency by 55%.

Designed a secure and scalable data lake on AWS S3 to store and retrieve structured and unstructured data, enabling 30% faster access for analytics and ML workflows across 3 departments.

Built real-time analytics dashboards using AWS Kinesis, Power BI, and Tableau, visualizing live KPIs from streaming data and reducing time-to-decision for stakeholders by 40%.

Integrated advanced machine learning models into pipelines using TensorFlow, PyTorch, and Azure ML, automating model training/inference and enhancing predictive analytics capabilities.

Automated cloud monitoring and alerts with AWS CloudWatch and Azure Monitor, maintaining 99.95% uptime and enabling proactive issue resolution in production workflows.

Developed robust SQL-based validation frameworks using Python (Pandas, NumPy) and SQL, ensuring 99.9% data integrity across multi-cloud environments (AWS & Azure).

Orchestrated data workflows with Apache Airflow, reducing pipeline failure rates by 70% through automated retries, dependency handling, and failure notifications.

Integrated GenAI solutions using OpenAI APIs and Hugging Face Transformers for NLP-driven tasks, boosting operational efficiency in reporting by 45%.

Optimized Apache Spark jobs on Azure HDInsight and AWS EMR, reducing large-scale data transformation times by 60% and cutting resource usage by 35%.

Collaborated in Agile environments alongside data scientists, DevOps, and product managers, ensuring timely delivery of cloud-native data solutions that aligned with business goals. May 2022–Present

EVERIS – NEWYORK, NY

Data Engineer

Architected and deployed cloud-native analytics platforms on AWS S3 and GCP BigQuery, enabling real-time access to structured and unstructured data across 8+ business units.

Built scalable ETL pipelines using Apache Airflow, AWS Glue, and Python, automating ingestion from MySQL, AWS RDS, and REST APIs, improving pipeline efficiency by 45%.

Integrated serverless transformations using AWS Lambda, reducing manual overhead and lowering operational costs by 30% across multiple workflows.

Engineered real-time streaming solutions using GCP Dataflow, Apache Flink, and Amazon Kinesis, enabling sub-second latency for critical data ingestion pipelines.

Optimized SQL queries in GCP BigQuery and AWS Redshift, reducing average query execution time by 60% and accelerating BI reporting cycles.

Developed Kafka-based producer-consumer applications to support distributed real-time analytics with 99.9% uptime, reducing lag in high-throughput environments.

Automated data validation and testing processes using Pandas and SQL, increasing data quality assurance coverage by 80% across hybrid cloud systems.

Created dynamic dashboards using Power BI, Tableau, and Matplotlib, reducing manual reporting by 70% and delivering actionable insights for executive teams.

Provisioned infrastructure using AWS CloudFormation and Terraform, ensuring consistent, repeatable deployment pipelines with zero manual configuration errors.

Implemented real-time anomaly detection workflows by integrating Flink, Kinesis, and Python logic, enabling 24/7 monitoring and reducing fraud risk by 35%.

Migrated legacy batch jobs to Spark on AWS EMR and Azure HDInsight, increasing data processing speed by 50% and reducing runtime failures.

Automated A/B testing workflows using Python and SQL, improving test result turnaround time by 40% and supporting product optimization decisions.

Deployed NLP models with Hugging Face and OpenAI APIs, automating sentiment analysis pipelines with 92% classification accuracy.

Established end-to-end monitoring and alerting with AWS CloudWatch and GCP Stackdriver, reducing downtime incidents by 60% and improving pipeline SLA compliance.

Collaborated with cross-functional Agile teams, aligning KPIs with engineering output and reducing analytics delivery timelines by 30% per sprint.

Aug 2019– Dec2020

EDUCATION

BachelorinComputer Engineering Aug 2016 - June 2019 Sanghvi College of Engineering, Mumbai, India

Contact this candidate