Data Engineer - AWS/Azure - ETL, Databricks, Snowflake, CI/CD

Location:

Hatfield, Hertfordshire, AL10 9GW, United Kingdom

Salary:

40000

Posted:

December 12, 2025

Contact this candidate

Resume:

Manne Sai Ramya

Data Engineer

Location: UK Phone: +44-735*-****** Email: ************@*****.*** LinkedIn Professional Summary

Data Engineer with 2 years of experience designing, building, and optimizing large-scale data platforms across AWS and Azure ecosystems, supporting real-time analytics, ETL automation, and high-volume data processing.

Proficient in developing scalable data pipelines using AWS Glue, Azure Data Factory, Databricks, and PySpark, integrating diverse data sources and transforming over 30M records daily for enterprise clients in healthcare and finance.

Skilled in data warehousing and modeling with Snowflake, Redshift, and Synapse, leveraging star and snowflake schemas to enhance BI and reporting performance across Power BI, Tableau, and QuickSight dashboards.

Strong expertise in DevOps and automation, implementing CI/CD pipelines with Terraform, Jenkins, and Azure DevOps, improving deployment reliability, system uptime (99.9+), and end-to-end data pipeline efficiency.

Recognized for delivering reliable, secure, and optimized data solutions, ensuring data integrity, governance, and compliance through robust monitoring, validation frameworks, and proactive performance tuning. Experience

Trinity Technolabs – Data Engineer UK (Remote) Jan 2025 - Current

Architected and deployed automated ETL pipelines in Azure Data Factory and Databricks, processing over 25M records daily across healthcare and finance datasets, reducing data latency from 5 hours to 30 minutes.

Developed Delta Lake architecture in Synapse Analytics to ensure ACID transactions, enabling seamless incremental data loading and improving reliability for real-time analytical dashboards.

Integrated multiple data sources using REST APIs, Kafka, and Snowflake connectors, ensuring secure and consistent ingestion of structured and semi-structured data across 10+ systems.

Optimized Databricks cluster configurations, partitioning, and caching strategies, and cutting compute costs by 28% and enhancing pipeline throughput for high-volume batch workloads.

Automated CI/CD deployment of data pipelines using Terraform, Jenkins, and Azure DevOps, increasing deployment speed by 3x and minimizing manual configuration errors.

Implemented comprehensive data validation frameworks using Python and SQL, detecting anomalies across 40+ data sources, ensuring data integrity and regulatory compliance.

Collaborated with data analysts and architects to design star and snowflake schemas, improving query performance for Power BI and Tableau dashboards, accelerating report generation by 60%.

Monitored production pipelines with Grafana, Prometheus, and Azure Monitor, establishing automated alerts and achieving 99.97% pipeline reliability, preventing downtime and SLA breaches. Trinity Technolabs – Data Engineer India Jun 2022 – Aug 2023

Architected end-to-end data ingestion pipelines using AWS Glue, Lambda, and S3, automating extraction from 20+ enterprise data sources, improving pipeline efficiency by 4.7 and ensuring consistent daily refreshes.

Developed distributed data processing workflows in PySpark on AWS EMR, transforming 15+ terabytes of raw data daily, reducing compute runtime from 6 hours to 75 minutes while maintaining data accuracy across workloads.

Designed and optimized Redshift data warehouse schemas using star and snowflake modeling, accelerating complex analytical queries by 3.5 and supporting over 250+ BI reports across departments.

Integrated Kinesis Data Streams for real-time ingestion and analytics, enabling sub-second latency for operational dashboards used by 10 business teams handling high-frequency event data.

Automated validation workflows in Python and SQL, scanning 40+ tables daily to detect anomalies, schema drift, and data mismatches, achieving 99.97 reliability in data quality checks.

Implemented S3 lifecycle policies and Athena partitioning, lowering monthly storage and query overhead by

$3,200 and improving ad-hoc query response times by 2.4 .

Configured CI/CD pipelines through AWS CodePipeline and Jenkins, automating 65+ ETL deployments, ensuring consistent versioning and zero manual configuration drift across environments.

Deployed serverless data-access APIs with AWS API Gateway and Lambda, supporting 40+ concurrent applications and improving data retrieval speed from 2.8 seconds to 0.9 seconds per request.

Monitored production workloads with CloudWatch, Grafana, and SNS alerts, achieving 99.98 uptime across 24 7 mission-critical pipelines and reducing incident response time by 90 minutes per event.

Migrated legacy ETL processes from on-prem SSIS to AWS Glue and Redshift, retiring 12 outdated servers and improving job orchestration reliability to 99.95, while cutting infrastructure costs by $5,000 per month. Technical Skills

Programming & Scripting: Python, SQL, T-SQL, PySpark, Scala, Bash, Shell Scripting, REST API Integration

Databases & Management: SQL Server, Oracle, MySQL, PostgreSQL, MongoDB, Cassandra; Query Optimization, Database Design, and Administration

Data Warehousing: Snowflake, Azure Synapse Analytics, Amazon Redshift, Google BigQuery

ETL & Data Pipelines: Azure Data Factory (ADF), AWS Glue, Apache Airflow, Talend, Informatica, SSIS, Alteryx; ETL Migration and Orchestration

Big Data & Distributed Systems: Apache Spark, Hadoop, Hive, Kafka, HDFS, Flink, Delta Lake (ACID Transactions, Incremental Loading)

Cloud Platforms: Azure (Data Lake, Synapse, Databricks, Fabric), AWS (S3, EC2, Lambda, Glue, EMR), GCP

(BigQuery, Dataflow)

Data Modeling & Integration: Star & Snowflake Schema, Dimensional Modeling, JSON, XML

Performance Optimization: Data Partitioning, Bucketing, Cluster Tuning, and Resource Optimization in Databricks

Migration & Modernization: On-Prem to Cloud (Azure & Snowflake) Data Migration, Synapse to Fabric Pipeline Migration, SSIS to ADF Conversion

Containerization & Orchestration: Docker, Kubernetes

DevOps & CI/CD: Jenkins, Git, Terraform, Ansible, GitHub Actions, Azure DevOps

Monitoring & Logging: Grafana, Prometheus, AWS CloudWatch, Azure Monitor, OpenSearch

Data Visualization & BI: Power BI, Tableau, Looker, QuickSight, Advanced Excel, SSRS

Machine Learning & Analytics: Pandas, NumPy, scikit-learn, TensorFlow, Spark MLlib Education

University of East London United Kingdom

Master of Science in Data Science. Sep 2023 – Sep 2024 Vignan Nirula Institute of Technology and Science for Women India Bachelor of Technology in

Electronics and Communication engineering.

Projects

Real-Time Retail Sales Data Pipeline on AWS

Designed and implemented a real-time ETL pipeline using AWS Kinesis, Glue, and Redshift, processing over 2 million sales records daily from transactional data streams.

Developed data transformation logic with PySpark and AWS Lambda, enabling near real-time analytics for inventory tracking and pricing optimization.

Automated data validation and reporting through Amazon QuickSight dashboards, improving data accuracy and reducing manual reconciliation time by 6 hours per batch cycle. Healthcare Data Lakehouse using Azure Databricks & Snowflake

Built an end-to-end data lakehouse architecture using Azure Data Factory, Databricks, and Snowflake, integrating 5+ disparate healthcare datasets for unified analytics.

Implemented Delta Lake for incremental data loading and ACID compliance, improving refresh efficiency and ensuring reliability across analytical workloads.

Orchestrated ETL workflows with Apache Airflow, enabling automated scheduling, monitoring, and pipeline alerting with 99.9% execution reliability.

Contact this candidate