Data Engineer Google Cloud

Location:

Wilmington, Devon, United Kingdom

Posted:

June 17, 2025

Contact this candidate

Resume:

DWARAKA N

Data Engineer

Mail: ***.*****@*****.*** LinkedIn: https://www.linkedin.com/in/dwaraka-n-9a59a7196/

Mobile: 610-***-****

PROFESSIONAL SUMMARY

Over 6 years of experience in the Data Engineering field across major clients, including CVS Health, Chase, and AT&T.

Extensive experience with ETL development, designing data pipelines, and optimizing processes to handle large-scale datasets.

Deep knowledge of cloud platforms like AWS, Azure, and Google Cloud to scale data solutions.

Expertise in big data tools including Apache Hadoop, Apache Spark, Kafka, Hive, and Flume to process and analyze large datasets.

Proficient in SQL, Python, and Java for scripting and querying large datasets efficiently.

Skilled in using data warehousing solutions such as AWS Redshift, Google Big Query, Snowflake, and Teradata.

Experience in data orchestration using Apache Airflow and Apache Nifi for managing workflows and scheduling ETL jobs.

Developed real-time data processing pipelines using Apache Kafka and Apache Spark Streaming for data ingestion and transformation.

Created and managed shell scripts on Unix to support job scheduling, file transfers, and monitoring within on-prem and hybrid ETL frameworks.

Managed collaborative codebases using GitHub, Bitbucket, with version-controlled deployment pipelines in Jenkins and Terraform.

Expertise in cloud storage solutions like AWS S3, Azure Blob Storage, and Google Cloud Storage for efficient data storage and retrieval.

Built and maintained scalable ETL pipelines for continuous data processing using Python, Apache Nifi, and Apache Spark.

Performance tuning of data pipelines and SQL queries to improve processing time and optimize resource usage.

Automated data refreshes and reporting tasks using Apache Airflow, improving efficiency by 30%.

Hands-on experience in containerization technologies like Docker and Kubernetes for deploying and managing data applications.

Expertise in building data visualization dashboards using Tableau and Power BI for real- time insights and reporting.

Implemented data security practices ensuring compliance with industry regulations (HIPAA, GDPR) and protecting sensitive data.

Led data migration projects to move legacy systems to modern cloud-based infrastructures, improving scalability and performance.

Collaborated closely with data scientists to deploy machine learning models and support predictive analytics use cases.

Actively participated in Agile/Scrum teams, providing timely updates and coordinating efforts with business stakeholders.

Technical Skills

Programming Languages

Python, SQL, Java, Shell Scripting

Big Data Technologies

Apache Hadoop, Apache Spark, Apache Kafka, Hive, HBase, Flume, Pig

ETL Tools

Apache Nifi, Talend, Informatica, SSIS, Python (Pandas, Dask), Airflow

Cloud Platforms

AWS (S3, Redshift, EMR, Lambda, EC2,Glue), Azure, Google Cloud (Big Query, Dataflow, Pub/Sub)

Data Warehousing Solutions

Snowflake, Amazon Redshift, Teradata, Google Big Query

Data Orchestration & Workflow Automation

Apache Airflow, Apache Nifi, Apache Oozie

Databases

SQL Server, MySQL, PostgreSQL, MongoDB, Cassandra

Version Control

Git, GitHub, Bitbucket

Containerization & Virtualization

Docker, Kubernetes

Data Visualization & Reporting Tools

Tableau, Power BI, QlikView

Data Security & Compliance

HIPAA, GDPR, Data Encryption, Role-based Access Control (RBAC)

Monitoring Tools

AWS CloudWatch, Prometheus, Grafana

DevOps Tools

Jenkins, Terraform, Ansible, CI/CD pipelines

Professional Experience

CVS Health Data Engineer September 2023 – Present

Developed and optimized ETL pipelines using Apache Spark and Python to ingest and transform healthcare data for reporting and analytics.

Designed and implemented real-time data processing solutions using Apache Kafka and AWS Lambda, enabling instant insights into patient data.

Built scalable cloud architectures on AWS to manage large healthcare datasets with high availability and low-latency access.

Developed and maintained data transformation pipelines using DBT (Data Build Tool), enforcing modular SQL practices and enabling version-controlled, testable ELT workflows in the Snowflake environment.

Automated data workflows using Apache Airflow, reducing the need for manual intervention and increasing pipeline reliability.

Integrated AWS Glue with AWS Lake Formation and S3, enabling fine-grained access control and centralized data governance for secure data sharing.

Integrated data from multiple healthcare systems, creating a unified data lake on AWS S3 for more accessible data storage.

Optimized Snowflake warehouse performance by implementing clustering keys, result caching, and materialized views, reducing query response times and improving data accessibility for analytics teams.

Developed custom data validation scripts in Python to ensure the accuracy and quality of incoming healthcare data.

Enabled GDPR and HIPAA compliance by incorporating data masking, PII filtering, and encryption in AWS Glue pipelines before data distribution.

Built and maintained data models to support healthcare reporting and analytics for various business units.

Worked closely with data scientists to deploy predictive models for patient behavior analysis and clinical decision-making.

Designed optimized SQL queries and stored procedures for high-performance data retrieval in Teradata.

Conducted data mining on large-scale healthcare and financial datasets to identify patterns and derive actionable insights for business stakeholders.

Developed custom dashboards in Tableau provide real-time insights for decision- makers in the healthcare industry.

Optimized the performance of SQL queries for data reporting, reducing query times by 25%.

Led a data migration project to transition legacy healthcare systems to the cloud, improving scalability and reducing costs by 30%.

Delivered training sessions for business stakeholders, helping them understand data pipelines and reporting tools.

Implemented automated error detection and alerts using AWS CloudWatch to monitor ETL jobs.

Created documentation for data workflows, providing clarity and transparency for cross-functional teams.

Led a team of data engineers to enhance data pipeline performance and reliability.

Supported disaster recovery and backup strategies, ensuring business continuity for patient data.

Delivered regular status updates and reports on project progress to senior leadership.

Participated in Agile sprints and worked with cross-functional teams to deliver data- driven solutions on time.

Chase Data Engineer May 2020 – December 2022

Developed ETL pipelines to ingest financial data using Apache Spark and Snowflake, streamlining the data integration process and enabling faster, scalable analytics within the cloud data warehouse.

Built real-time data processing pipelines using Apache Kafka, ensuring continuous data ingestion and transformation for financial applications.

Integrated data from multiple internal and external APIs to create a unified data lake in AWS S3, making data more accessible for analysis.

Designed and developed Snowflake-based data warehouses to support financial analytics, reporting, and compliance tracking across business units.

Designed data models for customer transaction data to enable efficient reporting and analytics.

Implemented automated data quality checks and validation scripts using Python to ensure clean and accurate data.

Improved data processing time by 30% through the optimization of SQL queries and data transformation scripts.

Optimized Glue job performance by partitioning large datasets and fine-tuning Spark parameters, reducing job run times by up to 60%.

Led cloud migration efforts to move banking data systems to AWS, improving system scalability and reducing costs.

Developed dashboards and reports in Power BI and Tableau, enabling stakeholders to make informed financial decisions.

Built and maintained CI/CD pipelines using Jenkins to streamline data pipeline deployment and ensure faster release cycles.

Collaborated with data scientists to implement machine learning models for fraud detection and customer segmentation.

Provided ongoing support for cloud-based data environments, ensuring availability and reliability of financial systems.

Worked with the compliance team to ensure that financial data handling met GDPR and other data protection regulations.

Developed modular SQL models and implemented version-controlled transformations using Data Build Tool (DBT) to streamline ELT workflows.

Developed scripts to automate the extraction of historical data for reporting purposes, saving 20% of time for the reporting team.

Optimized Teradata query performance using statistics collection, join indexing, and workload management for high-frequency financial reports.

Participated in Agile development processes, providing regular updates and feedback during sprint reviews.

Created data processing documentation for internal teams, improving team knowledge sharing and operational efficiency.

Reduced data pipeline failure rates by 25% through improved monitoring and error handling.

Played a key role in improving the data warehouse structure, leading to faster reporting and improved customer insights.

Utilized Teradata to support enterprise-level financial reporting and data modeling initiatives.

Designed and implemented data refresh strategies to ensure real-time data availability for critical financial systems.

Provided training and mentoring to junior data engineers, helping them to develop skills in cloud technologies and big data tools.

Worked closely with business analysts and stakeholders to translate business requirements into data engineering

solutions.

AT&T Data Engineer June 2018 – May 2020

Designed ETL architecture using Python, Apache Nifi, and Hadoop to process telecom data into an Azure-based data lake.

Optimized advanced SQL queries and procedures across Snowflake, PostgreSQL, and Azure Databricks for performance and analytics.

Developed data pipelines using Azure Data Factory and Informatica, automating workflows for consistent and scalable ETL processing.

Refactored legacy SQL and Snowflake logic into DBT and PySpark in Databricks, enhancing performance with Delta Lake.

Built real-time stream processing pipelines using Apache Kafka and Spark Streaming to deliver instant insights.

Engineered real-time ingestion of multi-format data (Parquet, Avro, JSON, XML) using Event Hubs and PySpark transformations.

Migrated large-scale data from on-prem Snowflake to Azure Databricks via Blob Storage and automated ADF pipelines.

Used GitHub for version control and collaborative code management in Agile environments.

Implemented Jenkins CI/CD pipelines for ETL job deployment and automated testing across development stages.

Integrated relational and NoSQL databases like Oracle and PostgreSQL to support cross-system data synchronization.

Ensured system reliability through performance tuning, disaster recovery planning, and detailed technical documentation.

Conducted regular system audits and performance assessments to ensure the integrity and efficiency of telecom data systems.

Supported the development of API-based data interfaces for integrating external data with AT&T’s internal systems.

Delivered comprehensive technical documentation and training to ensure smooth transitions between project phases and team members.

Education

Master’s in Information Technology, Wilmington University

Contact this candidate