Hetal Vaghela Ph. +1-617-***-**** Email:**********@*****.***
Boston, MA LinkedIn: https://www.linkedin.com/in/hetal42/ SUMMARY
Innovative and performance-oriented Data Engineer with 4+ years of experience designing, building, and optimizing large-scale data solutions in cloud and hybrid environments. Proven expertise in crafting robust ETL pipelines, real-time streaming, and data lakes using AWS, GCP, Hadoop, and Snowflake. Adept at transforming raw data into business-ready insights through automation, machine learning models, and advanced analytics. Strong background in healthcare and financial domains, delivering secure, cost-effective, and compliant solutions that improve efficiency, reduce latency, and support strategic decision-making. PROFESSIONAL EXPERIENCE
Data Engineer CVS Health USA Jun 2024 – Present
● Implemented Agile methodologies, collaborating with cross-functional teams to enhance project flexibility and accelerated development cycles by 20%, reducing time-to-market for healthcare data-driven initiatives.
● Designed and deployed end-to-end ETL pipelines using Python and SQL within the Hadoop ecosystem to automate the data transformation and ingestion of structured data, improving data processing efficiency by 40%.
● Migrated ETL workflows from SSIS to AWS Glue, integrating relational databases with cloud storage, enabling a seamless transition to cloud-based processing, enhancing scalability, and lowering infrastructure maintenance costs.
● Streamlined Spark-based data pipelines to process large volumes of data warehouse and supply chain data, reducing batch processing times by 30% and ensuring accurate data models for downstream analytics and reporting.
● Engineered real-time data pipelines using Apache Kafka to stream pharmacy inventory and order data while integrating data modeling and data governance practices such as schema validation to ensure compliance and improve data quality by 30%.
● Accelerated data analysis workflows by leveraging Amazon Athena, executing SQL queries directly on S3 data, and reducing query response times by 20% for business decisions.
● Devised advanced CI/CD pipelines, incorporating DevOps best practices to automate the deployment of healthcare data engineering workflows while optimizing data structures and minimizing deployment errors by 25%.
● Crafted interactive dashboards in Amazon QuickSight to visualize critical metrics such as prescription trends and inventory levels in real-time, enhancing decision-making and improving operational efficiency by 25%. Data Engineer Magna Infotech India Jun 2018 – Jul 2022
● Developed and optimized cloud-based data pipelines using GCP Google Cloud Dataflow to unify transactional and structured data, ensuring seamless operations in production through testing and data validation.
● Integrated Hadoop and Spark with SQL databases to enhance financial data warehousing and accelerate OLAP processes, reducing data retrieval times by 50% for faster business insights.
● Spearheaded the adoption of Snowflake for advanced data warehousing, improving scalability and reducing query execution times by 45%, enabling faster insights for credit risk and portfolio analysis.
● Built data marts and data models using Google BigQuery to streamline the management of financial datasets, improving reporting accuracy by 35% and enabling the development of data-driven financial solutions.
● Executed MapReduce jobs to process and transform large-scale datasets, improving data management and accuracy by 30% while supporting critical reporting and analytics workflows.
● Applied machine learning models using TensorFlow and Scikit-learn to forecast financial market trends with 20% higher accuracy, enhancing risk management strategies and supporting data-driven decision-making.
● Orchestrated containerized deployments with Docker and Kubernetes to support scalable data integration and financial data processing workflows, ensuring operational efficiency in cloud environments.
● Created customized financial dashboards using Google Data Studio, enabling real-time insights, enhancing strategic planning, and improving decision-making efficiency by 40%.
PROJECT
● Built healthcare data platform processing patient records from 1,000+ locations, improving efficiency by 40% using AWS Glue and Apache Kafka.
● Migrated SSIS to cloud ETL workflows, integrating EHR and pharmacy data with HIPAA compliance, reducing costs by 35%.
● Developed ML models using TensorFlow to predict healthcare costs and detect insurance fraud, achieving 20% higher accuracy.
● Optimized Apache Spark pipelines for patient data processing, reducing batch times by 30% for clinical analytics.
● Implemented real-time streaming with Kafka for pharmacy inventory and prescription processing.
● Migrated Oracle to Snowflake data warehouse, achieving 45% faster query performance for healthcare financial data.
● Created automated CI/CD pipelines for healthcare data workflows, reducing deployment errors by 25% with integrated testing.
● Built a data quality framework with Python validation scripts, improving healthcare data accuracy by 30% across patient systems. TECHNICAL SKILLS
● Programming Languages: Python, R, SQL, Java, C
● Databases: MySQL, PostgreSQL, SQL Server, MongoDB, NoSQL
● Big Data Technologies: Hadoop (HDFS), Spark, Snowflake, BigQuery
● ETL/ELT Tools: Apache Airflow, AWS Glue, DBT, SSIS
● Cloud Platforms: AWS (EC2, S3, RDS, EMR, Lambda, Kinesis, Redshift, Glue, IAM), Azure (Data Factory, Databricks, Data Lake, SQL Database), GCP (Cloud Dataflow, BigQuery, Data Studio)
● Machine Learning & Data Analysis: TensorFlow, Scikit-learn, NumPy, Pandas, SciPy, Matplotlib, Seaborn, ggplot2
● Visualization Tools: Tableau, Power BI, QuickSight, Google Data Studio
● Development Tools & IDEs: PyCharm, Jupyter Notebook, Visual Studio Code, Git, GitHub
● Containerization & Orchestration: Docker, Kubernetes
● Methodologies: Agile, Scrum, DevOps, CI/CD, Jira EDUCATION
M.S. Information Technology - The University of Massachusetts Boston GPA – 3.78/4.0 Aug 2022 – May 2024 Bachelors in Information Technology - Gujarat Technological University GPA – 8.04/10.0 Jul 2014 – May 2018