Post Job Free
Sign in

Data Engineer Machine Learning

Location:
Hillsboro, OR
Posted:
October 15, 2025

Contact this candidate

Resume:

Areej Asad

Data Engineer

***********@*****.*** 551-***-**** Houston, Texas, 77002

LinkedIn: https://www.linkedin.com/in/areej-asad-408aa3376/

SUMMARY

Data Engineer with 5 years of experience in building end-to-end data pipelines, migrating ETL processes to cloud platforms like Azure and AWS, and implementing scalable data solutions.

Expertise in leveraging Python and R for data transformation, statistical analysis, and machine learning model integration within data workflows.

Proficient in designing and optimizing cloud-based data warehouses and data lakes, ensuring efficient storage and retrieval using SQL and NoSQL databases.

Strong background in developing and deploying machine learning models using Scikit-learn, TensorFlow, and MLlib for predictive analytics and data-driven insights.

Experienced in using Azure Data Factory, Databricks, and Apache Spark for orchestrating and executing large-scale data processing and ETL operations.

Skilled in advanced SQL programming, including complex queries, stored procedures, and performance tuning for high volume data environments.

Hands-on experience with data integration from diverse sources, data cleansing, and feature engineering to support analytics and business intelligence.

Implemented data security protocols and compliance measures for HIPAA regulations and financial industry compliance standards.

Adept at collaborating with data scientists and analysts to operationalize models and deliver integrated data products in agile settings.

SKILLS

Programming Languages & Packages: Python (Pandas, NumPy, PySpark, Scikit-learn), SQL, R (dplyr, ggplot2), Java

Project Management: Agile, Scrum, Kanban, Waterfall

Big Data Technologies: Apache Spark, Hadoop (HDFS, Hive), Kafka, Airflow, HBase, Cassandra

Cloud Platforms: AWS (S3, Redshift, Glue, Lambda), Azure (Data Factory, Databricks, Synapse), GCP (BigQuery, Dataflow)

Databases & Data Warehouses: Snowflake, PostgreSQL, MySQL, SQL Server, Oracle, MongoDB

BI & Visualization Tools: Tableau, Power BI, SSRS, Looker, QlikView

Data Engineering Tools: dbt, SSIS, Talend, Informatica, Apache NiFi, Erwin Data Modeler

DevOps & Infrastructure: Docker, Kubernetes, Jenkins, Git, Terraform, Ansible

Project Management: Jira, Confluence, Trello, Asana

Other Technologies: Linux, Unix, Windows Server, REST APIs, Microservices

EDUCATION

Bachelor of Engineering in Computer Science Engineering Aug 2017 – Nov 2021

NEDUET, Karachi, Pakistan

CERTIFICATION

AWS CCP Certified

EXPERIENCE

Data Engineer, CVS Health Corporation, Irving, Texas Aug 2023- Present

Integrated SaaS-based applications and APIs into enterprise data pipelines, ensuring seamless data ingestion and synchronization.

Developed complex SQL scripts and optimized queries in Azure Synapse Analytics for analyzing patient claims data from Medicare and Medicaid systems, improving query performance.

Designed and implemented ETL processes for the Clinical Outcomes Analytics initiative, extracting and transforming data from EHR systems, mainframes, and flat files into Azure Data Lake Storage.

Utilized PySpark for large-scale processing of patient demographic and clinical data, converting SAS scripts to cloud-native solutions for the Population Health Management program.

Created and maintained healthcare data models using ERWIN tool, implementing FHIR-compliant logical and physical designs to support clinical analytics requirements.

Designed, optimized, and executed complex SQL queries, stored procedures, and views for high-volume data environments.

Implemented data cleansing and profiling procedures for the Healthcare Quality Metrics reporting system using Azure Data Lake Storage and Azure Purview, ensuring HIPAA compliance.

Designed and implemented scalable data warehouses in Snowflake, enabling high-performance analytics on large datasets.

Migrated legacy clinical data warehouse to Azure Synapse Analytics for the Enterprise Healthcare Analytics platform, handling patient claims data integrity.

Data Engineer, KPMG, Karachi, Pakistan Jan 2021- Jun 2023

Built and scheduled automated workflows in Apache Airflow, orchestrating ETL jobs across cloud and on-premise systems.

Developed interactive Power BI dashboards for transaction monitoring and AML compliance reporting, improving reporting efficiency.

Built cloud data solutions using AWS S3 for data lake storage and serverless processing, reducing infrastructure costs.

Optimized complex SQL queries for AML detection systems, improving query performance.

Processed large-scale banking data using PySpark to develop customer risk scoring models, achieving 95% accuracy in identifying high-risk accounts.

Implemented data quality frameworks with AWS Glue DataBrew, reducing data discrepancies in monitoring systems.

Migrated legacy AML systems to cloud architecture, improving system reliability.

Documented data governance policies and compliance procedures in Confluence, ensuring adherence to KYC and AML regulations.

Integrated Airflow with Snowflake and cloud platforms (AWS/Azure) to manage end-to-end data processing pipelines.



Contact this candidate