Data Engineer

Location:

Countryside, VA, 20165

Posted:

November 25, 2025

Contact this candidate

Resume:

FAIZA SADAQAT

Email: **************@*****.*** Contact: 443-***-**** Location: Sterling, VA 20165

LinkedIn: https://www.linkedin.com/in/faiza-sadaqat-09ba85248/

Status: Green Card Active Public Trust Clearance Holder

Sr.Data Engineer

Professional Summary:

Senior Data Engineer with 11+ years of experience delivering enterprise-scale data platforms, analytics, and machine learning solutions across healthcare, banking, and government sectors.

Specializes in building and optimizing cloud-native data ecosystems leveraging Azure Data Factory, Azure Data Lake, Synapse, Databricks, and Terraform to design automated pipelines, govern data environments, and enable scalable analytics.

Extensive expertise with the Medallion Architecture (bronze, silver, gold layers), engineering structured and unstructured data pipelines in Databricks with PySpark, and implementing governance frameworks with Unity Catalog.

Proven track record of migrating multi-terabyte data environments from on-premises to cloud, optimizing ETL/ELT with Delta Lake and partitioning, and enabling real-time streaming pipelines with Kafka, Event Hubs, and Spark Structured Streaming.

Brings advanced data lakehouse engineering experience combined with ML/AI deployment using MLflow, predictive modeling, and fraud detection. Certified in AWS, Databricks, and Generative AI, with strong multi-cloud expertise (AWS, Azure, GCP) and the ability to bridge data engineering, governance, and advanced analytics for modern enterprise data platforms.

Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

HIGHLIGHTS

Programming: Python (Pandas, NumPy, Scikit-learn, TensorFlow, Keras, Matplotlib), SQL, Shell Script, R

Big Data & ETL: Apache Spark, Hadoop, Kafka, Hive, Delta Lake, Airflow, AWS Glue, Talend, Data Wrangling, Pig Latin, Flume

Cloud Platforms: AWS (S3, Lambda, EMR, Redshift, Glue, DMS, RDS, EC2, CloudWatch), Azure (Data Factory, Data Lake, SQL DW, Synapse, HDInsight, Stream Analytics), GCP (BigQuery)

Data Lakes & Architecture: Medallion Architecture (Bronze, Silver, Gold layers), Azure Data Lake, Databricks, Unity Catalog, Delta Lakehouse, Terraform

Databases: PostgreSQL, MongoDB, Redshift, Snowflake, NoSQL, SQL Server, MySQL, MS Access, DB2, Impala, Oracle, Teradata, ODS

Machine Learning & AI: Linear/Logistic Regression, Decision Trees, Random Forest, K-Means, SVM, XGBoost, Ensemble Methods, Deep Learning (TensorFlow, Keras, PyTorch), NLP, SMOTE, Fraud Detection, Predictive Modeling, Feature Engineering, Hyperparameter Tuning, MLflow

Visualization & BI Tools: Tableau, Power BI, Qlik, Excel, QlikView, Business Objects

DevOps & CI/CD Tools: Docker, Jenkins, Git, GitHub, Bitbucket, Tortoise SVN, DataDog, CI/CD Pipelines, Copado, Jira, Confluence, SharePoint, ProjectLibre

ETL & Data Engineering: ETL/ELT, Data Pipelines, Data Mining, Data Quality Management, Principal Component Analysis, Statistical Analysis, Data Modeling, ER Diagrams, Schema Design, Data Governance, Metadata Management

Other Tools/Frameworks: Microsoft SSIS, Informatica, Autosys, PySpark, API Integrations (REST, SOAP, Shopify, WordPress, Squarespace), Salesforce Data Integration

Soft Skills & Practices: Agile (Scrum), Requirement Gathering, Gap Analysis, Stakeholder Communication, Cross-functional Collaboration, Technical Documentation, Testing & Troubleshooting

EDUCATION:

MS in computer science, Punjab University 2013.

CERTIFICATIONS:

Cloud & Data Engineering:

•AWS Certified Developer – Associate

•Databricks Certified Professional Data Engineer

•Databricks Accredited Generative AI Fundamentals

AI & Machine Learning:

•AI & Machine Learning Engineer Certification – Google

•Certified Python Programming Professional

•Python Automation Testing

•Text Classification with Natural Language Processing

•Data Sources in R

•AI Algorithms from Scratch

Marketing Technology:

•Braze Certified Developer – Level 2

•Productivity & Project Management

•Project Management Certification

•Excel Formulas and Functions

•MS Word 2021 Essential Training

PROFESSIONAL EXPERIENCE

Sr. Data Engineer Mar 2025-Present

Brilliant Corporation - Reston, VA

Responsibilities:

Developed and deployed predictive ML models, deep learning solutions, and AI-powered analytics platforms for customer experience, revenue optimization, and marketing insights.

Designed, implemented, and automated data pipelines using Apache Spark, Kafka, Airflow for structured and unstructured datasets.

Managed massive data lakes and cloud data warehouses on AWS, Azure, GCP, utilizing S3, Redshift, Glue for scalable analytics.

Designed and deployed enterprise-scale ETL pipelines using Databricks, Spark, and Delta Lake, delivering reduced cloud costs and improved processing speeds.

Implemented data governance and fine-grained access control utilizing Unity Catalog.

Built and managed full ML model lifecycle with MLflow on Databricks, including automated registration and deployment.”

Optimized performance of distributed jobs and workflows for large-scale analytics.

Engineered ETL workflows (Python/SQL/cloud-native services) processing 15TB+ monthly, reduced processing times by 65%.

Built and tuned fraud detection algorithms (real-time), improving detection from 89% to 96%, cut false positives 30%.

Led feature engineering, hyperparameter tuning (F1-score +0.09 improvement), enhanced production deployments.

Developed interactive Tableau dashboards, BI reports, and analytics for stakeholders.

Secured big data systems, implemented metadata management and data quality monitoring.

Implemented real-time fraud detection algorithms, improving detection rates from 89% to 96% and decreasing false positives by 30%.

Optimized model performance via feature engineering and hyper parameter tuning, achieving an average F1-score improvement of 0.09 across three production deployments

Worked on metadata clean-up and processing at regular intervals for better quality of data.

Monitored and debugged model performance using PyTorch utilities, logging tools, and visualization libraries like Tensor Board.

Applied different Machine Learning algorithms/methods on data sets to predict credit risk, fraud detection.

Generated daily, weekly and monthly reports as per the specifications/requirements by business users.

Environment: Python, Hive, AWS, Linux, Tableau Desktop, Microsoft Excel, NLP, Deep learning frameworks such as Tensor Flow, Keras, Boosting algorithms etc.

Sr. Data Engineer Dec 2020- Feb 2025

UHG - Fairfax, VA

Responsibilities:

Collaborated with data scientists and analysts to ensure data availability, quality, and accessibility for machine learning model training and business intelligence.

Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Designed and deployed enterprise-scale ETL pipelines using Databricks, Spark, and Delta Lake, delivering reduced cloud costs and improved processing speeds.”

Implemented data governance and fine-grained access control utilizing Unity Catalog.”

Built and managed full ML model lifecycle with MLflow on Databricks, including automated registration and deployment.

Implemented data quality checks and monitoring frameworks to ensure consistency, accuracy, and completeness of the data being ingested and stored.

Tuned SQL queries and improved indexing, leading to a 60% reduction in query runtimes and faster business insights delivery.

Designed and launched a PowerBI suite for revenue management analytics, leading to 15% more accurate business forecasts, validated by historic performance metrics.

Created action filters, parameters and calculated sets for preparing dashboards and worksheets using PowerBI.

Architect & implemented medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Data bricks, NoSQL DB).

Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.

Worked on developing Pyspark script to encrypting the raw data by using Hashing algorithms concepts on client specified columns.

Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.

Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms

Interacted with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions

Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.

Performed data cleaning and imputation of missing values using R.

Involved in solving technical queries, driving the product initiatives and metric collection and analysis.

Involved in regularly accessing JIRA tool and other internal issue trackers for the Project development.

Involved in Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.

Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.

Environment: R, Python, Informatica, JIIRA, Teradata, ODS, MySQL, DB2, MS Excel, json, Machine Learning Algorithms.

Data Engineer Feb 2018 to Nov 2020

Usana - Salt lake city, UT

Responsibilities:

Involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.

Performed exploratory Data Analysis and Principal Component Analysis on noisy information scraped from the internet present findings by developing an interactive application for data visualization.

Analyzed and cleaned 7+ TB of noisy external data, improving downstream model accuracy by 17% through advanced sampling and SMOTE.

Developed scalable fraud detection models, boosting recall from 71% to 89% and reducing manual review workload by 40%

Extracted data from SQL Server Database copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.

Created Tableau reports with complex calculations and worked on Ad-hoc reporting using PowerBI.

Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.

Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.

Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods and deployed the model on AWS.

Used GitHub and Jenkins for CI/CD (DevOPs operations). Also familiar with Tortoise SVN, Bit bucket, JIRA, Confluence.

Designed and implemented Cross-validation and statistical tests including ANOVA, Chi-square test to verify the models’ significance.

Implemented, tuned and tested the model on AWS EC2 with the best performing algorithm and parameters.

Data Analytics Jan 2014 to Dec 2017

BANK OF AMERICA - Dover, DE

Responsibilities:

Provided management support by identifying trends and developing strategies to assist management in decision making processes

Performed data engineering functions: data extract, transformation, loading, and integration in support of enterprise data infrastructures - data warehouse, operational data stores and master data management

Led the consolidation of structured and unstructured data from 25+ sources, enabling scalable analytics and business intelligence.

Used of Tableau business intelligence tool to generate report and dashboard overview to analyze trend.

Responsible for data services and data movement infrastructures good experience with ETL concepts, building ETL solutions and Data modeling.

Integrated structured and unstructured data from APIs, streaming services (Kafka, Kinesis), cloud storage (S3, GCS), and enterprise systems (SAP, Salesforce, etc.)...

Acted as product owner in several data science company agile projects, providing detailed guidance including statistical quality assurance and dashboard design.

Documented data flow diagrams, pipeline logic, and metadata management practices to maintain transparency and ease of knowledge transfer.

Created SQL views, triggers, store procedures and defect management for ETL and BI systems.

Developed and published reports to provide operational and analytical insights to management via Excel and PowerPoint.

Modified existing data visualizations and made adjustments as per requirements.

Developed and supported analytic solutions aimed at improving outcomes, efficiencies, costs, and patients experience.

Ensure design quality by creating, conducting, and documenting testing.

Identifies technical roadblocks and troubleshoots and resolves functional and performance related issues.

Environment: Jira, SharePoint, Confluence, Tableau, Oracle, Microsoft Excel, SQL Server, Power BI

Contact this candidate