Data Engineer Machine Learning

Location:

Bayonne, NJ

Posted:

September 10, 2025

Contact this candidate

Resume:

MUSKAAN MAHINDRAKAR

551-***-**** Bayonne, Jersey City, United States **********@*****.*** linkedin.com/in/muskaan-mahi/ DATA ENGINEER

Data Engineer with 7 years of experience building cloud-native data platforms on AWS and GCP. Experienced in Python, SQL, Spark, Kafka, Airflow, and Snowflake. Designed and deployed ETL/ELT pipelines for real-time and batch systems. Improved data quality, observability, and governance across large-scale environments. Delivered scalable solutions supporting machine learning workflows and analytics, with a focus on performance, reliability, and automation. STRENGTHS AND EXPERTISE

Programming Languages: Python, SQL, Java, Scala, C++, Shell Scripting, R Program Big Data Technologies: Hadoop, Spark, Kafka, Hive, Snowflake, Airflow, MongoDB, Docker, Git Cloud Platforms: AWS (S3, EMR, Lambda, Glue, Redshift), Azure (Data Factory, Synapse, Databricks), GCP (BigQuery, Dataflow, Pub/Sub), VertexAI, AzureAI Data Analysis Tools: Tableau, Power BI, QuickSight Machine Learning Tools: Scikit-learn, TensorFlow, SageMaker, MLflow (Keras, PyTorch, LightGBM, XGBoost, OpenCV, NLTK, SpaCy, Seaborn, SciPy) Methodologies: Agile, Scrum, CI/CD, SDLC

Gen AI: Fine-tuning LLMs (GPT, BERT, and T5), Hugging Face Transformers, OpenAI API Leadership: Sprint Planning, Retrospectives, Backlog Grooming, Scrum Master Support PROFESSIONAL EXPERIENCE

Data Engineer

AT&T Bedminster, New Jersey

March 2024 - Present

●Architected and deployed ETL pipelines using BigQuery, Dataproc, and Spark, improving query performance by 25% and reducing storage costs by 20%.

●Constructed scalable Kafka-Snowflake pipelines for multi-terabyte financial data, reducing latency by 40% and increasing throughput by 3x.

●Developed fraud analytics platforms on AWS and Azure with Redshift, Glue, and IAM controls, reducing audit risks by 30% under SOX and GDPR.

●Automated infrastructure provisioning with Terraform and CloudFormation, accelerating infrastructure delivery by 40% and ensuring high availability.

●Deployed RESTful APIs using Flask and FastAPI for real-time predictions; connected MongoDB and DynamoDB to reduce latency by 35%.

●Integrated anomaly detection in ETL workflows, enhancing fraud detection accuracy by 15% and reducing false positives.

●Created real-time dashboards using Tableau and QuickSight, improving KPI visibility and analyst efficiency by 30%.

●Built CI/CD pipelines with GitHub Actions, Airflow, and Snowflake Tasks, reducing deployment errors by 40% and enabling zero-downtime releases. Data Engineer

North Highland Atlanta, Georgia June 2023 - January 2024.

● Implemented end-to-end MLOps workflows using MLflow and Docker, improving deployment speed by 35% and ensuring consistent, scalable model delivery.

●Executed distributed data processing using Spark DataFrames and reusable Airflow DAGs, improving pipeline speed by 85%.

●Engineered cloud-native architectures with AWS Redshift, Lambda, and S3, increasing data availability and scalability by 40%.

●Optimized SQL and Athena queries through partitioning and compression strategies, reducing query execution costs by 35%.

●Synthesized synthetic datasets using Gen AI and Python, increasing model accuracy by 18% and improving robustness across edge cases. Data Engineer

Bharat Heavy Electricals Limited Bengaluru, India December 2018 - June 2022

●Formulated ETL workflows using Apache Airflow, AWS Glue, and BigQuery, reducing data latency by 50% and improving data reliability.

●Orchestrated ML models (XGBoost, LSTM, BERT) into production using SageMaker and GCP AutoML, cutting inference latency by 40% for real-time scoring.

● Implemented CI/CD pipelines for ML workflows using Terraform, GitHub Actions, and Docker, reducing deployment time by 40% and enabling automated monitoring.

●Created dashboards in Tableau and Power BI for real-time data visibility, reducing manual reporting effort by 35%.

●Managed feature pipelines using Snowflake and Feature Store, accelerating model training by 30% and improving consistency across versions.

●Leveraged GCP AutoML and SageMaker Pipelines to reduce manual effort by 50% and accelerate model lifecycle deployment.

●Propelled multi-region failover systems in AWS Cloud, increasing availability by 99.9% during peak loads and reducing downtime risk.

●Facilitated Agile processes, boosting collaboration across data science and engineering teams and increasing sprint velocity by 25%. KEY PROJECTS: Real-Time Transaction Monitoring for Fraud Detection, Credit Risk Scoring Pipeline - AutoML, Cloud Migration for Regulatory Data Warehousing CERTIFICATIONS:

● Python: 60 Hour Training Program (ISO-Certified)

● AWS Partner Courses: Completed Cloud Practitioner, Data Lake and others.

● Dataiku Academy Certifications: Core Designer, ML Practitioner, Advanced Designer, Developer, MLOps Practitioner

● Databricks Certifications: ML/Data Science

● Java Certification: ISO Certified Core and Advanced

● Big Data Hadoop: Course Completion Certificate

● Machine Learning: Internship Training (ISO-Certified) EDUCATION:

Master of Science Computer Science - Stevens Institute of Technology Relevant Coursework: Deep Learning, Machine Learning, Natural Language Processing, Knowledge Discovery & Data Mining, Data Acquisition, DBMS Bachelor of Technology Computer Science - NIIT University Relevant Coursework: ML, Cloud, NLP, Data Structures, Big Data, Information Retrieval, Image Processing, Design & Analysis of Algorithms.

Contact this candidate