Data Engineer Machine Learning

Location:

Posted:

October 15, 2025

Resume:

Afsaruddin Mohammed

Email: ********************@*****.*** LinkedIn: LinkedIn/In/Afsaruddin Mobile: +1-469-***-**** Place: United States EDUCATION

Master of Science in Business Analytics (Minor in Data Science) The University of Texas at Dallas, Richardson, Texas August 2022 – May 2024 Bachelor of Engineering in Electronics and Communication Engineering Anil Neerukonda Institute of Technology and Sciences, Visakhapatnam, India August 2015 – April 2019 PROFESSIONAL EXPERIENCE

Adept Technical Services, Inc.

Data Engineer January 2024 – Present

• Engineered scalable ETL pipelines with Apache Spark and Python, integrating data from multiple sources (JSON, XML, CSV, Parquet) into a centralized Data Platform, improving data availability by 40%.

• Optimized data models in Snowflake and Amazon Redshift, reducing query latency by 50% and enhancing storage efficiency.

• Orchestrated data ingestion workflows using Apache Airflow and AWS Glue, processing 10M+ records daily from digital platforms, transactional systems, and Salesforce CRM.

• Transformed raw data using dbt & SQL, ensuring consistency & standardization for cross-channel analytics & segmentation.

• Designed a scalable AWS S3 data lake, managing 5TB+ of structured and unstructured data to support enterprise-wide analytics.

• Implemented real-time streaming and batch processing pipelines with Apache Kafka, Zookeeper, and Hadoop, handling 10K+ events/sec and 2M+ records daily for analytics and Machine Learning workloads.

• Developed interactive Power BI dashboards, visualizing financial trends, customer behavior, and market movements, delivering actionable insights to stakeholders.

Wipro Limited

Data Engineer June 2019 – July 2022

• Created scalable data ingestion pipelines in Databricks using PySpark, integrating CSV, JSON event streams & NoSQL data.

• Built scalable ETL and batch processing pipelines in Databricks using Spark, Python, and Scala, processing 5PB+ production data. Optimized workflows with parallel execution, cutting resource costs by 20%.

• Developed real-time data pipelines with Apache Kafka and Spark Streaming, ensuring low-latency ingestion into Apache Pinot (OLAP datastore), reducing query response time to ~3-10ms.

• Implemented data quality checks using Python & SQL, optimizing data validation processes and improving data accuracy.

• Designed and automated end-to-end data pipelines using Apache Airflow, orchestrating dynamic DAGs to enable seamless data integration, while automating data ingestion, transformations, and monitoring for improved reliability and scalability.

• Led the Customer 360 initiative, developing RFM-based segmentation for 35M+ customers and analytical datasets for Power BI & Tableau, enabling 10K+ business users with actionable insights for targeted marketing and customer analytics.

• Migrated on-premises workloads to AWS, leveraging S3, EC2, EMR, optimizing big data processing & improving scalability.

• Collaborated with Data Scientists & Analysts, designed high-performance datasets to support ML models & business intelligence. PROJECTS

Predicting Call Success Using Machine Learning Models

• Analyzed call success trends using Tableau, uncovering key demographic and time-based insights to optimize outreach.

• Developed an ML classification model (Logistic Regression, Decision Tree, Random Forest, GBT) to predict call success.

• Optimized GBT model, achieving 70.5% accuracy and 0.364 F1 score after class imbalance correction through sub-sampling. Procurement Data Warehousing in Snowflake

• Engineered a procurement data warehouse in Snowflake, designing fact & dimension tables using Kimball’s methodology.

• Orchestrated Slowly Changing Dimensions (Types 0–7) to track procurement data evolution and enhance business insights.

• Architected and implemented fact table strategies in Snowflake, balancing ETL efficiency with analytical scalability. TECHNICAL SKILLS

• Programming Languages: SQL, Python, Scala, R, SAS, Java

• Databases: MySQL, Oracle, PL/SQL, MS SQL Server, PostgreSQL, MongoDB, Snowflake, Redshift, T-SQL, SSMS, DB2

• Cloud Platforms: Microsoft Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP)

• Data Visualization Tools: Tableau, Power BI, SSRS, Quicksight, Microsoft Excel, Looker

• Big Data Technologies: Spark, Hive, Hadoop, Apache Pinot, MapReduce, PIG

• Data Engineering Tools: ADF (Azure Data Factory), Databricks, Docker, Kubernetes, Apache Airflow

• Libraries: NumPy, Pandas, Scikit-learn, Matplotlib, TensorFlow, PyTorch

• APIs & Frameworks: Fast API, Rest API

• ETL & Data Management: ETL, Data Ingestions, Data Modelling, Data Warehousing CERTIFICATIONS & TRAININGS

• AWS Certified Cloud Practitioner

• Machine Learning certification – UTD

Contact this candidate