Afsaruddin Mohammed
Email: ********************@*****.*** LinkedIn: LinkedIn/In/Afsaruddin Mobile: +1-469-***-**** Place: United States EDUCATION
Master of Science in Business Analytics (Minor in Data Science) The University of Texas at Dallas, Richardson, Texas August 2022 – May 2024 Bachelor of Engineering in Electronics and Communication Engineering Anil Neerukonda Institute of Technology and Sciences, Visakhapatnam, India August 2015 – April 2019 PROFESSIONAL EXPERIENCE
Adept Technical Services, Inc.
Data Engineer January 2024 – Present
• Engineered scalable ETL pipelines with Apache Spark and Python, integrating data from multiple sources (JSON, XML, CSV, Parquet) into a centralized Data Platform, improving data availability by 40%.
• Optimized data models in Snowflake and Amazon Redshift, reducing query latency by 50% and enhancing storage efficiency.
• Orchestrated data ingestion workflows using Apache Airflow and AWS Glue, processing 10M+ records daily from digital platforms, transactional systems, and Salesforce CRM.
• Transformed raw data using dbt & SQL, ensuring consistency & standardization for cross-channel analytics & segmentation.
• Designed a scalable AWS S3 data lake, managing 5TB+ of structured and unstructured data to support enterprise-wide analytics.
• Implemented real-time streaming and batch processing pipelines with Apache Kafka, Zookeeper, and Hadoop, handling 10K+ events/sec and 2M+ records daily for analytics and Machine Learning workloads.
• Developed interactive Power BI dashboards, visualizing financial trends, customer behavior, and market movements, delivering actionable insights to stakeholders.
Wipro Limited
Data Engineer June 2019 – July 2022
• Created scalable data ingestion pipelines in Databricks using PySpark, integrating CSV, JSON event streams & NoSQL data.
• Built scalable ETL and batch processing pipelines in Databricks using Spark, Python, and Scala, processing 5PB+ production data. Optimized workflows with parallel execution, cutting resource costs by 20%.
• Developed real-time data pipelines with Apache Kafka and Spark Streaming, ensuring low-latency ingestion into Apache Pinot (OLAP datastore), reducing query response time to ~3-10ms.
• Implemented data quality checks using Python & SQL, optimizing data validation processes and improving data accuracy.
• Designed and automated end-to-end data pipelines using Apache Airflow, orchestrating dynamic DAGs to enable seamless data integration, while automating data ingestion, transformations, and monitoring for improved reliability and scalability.
• Led the Customer 360 initiative, developing RFM-based segmentation for 35M+ customers and analytical datasets for Power BI & Tableau, enabling 10K+ business users with actionable insights for targeted marketing and customer analytics.
• Migrated on-premises workloads to AWS, leveraging S3, EC2, EMR, optimizing big data processing & improving scalability.
• Collaborated with Data Scientists & Analysts, designed high-performance datasets to support ML models & business intelligence. PROJECTS
Predicting Call Success Using Machine Learning Models
• Analyzed call success trends using Tableau, uncovering key demographic and time-based insights to optimize outreach.
• Developed an ML classification model (Logistic Regression, Decision Tree, Random Forest, GBT) to predict call success.
• Optimized GBT model, achieving 70.5% accuracy and 0.364 F1 score after class imbalance correction through sub-sampling. Procurement Data Warehousing in Snowflake
• Engineered a procurement data warehouse in Snowflake, designing fact & dimension tables using Kimball’s methodology.
• Orchestrated Slowly Changing Dimensions (Types 0–7) to track procurement data evolution and enhance business insights.
• Architected and implemented fact table strategies in Snowflake, balancing ETL efficiency with analytical scalability. TECHNICAL SKILLS
• Programming Languages: SQL, Python, Scala, R, SAS, Java
• Databases: MySQL, Oracle, PL/SQL, MS SQL Server, PostgreSQL, MongoDB, Snowflake, Redshift, T-SQL, SSMS, DB2
• Cloud Platforms: Microsoft Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP)
• Data Visualization Tools: Tableau, Power BI, SSRS, Quicksight, Microsoft Excel, Looker
• Big Data Technologies: Spark, Hive, Hadoop, Apache Pinot, MapReduce, PIG
• Data Engineering Tools: ADF (Azure Data Factory), Databricks, Docker, Kubernetes, Apache Airflow
• Libraries: NumPy, Pandas, Scikit-learn, Matplotlib, TensorFlow, PyTorch
• APIs & Frameworks: Fast API, Rest API
• ETL & Data Management: ETL, Data Ingestions, Data Modelling, Data Warehousing CERTIFICATIONS & TRAININGS
• AWS Certified Cloud Practitioner
• Machine Learning certification – UTD