Adnan Ali Mohammed
Senior Data Scientist
*****.*********@*****.*** +1-267-***-**** Linkedin :- www.linkedin.com/in/adnan-ali-mohammed-1907ab337
SUMMARY
10+ years of leadership experience as a Data Scientist in data analytics, infrastructure development, application support, and security optimization, driving innovation and efficiency.
Expert in translating complex technical requirements into actionable business strategies, delivering customer-centric solutions that align with organizational goals.
Hands-on experience in machine learning (ML), deep learning, and statistical analysis using Python, R, and SQL to solve real-world business problems.
Proficient in ML algorithms such as logistic regression, neural networks, XGBoost, KNN, SVM, and clustering techniques (k-means, DBSCAN, etc.), improving decision-making and predictions.
Skilled in Natural Language Processing (NLP) using BERT, GPT models, text classification, sentiment analysis, and named entity recognition for extracting insights from unstructured data.
Hands-on experience deploying ML models on cloud platforms such as AWS SageMaker, Azure Machine Learning, and Google Cloud AI, ensuring scalability and efficiency.
Extensive work with transformers and Graph Neural Networks for use cases such as fraud detection, social network analysis, and recommendation systems.
Expertise in cloud-based data engineering tools including AWS Lambda, Google BigQuery, and Azure Data Factory, enabling seamless data processing and integration.
Strong background in SQL and NoSQL databases (MySQL, MongoDB), with expertise in ETL processes, data integration validation, and data quality controls.
Skilled in data visualization tools like Tableau, Power BI, and Python libraries (Matplotlib, Pandas), creating interactive dashboards and reports to drive business insights.
Experience automating recurring reports and processes using SQL, Python, and BI tools to streamline workflows and improve operational efficiency.
Expertise in predictive modeling and business intelligence using advanced regression techniques, correlation, and multivariate analysis to support strategic decision-making.
Proven track record of risk mitigation and process optimization across industries such as telecommunications, e-commerce, and logistics, driving profitability and reducing costs.
Extensive work with credit services in the telecommunications industry, applying predictive models for credit risk mitigation and improving credit scoring systems.
Expertise in e-commerce analytics, focusing on predicting event rates, cross-sell opportunities, and customer segmentation.
Adept in leveraging machine learning to enhance logistics processes and optimize business operations, reducing losses and improving efficiency.
Proficient in using tools like Anaconda, Jupyter Notebooks, H2O, TensorFlow, Keras, and MATLAB to develop and deploy models, as well as manage complex data science projects.
Strong foundation in linear algebra, probability theory, and calculus, applying mathematical concepts to build more effective and accurate models.
Recognized for strong communication skills and ability to foster a collaborative team culture in fast-paced, dynamic environments.
SKILLS
•Data Science & Analytics: Machine Learning, Deep Learning, Statistical Analysis, Predictive Modelling, Data Mining, Natural Language Processing, Sentiment Analysis, Named Entity Recognition, Transformers, Graph Neural Networks
•Programming & Scripting: Python (NumPy, Pandas, Matplotlib, Scikit-learn), R (ggplot2, Caret), SQL (MySQL), NoSQL (MongoDB), MATLAB, TensorFlow, Keras, Theano, Anaconda, Jupyter Notebooks.
•Machine Learning Algorithms: Logistic Regression, Neural Networks (CNNs, RNNs), XGBoost, KNN, SVM, Clustering (k-means, DBSCAN), Ensemble Methods (Bagging, Boosting), Decision Trees, Random Forest, Naïve Bayes.
•Cloud Platforms & Data Engineering: AWS SageMaker, Azure Machine Learning, Google Cloud AI, AWS Lambda, Google BigQuery, Azure Data Factory.
•Data Visualization & Reporting: Tableau, Power BI, Python Visualization Libraries (Matplotlib, Seaborn), Dashboard Creation and Interactive Reporting.
•Business Intelligence & Optimization: Business Intelligence Tools, Cross-Sell Analysis, Credit Risk Mitigation, Fraud Detection, Process Optimization, Risk Mitigation.
•Statistical & Mathematical Expertise: Advanced Regression Models, Multivariate Analysis, Correlation, ANOVA, Hypothesis Testing, Probability Theory, Calculus, Linear Algebra.
•Automation & ETL: Report Automation (SQL, Python), ETL Processes, Data Integration Validation, Data Quality Control.
•Collaboration & Leadership: Cross-Functional Team Collaboration, Project Management, Strong Communication Skills, Team Leadership in Fast-Paced Environments.
EDUCATION
Bradley University
MS, Computer Science
Osmania University
Bachelor of Engineering in Information Technology
PROFESSIONAL EXPERIENCE
Penn Mutual, Philadelphia, PA March 2023 - Present
Senior Data Scientist
Implemented Machine Learning, Computer Vision, Deep Learning, and Neural Networks algorithms using TensorFlow and Keras.
Designed prediction models using data mining techniques with Python and libraries like NumPy, SciPy, Matplotlib, Pandas, and Scikit-learn.
Rebuilt existing models, improving accuracy from 68% to 89% by applying advanced machine learning techniques.
Utilized text feature engineering techniques such as n-grams, TF-IDF, and Word2Vec for NLP tasks.
Implemented ethical AI practices and ensured compliance with data privacy regulations (like GDPR or CCPA) during data collection and model development.
Applied Support Vector Machines (SVM) with Polynomial and RBF kernels to solve machine learning problems.
Addressed issues with imbalanced datasets using appropriate techniques and evaluation metrics.
Developed and deployed transformer-based models for NLP tasks like text classification, named entity recognition (NER), and sentiment analysis using frameworks like Hugging Face.
Integrated ML pipelines with MLOps frameworks such as Kubeflow and MLflow for tracking, reproducibility, and continuous deployment.
Leveraged reinforcement learning to optimize recommendation systems and dynamic pricing models in real-time environments.
Implemented AutoML platforms such as H2O.ai and Google AutoML to streamline model prototyping, reducing development time by 40%.
Utilized Federated Learning to perform privacy-preserving machine learning on decentralized data sources.
Applied Graph Neural Networks (GNNs) for fraud detection, social network analysis, and knowledge graph completion.
Worked with CNNs and RNNs for image recognition and sequence data modeling.
Developed low-latency applications and interpretable models using SHAP and LIME for model transparency.
Utilized synthetic data generation techniques to create large datasets for robust training in data-scarce environments.
Explored and implemented Large Language Models (LLMs) such as GPT and BERT for insights from unstructured data.
Designed and implemented MLflow workflows for use cases like fraud detection, customer churn prediction, and recommendation systems.
Visualized findings using Power BI and created reports using SSAS Tabular via the Analysis connector.
Handled data from sources like JSON and XML, applying machine learning algorithms in Python.
Updated Python scripts to integrate AWS Cloud Search for classification tasks, matching training data with databases.
Performed data imports, applied transformations using Hive and MapReduce, and loaded data into HDFS.
Applied Agile methodology for building internal applications and managing development workflows.
Conducted data manipulation and aggregation using tools like Nexus, Toad, Business Objects, Powerball, and Smart View.
Worked with edge computing frameworks to deploy machine learning models on IoT edge devices for real-time analytics.
Collaborated with Business Analysts, SMEs, and Data Architects to understand business requirements and design solutions.
Researched, evaluated, and deployed tools and frameworks to build sustainable big data platforms for clients.
Managed end-to-end data mining, from data collection and cleaning to model validation, visualization, and gap analysis.
Environment:
Python, MLlib, regression, PCA, T-SNE, Cluster analysis, SQL, Scala, NLP,
Spark, Kafka, Mongo DB, logistic regression, Hadoop, PySpark, CNN’s, RNN’s, Oracle 12c, Netezza, MySQL Server, SSRS, T-SQL, Tableau, Teradata, random forest, OLAP, Azure, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS, Linux.
HonorHealth, Scottsdale, AZ Jan 2021 – Feb 2023
Senior Data Scientist
Developed predictive models using machine learning algorithms such as Logistic Regression, Decision Trees, Random Forests, and XGBoost to solve business challenges.
Analyzed structured and unstructured data using Python and R to derive insights that support data-driven decision-making.
Engaged in feature engineering, applying techniques like one-hot encoding, scaling, and PCA to enhance model performance.
Implemented classification and regression models with Scikit-learn and XGBoost, improving predictive performance by 20%.
Designed and optimized A/B testing frameworks, analyzing product features and recommending improvements based on statistical significance.
Worked with SQL and NoSQL databases (MySQL, PostgreSQL, and MongoDB) for data extraction, cleaning, and transformation to support model development.
Created data visualizations using Tableau and Matplotlib to communicate insights to stakeholders and business teams effectively.
Applied natural language processing (NLP) techniques such as tokenization, stemming, lemmatization, and sentiment analysis for customer review analysis.
Collaborated with data engineers and software developers to deploy end-to-end machine learning pipelines into production systems.
Automated ETL processes for large datasets using Apache Spark and PySpark to achieve scalable, fast data processing.
Conducted time series analysis with models like ARIMA and Prophet for forecasting demand and revenue trends.
Explored unsupervised learning techniques such as K-means, DBSCAN, and hierarchical clustering to segment customers and improve targeting strategies.
Designed and built dashboards in Power BI and Tableau to monitor KPIs and project metrics across various business functions.
Integrated machine learning models into production using APIs, ensuring continuous updates through MLOps best practices.
Utilized cross-validation techniques like k-fold validation to fine-tune and validate model performance, ensuring robustness and reliability.
Environment:
Python, R, Scikit-learn, XGBoost, Pandas, NumPy, Matplotlib, Seaborn, SQL, PostgreSQL, MongoDB, MySQL, Tableau, Power BI, Apache Spark, PySpark, Hadoop, TensorFlow, ARIMA, Prophet, K-means, DBSCAN, NLTK, APIs, Git, Jupyter Notebooks, AWS, Azure, Linux
University of Rochester, Rochester, NY March 2018 – Dec 2020
Data Scientist / Machine Learning Engineer
Extracted data from Hive tables by writing efficient Hive queries for large-scale data analysis.
Performed preliminary data analysis using descriptive statistics and handled data anomalies, including removing duplicates and imputing missing values.
Created impactful visualizations using Tableau Desktop and Power BI, including cross-tabs, scatter plots, geographic maps, pie charts, and density charts for business insights.
Applied a wide range of machine learning algorithms and statistical modeling techniques such as decision trees, NLP, supervised/unsupervised learning, regression models, SVM, clustering, and deep learning with Scikit-learn and MATLAB.
Conducted data cleaning and feature selection using MLlib in PySpark and utilized deep learning frameworks like Caffe and Keras for model building.
Developed big data solutions using Spark/Scala, Python, and R in a Hadoop/Hive environment, employing K-Means clustering to identify outliers and classify unlabeled data.
Customized MLflow components for tracking machine learning models, including logging parameters and artifacts, to fit project-specific requirements.
Collaborated with CRM teams to analyze retention strategies and quantify their impact, optimizing communication without sacrificing effectiveness.
Designed and developed dimension and fact tables, creating data flow architecture for a data warehouse environment.
Evaluated models using cross-validation, log loss function, ROC curves, AUC, and leveraged Elasticsearch and Kibana for feature selection and performance monitoring.
Utilized NLTK for NLP data processing and pattern recognition, addressing overfitting through L1/L2 regularization methods.
Employed Principal Component Analysis (PCA) to analyze high-dimensional data for feature engineering.
Extracted data from the Hadoop cluster using Hive, accessed Oracle databases through SQL, and transformed data with ETL processes.
Built and evaluated machine learning models with MLlib, enhancing analytics performance.
Collaborated with cross-functional teams, including data engineers and business stakeholders, to integrate MLflow into operational workflows.
Developed a MapReduce pipeline for feature extraction using Hive and Pig for scalable data processing.
Tuned performance models using frameworks like Signal Hub, AWS SageMaker, and Azure Databricks to achieve optimal results.
Conducted data cleaning, feature scaling, and engineering with pandas and NumPy for enhanced model performance.
Created data quality validation scripts with SQL and Hive to ensure data integrity and consistency across datasets.
Generated advanced NLP models for tasks like text summarization and sentiment analysis using frameworks such as Hugging Face and TensorFlow.
Implemented automated ML pipelines with Kubeflow for seamless model deployment and monitoring.
Utilized reinforcement learning techniques to optimize dynamic pricing models and real-time decision-making.
Explored Graph Neural Networks (GNNs) for applications like fraud detection and social network analysis.
Applied synthetic data generation techniques to augment datasets, improving model accuracy when data was scarce.
Designed experiments with Large Language Models (LLMs) to derive insights from unstructured data for specific business applications.
Environment:
Python, SQL, SQL Server, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, cluster analysis, Scala, NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, XML, Cassandra, MapReduce, AWS, TensorFlow, Keras, Hugging Face, Kubeflow, Elasticsearch, Kibana, Signal Hub.
Saint Francis Health System, Tulsa, OK Apr 2017 – Feb 2018
Senior Data Analyst
Hands-on experience with data persistency using Hibernate and Spring Framework, including writing stored procedures and inner joins in RDBMS Oracle.
Utilized MongoDB for storing specification documents for fulfillment centers, ensuring efficient data management.
Wrote complex SQL queries and Stored Procedures to interact with the Oracle database for promo codes and offers.
Acted as part of the production support team, resolving incidents and documenting common issues pre- and post-go-live.
Collaborated in the design and development of web application pages (e.g., accident coverage and vehicle info) using Java, JSP, and JSTL.
Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, HiveUDF, and Spark for efficient data processing and querying.
Developed custom aggregate functions with Spark SQL and performed interactive querying, using Scoop to store data into HBase and Hive.
Installed and maintained Hadoop clusters, including tasks like commissioning/decommissioning Data Nodes, capacity planning, and configuring Name Node high availability.
Troubleshot software issues through debugging and code analysis, preparing unit test cases to ensure quality control.
Developed back-end processes through SQL Server stored procedures for efficient data operations.
Integrated various sub-systems through XML and XSL, using WebSphere as the application server in both development and production environments.
Mentored team members in the complete Software Development Life Cycle (SDLC), covering design, coding, testing, and documentation phases.
Worked extensively with NoSQL databases like HBase, creating tables to load large sets of semi-structured data from various sources.
Involved in the design and development of business module applications using J2EE technologies like Servlets, Spring, and JDBC.
Developed and implemented the Struts framework to follow the MVC architecture, improving modularization and scalability.
Utilized J2SE technologies like JNDI, JDBC, and RMI, applying multithreading in projects to enhance performance.
Environment:
Java J2EE, HTML, JavaScript and CSS, UML, Spring Framework, Git, Visual Studio Code, MySQL, JDBC, Java 7/8, Servlets, XML, Web Services, WSDL, Selenium, Apache HTTP Client, Oracle, SQL, PL/SQL, JSTL, ANT, Maven.
Petco, San Diego, CA Dec 2015 – Mar 2017
Data Analyst
Extracted and manipulated data from various sources using SQL and Python to generate actionable insights for business stakeholders.
Conducted Exploratory Data Analysis (EDA) to identify trends, patterns, and anomalies within large datasets using descriptive statistics.
Developed interactive dashboards and reports using Tableau and Power BI, enabling stakeholders to visualize Key Performance Indicators (KPIs) and track business metrics.
Collaborated with cross-functional teams to understand business requirements and translate them into analytical solutions, ensuring alignment with organizational goals.
Applied advanced statistical techniques and machine learning algorithms (e.g., regression analysis, clustering, time series forecasting) to derive insights from data.
Automated repetitive data processing tasks using Python, reducing manual effort and increasing efficiency in data handling.
Conducted A/B testing and multivariate testing to assess the impact of marketing strategies and product features on user engagement and conversion rates.
Developed and maintained data models, ensuring data accuracy and consistency across various analytical processes.
Utilized ETL processes to extract, transform, and load data from diverse sources, enhancing data quality for analysis.
Created data quality checks and validation scripts to ensure the integrity of datasets used for reporting and analysis.
Provided training and mentorship to junior analysts, fostering a collaborative learning environment and enhancing team productivity.
Presented analytical findings and recommendations to senior management, facilitating data-driven decision-making.
Developed and maintained documentation for data processes, methodologies, and best practices to ensure transparency and reproducibility of analysis.
Environment:
Python, SQL, SQL Server, PL/SQL, T-SQL, Tableau, Power BI, Excel, R, Hadoop, Apache Spark, Apache Kafka, MySQL, MongoDB, Teradata, OLAP, Google Analytics, Microsoft Azure, Jupyter Notebooks, NLTK, JSON, XML.