Post Job Free
Sign in

Remote Data Scientist & ML Engineer - Health & Impact Focus

Location:
Bonny, Rivers, 50, Nigeria
Salary:
$15/hr
Posted:
May 07, 2026

Contact this candidate

Resume:

*

STEPHEN JOHN

KWAGGA

+234**********

Plot *** Danjuma drive Trans - Amadi

Industrial Layout, Port Harcourt,

Rivers state. Nigeria.

*********.***@*****.***

https://stephenkwagga.github.io

https://www.linkedin.com/in/stephen-j-kwagga-

60385418b

CORE COMPETENCE

● Predictive Modeling & Machine

Learning (supervised, unsupervised and

deep learning)

● Version Control & Collaboration

(Git & GitHub)

● Data Analysis & Exploratory Data

Analysis (EDA)

● Jupyter Notebooks & Google Colab

for Prototyping

● SQL for Data Querying (Joins,

Subqueries, Window Functions)

● Data Cleaning & Wrangling

(Python, Pandas, NumPy)

● Business Intelligence Reporting &

Insight Generation

● Collaboration and Problem

Solving

● Tableau, Power BI and Excel

● Streamlit, FastAPI, Docker and

AWS

PROFILE

Machine Learning & Data Science professional, skilled in building scalable ML pipelines and production- ready models, supervised, unsupervised, and deep learning. Specialized in time-series forecasting, fraud detection, churn prediction, patient segmentation, marketing intelligence etc. Expert in Python, SQL, PyTorch and Data visualization, delivering advanced actionable insights and end-to-end solutions for business impact.

SELECTED PROJECTS / HIGHLIGHT

1. Energy Forecasting System (End-to-End)

High-precision household energy forecasting with real-time inference. Captures peak consumption within 0.02 kWh accuracy. Integrated with Streamlit dashboard for smart-meter data visualization.

· End-to-end ML pipeline with FastAPI backend, automated retraining, and live monitoring.

· XGBoost model achieving 0.228 RMSE on 1M+ records.

· Drift detection + Docker containerization for production deployment.

· 0.02 kwh accuracy peak capture, with $0.15/kWh translates to $560,000 per year saving per 500,000 households preventing procurement and demand penarity. Private repo – Request technical walkthrough…

2

2. Credit Card Fraud Detection

Real-time anomaly detection for financial transactions. Catches 7 out of 10 fraud attempts before completion. Flagged transactions trigger 2FA, SMS verification, or manual review — balancing security with user experience.

· Random Forest model optimized for recall (>70%) with sub-100 ms latency

· Risk-based authentication triggers based on fraud probability

· Streamlit interface for real-time transaction scoring Private repo – Request technical walkthrough….

3. Credit Risk Prediction (Loan Default Classification) Impact: Improved loan approval decisions and reduced default risk. Built a supervised machine learning model to predict loan default using applicant financial and demographic data. Conducted data cleaning, categorical encoding, and EDA to identify key risk drivers such as credit history, income, and loan amount. Trained a Logistic Regression model achieving 82% accuracy, enabling data-driven credit risk assessment.

https://github.com/StephenKwagga/DataScience_Projects/blob/main/Credit_Risk_Prediction/Credit_Risk_Pr ediction_SupervisedLearning.ipynb

4. Household Energy Consumption Forecasting (Time Series) Impact: Enabled accurate energy demand planning and peak-load management. Developed an XGBoost time-series model to forecast hourly household energy consumption. Engineered lag features, rolling statistics, and cyclical time features to capture daily and seasonal patterns. Achieved RMSE of 0.228 kWh with accurate peak demand prediction and minimal bias. Model saved and production-ready.

https://github.com/StephenKwagga/Forcasting_Hub/blob/main/Household_Energy_Consumption_Series_Xg boost.ipynb

5. Bank Marketing Prediction with Explainable AI (Random Forest + LIME) Impact: Increased marketing conversion while reducing wasted outreach. Built customer subscription prediction models using Logistic Regression and Random Forest, achieving ROC AUC of 0.89. Applied LIME to explain individual predictions and uncover key drivers such as call duration, prior campaign success, and contact timing, enabling transparent and targeted marketing decisions. 3

https://github.com/StephenKwagga/Advance_DataScience_Projects/blob/main/Bank_Marketing_Model/Ban k_Marketing_SupervisedLearning.ipynb

6. Customer Churn Prediction (Banking)

Impact: Enabled proactive customer retention strategies. Developed a churn prediction model using customer demographic and account data. Applied feature encoding and scaling, training a Logistic Regression classifier achieving 86.6% accuracy. Identified age, account balance, and number of products as the strongest churn indicators, supporting data-driven retention efforts.

https://github.com/StephenKwagga/DataScience_Projects/blob/main/Churn_Prediction/Customer_Churn_Pre diction_SupervisedLearning.ipynb

7. Customer Segmentation using Unsupervised Learning (K-Means + PCA) Impact: Improved customer targeting and marketing ROI. Performed customer segmentation using K-Means clustering on age, income, and spending behavior. Validated clusters using Elbow and Silhouette methods and visualized segments with PCA. Translated clusters into actionable strategies such as loyalty programs, discount targeting, and premium upselling.

https://github.com/StephenKwagga/Advance_DataScience_Projects/blob/main/Mall_Customers_Model/Mall _Customers_Modelling_UnsupervisedK-means.ipynb

EXPERIENCE

1. DevelopersHub Corporation. June 2025 – July 2025 Data Science & AI/ML Intern.

Completed 8 real-world Data Science & AI/ML projects, fully documented on GitHub.

Worked with datasets across classification, regression, clustering, and time-series forecasting.

Applied Python, numpy, pandas, scikit-learn, seaborn, matplotlib, XGBoost, and SHAP for model building and analysis.

Delivered insights through data cleaning, feature engineering, visualization, and model evaluation.

Gain proficiency in supervised and unsupervised Learning. 2. Islamic Coin. July. 2023 – Jun. 2024

Community Analyst and Content Ambassador.

Monitored and analyzed community engagement metrics across Telegram and Discord

(200K+ users), providing actionable insights into user activity and sentiment.

Created data-driven content strategies to increase engagement and participation, using platforms such as Twitter, Medium, and Reddit.

Conducted trend analysis to optimize content delivery and improve community interaction.

Developed and executed community events (polls, quizzes) to increase interaction and track participation metrics.

4

Generated regular performance reports, detailing key metrics and content effectiveness, using Excel and Google Sheets.

Collaborated with core team members to leverage data insights and enhance community engagement, fostering a positive environment across platform.

insights through data cleaning, feature engineering, visualization, and model evaluation. 3. Delliote Internship May 2025 – June 2025

Data Analyst.

Utilized Tableau to create interactive dashboards showcasing financial and operational KPIs.

Performed data cleaning and transformation using Excel and Python for business performance analysis.

Interpreted client datasets to identify trends, cost-saving opportunities, and strategic insights.

Delivered actionable reports tailored for both technical and non-technical stakeholders.

Practiced stakeholder communication and consulting scenarios through simulated case studies.

Gained exposure to enterprise consulting frameworks, performance metrics, and strategic data planning.

4. KPMG Australia (Forage). Jan 2023 – April 2023

Power BI Data Analyst.

• Analyzed a dataset of over 500,000 customer transactions using Power BI to uncover trends and correlations between customer behavior and business performance.

• Built interactive Power BI dashboards to track and visualize customer retention, segmentation, and performance metrics, leading to a 20% increase in data accessibility for business decision-makers.

• Created automated reports that generated insights into customer behavior, reducing manual reporting time by 30% and allowing the team to focus on strategic planning.

• Utilized DAX and Power Query to create custom calculations, improving the accuracy and depth of the data analysis, which led to a 15% reduction in churn rates after implementing findings.

• Led presentations of findings using Power BI visuals, improving stakeholder engagement by 25% through clearer, more actionable insights.

• Worked with cross-functional teams to develop solutions for customer retention, resulting in a 10% increase in retention rates for mock scenarios. 5. Udacity. Jan. 2023 – June. 2023

Data Science Nanodegree (Bertelsmann Scholarship).

Developed solid foundations in SQL by writing complex queries involving JOIN, GROUP BY, nested subqueries, and aggregations to analyze structured datasets and generate business insights.

Applied SQL to real-world datasets, such as analyzing trends in global film data (e.g., TMDb Movies Dataset), including revenue drivers, genre popularity, and release timing. 5

Built end-to-end data analysis pipelines using Pandas for cleaning, reshaping, and merging datasets, and NumPy for numerical operations.

Performed exploratory data analysis (EDA) to identify outliers, missing values, and relationships between variables.

Created compelling data visualizations with Matplotlib and Seaborn, and presented findings with narrative context using Jupyter Notebook documentation.

Synthesized insights into clear, actionable conclusions to simulate decision-making support for business stakeholders.

6. Hamoye AI. May.2023 – Nov 2023

Data Science Internship.

ata wrangling and cleaning efforts using Python (Pandas, NumPy) to prepare raw datasets for analysis.

Conducted exploratory data analysis (EDA) with Python (Matplotlib, Seaborn) and R

(ggplot2) to reveal key trends, correlations, and outliers in complex datasets.

Built and fine-tuned machine learning models in Python using scikit-learn, achieving performance metrics like 85% accuracy for customer churn prediction.

Applied cross-validation and hyperparameter optimization to improve model robustness and predictive power.

Produced clear, data-driven insights by visualizing trends and patterns in datasets, providing actionable recommendations to non-technical stakeholders.

Documented analysis and shared findings using Jupyter Notebooks, ensuring clarity in communication and smooth collaboration with team members. EDUCATION

B.Tech. (Hons.) Geology. May 2015 - Jul 2021

Moddibo Adama University, Yola, Adamawa State.

CERTIFICATIONS AND DEVELOPMENT:

A/B Testing with Python. 365 Data Science. Dec 2025

Data Visualization with Python, R, Tableau, and Excel. Nov 2025 Data Science 365.

Introduction to Vector Database and Pinecone. 365 Data Science. Nov 2025

LLM Engineering with Streamlit and OpenAI. 365 Data Science. Nov 2025

Machine Learning A – Z. 365 Data Science. Nov 2025

Machine Learning in Python. 365 Data Science. Nov 2025

Linear Algebra and Feature Selection. 365 Data Science. Nov 2025

Math Foundation for ML. 365 Data Science. Nov 2025

Deep Learning with Pytorch. Datacamp. Nov 2025

6

Python Fundamental. Datacamp. Sep 2025

Machine Learning Specializition. Deeplearning.AI August 2025

Data-Driven Decision Making. Linkedin Learning. August 2025

DataScience and Analytics. DevelopersHub.co July 2025

AWS Sagemaker AI. Linkedin Learning. July 2025

Python for DataScience and Machine Learning. Essential 1 & 2. July 2025 Linkedin Learning

SQL Intro, Intermediate and Join. Datacamp. June 2025

SQL Bootcamp. Udemy. Oct 2024

Data Analytics Certificate. KPMG. April 2023

Data Science with Python. Udacity. Mar 2023

Aspire Leadership. May 2023

Aspire Institute an online learning.

Initiative Of Harvard Business School

Data Analytics Certificate. Coursera. Nov 2022

Project Management May 2022

International Capacity Building Management and Development.

Professional and Vocational Development in Basic Microsoft Office. May 2015

American University of Nigeria, Yola.



Contact this candidate