Post Job Free
Sign in

Machine Learning Data Scientist

Location:
Atlanta, GA
Salary:
100000
Posted:
April 25, 2025

Contact this candidate

Resume:

**/**** ** *******

East Lansing, MI

**/**** ** **/****

East Lansing, MI

JANET MWANGI

Summary

Experienced and results-driven Data Scientist with 10 years of experience in analytics across healthcare, finance, research, and consumer domains. Skilled in transforming complex datasets into actionable insights using Python, SQL, and R. Proven ability to deliver end-to-end data solutions through predictive modeling, machine learning, statistical analysis, and scalable data pipeline development. Adept at working with large and diverse datasets, building interactive visualizations, and translating data into clear narratives that support strategic decision-making. Known for collaborating cross-functionally with technical and non-technical teams to drive innovation, improve outcomes, and solve complex business and research problems. Skills

Programming and Scripting: Python, R, SQL,

Bash, JavaScript

Data Analysis and Processing: Data cleaning,

transformation, aggregation, feature

engineering

ETL & Pipelines: Building and maintaining

automated ETL workflows using Python and

SQL

Machine Learning: regression (linear/logistic),

classification, clustering, time series

forecasting, and survival analysis

Statistical Analysis: Descriptive statistics,

hypothesis testing, model evaluation (ROC,

AUC, confusion matrix)

Data Visualization: Seaborn, Matplotlib,

ggplot2, Plotly, Tableau, and Power BI

Databases and Querying: PostgreSQL,

Google BigQuery, SQLite

Tools and Platforms: Git, GitHub, Jupyter,

RStudio, Linux environment, SSH

Natural Language Processing: text cleaning,

tokenization, topic analysis

Version Control & Workflow: Git version

control, collaborative development, automated

workflows

Experience

Data Science Research Assistant

Michigan State University - Enviroweather

Developed and maintained operational Python scripts to ensure seamless data processing and system functionality.

Worked collaboratively as part of the backend team to enhance data processing and validation workflows for weather data.

Wrote custom SQL within Python object classes to transform data, optimizing data retrieval and storage in PostgreSQL.

Implemented data movement and quality control measures through automated Python scripts, ensuring data integrity and reliability.

Reviewed and refactored legacy data processing code to improve performance and maintainability. Utilized PostgreSQL to extract weather data from remote servers, employing Python scripts for data cleaning and validation prior to loading it into production databases. Employed Git for version control, facilitating collaborative development, and code management. Operated within a Linux environment, utilizing SSH tunneling for secure access to a remote server. Ensured thorough documentation of the codebase to enhance collaboration and shareability among developers.

Data Science Research Intern

Michigan State University, College of Medicine

Utilized SQL to query and compile health data from Google BigQuery for in-depth analysis. Jacksonville, Florida 32221 616-***-**** *******@****.****.*** 01/2022 to 05/2022

Grand Rapids, MI

02/2019 to 07/2021

Remote, Remote

12/2023

Allendale, MI

12/2018

Exeter, UK

Employed NumPy and Pandas to wrangle healthcare data from NIH's All of Us repository, including demographics and health measurements.

Conducted data preparation in R and Python, focusing on aggregation and normalization for robust analysis.

Executed exploratory data analysis using Python and R, applying descriptive statistics to identify distribution patterns, and outliers.

Leveraged data visualization tools (ggplot2, Matplotlib, Seaborn) to enhance the interpretability of analytical results.

Interpreted p-value results to assess statistical significance. Applied regression analysis and machine learning algorithms to deepen analytical insights. Created static and interactive visualizations to effectively communicate research findings. Front End Web Development Graduate Assistant

Grand Valley State University

Utilized HTML, CSS, and JavaScript to develop a website featuring a slow-reveal tool for elementary school teachers.

Designed and implemented a user-friendly web application for creating engaging slow-reveal graphs for classroom use.

Suggested enhancements for an existing slow-reveal graph application, improving its functionality and user experience.

Developed Google Sites add-ons to integrate the slow-reveal tool, ensuring accessibility for educators and students.

Data Analyst

Digitaleo Interactive

Processed, cleaned, and aggregated data for analysis using tools like Excel pivot tables. Led training for 10 field officers in standardized data collection and entry processes. Ensured data integrity through rigorous verification, guaranteeing reliability for analysis. Analyzed trends and patterns in complex data sets to uncover insights. Utilized Key Performance Indicators (KPIs) to identify critical variables for analysis. Employed data visualization tools to present findings and generate impactful insights. Created compelling data representations with Excel and ggplot2 in R for enhanced clarity. Successfully completed a major big data analysis project for Diageo in two months, resulting in actionable insights that optimized distribution channels, and increased revenue. Education and Training

Master of Science: Data Science & Analytics

Grand Valley State University

Postgraduate Diploma: Financial Mathematics

University of Exeter

Accomplishments

Chevening Scholarship Award-UK FCO 2017

University of Exeter Global Excellence Scholarship 2017



Contact this candidate