East Lansing, MI
East Lansing, MI
JANET MWANGI
Summary
Experienced and results-driven Data Scientist with 10 years of experience in analytics across healthcare, finance, research, and consumer domains. Skilled in transforming complex datasets into actionable insights using Python, SQL, and R. Proven ability to deliver end-to-end data solutions through predictive modeling, machine learning, statistical analysis, and scalable data pipeline development. Adept at working with large and diverse datasets, building interactive visualizations, and translating data into clear narratives that support strategic decision-making. Known for collaborating cross-functionally with technical and non-technical teams to drive innovation, improve outcomes, and solve complex business and research problems. Skills
Programming and Scripting: Python, R, SQL,
Bash, JavaScript
Data Analysis and Processing: Data cleaning,
transformation, aggregation, feature
engineering
ETL & Pipelines: Building and maintaining
automated ETL workflows using Python and
SQL
Machine Learning: regression (linear/logistic),
classification, clustering, time series
forecasting, and survival analysis
Statistical Analysis: Descriptive statistics,
hypothesis testing, model evaluation (ROC,
AUC, confusion matrix)
Data Visualization: Seaborn, Matplotlib,
ggplot2, Plotly, Tableau, and Power BI
Databases and Querying: PostgreSQL,
Google BigQuery, SQLite
Tools and Platforms: Git, GitHub, Jupyter,
RStudio, Linux environment, SSH
Natural Language Processing: text cleaning,
tokenization, topic analysis
Version Control & Workflow: Git version
control, collaborative development, automated
workflows
Experience
Data Science Research Assistant
Michigan State University - Enviroweather
Developed and maintained operational Python scripts to ensure seamless data processing and system functionality.
Worked collaboratively as part of the backend team to enhance data processing and validation workflows for weather data.
Wrote custom SQL within Python object classes to transform data, optimizing data retrieval and storage in PostgreSQL.
Implemented data movement and quality control measures through automated Python scripts, ensuring data integrity and reliability.
Reviewed and refactored legacy data processing code to improve performance and maintainability. Utilized PostgreSQL to extract weather data from remote servers, employing Python scripts for data cleaning and validation prior to loading it into production databases. Employed Git for version control, facilitating collaborative development, and code management. Operated within a Linux environment, utilizing SSH tunneling for secure access to a remote server. Ensured thorough documentation of the codebase to enhance collaboration and shareability among developers.
Data Science Research Intern
Michigan State University, College of Medicine
Utilized SQL to query and compile health data from Google BigQuery for in-depth analysis. Jacksonville, Florida 32221 616-***-**** *******@****.****.*** 01/2022 to 05/2022
Grand Rapids, MI
02/2019 to 07/2021
Remote, Remote
12/2023
Allendale, MI
12/2018
Exeter, UK
Employed NumPy and Pandas to wrangle healthcare data from NIH's All of Us repository, including demographics and health measurements.
Conducted data preparation in R and Python, focusing on aggregation and normalization for robust analysis.
Executed exploratory data analysis using Python and R, applying descriptive statistics to identify distribution patterns, and outliers.
Leveraged data visualization tools (ggplot2, Matplotlib, Seaborn) to enhance the interpretability of analytical results.
Interpreted p-value results to assess statistical significance. Applied regression analysis and machine learning algorithms to deepen analytical insights. Created static and interactive visualizations to effectively communicate research findings. Front End Web Development Graduate Assistant
Grand Valley State University
Utilized HTML, CSS, and JavaScript to develop a website featuring a slow-reveal tool for elementary school teachers.
Designed and implemented a user-friendly web application for creating engaging slow-reveal graphs for classroom use.
Suggested enhancements for an existing slow-reveal graph application, improving its functionality and user experience.
Developed Google Sites add-ons to integrate the slow-reveal tool, ensuring accessibility for educators and students.
Data Analyst
Digitaleo Interactive
Processed, cleaned, and aggregated data for analysis using tools like Excel pivot tables. Led training for 10 field officers in standardized data collection and entry processes. Ensured data integrity through rigorous verification, guaranteeing reliability for analysis. Analyzed trends and patterns in complex data sets to uncover insights. Utilized Key Performance Indicators (KPIs) to identify critical variables for analysis. Employed data visualization tools to present findings and generate impactful insights. Created compelling data representations with Excel and ggplot2 in R for enhanced clarity. Successfully completed a major big data analysis project for Diageo in two months, resulting in actionable insights that optimized distribution channels, and increased revenue. Education and Training
Master of Science: Data Science & Analytics
Grand Valley State University
Postgraduate Diploma: Financial Mathematics
University of Exeter
Accomplishments
Chevening Scholarship Award-UK FCO 2017
University of Exeter Global Excellence Scholarship 2017