Senior Data Scientist

Location:

Riverdale Park, MD, 20737

Posted:

February 25, 2024

Contact this candidate

Resume:

LOUDEN BESINGI MOTINA

Contact No.: (***) *** - **64

E-Mail: *************@*****.***.

Data Scientist Data Management Data Analysis

Dynamic and Strategic Professional with global, multicultural, and diversified experience of over 10 years in the Data Science, AI, and Machine Learning field.

PROFILE SUMMARY

•Analytical Data Scientist with over 10 years of experience as a Data Scientist; Skilled in Data Mining, Machine Learning with big datasets of structured and unstructured data, and Deep Learning techniques for business problems across verticals

•Skilled in applying Machine learning techniques such as Linear and Logistic Regression, Decision Trees, and Neural Network Architectures

•Proficient in collecting, cleaning, and analyzing data, while uncovering hidden trends and patterns.

•Experienced in building data pipelines for data migration and integration.

•Skilled in performing ETL processes on data and well-versed in data mining techniques and machine learning

•Experienced in Analysis Methods such as Forecasting, Multivariate analysis, Sampling methods, Clustering Predictive, Statistical, Sentiment, Exploratory, and Bayesian Analysis. Regression Analysis, Linear models

•Competent in creating data visualizations and dashboards using Tableau and Power BI.

•Knowledgeable in cloud computing, including AWS, Azure, and Google Cloud.

•Familiar with various cloud-based SaaS and PaaS solutions used in data engineering.

•Accomplished in Python, SQL, NoSQL, and Excel.

•Capable of delivering data-driven insights to inform decision-making

•Gained knowledge in ML techniques to live data streams from big data sources using PySpark, SPARK packages for SQL and ML & batch processing techniques

•Working in close coordination with the cross-functional team for requirements gathering sessions with business units and versatile in converting the business requirements to technical specifications like system and functional requirements

CORE COMPETENCIES

Dimensional Data Modeling

Business Intelligence Solutions

Tableau

Data Warehousing

Requirement Gathering

Linear Regression

Scikit-learn

Project Management & Governance

Tensorflow

Machine Learning

Statistics

Data Processing Algorithms

Logical Data Models

Python

PySpark

MySQL

Team Management

TECHNICAL SKILLS

Programming languages

Matlab, Python, SQL, R, C++, PHP, JavaScript, Spark

Python Packages

Numpy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, Fastai, SciPy, matplotlib, Seaborn, Numba

Software Tools

Phpmyadmin, MySQL, Excel, and Visual Basic for Applications

Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, Regression, Naïve Bayes.

Machine Learning Algorithms

(kNN, SVM, Naïve Bayes, MLP, Random Forest, C4.5, xgboost), linear regression, Associative Memories, and data preprocessing algorithms (feature selection, feature extraction, NLP, and cleaning data), WEKA, sci-kit-learn, and Keras on TensorFlow

Tools and Skills

Jupyter, RStudio, Github, Git, APIs, C++, Eclipse, Java, Linux, C#, Docker, Node.js, React.js, Spring, XML, Kubernetes, Back-End, Databases, Bootstrap, Django, Flask, CSS, Express.js, Front-End, HTML, MS Azure, AWS, GCP, Azure Databricks, AWS Sagemaker

Deep Learning

Machine Perception, Data Mining, Machine Learning, Neural Networks, TensorFlow, Keras, PyTorch, Transfer Learning

Time Series Analysis:

Forecasting (Arima, SSM, NAR, and LSTM)

Data Modelling

Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series analysis

Natural Language Processing

NLTK Toolkit and Stanza in Python, Text Analysis, Classification, Chatbots

Other

Cloud computing and service providers like AWS, GC, Azure & Good knowledge on cybersecurity technics

ORGANIZATIONAL EXPERIENCE

Senior Data Scientist

Since Jul’21-Present Abik health care services, Riverdale Maryland

This data science project revolves around the collection and analysis of a company's diverse healthcare data stored in Amazon S3. The primary objective is to establish a robust data pipeline, leveraging Snowflake, to construct a comprehensive data warehouse. This data warehouse serves as the central repository for processing and querying daily electronic health data derived from various sources, including patient records, clinical information, insurance data, billing records, and hospital transactions.

•Gather electronic health data from multiple sources, centralizing it in Amazon S3 for subsequent processing.

•Apply data cleaning and transformation techniques to ensure data accuracy and consistency. Python, with libraries such as Pandas and NumPy, is utilized for this purpose.

•Build and maintain a Snowflake data warehouse to store and organize the cleaned data efficiently.

•Perform SQL queries on the data warehouse to extract valuable insights and generate actionable reports.

•Employ Natural Language Toolkit (NLTK) and Scikit-learn (Sklearn) libraries to analyze text data within the healthcare records.

•Integrate the Snowflake data warehouse with Power BI for creating dynamic and informative business reports.

•Use Power Query to transform and enrich data, conduct multiple joins, and generate various visualizations to facilitate data-driven decision-making.

•Successfully centralized and standardized healthcare data from diverse sources.

•Established an efficient data pipeline using Snowflake, ensuring timely data processing

•Leveraged Python for data manipulation and analysis

•Applied advanced text analysis techniques to gain insights from textual healthcare data

•Empowered stakeholders with informative business reports via CloudWatch

•Acting as an internal enablement resource for client-facing resources and Marketing Resources

•Participated in the implementation of and use of leading-edge data warehouse / big data technologies including software, end-user tools, and other data services

Senior Data Scientist

Sep’19- Jul’21 Jefferies Financial Group, Newyork, NY

The Data Strategy project is a global initiative aimed at providing critical support to the Equity Research Department and its analysts. Our primary objective is to seamlessly integrate alternative data sources into their research reports. This integration combines the fundamental domain expertise of the research team with our technical skills to facilitate actionable analyses. We also maintain proactive collaboration with third-party data vendors to enhance their products. Our team's core focus revolves around the methodical, defensible, and actionable processing of extensive datasets, enabling us to construct narratives about real-world behavior and dynamics. Our work is founded on statistics, data engineering, and data science, with occasional forays into machine learning and artificial intelligence as tools to augment our analysts' capabilities.

•Spearheaded the integration of diverse data sources, including alternative datasets, into the Equity Research Department's analytical framework, ensuring data quality, consistency, and relevance using ools such as Apache Nifi, Talend), API Integration, Data Warehousing (Snowflake)

•Collaborated closely with equity analysts to harness their domain expertise and translate it into actionable insights through advanced analytics techniques, such as regression analysis, clustering, and time series forecasting.

•Proactively engage with third-party data vendors to optimize data procurement processes, evaluate data quality, and identify opportunities for data enrichment using Data Quality Assessment Tools, Data Enrichment Solutions.

•Applied robust data preprocessing techniques, including data cleaning, normalization, and feature engineering, to large and complex datasets, enhancing the accuracy and reliability of analytical outcomes using Pandas, NumPy), Data Cleaning and Feature Engineering

•Constructed compelling narratives and visualizations using data-driven insights to explain real-world market behaviors, trends, and dynamics, employing storytelling techniques such as Tableau & Power BI

•Leveraged statistical methodologies, such as hypothesis testing, ANOVA, and regression modeling, to extract actionable information from datasets and support hypothesis-driven research.

•Implemented data engineering practices like data transformation, data warehousing, and data pipeline automation to streamline data ingestion and processing workflows using Apache Spark, Apache Kafka, Data Transformation, Data Pipeline Automation

•Applied machine learning and artificial intelligence algorithms, including supervised and unsupervised learning, to augment research processes and empower analysts with predictive and prescriptive analytics with ML Framework such as Scikit-Learn & TensorFlow

•Guided and performedknowledge transfer to junior team members, ensuring alignment with best practices in data science, data engineering, and statistical analysis.

•Stayed abreast of emerging trends and technologies in data science, including advancements in machine learning, deep learning, and natural language processing, to identify opportunities for innovation and improvement in the research process

•Performed Reporting analytics using PowerBI, as well as built internal dashboards in Periscope for reporting purposes

Data Scientist & ML Engineer

Jun’17-Sep’19 McKinsey & Company, Newyork, NY, USA

With an enterprise resource planning solution, worked on a project where organizations can use data and analytics to better predict supply needs and anticipate disruptions. Worked with diverse and complex datasets also contribute to tangible business improvements by leveraging data science methods and advanced analytics.

•Utilized machine learning and statistical modeling techniques to develop and assess algorithms aimed at optimizing performance, data quality, and accuracy within the industry's Supply Chain management

•Managed version control through Git and implemented Jenkins for CI/CD pipelines to streamline code management and deployment

•Established a playbook for events and classification containers, improving data management and classification processes

•Took charge of data analysis, cleaning, and debugging efforts, ensuring the readiness of datasets and codebases for Quality Assurance.

•Spearheaded the development of a proof of concept for a chatbot, showcasing its potential utility

•Collaborated closely with product and program managers to define data analytics problems and formulate effective solutions

•Designed and developed an application using Django, successfully moving it to deployment

•Automated model training and deployment through ML operations, enhancing efficiency and scalability

•Contributed to the design and prototyping of medium to high-complexity machine learning systems, fostering innovation

•Employed statistical analysis and data visualization techniques to uncover valuable patterns, trends, and insights for informed decision-making

•Utilized predictive modeling to forecast outcomes and identify potential health risks for patients, facilitating personalized treatment plans and improved care

•Leveraged statistical analysis and machine learning algorithms to identify target patient populations, refine trial designs, and assess potential risks

•Developed dashboards and reports to monitor key performance indicators (KPIs), analyze trends, and create predictive models for outcome forecasting

•Ensured organizations' compliance with regulatory requirements, including HIPAA, GDPR, and FDA regulations, safeguarding data privacy and integrity

•Ensuring that healthcare organizations are compliant with regulatory requirements, including HIPAA, GDPR, and FDA regulations

Data Scientist

Feb’15-Jun’17 CompSource Mutual Insurance Co., Oklahoma City, USA

Worked within the Enterprise Applications Team as a Data Scientist. Project responsibilities include multivariate analysis to test the impact of the WorkSafe Champions Safety Program on policies. I also worked with the SIU (special investigations unit) to indicate the feasibility of detecting insurance fraud with Machine Learning and help gain insight into possible fraudulent claims or activity.

•Operated within a Cloudera Hadoop environment, harnessing Python, SQL, and Tableau to unlock insights from extensive datasets

•Conducted research to evaluate fraud predictive analytics scenarios, identifying patterns and predicting outcomes for new claims. This involved the development of predictive models and comprehensive data analysis

•Employed a robust toolkit, including Python, Pandas, NumPy, and SciPy, for exploratory data analysis, data wrangling, and feature engineering

•Evaluated Anomaly Detection Models, including Expectation Maximization, Elliptical Envelopes, and Isolation Forests, to enhance fraud detection capabilities

•Extracted and harnessed data residing within the Hadoop Distributed File System (HDFS) on Cloudera for analysis

•Crafted a comprehensive Tableau Dashboard, providing a powerful tool for presenting the organization's Annual Report

•Explored the application of kernel density estimation in lower-dimensional spaces as a predictive feature in fraud detection

•Conducted in-depth multivariate analysis on safety programs spanning multiple years, aiming to uncover valuable insights

•Leveraged regression analysis to establish correlations between participation in safety programs and claims outcomes

•Executed Hypothesis testing and rigorous statistical analysis to pinpoint statistically significant changes in claims following participation in safety programs

•Utilized Tableau and TabPy to visualize complex analyses and present findings effectively

•Managed the integration of fraud data with claims data, effectively handling large datasets with multiple observations

•Collaborated with fellow data scientists on diverse use cases, including workplace accident prediction and sentiment analysis, engaging with various stakeholders to develop predictive models and drive data-driven decision-making

Data Scientist

Sep’13-Feb’15 Fidelity Investments, Boston, Massachusetts

At Fidelity Investments, the Investment Portfolio project was at the forefront of innovation, leveraging Natural Language Processing (NLP) and Time Series Analysis to revolutionize investment portfolio management. This forward-looking initiative aimed to enhance predictive analytics. This analysis was used to re-balance stock portfolios. Explored various algorithmic trading theories and ideas.

•Leveraged PySpark Python modules within the Hadoop ecosystem on AWS for machine learning and predictive analytics

•Implemented advanced machine learning algorithms using Spark, MLLib, R, and other relevant tools and languages

•Scripted in R, Java, and Python to perform data analysis and manipulation, ensuring data accuracy and quality

•Developed data dictionaries to generate metadata reports, catering to both technical and business requirements

•Created reporting dashboards to visualize statistical models, enabling the tracking of key metrics and risk indicators

•Utilized ensemble models such as Random Forest to enhance model performance

•Extracted and transformed data from Amazon Redshift on AWS, preparing raw and complex data streams for analytical tools

•Explored various regression and ensemble models for forecasting and developed new financial models

•Improved model efficiency and accuracy through rigorous evaluation and refinement in R

•Defined source-to-target data mappings, established business rules, and refined data definitions

•Conducted end-to-end Informatica ETL testing, crafting complex SQL queries for source and target databases comparison

CERTIFICATIONS & PUBLICATIONS

Applied data science with python specialization

Certifications in Introduction to Python, Applied Plotting, Charting in Python, Applied Machine Learning, Applied Text Mining in Python, Applied Social Network Analysis from the university of Michigan through Coursera

SPLUNK CERTIFICATION

Training and certifications on Splunk

Project Management

Online certification in project management from open to study

Publications

Published 2 articles on peer-reviewed journals titled

-Designing a natural rubber latex effluent treatments system using acid coagulation rubber recovery tank in series with a constructed wetland for Cameroon’s largest agro industry’s rubber processing unit.

-Reducing overconsumption of acid during skim latex coagulation in Cameron’s rubber processing factories

EDUCATION

•Master of Science in Big Data Analytics from Bay Atlantic University, Washington DC

•Master of Engineering in Process Engineering from ENSAI-University of Ngaoundere

•Bachelor of Science in Chemistry from university of Buea

Contact this candidate