AI Scientist-ML Engineer

Location:

Redmond, WA

Posted:

April 05, 2023

Contact this candidate

Resume:

ANASS ISMAILI

Contact: 510-***-**** (M); Email: ************@*****.***

Attuned to the latest trends and advancements in this field, I am consistently delivering impeccable results through my dedication to handling multiple functions and activities under a high-pressure environment with tight deadline.

DATA SCIENTIST /MACHINE LEARNING ENGINEER

EXECUTIVE SNAPSHOT

Data scientist and Machine Learning Engineer with 7+ years of experience in applying Data Mining, Machine Learning with big datasets of structured and unstructured data, and Deep Learning techniques to business problems across verticals

Proficient in managing end-to-end data science projects and actively involved in all the phases of the project’s life cycle.

Extensive experience in applying:

oMachine learning techniques such as Linear and Logistic Regression, Decision Trees, and Neural Network Architectures.

oNaïve Bayes, Regression, and Classification techniques as well as Neural Networks, Deep Neural Networks, Decision Trees, and Random Forest.

oStatistical models on large data sets using cloud computing services such as AWS, Azure, and GCP.

Adept at performing EDA to find patterns in business data and communicate findings to the business using visualization tools such as Matplotlib, Seaborn, and Plotly.

Hands-on experience in leading teams to

oProductionize statistical or machine learning models and create APIs or data pipelines for the benefit of business leaders and product managers.

oprepare clean data pipelines and design, build, validate, and refresh machine learning models.

Brilliant in applying predictive analytics for sales to provide forecasting and improve decision-making using techniques such as ARIMA, ETS, and Prophet.

Good knowledge of applying statistical analysis and machine learning techniques to live data streams from big data sources using PySpark, and SPARK packages for SQL and ML and batch processing techniques.

An assertive team leader with strong aptitude in developing, leading, hiring, and training highly effective teams; strong analytical skills with proven ability to work well in a multi-disciplined team environment and adept at learning new tools and processes with ease.

PROFESSIONAL EXPERIENCE

Since Sep 2020with Keiser Permanente, Oakland California (REMOTE)

As a Senior Data Scientist

(Kaiser Permanente is an American integrated managed care consortium, based in Oakland, California, United States. Worked on Kaiser Permanente Analytic Insights project to obtain a single source of truth, drill down into actionable insights and understand cost drivers, keep claims data lag at or below the market norm, respond to ad-hoc queries promptly and create custom reports that satisfied clients.

Created a customizable suite of dashboards and reports, robust data analytics, and industry benchmarks.

Integrated Kaiser Permanente’s clinical and financial data warehouses including Medicaid/Medical

Offered actionable insights on Medicaid/Medical and other clinical data

Ensured that Medicaid coverage aligned with Kaiser Permanente’s integrated care model during the integration

Streamlined enrolling and claims processing for Medicaid/Medical by ensuring high quality of the single source of truth/data management.

Used multiple python packages such as Pandas and NumPy for data manipulation, and feature engineering, as well as Matplotlib and Seaborn for visualization exploratory data analysis (EDA) that was often used for presentations

Scikit-Learn’s train-test-validation-split was used for the Medication Adherence and SDoH projects for model evaluation

Implemented an XGBClassifier algorithm with GridSearchCV and EarlyStopping to predict patients who are non-adherent for the Medication Adherence project

Regression Gradient Boosting and Random Forest Regressor were explored with GridSearch and cross-validation techniques to calculate an SHI score for the SDoH project

Investigated co-morbidities by calculating Association Rules using Apriori

Shap was used for feature importance in the Medication Adherence project to perform further feature engineering

Used doc2vec as well as categorical embeddings such as one-hot-encoding and LabelEncoder for patient information to reduce dimensionality

Applied weights to patient demographics that were under-represented

Evaluated the performance of our models using a confusion matrix, accuracy, recall, precision, F1 score, ROC and AUC curves, R2, and RMSE

Performed literature research to gain domain knowledge on how to identify Limited English Proficiency (LEP) patients and the best interventions, as well as Power Analysis, Co-Morbidity, and more

Coordinated with other teams such as the dashboard team to provide model results, and patient and medication information to populate the dashboard, and data integration tests were also performed

Implemented CI-CD pipeline for deployment and MLOps in MS Azure

Automated MLOps processes of Data preprocessing, Model building and validation, Model Selection for deployment, Endpoint Deployment, and Model monitoring and maintenance using a CI CD pipeline

Used Azure DataBricks notebooks to code in Python

Used MLFlow and monitoring tools to detect model drift and implemented model re-training/data refresh in an automated way

Used unit and integrated testing for pipeline diagnostics

Prepared and performed bi-weekly presentations to stakeholders, project owners, and other data science teams to show progress toward project goals

Packages used: NumPy, Scikit-Learn, Pandas, Matplotlib, Seaborn, Github, PostgreSQL, Shap, A-priori, doc2vec, Keras, Azure DataBricks, MLFlow

Aug 2018 – Sep 2020 with CNN, Atlanta GA

As AI Scientist-ML Engineer

(CNN is an online broadcasting platform that provides live coverage & analysis of breaking news, as well as a full range of international, political, business, entertainment, sports, health, science & weather coverage, and topical in-depth interviews)

Built a real-time news analyzer.

Implemented application of various machine learning algorithms and statistical modeling techniques like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression, and Linear Regression using Python and determined performance.

Interrogate analytical results to resolve algorithmic success, robustness, and validity

Use of a variety of NLP methods for text mining, information extraction, topic modeling, parsing, and relationship extraction.

Developing, deploying, and maintaining production NLP models with scalability in mind.

Performed word embedding using BERT, and ELMo.

Implemented Agile Methodology for building an internal application

Use of knowledge databases and language ontologies.

Wrote a Flask app to call CoreNLP for parts-of-speech and named entity recognition on natural English queries.

Optimized SQL queries to improve the performance of data collection.

Developed an estimate of uncertainties for the semantic predictions made by the deep convolutional model.

Derived high-quality information, and significant patterns from a textual data source. Used Document Term Frequency and TF-IDF (Term Frequency- Inverse Document Frequency) algorithm to find information for topic modeling.

Analysed large data sets, applied machine learning techniques and developed predictive models, statistical models, and developed and enhanced statistical models by leveraging best-in-class modeling techniques.

Wrote a Scala console application that is retrieving data using Hive or MapReduce.

Used Pyspark to use Spark SQL and Spark ML modules.

Used Hive/MapReduce must Scrap data from datasets from an API.

Implemented in AWS using EC2 instances.

Designed, developed, and produced reports that connect quantitative data to insights that drive and change business

Oct 2016 – Aug 2018with Datakind Houston TX

As a Data Scientist

(DataKind utilizes data science in the service of humanity. From one-hour events to year-long engagements, DataKind scientists enable social changemakers to address tough humanitarian challenges.)

Developed a system that did the classification and analysis of drinking or non-drinking water, collecting the data based on a multisensory approach Parameters water descriptors are transformed into electrical signals from physico-chemical sensors.

Data was transmitted to a processing unit which ensures the acquisition and analysis.

Deployed a machine learning module supervised by a human expert continuously and permanently.

Build a system around a PC and a state-of-the-art multi-channel acquisition card generation, which is intended to acquire multiple input signals, and by using a neural network we assure that the output will be a separation of the data in two very distinct classes (drinking or non-drinking water).

Created and populated PostgreSQL tables for the Feature Store and oversaw projects to create an ingestion service to update these tables through the use of a flask app

Designed Architecture Diagram for the Feature Store to migrate to Microsoft Azure

Researched past Data Integrity projects and evaluated approaches with a lens to developing a Data Integrity product.

Documented and recommended a partner’s technical requirements for a current and/or future-state Data Integrity solution to be adopted.

Applied knowledge management (KM) practices to organize all Data Integrity-related codebases and technical documentation.

Proposed best practices where opportunities were seen to improve KM to enable the efficient transfer of knowledge between subsequent projects and across related projects.

Researched FHIR standards and implications for DataKind’s Data Integrity product roadmap.

Successfully delivered solutions per project requirements.

Apr 2015 – Oct 2016with World Health Organization, Washington DC

As Data Analysis Scientist

Applied scientific and business analytics skills, integrated and prepared large, varied datasets, and communicated results.

Worked with specialized database architecture and cloud computing environments.

Developed analytic approaches to strategic business and clinical decisions.

Performed analysis using predictive modeling, data/text mining, and different statistical tools.

Built predictive modeling using Machine Learning algorithms such as Random Forests, Naive Bayes, Neural Networks, SVM, NLP techniques, Ensemble Modeling, GB, etc.

Worked with Big Data infrastructure and tools such as Hive and Spark.

Applied statistics and organized large datasets of both structured and unstructured data.

Worked with applied statistics and applied mathematics tools for performance optimization.

ACADEMIC CREDENTIALS

Masters – Faculty of Science Dhar El Mahraz

Smart University

Bachelors – Bachelor Degree – Electronics

Smart University

technical skills

Analytic Development Python, R, Spark, SQL

Python Packages - Numpy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, Fastai, SciPy, Matplotlib, Seaborn, Numba

Artificial Intelligence - Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, Regression, Naïve Bayes

Natural Language Processing - Text analysis, classification, chatbots.

Deep Learning - Machine Perception, Data Mining, Machine Learning, Neural Networks, TensorFlow, Keras, PyTorch, Transfer Learning

Programming Tools and Skills -Jupyter, RStudio, Github, Git, APIs, C++, Eclipse, Java, Linux, C#, Docker, Node.js, React.js, Spring, XML, Kubernetes, Back-End, Databases, Bootstrap, Django, Flask, CSS, Express.js, Front-End, HTML, MS Azure, AWS, GCP, Azure Databricks, AWS Sagemaker

Data Modeling - Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series analysis

Machine Learning - Natural Language Processing and Understanding, Machine Intelligence, Machine Learning algorithms

Analysis Methods - Forecasting, Multivariate analysis, Sampling methods, Clustering Predictive, Statistical, Sentiment, Exploratory and Bayesian Analysis. Regression Analysis, Linear models,

Applied Data Science - Natural Language Processing, Predictive Maintenance, Chatbots, Machine Learning, Social Analytics, Interactive Dashboards.

Responsibilities

Contact this candidate