Senior Data Scientist

Location:

Manhattan, NY, 10019

Posted:

August 11, 2023

Contact this candidate

Resume:

Natnael Mengitsu

Data Scientist/ML Engineer

Phone: (929-***-**** Email: ***********@*****.***

Professional Summary

10+ years’ overall experience covering Software/Information Technology/Web Development and Data Science

8 consecutive years focused on Data Science.

Extensive work in Natural Language Processing and Predictive Analytics using Machine Learning Algorithms, Visualization Tools, and Web Deployment Technologies.

Used Neural Networks, Trees, Clustering Algorithms, and Statistical Models to propel systems which perform Sentiment Analysis, Fraud Detection, Client Segmentation, Predictive Maintenance, Demand Forecasting.

Experience in Natural Language Processing (NLP), Machine Learning & Artificial Intelligence.

Experience with AWS, Kubernetes, and Azure cloud computing.

Spark (especially AWS EMR), Kibana, Node.js, Tableau.

Expertise in Machine Learning, Deep Learning, Natural Language Processing, and Data Analytics Ml_Ops, Model Productionizing and Monitoring.

Projects involving Sentiment Analysis, Fraud Detection, Predictive Analytics, Artificial Intelligence.

Skilled Python programmer.

Business understanding, Data understanding, Data preparation, Modeling, Evaluation and Deployment.

Experienced in practical applications of data science to solve business problems and to produce actionable results.

Able to incorporate visual analytics dashboards.

Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction.

Knowledge on Apache Spark and developing data processing and analysis algorithms using Python.

Programming strength in Python, C, C++, Java, SQL, R, MATLAB, Mathematica, JavaScript.

Use of libraries and frameworks in Machine Learning such as NumPy, SciPy, Pandas, Theano, Caffe, Sci-Kit Learn, Matplotlib, Seaborn, TensorFlow, Keras, PyTorch, NLTK, Gensim, Urllib, Beautiful Soup.

Ability with algorithms, data query and process automation.

Evaluation of datasets and complex data modelling.

Technical Skills

ANALYTICS - Data Analysis, Data Mining, Data Visualization, Statistical Analysis, Multivariate Analysis, Stochastic Optimization, Linear Regression, ANOVA, Hypothesis Testing, Forecasting, ARIMA, Sentiment Analysis, Predictive Analysis, Pattern Recognition, Classification, Behavioral Modeling

DATA EXTRACTION AND MANIPULATION - Hadoop HDFS, Hortonworks Hadoop, MapReduce, Cloudera Hadoop, Cloudera Impala, Google Cloud Platform, MS Azure Cloud, SQL, NoSQL, Data Warehouse, Data Lake, SWL, HiveQL, AWS (RedShift, Kinesis, EMR, EC2, Lambda)

NATURAL LANGUAGE PROCESSING - Document Tokenization, Token Embedding, Word Models, Word2Vec, FastText, Bag Of Words, TF/IDF, Bert, Elmo, LDA.

MACHINE LEARNING - Supervised Machine Learning Algorithms (Linear Regression, Logistic Regression, Support Vector Machines, Decision Trees and Random Forests, Naïve Bayes Classifiers, K Nearest Neighbors), Unsupervised Machine Learning Algorithms (K Means Clustering, Gaussian Mixtures, Hidden Markov Models, Auto Encoders), Imbalanced Learning (SMOTE, AdaSyn, NearMiss), Deep Learning Artificial Neural Networks, Machine Perception

PROGRAMMING LANGUAGES - Python, R, SQL, Java, MATLAB, Mathematica, C, C++, JavaScript, PHP

LIBRARIES - NumPy, SciPy, Pandas, Theano, Caffe, Matplotlib, Seaborn, Plotly, TensorFlow, Keras, NLTK, PyTorch, Gensim, Urllib, BeautifulSoup4, PySpark, PyMySQL, SQAlchemy, MongoDB, SQLite3, Flask, Deeplearning4j, EJML, DPLYR, GGPLOT2, Reshape2, TIDYR, PURRR, READR, Apache, Spark, MapReduce, WPF, Entity Framework Core, Node.js

DEVELOPMENT - Git, GitHub, GitLab, Bitbucket, SVN, Mercurial, Trello, PyCharm, IntelliJ, Visual Studio, Sublime, JIRA, TFS, Linux, Unix

APPLICATIONS - Machine Language Comprehension, Sentiment Analysis, Predictive Maintenance, Demand Forecasting, Fraud Detection, Client Segmentation, Marketing Analysis

LEADERSHIP - Push project goals, determine business use cases, and mentor/lead teams

QUALITY - Continuous improvement in project processes, workflows, automation and ongoing learning and achievement CLOUD Analytics in cloud-based platforms (AWS, MS Azure, Google Cloud)

Professional Work Experience

Credit Suisse, New York City, NY

Senior Data Scientist

April 2021 – Present

As an accomplished Data Scientist at Credit Suisse, I contributed my expertise to the Predictive Analytics team, focusing on modelling initiatives. During my tenure, I spearheaded the development of a sophisticated model ensemble specifically tailored to forecast the occurrence and performance of "meme" stocks. This innovative approach involved leveraging the synergistic power of Natural Language Processing (NLP) and Time Series Analysis.

To execute this project successfully, my team diligently tracked a curated selection of stocks and devised an advanced NLP algorithm. This algorithm effectively assessed social media mentions related to each tracked security, enabling the establishment of a comprehensive buy, hold, or sell score. By incorporating time series analysis techniques, we further harnessed the power of historical data to generate a robust series of forecasts, empowering informed decision-making on a daily basis. My contributions throughout this endeavour highlighted my ability to leverage cutting-edge methodologies and effectively apply them to real-world scenarios.

Conducted data cleaning, feature scaling, and feature engineering utilizing Python's Pandas and Numpy packages, alongside building models using deep-learning frameworks, leveraging the power of large language models and ML-ops deployment tools.

Employed Python for data mining and developed statistical models to generate tactical recommendations for business executives, harnessing the capabilities of large language models and ML-ops deployment tools.

Developed a Python utility that seamlessly integrated multiple packages such as Scipy, Numpy, and Pandas, showcasing expertise in utilizing diverse tools for efficient data processing.

Integrated R into MicroStrategy to expose metrics derived from sophisticated and detailed models, surpassing the native capabilities of the tool, and utilizing the power of large language models and ML-ops deployment tools.

Enhanced an existing semantic labeling model to incorporate Monte Carlo and Markov Chain techniques, providing uncertainties and semantic predictions through Bayesian approximation. Proposed a novel metric to evaluate the quality of estimated uncertainties, outperforming the baseline model on the same dataset. Utilized the capabilities of large language models and ML-ops deployment tools for enhanced performance.

Designed dynamic dashboards utilizing Tableau, presenting complex reports encompassing summaries, charts, and graphs to effectively communicate findings to the team and stakeholders, leveraging the power of large language models and ML-ops deployment tools.

Collaborated with Data Engineers to design efficient databases for data science projects, ensuring seamless integration and optimal performance.

Utilized Git for version control, facilitating tracking of file changes and efficient coordination of collaborative work among team members, incorporating the best practices of large language models and ML-ops deployment tools.

Implemented a distributed Random Forest using Python, harnessing the capabilities of large language models and ML-ops deployment tools to optimize predictive modeling.

Leveraged predictive modeling techniques with tools such as SAS, SPSS, R, and Python, effectively utilizing the strengths of each tool in different scenarios while incorporating the power of large language models and ML-ops deployment tools.

Levi Strauss & Co., San Francisco, California

ML-Ops Engineer

September 2018 – April 2021

Levi's, a renowned manufacturing and merchandising establishment, caters to a vast network of over 500 stores and multiple output channels. As the Lead ML-Ops Engineer, I assumed a pivotal role in spearheading the design and implementation of a cutting-edge restocking solution leveraging AWS Batch and Docker Containers. This innovative solution aimed to facilitate efficient replenishment processes by populating an app running on Objective C edge devices, allowing for informed decision-making and optimized product restocking strategies based on predicted demand. To achieve this, advanced machine learning models were constructed utilizing AWS Sagemaker, while leveraging the power of Adobe Airflow to schedule and orchestrate the deployment of these models. This comprehensive approach ensured seamless integration and synchronization of the restocking solution, aligning it with business objectives and enhancing operational efficiency at Levi's.

During my tenure at Levi's, I successfully:

Led the design and implementation of the restocking solution, incorporating AWS Batch and Docker Containers, to streamline the replenishment process across the organization's extensive store network and diverse output channels.

Employed Objective C edge devices to deliver real-time insights and enable data-driven restocking decisions, enhancing productivity and customer satisfaction.

Leveraged the capabilities of Sagemaker to develop robust machine learning models, enabling accurate prediction of product demand and optimizing restocking strategies accordingly.

Utilized Adobe Airflow to effectively schedule and automate the execution of the machine learning models, ensuring seamless integration into the existing infrastructure and maximizing operational efficiency.

Developed a personalized in-session product recommendation engine, leveraging advanced algorithms and data-driven techniques to enhance customer engagement and drive sales.

Successfully implemented and configured a Next-Best offer prediction solution, empowering Levi's to deliver tailored offers and recommendations to customers based on their preferences and behavior.

Architected, built, maintained, and enhanced new and existing suites of algorithms and their underlying systems, ensuring their reliability, scalability, and performance.

Implemented end-to-end solutions for batch and real-time algorithms, including essential tooling for monitoring, logging, automated testing, performance testing, and A/B testing, guaranteeing the accuracy and effectiveness of the implemented models.

Collaborated closely with data scientists and analysts to develop and deploy new product features across various platforms, including the ecommerce website, in-store portals, and the Levi's mobile app, facilitating seamless customer experiences.

Established scalable and automated processes for data analysis, model development, validation, and implementation, enabling efficient and streamlined workflows.

Implemented deployment solutions utilizing TensorFlow, Keras, Docker, and Elastic Kubernetes Service, harnessing the power of these technologies to ensure efficient and scalable model deployment and management.

Implemented robust Model Drift Monitoring and Retraining Strategies, ensuring the continued accuracy and effectiveness of the deployed models over time.

Integrated the solution into AWS, GCP, and Azure environments, leveraging the capabilities of these cloud platforms to ensure seamless scalability, reliability, and performance.

Implemented and configured dedicated preprocessing, inference, and model validation scripting using a SageMaker model for batch transformation, ensuring accurate and efficient data processing and model utilization.

Sutter Health, Sacramento, California(Remote)

Data Scientist/ NLP Engineer

July 2016 – June 2018

Worked with a natural language processing (NLP/data science team that implemented a medical response AI chatbot that directs users regarding Insurance issues, coverage, medical records, and billing questions to the correct department or provides general answers to patient’s non-health-related questions.

Cleaned text data using different techniques.

Performed EDA such as Bag of Words, K-means, DBSCAN, etc.

Used embedders such as Universal Google Encoder, Doc2Vec, TFIDF, BERT, and ELMO to identify the best embedder that yields the best performing result.

Performed Cosine Similarity method to match the user input to the most similar trained question and matched the trained question to the corresponding department.

Deployed model using FLASK.

Split the dataset into training, validation, and test data.

Visualized and rescaled images.

Created the model using Keras convolutional layers, max pooling layers, normalization, and drop-out layers using different activation functions.

Flattened the CNN output and fed them to the dense layers.

Hudson Valley Advisory Group Inc. Poughkeepsie, NY

Data Scientist

March 2014 – June 2016

As a Data Scientist, I worked in the customer experience domain for a client who was a large US-based online store. Solved problems related to customer profiling, product recommendation, and customer churn for subscription services..

Implemented content-based filtering and collaborative-based filtering for the recommender engine

Analyzed data and performed customer segmentation using unsupervised clustering approaches such as Hierarchical Clustering and Gaussian Mixture Models

Used Matplotlib and R-based ggplot2 via rpy2 for data visualization and presentation

Worked with internally generated data stored in an Oracle database for useful information extraction

Utilized Python and its main packages such as NumPy, Scipy, Pandas, and TensorFlow for this project

Worked on Spark tools for analyzing massive data sets

Collected structured relational database queries using PostgreSQL

Applied Statistical NLP/Machine Learning techniques, particularly Supervised Learning for document classification, information extraction, and named entity recognition in the context

Developed custom dashboards and visualizations, and utilized survival and churn analysis to detect and predict churn among members

Utilized Microsoft Azure ML for predicting churn using logistic regression and boosted decision trees

Logistic regression helped determine which predictors/variables impacted churn the most, and boosted decision trees were used to decrease the misclassification rate of high-risk churners

Performed cross-validation of model results and calculated metrics such as r-squared, accuracy, confusion matrix, precision, recall, and ROC-AUC

Marist College Poughkeepsie, NY

IT Support

January 2012 – March 2014

Provided 1st and 2nd-level support for the campus-wide deployment, imaging, and maintenance of IT equipment.

Maintained accurate repair records and document all maintenance performed using HDP.

Supported and maintained corporate Active Directory containing over 10K users.

Education

Master’s in Information Systems

Marist College

Relevant Coursework: Data Mining, Data Analysis, Emerging Technologies, IS Policy, Decision systems, Linear Programming Models, Monte Carlo Simulation Models, Regression Models (Linear, Multiple, Logistic)

Bachelor of Science in Computer Science

Marist College

Relevant Coursework · Data Structures and Algorithms, Data Management, UNIX, System Design, Project Management

Contact this candidate