Resume

Senior Data Science

Location:

Spring Valley, NV, 89146

Posted:

February 25, 2024

Contact this candidate

Resume:

Leonildo de Melo

Data Scientist.

Phone: 314-***-**** Email: ad178x@r.postjobfree.com/

Summary: 8 Years in Data Science/ML, 10 Years in Information Technology.

Creative Data Scientist and Software Engineer focused on machine .learning. Extensive background in project management, leadership, and financial reporting. Well versed in various machine learning techniques, such as Linear and Logistic Regression, Decision Trees, and Neural Network Architectures. Comfortable with deployment and integration on cloud technologies such as AWS and Azure. Math and logical thinking to tackle the everyday obstacles with a touch of good old physics. Thrives under pressure, quick learner, and a self-starter always in the mood for a new challenge.

Professional Profile

•Experience applying Naïve Bayes, Regression, and Classification techniques as well as Neural Networks, Deep Neural Networks, Decision Tree, and Random Forest.

•Familiarity using statistical models on large data sets using cloud computing services such as AWS, Azure, and GCP.

•Applying statistical and predictive modeling methods to build and design reliable systems for real-time analysis and decision-making.

•Expertise in developing creative solutions to business use cases through data analysis, statistical modeling, and innovative thinking.

•Performing EDA to find patterns in business data and communicate findings to the business using visualization tool such as Matplotlib, and Seaborn, and Plotly.

•Leading teams to productionize statistical or machine learning models and create APIs or data pipelines for the benefit of business leaders and product managers.

•Experience using supervised and unsupervised techniques.

•Implementation of predictive analytics for sales to provide forecasting and improve decision-making using techniques such as ARIMA, ETS, and Prophet.

•Excellent communication and presentation skills with experience in explaining complex model and ideas to team members and non-technical stakeholders.

•Leading teams to prepare clean data pipelines and design, build, validate, and refresh machine learning models.

•Applying statistical analysis and machine learning techniques to live data streams from big data sources using PySpark and batch processing techniques.

Professional Skills

Analytic Development

Python, R, Spark, SQL

Python Packages

Numpy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, Fastai, SciPy, Matplotlib, Seaborn, Numba

Programming Tools

Jupyter, RStudio, Github, Git

Cloud Computing

Amazon Web Services (AWS), Azure, Google Cloud Platform (GCP)

Machine Learning

Natural Language Processing & Understanding, Machine Intelligence, Machine Learning algorithms

Analysis Methods

Forecasting, Predictive, Statistical, Sentiment, Exploratory and Bayesian Analysis. Regression Analysis, Linear models, Multivariate analysis, Sampling methods, Clustering

Data Science

Natural Language Processing, Machine Learning, Social Analytics, Predictive Maintenance, Chatbots, Interactive Dashboards.

Artificial Intelligence

Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, Regression, Naïve Bayes

Natural Language Processing

Text analysis, classification, chatbots.

Deep Learning

Machine Perception, Data Mining, Machine Learning, Neural Networks, TensorFlow, Keras, PyTorch, Transfer Learning

Data Modeling

Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling, Time-Series analysis

Soft Skills

Excellent communication and presentation skills. Ability to work well with stakeholders to discern needs. Leadership, mentoring

, Other Programming Languages & Skills

APIs, C++, Eclipse, Java, Linux, C#, Docker, Node.js, React.js, Spring, XML, Bootstrap, Django, Flask, CSS, Express.js, Front-End, HTML, Kubernetes, Back-End, Databases, Finance, GitHub

Professional Experience

AI/ ML Architect

MGM Resorts June 2023 – Current

Las Vegas, Nevada

I led a project focused on crafting an advanced call center chatbot employing Python, leveraging models like ChatGPT and LLAMA. Its aim was to adeptly manage customer queries, elevating the call center's response efficiency. To bolster accuracy and context comprehension, I integrated Neo4j as a knowledge graph for data management. Azure Web Services facilitated seamless deployment and scalability.

•Establishing Graph Infrastructure: Setting up the foundational structure of the graph system and initializing its data.

•API Development: Creating and implementing an API for efficient access to graphs.

•User-Friendly Interface Creation: Designing and developing an intuitive query interface powered by LLM for seamless interaction with the knowledge graph.

•Expansion and Maintenance: Developing an API to incorporate new data into the graph and ensuring the LLM model's accurate contributions.

•Technological Implementation: Employing Azure, Python, Neo4J, and BloomFire to deploy, code, manage the database, and enhance functionalities respectively.

•User Interface Development: Building the UI using Streamlit and adhering to an MVC architecture for smooth user experience.

•Tool & Framework Utilization: Leveraging langchain, Hugging Face, Transformers, and OpenAI to achieve specific functionalities within the project.

•Team Collaboration & Project Management: Collaborating with a multi-disciplinary team including a Product Owner, Solutions Architect, Software Engineer, Tech Artist, Manager, and AI/ML Architect using JIRA for effective project coordination.

•Addressing Challenges: Tackling integration issues, managing data ingestion, and mitigating instances of high hallucination in the LLM model.

•Achievement in Accuracy Enhancement: Focusing efforts on achieving high accuracy in the chatbot's responses, marking a notable accomplishment for the team's efforts.

Senior Data Scientist

Amdocs Sep 2021 – May 2023

Chesterfield, Missouri

Document Data Mining using Computer Vision – Amdocs is a global provider of software and services to communications, media and financial services providers and digital enterprises. As Computer Vision and NLP expert, I led a project to develop an object recognition system that combined CNNs with NLP tools to process scanned documents into database entries. My responsibilities included:

•Developed, evaluated, and trained a custom convolutional neural network (CNN) using frameworks such as Tensorflow and Keras in Python.

•Leveraging of model checkpoints and early stopping as well as optimizers such as Adam to expedite the model training process.

•Image resizing and interpolation into a standard size as well as generation of rotational and other invariances using the Skimage library.

•Utilization of the CV2 library in order to read and render videos.

•Collecting and preprocessing a dataset of over 10,000 images, including objects in natural and artificial environments, with varying lighting conditions and camera angles, as well as scanned documents with varying quality and resolution.

•Training and fine-tuning a CNN architecture using TensorFlow and Keras to recognize objects in the dataset with an initial accuracy of 85%.

•Developing an NLP module using spaCy to analyze the textual descriptions of the objects and their contexts, and to generate additional features for the CNN to incorporate.

•Integrating the NLP module with the CNN architecture using a multimodal fusion approach, which allowed the model to learn from both visual and textual information simultaneously.

•Applying the object recognition system to scanned documents and using Tesseract OCR and AWS Text Extract services to extract and classify text, tables, and other relevant information from the documents.

•Developing a post-processing module that used NLP techniques to further analyze and interpret the extracted text and generate structured data outputs.

•Evaluating the performance of the NLP-enhanced CNN model and scanned document processing system using a holdout set of images and documents and achieving a final accuracy of 93% for object recognition and 90% for document processing, with a false positive rate reduced by 40%.

Lead Data Scientist/ ML Engineer

Nestle Purina Jan 2019– Aug 2021

St Louis, Missouri

Marketing Machine Learning Engineer – As a Machine Learning Engineer, I applied Marketing Mix Modeling (MMM) to quantify the impact of marketing inputs on sales and market share. I built regression models considering expenditures, macroeconomic factors, seasonality, and competition. By leveraging techniques like multi-touch attribution (MTA), I accurately measured the effectiveness of marketing strategies and optimized resource allocation, enhancing the overall return on investment.

•Project led to an increase of performance, accuracy, precision, and recall rate.

•Employed Marketing Mix Modeling (MMM) using Python to analyze the influence of various marketing components on sales and market share.

•Developed regression models on AWS Cloud, incorporating factors such as marketing expenditures, macroeconomic indicators, seasonality, and competition.

•Implemented multi-touch attribution (MTA) techniques using TensorFlow to accurately measure the intricacies of digital marketing.

•Used Python-based APIs to integrate models with existing systems and enhance the effectiveness of marketing strategies.

•Optimized resource allocation, enhancing return on investment, and maintained these models in a cloud environment for scalability.

•Continually updated and refined models to reflect changing market trends, employing continuous integration and deployment strategies on AWS.

•Cleaned and transformed data to prepare datasets for further analysis.

Demand Planning Scientist – As a Demand Planning Statistician at Nestle Purina Pet Care, I developed and enhanced predictive models using complex data science techniques to predict product demand in North America. I utilized a variety of statistical techniques including regression, ARIMAX, ESM, and other time series methods to improve forecast accuracy, reduce forecast bias, increase customer fulfillment, and predict changes in customer demand.

•Cleaned and transformed data to prepare datasets for further analysis.

•Automated data acquisition, modeling, and visualization in order to streamline and simplify processes. Cleaned and transformed data to prepare datasets for further analysis.

•Provided internal data science consulting services, helping business partners identify opportunities and problems that could be addressed through data science solutions. Supported small-scale projects from initial ideation through planning, execution, and delivery.

•Combined various data inputs (shipment, order, POS, and promotional data) from different external sources (Sales, Marketing, Operations Planning, Customer Facing, and more) as potential predictors of customer demand.

•Developed and enhanced forecast models for manufacturing plants and customer accounts using a variety of statistical techniques, including regression, ARIMAX, and ESM. Improved forecast accuracy by 10% and reduced forecast bias by 5%.

•Performed Data Preprocessing on Censor Generated and IoT Data.

•Pre-processed data using PCA and feature elimination while still maintaining a classification accuracy of more than 99% by the trained models.

•Implemented SVM, for faster training, less resource-intensive reasons.

•Implemented various neural networks such as convolutional and recurrent for large number of features.

•Developed K-means and Density-Based Spatial Clustering of Application with Noise and mixture methods such as multi-variate Gaussian mixture model for this process.

Sr NLP Engineer, Data Scientist

Buoy Health Feb 2017 – Dec 2018

Boston Massachusetts

Medical Document Search and Chatbot – Buoy Health required symptom checker chatbot which leverages AI to deliver personalized and more accurate diagnoses and medical document search. The company’s algorithm was trained on clinical data from 18,000 medical papers in an effort to mirror the literature referenced by physicians. Beginning with the symptoms provided by the user via natural language processing, the chatbot matches the symptoms to all possible conditions and then ask clarifying questions to narrow them down to the best selection. Symptom checker chatbots are not clinical decision support (CDS) tools and do not claim to assist with medical decision making (MDM). The bots then put together a “most-likely” diagnosis and advise on seeing a provider based on provided symptoms. Classification was achieved using a Tensorflow sequential model with Softmax as the final activation function because of the sheer number of labels, and Stochastic Gradient Descent as an optimizer. Our initial perplexity measurements show a Computer Understanding of over 90% with a solution matching accuracy of over 85%.

•Worked in an environment using Python, NoSQL, Docker, AWS, and Kubernetes.

•Worked with the Python packages NumPy, Pandas, SciPy, Matplotlib, Plotly, and FeatureTools for data analytics, cleaning, and feature engineering.

•Used NLTK and Gensim for NLP processes such as Tokenization and for creating custom Word Embeddings.

•Imported from Python’s Tensorflow package for building Neural Network models.

•Implemented BERT based embeddings.

•Employed numerous different models, including Convolutional and Recurrent Neural Networks, LSTM, and Transformers.

•Models which were operationalized were deployed to a RESTful API using the Python Flask package and Docker containers.

•Used Agile approaches, including Extreme Programming, Test-Driven Development, and Agile Scrum.

Junior Data Scientist

Citizens Bank Dec 2015 – Jan 2017

Boston, MA

Forecasting and Analytics – Main project targeting the price prediction of houses in greater Boston area. Following a correlation analysis, few attributes in the dataset seemed to correlate with the price attribute. In order to train the model with more comprehensive data, the Boston Police reports were added as attributes. First, the all the incidents per zip code were counted, since in the original dataset there was the zip code attribute. Then a function to loop through the original data was created to append a column that contained the number of crimes reported on the zip code of that row. A negative correlation between house price and areas with high crime rates was shown. The model was then trained using the attributes, number of bathrooms, number of bedrooms, number of crimes and square feet. To test the model, the same data was extracted from a website used to buy/sell property called Zillow. The prediction of the model was 85% accurate in comparison with the price listed on the website. Additional forecasting for sales and overall trends was done in parallel with this work.

•Built and integrated logistic and linear regression models, balancing various internal requirements of covariance and variable criteria.

•Discovered patterns in data using algorithms and use experimental and iterative approach to validate findings.

•Creative thinking/strong ability to devise and propose innovative ways to look at problems by using business acumen, mathematical theories, data models, and statistical analysis.

•Advanced statistical and predictive modeling techniques to build, maintain, and improve on real-time decision systems using ARIMA, ETS, and Prophet.

•Used decision trees and random forests to grade the validity of the variables used in the regression models. Implemented additional tools such as bagging and boosting (AdaBoost, XGBoost) in order to strengthen these models.

•Communicated results and recommendations to business stakeholders on a weekly basis. Implemented feedback and features based on the evolving needs of the business in a rapidly changing social landscape.

•Model evolution performed between competing groups, where the best model was selected for further refinement.

•Final model deployed in a Flask app on AWS to be called via a REST API.

Courses & Certificates

AWS Academy Graduate - AWS Academy Cloud Foundations

Parallel Programming on GPUs

Publication

Costa-Duarte, M. V., et al. "The S-PLUS: a star/galaxy classification based on a Machine Learning approach." arXiv preprint arXiv:1909.08626 (2019).

de Azevedo, Leonildo JM, et al. "Optimized service level agreement establishment in cloud computing." The Computer Journal 61.10 (2018): 1429-1442.

de Azevedo, Leonildo J. de M., et al. "An analysis of metaheuristic to SLA establishment in cloud computing." (2017).

Education

PhD in Computer Science and Computational Mathematics

University of Sao Paulo

Master of Science in Computational Mathematics

University of Sao Paulo

Bachelor’s in Computer Science

State University of the Midwest, Parana, Brazil

Languages

English, Portuguese, Spanish

Contact this candidate