Profile Summary
* *****’ hands-on working experience is the Data Science/Machine .Learning space
Skilled with multiple areas of prescriptive analytics / prescriptive modelling, including.
Machine Learning, Natural Language Processing, Applied Statistics, Operations Research,.
and a variety of optimization techniques
Stays on top of advances within the fields of Machine Learning and Artificial Intelligence.
Proficiency in application of statistical predictive modeling, machine learning classification techniques, econometric forecasting techniques.
Proficiency in various type of optimization, Market Mix modeling, Segmentation, Time Series, Price Promo models, Customer Retention models, Elastic Models, Net lift models
Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python, Tableau and Splunk.
Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction in Python.
Adept at discovering patterns in data using both algorithms, visual representation, and intuition.
Hands-on applying machine learning techniques such as Naïve Bayes, Linear and Logistic Regression Analysis, Neural Networks, RNN, CNN, Transfer Learning, Time-Series Analysis, Trees and Random Forests.
Experience in designing stunning visualizations using Splunk software and publishing and presenting dashboards, Storyline on web and desktop platforms.
Hands-on experience in business understanding, data understanding, preparation of large databases.
Worked on Natural Language Processing with NLTK, SpaCy and other module for application development for automated customer response.
Program automation processes using Python, Microsoft Azure and the AWS Lambda service.
Skilled transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
Experience applying Neural Networks, Support Vector Machines (SVM), and Random Forest.
Technical Skills
Programming Languages: Python, SQL, R, SAS, C#, Command Line
Python packages: Matplotlib, Seaborn, Numpy, Pandas, Scikit-Learn, TensorFlow, SciPy, Bokeh, Numba, NLTK
Machine Learning: Natural Language Processing & Understanding, Machine Intelligence, Machine Learning algorithms, Statistical Modeling, Computer Vision, Time Series, Survival Analysis, Accelerated time to Failure, Anomaly detection
Deep Learning: Machine perception, Machine Learning algorithms, Neural Networks, TensorFlow, Keras, Data Mining
Artificial Intelligence: text understanding, classification, pattern recognition, recommendation systems, targeting systems, ranking systems.
Analysis: Advanced Data Modeling, Forecasting, Regression, Predictive, Statistical, Sentiment, Exploratory, Stochastic.
Data Modeling: Bayesian Analysis, Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioral Modeling, Probabilistic Modeling
Communication: Reporting, Documentation, Presentation, Collaboration. Clear, effective with a wide variety of colleagues, audiences.
Infraestructure: Cloud Environments Amazon Web Services (AWS), Google Cloud Plattform and Microsoft Azure.
Professional Work Experience
Since February 2023
CDS Global, Atlanta, Georgia/ Sr. Data Scientist
CDS Global is a business process solutions provider that uses the power of data and technology to support your entire consumer lifecycle. I am currently in the process of migrating 47 optimization revenue models from DataRobot to Sagemaker. These models cover a variety of topics, such as client churn and price elasticity.
Deploy end-to-end client churn and price elasticity models, including model monitoring and evaluation.
Developed Marketing Mixed Models with the objective of forecasting sales, customer churn and optimize marketing.
Developed Advanced marketing mixed models incorporating macroeconomic factors, competitor activity, seasonality, to comprehend factors influencing sales.
Generate high-quality training datasets with the most significant features, including data cleaning, feature selection, and sampling techniques.
Conduct thorough exploratory data analysis to identify patterns, correlations, and outliers in the data.
Apply advanced feature engineering techniques, such as normalization, imputation, and encoding methods, to improve model performance.
Use automated machine learning (AutoML) tools such as SageMaker to efficiently train and evaluate models.
Utilize SageMaker built-in algorithms to create accurate and efficient models.
Perform hyperparameter tuning to optimize model performance and fine-tune model parameters.
Deploy all models as endpoints and perform batch transformations to generate predictions at scale.
Compare model performance against other tools such as DataRobot and evaluate using classification and regression metrics.
Validate model predictions using DataRobot predictions as actual values andmeasure variance.
Generate a proof of concept (POC) for both model types and present them to executives for approval.
Document the entire model building and deployment process for future pipeline generation and reference.
Use Airflow to build a scalable and automated pipeline for both model types.
February 2020 – February 2023
DXC Technology (Ashburn, VA) / Sr. Data Scientist/Machine Learning Engineer
DXC Technology is a Fortune 500 global IT services leader, I am part of the Data & Analytics team where I am the technical leader of the Machine learning practice for healthcare, financial, manufacturing and retail industries. I developed and deployed a Computer Vision model to classify X-Ray images per diagnosis for a major healthcare provider by using transfer learning techniques and Convolutional Neural Networks. I built marketing analytics models (or MMM) for a major retail chain in the US, there the firm looked to increase its online KPIs (e.g., number of operations and Click Through Rate (CTR), and predict customer churn. I built a hybrid recommender engine to offer relevant suggestions to visitors. Relevant KPIs increased after the engine was deployed, and the company realized expanded profits. I developed and deployed my models into AWS cloud environment.
Engaged with the company’s sales department, data engineering team, and software development team.
Used different transfer learning algorithms (VGG16, VGG19, AlexNet, ResNet50, EfficientNet) for computer vision models
Used YOLO algorithm for an object recognition use case
Assessed model performance using Click Through Rate (CTR) and Mean Average Precision (MAP).
Applied a K-Nearest Neighbors (KNN) algorithm with Cosine Similarity for collaborative filtering and recommender systems
Used NumPy, SciPy, Scikit-Learn, PySpark, MLlib, Pandas, Matplotlib, Seaborn, and Flask.
Built recommender engines on Big Data using PySpark’s MLlib.
Used XGBoost to predict customer Churn
Used generative adversarial networks (GANs) to improve accuracy on my models
Performed queries and pulled data from Amazon S3 MemSQL database into Pandas DataFrames in Python using SQLAlchemy.
Implemented a Singular Value Decomposition (SVD) collaborative filtering algorithm to recommend items to users.
Created an OCR model with OpenCV and Google Tesseract to extract text from PDFs
Produced a Flask app API that returns a software agnostic JSON file for software developers to implement in the site.
Used Scikit-Learn for creating and training collaborative filtering algorithms.
Used AWS S3 and Redshift Data Warehouse to access AWS Resources from Python.
Worked with AWS Quicksight, Lambda, SageMaker, Athena, Microsoft Azure and others.
Defined different metrics and indicators for item similarity in the content-based approach.
Coordinated with the UI/UX team to plan the implementation of recommendations.
September 2018 – February 2020
Deloitte Consulting (Atlanta, GA) / Artificial Intelligence Developer
Deloitte has a regional Analytics team for the Southwest region. Worked on a team to create an alert automation system for internal messages and logs by leveraging cutting-edge NLP techniques for an Atlanta based Hospital. A hand-labeled internal dataset combined with tweets from Twitter’s API was used to train a model for importance, relevance, and priority along with a sentiment analysis matrix. Results were then classified by priority and urgency. The final production model used a neural network based on medical BERT and allowed users to decide what types of messages they wanted to let through the filter through an adjustable threshold and re-training. User productivity was expected to increase by 18.8% as projected by the business. Created a Chatbot to address customer requests from a major healthcare company by using BERT and PyTorch
Accessed the Twitter API using a Python wrapper to extract pseudo-labeled data based on hashtags.
Cleaned and prepared text best data through normalization, tokenization, stemming, and lemmatization using BERT and NLTK.
Coded customized solutions using Python and the Tensorflow and Numpy libraries.
Tested on a variety of embedders, including bag of words, TD-IDF, Word2vec, and ELMO.
Utilized statistical classifiers, random forests, and logistic regressions to perform sentiment analysis.
Used Torch and BERT encoder to create a chatbot
Used RNNs and LSTM for different NLP problems
Constructed an Artificial Neural networking machine learning solution for natural language processing.
Implemented a model utilizing BERT for embedding and classification and fine-tuned to specific data.
Became proficient in Natural Language Processing, SQL queries and web scrapping for collecting literature using BeautifulSoup
Productionized final model by hosting a web API and user-friendly intranet app powered by FLASK.
Used Google Cloud Platform (GCP), Colab, Vertex AI, Big Query, AutoML and others
November 2015 – September 2018
Hewlett Packard Enterprise (Houston, Texas) / Data Scientist
Hewlett Packard has several divisions for enterprise solutions. I served as part of a Data Science team that used an artificial neural network developed in PyTorch and Facebook’s Prophet model as the base for a sales forecasting project. I utilized Python for data cleaning on a large dataset that included multiple years’ worth of data across different regional departments in dozens of stores. I produced highly accurate forecasts for each regional store and department. I also created a model to predict maintenance of servers in collaboration with the R&D engineering team by using Accelerated Time to Failure models.
Prepared data for exploratory analysis.
Built a model using Facebook Prophet to produce highly accurate predictions of a weekly sales.
Deployed model created highly accurate 6-month forecasts up to 6 months in advance for every store and department.
Tested survival analysis technique using various method: Accelerated Failure Time model, proportional Hazard model and Cox Proportional Hazard (CPH) to estimate the default probability and default time and chose the best performing model.
Used different Time Series models, ARIMA, SARIMA, Prophet, LSTMs, etc.
Assessed model performance on large datasets.
Pulled data from Hadoop cluster (HDFS Cloudera).
Utilized Python, Pandas, SciPy, and NumPy for exploratory data analysis, data wrangling and feature engineering.
Applied Kernel Density estimation in lower dimensional space as a feature to predict fraud.
Tested Anomaly Detection Models such as Expectation Maximization, Isolation Forest, and Elliptical Envelope.
Completed hypothesis testing and statistical analysis to determine statistically significant changes in claims after participating in the safety program.
Utilized Tableau, Splunk and TabPy for visualization of analyses.
Consulted with various departments within the company, including SIU and Safety.
January 2014 – November 2015
Entech Biomedical (Chandler, Arizona) / Remote / Jr. Data Expert
Entech Biomedical in Chandler was focused on providing medical equipment service to private practice physician offices, surgery centers, freestanding clinics and major medical centers throughout the Southwest. My role involved first determining medical equipment that was sold with the highest profit margin, and then producing a model to predict the quantity sold in a particular quarter. My engagement extended to working with a team to improve site’s recommender system. We grouped the site users into two types of expected customers and determined which group was more influenced by our current recommender system and segmentation models.
Modeled quantity of the part with highest profit margin sold per quarter using Theano in Python.
Modeled long-run average cost (LRAC) of various electronic medical components to determine which products could be ordered in higher volume to maximize profit margins.
Applied Natural Language Processing (NLP) to classify reviews as being from end customers of medical equipment service.
Applied K-Means clustering to group types of type of customers from sales data.
Optimized Recommender system for online customers to see more feasible medical equipment services based on their business unit and what type of customer they are.
Structured a time-series model to determine time-dependence and seasonality of medical equipment services using SARIMA in Python’s statistical statsmodels library.
Implemented Gaussian radial bases into model to account for the seasonality of medical equipment services.
Education
Universidad de Guadalajara (CUCEI) – Master's degree in Bioengineering and Smart Computing
Specialized in: artificial intelligence, machine learning, electrophysiology, electrophysiological signals processing
ITESM – BSc Degree in Biomedical Engineering
Specialized in Bioanalytics
Certifications:
Project Management Certificate (In Analytical projects)
Research & Publications
Sensors Journal, special issue Advanced Sensing and Image Processing (Computer Vision) Techniques for Healthcare Applications. Work accepted and published in a leading international, open access, peer-reviewed journal which rank is JCR-Q1/CiteScore-Q1. Research article: Effect of Auditory Discrimination Therapy on Attentional Processes of Tinnitus Patients. https://www.mdpi.com/1424-8220/22/3/937
IEEE Signal Processing in Medicine and Biology Symposium in Pennsylvania, USA. Work selected, presented in an international symposium, published in the IEEE Xplore digital library and invited to be published as a book chapter in an ebook produced by Springer. Research article: Monitoring of auditory discrimination therapy for tinnitus treatment based on event- related (de-) synchronization maps.
https://ieeexplore.ieee.org/abstract/document/9672290
Research article published: XXXVIII National Congress of Biomedical Engineering in Mazatlan, MX, Work accepted in the most important event nationally in the area of Biomedical Engineering. Research article: Algorithm to identify potential cases of diabetic macular edema in fundus images using dynamic segmentation techniques and classifiers based on neural networks and Computer Vision.
https://www.researchgate.net/publication/283489959_Identificacion_de_casos_potenciales_de_edema_macular_diabetico_en_imagenes_de_fondo_de_ojo_utilizando_tecnicas_de_segmentacion_dinamica_y_clasificadores_basados_en_redes_neuronales
Research & Publications
Sensors Journal, special issue Advanced Sensing and Image Processing (Computer Vision) Techniques for Healthcare Applications. Work accepted and published in a leading international, open access, peer-reviewed journal which rank is JCR-Q1/CiteScore-Q1. Research article: Effect of Auditory Discrimination Therapy on Attentional Processes of Tinnitus Patients. https://www.mdpi.com/1424-8220/22/3/937
Languages
English
Spanish