Curriculum Vitae
Personal Information
Name: Andrés Soto Villaverde
Nationality: Cuban
Cel. Phone: (34-633-******
Skype: soto_andres
Email: ******.****.*@*****.***
Web page: https://sites.google.com/view/andres-soto-v, https://sites.google.com/view/my-portfolio-bag
LinkedIn:
https://es.linkedin.com/pub/andres-soto-
villaverde/5/98a/361
Github: https://github.com/andressotov/
Education
Doctorate in Artificial Intelligence, Universidad de Castilla La Mancha, Ciudad Real, Spain, 2006-2008. PhD Thesis Title: “Fuzzy Approach to Conceptual Meaning Processing in Natural Language Documents”, Supervisor: Dr. José Ángel Olivas Varela. Graded unanimously: Excellent Cum Laude Graduate in Mathematics, Universidad de La Habana, La Habana, Cuba, 1968- 1972
Research lines
Currently, my professional interests are related with research and development of Artificial Intelligence applications using Machine Learning, Natural Language Processing and Sentiment Analysis, Recommendation Systems, etc. Key words: Machine Learning, Natural Language Processing, Recommender Systems, Sentiment Analysis, Knowledge Representation, Information Retrieval, Soft Computing, Semantic Web.
Professional experience
Artificial Intelligence Software Engineer, Prisma Analytics, Oct. 2018 – Apr. 2019
Project: applications for Natural Language Processing Technologies: Python, Conditional Random Fields (CRF), NLTK, JSON, sklearn_crfsuite, scikit-klearn, NetworkX, MongoDB, Stanford CoreNLP Dependency Parser, CoNLL-U, Universal Dependencies, FrameNet, PropBank, VerbNet,
Consultant for Machine Learning methods (especially “Natural Language Processing”), Neusta consulting GmbH, Germany, Sep-2018 – Feb. 2019 Project: applications for Natural Language Processing 2
Technologies: Python, Conditional Random Fields (CRF), NLTK, json, sklearn_crfsuite, scikit-klearn, Stanford CoreNLP Dependency Parser Researcher + Project Leader, Fundamentia Business Consulting, Dec. 2017
– Oct. 2018
Project: applications for Natural Language Processing Technologies: Python, scikit-klearn (LinearSVC, SVM, SGDClassifier, MultinomialNB, Pipeline, GridSearchCV), LibSVM, Java, JSON, MongoDB, Linux.
Researcher + Developer, Sukan Sport Technology S.L., Mar. 2017 – Nov. 2017
Project: applications for sport training.
Role: Artificial Intelligence Leader, design, implementation, and results evaluation.
Technologies:
Python, Behavior tree, Finite State Machines
Data Scientist freelancer, Feb. 2017 – May. 2017
Project: expert system for psychology.
Role: Artificial Intelligence Leader, design, implementation, and results evaluation.
Technologies:
Expert Systems, Fuzzy Logic, Topic Maps (Ontopia, Omnigator), Python
(scikit-fuzzy, Pyke, Tkinter), SCI Prolog,
Researcher + Developer, Universidad de Castilla La Mancha, Ciudad Real, Nov. 2016 – Feb. 2017
Project: Intelligent Tutoring Systems (ITS) for Robot Programming Description:
• SCORM packaging of course
• Design, implementation and testing of API REST for course management
• Technical report about Learning Object Metadata standards
• ITS functional description
• Technical architecture documentation
• Technical report about ITS and adaptive algorithm
• Technical report about Data Mining libraries in JavaScript
• Design and implementation of the adaptive algorithm Role: team member; design, implementation, and results evaluation. Technologies:
Intelligent Tutoring Systems, Reinforcement Learning, Multi-Armed Bandit
(MAB) problem, Bayesian Knowledge Tracing (BKT)
Node.js, REST API, Postman, JavaScript, Mongoose, MongoDB, Trello, GitHub
3
Data Scientist freelancer, Jun. 2015 – Nov. 2016
Project: Headlines classifier for a localization system Description:
Natural Language Processing of documents.
Train and test of Machine Learning methods in order to determine which topics do relate those documents
Classification of documents according to a set of predefined topics Role: team member; design, implementation, and results evaluation. Technologies:
Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML
Classification methods: Naïve Bayes (Gaussian, Multinomial, Bernoulli)
Project: Sentiment Analysis for healthcare
Description:
Natural Language Processing of users opinions.
Train and test of Machine Learning methods in order to identify which topics do they refer to.
Sentiment Analysis processing to determine users attitude when writing their opinions
Classification of documents by topics and sentiment Role: team member; design, implementation, and results evaluation. Technologies:
Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML
Classification methods: Naïve Bayes (Gaussian, Multinomial, Bernoulli)
Project: Identify sections in job descriptions
Description:
• Filter XML files with job descriptions to identify section using XML tags (Job Title, Job Description, Company, etc)
• Train and test of Machine Learning methods in order to identify sentences and phrases describing roles or responsibilities, for example
• Classification of documents according to those topics: roles, skills, etc.
Role: team member; design, implementation, and results evaluation. Technologies:
• Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML
• Classification methods: Naïve Bayes (Gaussian, Multinomial, Bernoulli)
Researcher + Developer, 4d-life, Barcelona, Spain, Nov 2014 – Jun 2015 Project: Clustering of companies employees based on topics Description:
4
Natural Language Processing of documents circulating in the company.
Train and test of Machine Learning methods in order to determine which topics do relate those documents
Characterize employees according to their use of documents
Apply Machine Learning methods in order to cluster employees according to the documents and topics they are associated. Role: project leader (2 participants), prototype design, implementation, and results evaluation.
Technologies:
Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, PyLucene
MySQL, Apache Lucene/Solr
Clustering methods:k-means++, DBSCAN, hierarchical clustering Dimensionality Reduction, Latent Semantic Analysis (LSA), Singular Value Decomposition (SVD)
Gradient methods: gradient descent, steepest descent, conjugate gradient.
Researcher + Developer, BITYVIP Technology Ltd., Zaragoza, Spain, Aug. 2012 – Oct. 2014
Project: Content based Filtering Recommender system. Description:
• Processing of natural language documents from social networks and news papers.
• Determine which topics do relate those documents
• Characterize persons according to their use of those documents
• Cluster persons according to documents and topics they are associated.
• Offering of items to persons according to their characterization Role: project leader (3 participants), prototype design, implementation, and results evaluation.
Technologies:
Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda
MySQL, MongoDB
Clustering methods: k-means++, DBSCAN, hierarchical clustering Project: Sentiment analysis of customer review data Description:
• collect lexical resources for opinion mining like lists of manually classified documents and dictionaries in English and Spanish.
• train the classification methods using those resources
• test the methods with other documents not used for training
• evaluate performance
Role: project leader (3 participants), prototype design, implementation, and results evaluation.
Technologies:
5
Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda
MySQL, MongoDB
HTML, PhP, CSS
Classification methods: Naïve Bayes (Gaussian, Multinomial, Bernoulli), Logistic regression, Perceptron, Ridge regression, Passive Aggressive, Support Vector Machine, SGD (Stochastic Gradient Descent), Nearest Centroid
Project: Estimation of the influence of news on opinions in social networks Description:
• Processing of historical behavior of opinions on specific topics.
• Estimation of correlation factors between opinions and topics
• Estimation of the influence of news on opinions in social networks
• Determination of models to predict possible state of opinions Role: project leader (3 participants), prototype design, implementation, and results evaluation.
Technologies:
Python: NLTK, NumPy, SciPy, Panda, scikit klearn, matplotlib MySQL, MongoDB
Regression methods: linear and non linear, one dimensional and multidimensional, ordinary least squares (OLS)
Project: ETL Extract Transform Load
Description:
• Extract Transform Load CSV files to MongoDB and viceversa
• Extract Transform Load Mysql files to MongoDB and viceversa Role: prototype design, implementation, and results evaluation. Technologies:
Python: NLTK, NumPy, SciPy, Panda
MySQL, MongoDB
Project: Software application for airfare reservation Description:
• recommend flights taking into account the user's tourist preferences, climatic aspects, location of the airports, places where the flight stops, etc.
• allows to choose the dates for the travel, origin and destination cities and airports, prefered airlines and scales, how many people will travel and their age ranges.
• Sort offers by price, etc.
• allows clients to recommend touristic destinations, airports and airlines
• cluster clients according to their turistic preferences Role: prototype design, implementation, and results evaluation. Technologies:
Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda
MySQL, MongoDB
HTML, PhP, CSS
6
Clustering methods: Self Organizing Maps
Professor, Universidad Autónoma del Carmen, Ciudad del Carmen, Campeche, México, 2003 – 2012
Courses taught: Web Information Retrieval, Artificial Intelligence, Compiler Construction, Numerical Methods, Discrete Mathematics, Assembler Language, Operating Systems, Computer Simulation, and Object Oriented Programming.
Postgraduate courses taught: Introducción a las nuevas tendencias tecnológicas, Maestría en Administración de Tecnologías de Información
(2013, online master course); Análisis de Datos para Simulación (2004); Formación y Desarrollo de Proyectos de Investigación (2009); Programación MATLAB (2010)
Head of the Computer Science Department 2004 – 2005. Responsible for promoting research between professors and students. Responsible for promoting international collaboration
Associate Professor, Universidad Católica Andrés Bello, Caracas, Venezuela, 1998-2003.
Courses taught: Computer Simulation, Performance Analysis, Computer Architecture, Operating Systems, Algorithms and Programming. Postgraduate courses taught: Simulación utilizando ARENA (2000, Camagüey)
Head of the Computer Science group of professors. Responsible for students’ internship. Responsible for elective courses. Associate Professor, Universidad Simón Bolívar, Caracas, Venezuela, 1997-2003.
Courses taught: Computer Programming
Technical Advisor, Artificial Intelligence Group, Departamento de Computación y Tecnología de la Información, Universidad Simón Bolívar, 1997-2003
Associate Professor, Center of Biomaterials, Universidad de La Habana, La Habana, Cuba, 1990-1996.
Courses taught: Computer Programming, Computer Simulation Head of the Computer Science and Mathematical Models Department Associate Professor, Computer Science Department, Universidad de La Habana, La Habana, Cuba, 1973-1989
Courses taught: Computer Simulation, Operating Systems, System Programming, Computer Programming, Numerical Methods, and Logic. Member of the Scientific Advisory Committee and the Pedagogical Advisory Committee. Member of the Committee for Curriculum Development 1982-1987.
7
Skills
Programming Languages:
Python, MATLAB, Octave, Prolog, Pearl, C/C++, FORTRAN, Pascal, Basic, Arena, SIMAN, GPSS, Smalltalk, Mathematika
Node.js, REST API, Postman, JavaScript, Mongoose, HTML, CSS, PHP Scientific Software:
SciPy, scikit-learn, NLTK, NumPy, matplotlib, NetworkX, Splus, MS MathCAD, EASY-FIT, MODFIT, IMSL
Databases:
Lucene, MongoDB, MySQL
Readable Formats:
XML, JSON, RDF
Ontologies, Thesauri, Dictionaries:
WordNet, MultiWordNet, SentiWordnet, EUROVOC, YAGO, ConceptNet, DBpedia
Version Control System:
GIT, Trello, GitHub
Languages: (excellent, very good, good, regular, etc)
• Spanish: First language
• English: Very good (Speak, Read, Write)
• German: Basic (Speak, Read, Write)
• Russian: Basic (Speak, Read, Write)
Participated in research projects receiving grants: 10 Fellowships, grants and prizes received: 9
Invited professor: 9
Organizing and Program Committees: 15
Connections with national / international research teams: 6 Publications: 47 (4 ISI journal, 8 DBLP)
Conference Papers: 75