Post Job Free
Sign in

Data Scientist Python

Location:
Elche, Alicante Province, Spain
Salary:
45000
Posted:
June 09, 2019

Contact this candidate

Resume:

*

Curriculum Vitae

Personal Information

Name: Andrés Soto Villaverde

Nationality: Cuban

Cel. Phone: (34-633-******

Skype: soto_andres

Email: ******.****.*@*****.***

Web page: https://sites.google.com/view/andres-soto-v, https://sites.google.com/view/my-portfolio-bag

LinkedIn:

https://es.linkedin.com/pub/andres-soto-

villaverde/5/98a/361

Github: https://github.com/andressotov/

Education

Doctorate in Artificial Intelligence, Universidad de Castilla La Mancha, Ciudad Real, Spain, 2006-2008. PhD Thesis Title: “Fuzzy Approach to Conceptual Meaning Processing in Natural Language Documents”, Supervisor: Dr. José Ángel Olivas Varela. Graded unanimously: Excellent Cum Laude Graduate in Mathematics, Universidad de La Habana, La Habana, Cuba, 1968- 1972

Research lines

Currently, my professional interests are related with research and development of Artificial Intelligence applications using Machine Learning, Natural Language Processing and Sentiment Analysis, Recommendation Systems, etc. Key words: Machine Learning, Natural Language Processing, Recommender Systems, Sentiment Analysis, Knowledge Representation, Information Retrieval, Soft Computing, Semantic Web.

Professional experience

Artificial Intelligence Software Engineer, Prisma Analytics, Oct. 2018 – Apr. 2019

Project: applications for Natural Language Processing Technologies: Python, Conditional Random Fields (CRF), NLTK, JSON, sklearn_crfsuite, scikit-klearn, NetworkX, MongoDB, Stanford CoreNLP Dependency Parser, CoNLL-U, Universal Dependencies, FrameNet, PropBank, VerbNet,

Consultant for Machine Learning methods (especially “Natural Language Processing”), Neusta consulting GmbH, Germany, Sep-2018 – Feb. 2019 Project: applications for Natural Language Processing 2

Technologies: Python, Conditional Random Fields (CRF), NLTK, json, sklearn_crfsuite, scikit-klearn, Stanford CoreNLP Dependency Parser Researcher + Project Leader, Fundamentia Business Consulting, Dec. 2017

– Oct. 2018

Project: applications for Natural Language Processing Technologies: Python, scikit-klearn (LinearSVC, SVM, SGDClassifier, MultinomialNB, Pipeline, GridSearchCV), LibSVM, Java, JSON, MongoDB, Linux.

Researcher + Developer, Sukan Sport Technology S.L., Mar. 2017 – Nov. 2017

Project: applications for sport training.

Role: Artificial Intelligence Leader, design, implementation, and results evaluation.

Technologies:

Python, Behavior tree, Finite State Machines

Data Scientist freelancer, Feb. 2017 – May. 2017

Project: expert system for psychology.

Role: Artificial Intelligence Leader, design, implementation, and results evaluation.

Technologies:

Expert Systems, Fuzzy Logic, Topic Maps (Ontopia, Omnigator), Python

(scikit-fuzzy, Pyke, Tkinter), SCI Prolog,

Researcher + Developer, Universidad de Castilla La Mancha, Ciudad Real, Nov. 2016 – Feb. 2017

Project: Intelligent Tutoring Systems (ITS) for Robot Programming Description:

• SCORM packaging of course

• Design, implementation and testing of API REST for course management

• Technical report about Learning Object Metadata standards

• ITS functional description

• Technical architecture documentation

• Technical report about ITS and adaptive algorithm

• Technical report about Data Mining libraries in JavaScript

• Design and implementation of the adaptive algorithm Role: team member; design, implementation, and results evaluation. Technologies:

Intelligent Tutoring Systems, Reinforcement Learning, Multi-Armed Bandit

(MAB) problem, Bayesian Knowledge Tracing (BKT)

Node.js, REST API, Postman, JavaScript, Mongoose, MongoDB, Trello, GitHub

3

Data Scientist freelancer, Jun. 2015 – Nov. 2016

Project: Headlines classifier for a localization system Description:

Natural Language Processing of documents.

Train and test of Machine Learning methods in order to determine which topics do relate those documents

Classification of documents according to a set of predefined topics Role: team member; design, implementation, and results evaluation. Technologies:

Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML

Classification methods: Naïve Bayes (Gaussian, Multinomial, Bernoulli)

Project: Sentiment Analysis for healthcare

Description:

Natural Language Processing of users opinions.

Train and test of Machine Learning methods in order to identify which topics do they refer to.

Sentiment Analysis processing to determine users attitude when writing their opinions

Classification of documents by topics and sentiment Role: team member; design, implementation, and results evaluation. Technologies:

Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML

Classification methods: Naïve Bayes (Gaussian, Multinomial, Bernoulli)

Project: Identify sections in job descriptions

Description:

• Filter XML files with job descriptions to identify section using XML tags (Job Title, Job Description, Company, etc)

• Train and test of Machine Learning methods in order to identify sentences and phrases describing roles or responsibilities, for example

• Classification of documents according to those topics: roles, skills, etc.

Role: team member; design, implementation, and results evaluation. Technologies:

• Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML

• Classification methods: Naïve Bayes (Gaussian, Multinomial, Bernoulli)

Researcher + Developer, 4d-life, Barcelona, Spain, Nov 2014 – Jun 2015 Project: Clustering of companies employees based on topics Description:

4

Natural Language Processing of documents circulating in the company.

Train and test of Machine Learning methods in order to determine which topics do relate those documents

Characterize employees according to their use of documents

Apply Machine Learning methods in order to cluster employees according to the documents and topics they are associated. Role: project leader (2 participants), prototype design, implementation, and results evaluation.

Technologies:

Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, PyLucene

MySQL, Apache Lucene/Solr

Clustering methods:k-means++, DBSCAN, hierarchical clustering Dimensionality Reduction, Latent Semantic Analysis (LSA), Singular Value Decomposition (SVD)

Gradient methods: gradient descent, steepest descent, conjugate gradient.

Researcher + Developer, BITYVIP Technology Ltd., Zaragoza, Spain, Aug. 2012 – Oct. 2014

Project: Content based Filtering Recommender system. Description:

• Processing of natural language documents from social networks and news papers.

• Determine which topics do relate those documents

• Characterize persons according to their use of those documents

• Cluster persons according to documents and topics they are associated.

• Offering of items to persons according to their characterization Role: project leader (3 participants), prototype design, implementation, and results evaluation.

Technologies:

Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda

MySQL, MongoDB

Clustering methods: k-means++, DBSCAN, hierarchical clustering Project: Sentiment analysis of customer review data Description:

• collect lexical resources for opinion mining like lists of manually classified documents and dictionaries in English and Spanish.

• train the classification methods using those resources

• test the methods with other documents not used for training

• evaluate performance

Role: project leader (3 participants), prototype design, implementation, and results evaluation.

Technologies:

5

Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda

MySQL, MongoDB

HTML, PhP, CSS

Classification methods: Naïve Bayes (Gaussian, Multinomial, Bernoulli), Logistic regression, Perceptron, Ridge regression, Passive Aggressive, Support Vector Machine, SGD (Stochastic Gradient Descent), Nearest Centroid

Project: Estimation of the influence of news on opinions in social networks Description:

• Processing of historical behavior of opinions on specific topics.

• Estimation of correlation factors between opinions and topics

• Estimation of the influence of news on opinions in social networks

• Determination of models to predict possible state of opinions Role: project leader (3 participants), prototype design, implementation, and results evaluation.

Technologies:

Python: NLTK, NumPy, SciPy, Panda, scikit klearn, matplotlib MySQL, MongoDB

Regression methods: linear and non linear, one dimensional and multidimensional, ordinary least squares (OLS)

Project: ETL Extract Transform Load

Description:

• Extract Transform Load CSV files to MongoDB and viceversa

• Extract Transform Load Mysql files to MongoDB and viceversa Role: prototype design, implementation, and results evaluation. Technologies:

Python: NLTK, NumPy, SciPy, Panda

MySQL, MongoDB

Project: Software application for airfare reservation Description:

• recommend flights taking into account the user's tourist preferences, climatic aspects, location of the airports, places where the flight stops, etc.

• allows to choose the dates for the travel, origin and destination cities and airports, prefered airlines and scales, how many people will travel and their age ranges.

• Sort offers by price, etc.

• allows clients to recommend touristic destinations, airports and airlines

• cluster clients according to their turistic preferences Role: prototype design, implementation, and results evaluation. Technologies:

Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda

MySQL, MongoDB

HTML, PhP, CSS

6

Clustering methods: Self Organizing Maps

Professor, Universidad Autónoma del Carmen, Ciudad del Carmen, Campeche, México, 2003 – 2012

Courses taught: Web Information Retrieval, Artificial Intelligence, Compiler Construction, Numerical Methods, Discrete Mathematics, Assembler Language, Operating Systems, Computer Simulation, and Object Oriented Programming.

Postgraduate courses taught: Introducción a las nuevas tendencias tecnológicas, Maestría en Administración de Tecnologías de Información

(2013, online master course); Análisis de Datos para Simulación (2004); Formación y Desarrollo de Proyectos de Investigación (2009); Programación MATLAB (2010)

Head of the Computer Science Department 2004 – 2005. Responsible for promoting research between professors and students. Responsible for promoting international collaboration

Associate Professor, Universidad Católica Andrés Bello, Caracas, Venezuela, 1998-2003.

Courses taught: Computer Simulation, Performance Analysis, Computer Architecture, Operating Systems, Algorithms and Programming. Postgraduate courses taught: Simulación utilizando ARENA (2000, Camagüey)

Head of the Computer Science group of professors. Responsible for students’ internship. Responsible for elective courses. Associate Professor, Universidad Simón Bolívar, Caracas, Venezuela, 1997-2003.

Courses taught: Computer Programming

Technical Advisor, Artificial Intelligence Group, Departamento de Computación y Tecnología de la Información, Universidad Simón Bolívar, 1997-2003

Associate Professor, Center of Biomaterials, Universidad de La Habana, La Habana, Cuba, 1990-1996.

Courses taught: Computer Programming, Computer Simulation Head of the Computer Science and Mathematical Models Department Associate Professor, Computer Science Department, Universidad de La Habana, La Habana, Cuba, 1973-1989

Courses taught: Computer Simulation, Operating Systems, System Programming, Computer Programming, Numerical Methods, and Logic. Member of the Scientific Advisory Committee and the Pedagogical Advisory Committee. Member of the Committee for Curriculum Development 1982-1987.

7

Skills

Programming Languages:

Python, MATLAB, Octave, Prolog, Pearl, C/C++, FORTRAN, Pascal, Basic, Arena, SIMAN, GPSS, Smalltalk, Mathematika

Node.js, REST API, Postman, JavaScript, Mongoose, HTML, CSS, PHP Scientific Software:

SciPy, scikit-learn, NLTK, NumPy, matplotlib, NetworkX, Splus, MS MathCAD, EASY-FIT, MODFIT, IMSL

Databases:

Lucene, MongoDB, MySQL

Readable Formats:

XML, JSON, RDF

Ontologies, Thesauri, Dictionaries:

WordNet, MultiWordNet, SentiWordnet, EUROVOC, YAGO, ConceptNet, DBpedia

Version Control System:

GIT, Trello, GitHub

Languages: (excellent, very good, good, regular, etc)

• Spanish: First language

• English: Very good (Speak, Read, Write)

• German: Basic (Speak, Read, Write)

• Russian: Basic (Speak, Read, Write)

Participated in research projects receiving grants: 10 Fellowships, grants and prizes received: 9

Invited professor: 9

Organizing and Program Committees: 15

Connections with national / international research teams: 6 Publications: 47 (4 ISI journal, 8 DBLP)

Conference Papers: 75



Contact this candidate