Post Job Free

Resume

Sign in

Data Scientist Software Engineer

Location:
Katy, TX
Posted:
June 05, 2023

Contact this candidate

Resume:

Linh T. Nguyen, Ph.D. - Computer Science

Houston Area - adxiyk@r.postjobfree.com - 408-***-**** - LinkedIn: linh-nguyen-229a6ba SUMMARY:

• Experience leading team of data scientists and software developers to build AI assisted systems to solve complex business problems.

• Solid background in Machine Learning (ML), Algorithms, Numerical Optimization and Distributed systems. Many years of experience in Natural Language Processing (NLP), Information Retrieval (IR), Search engine, Recommender systems, Computer Vision (CV), and Agile software development.

• Proficiency with latest technology stack: Deep Learning (DL), Knowledge Graph, Data Science, Big Data, Docker, Kubernetes, Micro-services, and cloud based development.

• Skills & Tools: scikit-learn, gensim, spaCy, openNLP, scikit-image, openCV, pandas, numpy, spicy, R/Shiny, IpOpt, CPLEX, solr, Lucene, Mahout, word2vec, BigQuery, CNNs, RNNs, Transformers, PyTorch, TensorFlow, Keras, Hugging Face, git, SQL, Collaborative Filtering (CF), Matrix Factorization (MF), Latent Semantic Analysis (LSA), Jupyter, Nginx, Neo4J, Spark. EXPERIENCE:

Sr. Data Scientist (SparkCognition, 2021 – 2023):

Led a team of software developers and data scientists to build a solution to save people’s lives on Maana’s (acquired by SparkCognition) knowledge platform.

• Safety risk management: designed and built a data driven, knowledge-based, AI assisted system that helps SMEs efficiently identify safety risks, consistently assess their likelihoods and consequences, and sufficiently recommends safeguards for identified risks. The solution has two main components: (i) an ensemble text classification model (which includes a logistic regression model and a random forest model, with features based on sentiment analysis and domain word2vec embedding) to predict the likelihood and consequence of a risk based on its description; (ii) and a recommender to recommend safeguards to reduce both likelihood and consequence. The solution was built as a Python micro-service, with MS-SQL as a database, containerized using Docker, and deployed on a Kubernetes cluster in Azure environment. Sr. Data Scientist – Customer Solution (Maana, 2017 – 2021): Designed and built AI assisted solutions for Maana’s Oil and Gas customers.

• Maritime shipping optimization: built a system that optimizes schedules for a fleet of vessels, with customer constraints on pickup and delivery dates, and business constraints on vessels, products, port admission, routes, fuel costs and operation costs. The business challenge was solved in two main steps: first, feasible schedules were heuristically generated; second, an integer programming problem was formulated and solved to find the most cost efficient schedules. The

1 / 3

Linh T. Nguyen, PhD, Computer Science

solution was implemented as Python and Scala based micro-services, containerized using Docker, and deployed on a Kubernetes cluster.

• Predictive maintenance: built an anomaly detection system that learns on large volume of operational sensor data (IoT) and maintenance records (unstructured text) to early predict possible failures and schedule appropriated maintenance in order to reduce non-operational time of equipments. Named entity extraction was used to process unstructured text; and ensemble modeling was used to boost prediction accuracy. The solution was built and deployed using Python, Docker, and Kubernetes.

Manager, Pricing Strategy – Data Science (Sears Holdings, 2016 – 2017): Collaborated with business stakeholders to identify pricing opportunities and build pricing solutions for thousands of products. Designed and built a data analytic infrastructure for the team.

• Demand modeling: built regression models to predict demand of products based on features such as prices, seasons, locations, competitors’ prices, etc.; built optimization models (integer programming) to recommend best prices to maximize total revenue and margin.

• Dynamic pricing: implemented Explore-Exploit (multi armed bandit) algorithm in R/Shiny to price online products with very little to no historical price data.

• Cloud infrastructure: build and maintain Bigquery database, Linux virtual machines, R-studio server, shiny-server, notebook server, & spark cluster on the cloud (GCP). Manager, Development – Search team (Sears Holdings, 2013 - 2016) Applied natural language processing, image processing and machine learning on different aspects of Sears’s online search engine to improve search relevancy. Curate and maintain large datasets to enrich machine learning models.

• Semantic search engine: led a team to build a semantic product search engine that matches products and search phrases based not only on keyword matching but also on semantic matching. Various NLP technologies were applied to create the final solution: PoS tagging, named entity recognition & entity disambiguation, a domain knowledge graph using Neo4J to model USPTO dataset, spell correction, and Solr/Lucene indexing and searching.

• Product classification: built a system to classify products into a predefined taxonomy based on text classification and word embedding techniques using internal and external data sources.

• Product image duplication detection: built an image search engine to detect duplicated images among millions of product images based on features extracted from several deep learning models. Local sensitive hashing and multi-dimensional indexing were used to improve processing time.

• Spell correction: designed and implemented a context-aware spell correction system to correct users’ search phrases based on edit distance, n-gram indexing, and Solr search.

• Brand name extraction/recognition: led a team to build a brand name recognition/suggestion system to validate products’ brand names entered by vendors based on internal and external data.

2 / 3

Linh T. Nguyen, PhD, Computer Science

Sr. Software Engineer – Machine Learning (Orbitz Worldwide, 2009-2013) Designed and built various data driven projects to improve user experience and increase revenue of the company’s hotel booking engine.

• Hotel ranking optimization: optimized a hotel ranking system to increase click through rate and conversion rate. The system analyzes historical customers’ click stream to identify customer preferences, and learn how to rank hotels based on various features such as location, amenities, and prices. Learning-to-rank technique was used, and implemented based on a Support Vector Machine (SVM) tool.

• Hotel ranking and price optimization: formulated and solved a linear assignment problem to optimize the ranking of hotels based on conversion rates and revenues.

• Hotel recommendation system: implemented several recommendation techniques such as collaborative filtering, user-based, item-based recommendation for the hotel recommendation system.

• Hotel-Customer co-clustering analysis: used matrix co-clustering analysis to discover clusters of customers with similar preferences and clusters of hotels with similar features and the correlation among the two types of clusters.

• Hotel review analysis: used text mining and sentiment analysis on hotel reviews to summarize customer opinions about different aspects of hotels. Research Assistant (Illinois Institute of Technology Information Retrieval Lab., 2004-2009)

• Conducted research in large scale Peer-to-Peer Information Retrieval, Distributed Indexing, Distributed Query Processing, and Peer-to-Peer data analysis.

• Invented an adaptive indexing technique for efficient searching in structured and unstructured file sharing peer-to-peer systems.

• Sample publication: L. Nguyen, W. G. Yee, O. Frieder, Adaptive distributed indexing for structured peer-to-peer networks, Intl’ Conf. on Information and Knowledge Management, 2008. Director, Software Development (Bank for Foreign Trade of Vietnam, 1994-2004) Designed and developed various banking systems on different platforms (DOS, Windows, RS6000, AS/400) using different languages (Foxpro, C, C++, RPG, FORTRAN). Software Engineer (Institute of Information Technology of Vietnam, 1993-1994) Developed a speech signal processing software in C, using Fast Fourier Transform algorithm. EDUCATION:

• Ph.D., Computer Science, Illinois Institute of Technology, 2009.

• M.Sc., Computer Science, Hanoi University of Science and Technology, 2002.

• B.Sc., Computer Science, Hanoi University of Science and Technology, 1993.

3 / 3



Contact this candidate