Elaine Zhao
+1-415-******* *******@****.*****.*** Linkedin Github Portfolio Mountain View, 94043 EDUCATION
MS in Data Science - University of San Francisco July 2018 - June 2019
• Relevant courses: Data Structures and Algorithms, Distributed Computing, Cloud Computing, Database Management, Data Acquisition, Object-Oriented Programming and Web Development, Machine Learning BS in Engineering Management - Beijing Jiaotong University Sep 2014 - June 2018 TECHNICAL SKILLS
• Programming: Python, Java, C/C++, JavaScript, HTML/CSS, R
• Database and Distributed Computing: SQL(MySQL, PostgreSQL), NoSQL (MongoDB), Apache Spark
• Tools: AWS (ELK, EC2, S3, RDS, ECS, CDK, Lambda, EMR, Elastic Beanstalk, Sagemaker, CloudFormation, CodeBuild), Google Cloud, Jenkins, Docker, Flask, Git, Tableau
• Frameworks/libraries: SpringMVC, Serverless, Tensorflow, PyTorch, Scikit-Learn, Pandas, NumPy, Sphinx WORK EXPERIENCE
Vianai Systems Palo Alto, CA
Data Scientist Oct 2019 – Present
• Developed a search engine for Covid-19 research papers/articles with AWS Elasticsearch to present the most relevant results to the query. Set up the CI/CD pipeline with AWS Cloud Development Kit and GitHub Actions to automate the application deployment on Elastic Container Service. Improved the search results by generating the embeddings of the query and corpus with NLP model (BERT) and configured it as an additional service.
• Designed a RESTful API to support customer ML hands-on experiments and deployed the API to Elastic Beanstalk. Modularized and encapsulated the experiment and model implementations for easy configuration and fast automation, enabling data preprocessing, feature engineering, modeling, and metrics calculation.
• Built high-accuracy machine learning models (SARIMA, LSTM, MLP, LightGBM) to predict ATM cash withdrawal, significantly improved ATM cash availability, and reduced replenishment effort. Designed an interactive demo app with Flask, BigQuery and Tableau for the stakeholders. Orange Silicon Valley San Francisco, CA
Data Scientist Intern Nov 2018 - June 2019
• Applied a novel TFIDF algorithm on anonymized location data to detect similar users for advertisement targeting. Retrieved the mobile user data from the Spark cluster and optimized the TFIDF algorithm with SparkSQL.
• Wrote Python scripts to process raw text data and train Name Entity Recognition with SpaCy, NLTK, and Gensim, enabling better recognition of relevant entities and topics in customer complaints and feedback. PROJECTS
Dog Image Visual Search and Recommendation App[link] Mar 2019 - May 2019
• Designed a web application which uses Convolutional Neural Nets to search dog shelters to help find missing dogs. Users upload photos of their pets, image similarity scores are calculated and matches are presented.
• Implemented back-end development including web scraping, setting up an EC2 server and S3 file system, and managing users’ profile and search history data using PostgreSQL (AWS RDS). Set up the frontend pages with HTML, CSS and Bootstrap.
• Containerized the application and deployed a highly available virtual private cloud running multiple instances of the Docker image behind a load balancer with CloudFormation. Post Your Recipe App (Base on Java) [Link] Jan 2020 - Feb 2020
• Implemented the web service with Spring Boot and Spring Data JPA for recipe blog.
• Implemented various user interactive features with data access object(DAO) pattern that allow CRUD operations. Configured Spring MVC Annotation to manage URL mapping with controllers and used Hibernate for data persistence. Created automated JUnit tests and integration tests with mocked dependencies (Mokito), and used Jenkins pipeline for build, test and failure handling. Distributed Systems on AWS for Air Quality Index Prediction [Link] Jan 2019
• Set up MongoDB on AWS with 10 severs of replication and sharding to store over 10GB unstructured data.
• Designed data ingestion and ETL programs and built ML regression models with SparkML in Sagemaker notebook backed by Spark in Amazon Elastic MapReduce.