Post Job Free
Sign in

Data Scientist

Location:
San Francisco, CA
Posted:
September 09, 2024

Contact this candidate

Resume:

Steven Liu GitHub Portfolio Website LinkedIn ************@*****.***

Software Engineer and Data Scientist in developing advanced models for dynamic simulations and predictive analytics, with notable achievements in ship pathing, network packet counts, and COVID-19 forecasting. Proficient in Python, JavaScript, SQL, and data visualization tools. Successfully built and deployed solutions for real-time climate monitoring and NBA team seeding prediction. Strong in automating data processes, collaborating with stakeholders, and ensuring model robustness. EDUCATION

University of California, San Diego (2018 - 2021) - B.S. in Data Science - Provost Honors Work Experience

● AIMDyn — Software Engineer — Python, CSS, HTML, JavaScript, MATLAB — (2021 - Current)

Developed and implemented advanced models for dynamic simulations, predictive analytics, and optimization, including predictive models for ship pathing/refueling, network packet counts and COVID-19 forecasting, achieving top 5% accuracy rankings and surpassing competitors such as Johns Hopkins and Microsoft.

Designed and configured systems with various parameters, analyzing performance, and identifying opportunities for improvement to enhance decision-making processes.

Created and refined simulations to visualize and predict system behaviors, automating data processing and integration tasks to boost efficiency and accuracy.

Automated data wrangling processes for model training, identifying data gaps to enhance model performance.

Collaborated with stakeholders to gather requirements, track progress, and adapt solutions as needed, documenting/presenting results daily and maintaining regular communication to meet evolving needs.

Conducted extensive testing to evaluate models across diverse scenarios, ensuring robustness and reliability. TECHNICAL SKILLS

● Programming Languages: Python, JavaScript, SQL, C++, Java

● Libraries & Frameworks: Pandas, NumPy, SciPy, Scikit-Learn, TensorFlow, PyTorch, Flask, BeautifulSoup, Unittest, Coverage, Tkinter, Tableau, ggplot,, Matplotlib, Seaborn

● Data: Spark, Dask, Hadoop MapReduce, Dask, AWS-EC2, PostgreSQL, MySQL

● Tools & Platforms: Git, GitHub, Jira, Confluence, Docker COURSEWORK

● Data Structures & Algorithms - Recursion, Higher-Order, OOP, Complexity, and Data Types

● Application of Data Science -Statistics, Machine Learning Algorithms, and Data Systems on real world data

● Database Management - Relational Database, Schema Design, Query Language and Optimization

● Systems for Scalable Analytics - Big Data, Memory Hierarchy, Distributed Systems, Model Selection, and Deployment at Scale

● Modeling & Machine Learning - Models, Natural Language Processing, and Robotics PROJECTS

● Real-Time Global Climate Monitoring System

Utilized an ESP32 microcontroller to collect environmental data continuously.

Collected and stored over 1 million records in a PostgreSQL database for reliable and scalable data management.

Developed an interactive map and API for real-time data insights and trend analysis of the climate.

● NBA Team Seed Classification

Utilized BeautifulSoup to scrape comprehensive datasets, including player statistics, team schedules, and rosters. Preprocessed the data for analysis by cleaning, normalizing, and structuring it to fit the needs of graph-based models.

Implemented and evaluated accuracy of various classification models including GraphSAGE (88%), GCN

(75%), GCN-LPA (82%), and Logistic Regression (63%).

Deployed the trained models using Docker containers to maintain consistent development and testing environments.



Contact this candidate