Dr. Yuping Yang
Email: ad7lup@r.postjobfree.com
Web link: /https://www.linkedin.com/in/yuping-yang-59a1aa314/
Phone: 614-***-****
Summary: Primary skills in data modeling and data science, also a good software designer, coder, and project leader.
SKILLS
languages: python (9 years), java, c, c++, plsql, tsql, presto sql
data science: LLM, tensorflow, inferential and bayesian statistics, mcmc, jags, vision ml, time series
data modeling: nosql, relational, star schema, graph data modeling.
others: b,c,k,t shells, node.js, awk, perl, c#, pyqt, tableau, d3.js, matplotlib, oracle, mssql, postgresql, hive, spark, databricks., wsl2, docker, github, cuda, docker, linux, jupyter notebooks
EXPERIENCE
Time series forecasting projects:
Columbus, Ohio 04/15/2024 - present
A petroleum stock price ml based forecasting (in progress)
A time series forecasting software with algorithm to search for time differenced correlations with dynamic parameter adjustments (in progress)
Some of the python libraries used in time series analysis: numpy, pandas, matplotlib, pickle, h5py, sklearn.cluster Kmeans, DBSCAN, AgglomerativeClustering, tslearn.clustering TimeSeriesKmeans, sklearn.metrics silhouette_samples, silhouette_score, dtaidistance dtw, fastdtw tslearn.metrics cdist_dtw, tensorflow
Senior Data Scientist in Fractal Analytics Contract to C3.ai
Redwood City, California 01/12/2024 - 04/15/2024
Familiarized myself with process scheduling optimization (pso) module based on integer programming
Upgrading of c3 AI platform in different environments (local, jupyter, docker, Linux in wsl2, C3 cloud).
used vm, linux in WSL2, docker, mac, github, python, node.js
Participated in client engagement for AI projects
ML Architect in ZeroSight
San Jose, California 02/2023 - 12/2023
A POC project to investigate best ways “from video imaging to 3D modeling”. Some of the tech used:
Used ML packages openpose, mediapipe, Yolov8, and Yolo-nas to do object detection and pose estimation.
Did Dockerfile, pre-configured images, cmake, python virtual environ, Nvidia gpu, cuda, anaconda, vlc, ffmpeg, streamlit
Used pretrained models (Yolov8) and done some training. Used cvat to prepare labels for training.
Done affine transformations for 3D modeling. Used quaternions, 3D transformation matrices, etc.
Used Blender (python) for crafting 3D models and Unity scripting (C#) for controlling object movements
Done socket programming: UPD and TCP, messages between python to python (Blender) and python to C# (Unity)
Done threading and async coding (C#) for decoupling message passing and object control
Investigated data storage for storing and searching large amount of imaging data, such as vector database Milvus
Data Scientist in Meta Platforms
Menlo Park, California 12/2021 – 01/2023
Analyzed online ads parameters to enhance ads effectiveness. Used Presto SQL, Hive, Python, and notebooks. Originally focused on bid cap's impact, the project evolved into a high-dimensional optimization problem, analyzing various parameters collectively and individually. Used Presto SQL, Hive, statistics, and notebooks. Identified lift, reach, impression, CTR, and OR across channels, objectives, verticals, and partner conditions. Presented my findings to marketing managers and DS group.
In marketing campaign result analysis, used Python, R, z-tests, t-tests, p-values, winsorization, CUPED to statistically analyze marketing result analysis and communicate to marketing managers with reports.
Done Machine Learning, used PyTorch, Hive, and notebooks, to scope target audience for marketing campaigns. Created labels, prepare data, design and trained models, and identified responsive customers.
Senior Technical Staff in ComResource
Columbus, Ohio 06/2019 – 05/2021
Directly report to CTO. I have done numerous POCs (proof-of-concept) for clients. All projects are in python.
Done data mining (wrote code to analyze data): used apriori algorithm to discover value patterns with 99% accuracy. The code can discover thousands of value patterns in hours as compared to human staff can do in several months.
Done product tier classification (wrote code to analyze data) in a large database for client Cargill.
Done NLP (wrote code to analyze text): extracted patent citations with 97% accuracy from 50,000+ pdf documents from US Patent Office. I proposed this approach to replace the original proposal to do manual extraction with outsourced low labor cost (it would need more than 10 people) and won this contract from client Chemical Abstract.
Independently impl a data search tool (a gui tool. 5k lines python and pyqt), used to search in a large client database.
Independently impl a graph data visualization tool (a gui tool. 3k lines python and pyqt), used to search in a neo4j graph database with interactive graphs and menu selections.
Built a Azure chatbot as a demo project for Microsoft local marketing office in Columbus Ohio.
Used and compared gcp, aws, azure cloud platforms and wrote a report on usability and learning curves on major features relevant to the client.
Done data modeling: designed and built relational and graph database schemas, starting from a set of large (in num of data items as well as data volumes) unstructured and uncleansed datasets.
Founder in Code Track
Columbus, Ohio 03/2016 – 03/2019
I created this company to develop an automatic data lineage searching software that I designed. Because of the small funding received, I personally developed most of the code (15k lines of python).
Used Tensorflow/Keras (ML) to build a component in lineage search engine to capture user habits of the software.
Design and impl an office file sharing program. (similar to Slack, 3k lines python code)
Meetings with investors, did presentations, wrote business plans, impl plans, budget estimates
Used python, data mining libraries, Tensorflow, web crawler, Django, regex, web sockets, postgresql, pyqt gui framework, windows and ubuntu linux
Python Developer in Signet Accel
Columbus, Ohio 04/15/2015 – 03/2016
This was a small medical software startup aiming to consolidate all medical data into a common schema.
Wrote a versatile data converter in python. Its input are data from different hospitals. The challenges were to correctly identify and categorizes text values of the same concepts with different medical codings.
Familiarized with OMOP schema, Observational Medical Outcome Partnership, and many medical coding sets such as SNOMED, CPT-4, and ICD9.
Used python, lucene, usagi, csvkit, sql20, postgresql. sql server, windows, linux
Developer in JP Morgan Chase Bank
Columbus, Ohio 03/2013 – 04/2015
In financial compliance department, I worked on modification of compliance code due to regulation changes,
Wrote java program to archive data to third party storage format to comply with regulatory requirements. A 8 month project with 3 managers (source, general, and target), and 4 tech resources to collect and store data. I was the only conversion programmer to convert source data format to target data format and conversion is the only part of the project that requires substantial coding effort.
Investigated legacy financial compliance systems, separated and abstracted its main logic components for re-factorization of the system. Studied and done an (visual programming) POC.
Traced data flows from cobol/copybook files from ibm db2 (Bear Stern, bought by JPMC in 2008 financial crisis) to target oracle system in an effort to link two different systems.
Traced all historical data linked to a high profile Wall Street financial scandal.
Did website security analysis and maintenance, such as SSL setup for the Chase web site.
Used java, c++, junit, log4j, oracle, sql server, ibm db2, xml, ssl setup for https
Data Architect in Research Institute at Nationwide Children Hospital
Columbus, Ohio 11/2011– 03/2013
Redesign and impl the design of the data backend of etrac system. Data backend was buggy after many years of continuous modifications. Etrac system is used to track from funding proposal all the way to fund use for every research project. This system tracks research fundings for 200 medical professors, along with 800 scientists and assistants, from the medical school of Ohio State University.
Used sql server, data modeling, etrac, erwin, olap cubes, star schema
Senior System Analyst in G2O
Columbus, Ohio 5/2006 – 11/2011
Worked in about 30 projects and the projects are in various client companies and government agencies around Columbus. Most projects are database and data warehouse design, ETL (SQL or SSIS) impl and BI build.
Done many PO projects, wrote many technical proposals and assessments, for client companies.
Used sq server, oracle, postgresql, pentaho, Microsoft Bl suite, google map, iWay (ETL), java, google map program
Data Architect in GE Aircraft Engines
Cincinnati, Ohio 01/2005 - 5/2006
Coordinate the integration of several data areas, each area has several dozens of databases, and each database has dozens or hundreds of tables. This is done to support merging of GE Aircraft Engines and GE Locomotives
Founder in Ruiling Optics
Columbus, Ohio 2003 – 2004
Designed and built an ultra-fast (split-of-second) flat-bed image scan device, applied and obtained 3 US utility (invention) patents, conducted research on imaging software and hardware, and done many funding raising activities (business plans, exec summaries, emails, presentations, meeting with investors), It is intended to replace flatbed scanner
Principal Engineer in Profitlogic
Cambridge, Massachusetts 10/2001 – 10/2002
Profitlogic is a software startup to do retail price setting. JC Penny project ($15m, 6/2001-12/2002) is the biggest pricing project awarded to this company before it sold to Oracle.
I was in charge of the design and impl of the whole data backend of this pricing software and lead a team of 7 developers. It was a large data lake (several TBs) with relational and star schema. It has data about 3.5 million active SKU's across 1,020 stores.
Used UNIX Shell Scripting, Toad, Oracle 8i, PL/SQL, Pro*C, and Erwin in the project.
Senior Engineer in Electron Economy
Cupertino, California 5/2000 – 10/2001
Electron Economy was a software startup funded with $86m. I was hired with no fixed role assignment.
Proposed an online, rule-based intelligent business negotiation system, 2 month after I joined the company.
A division of 15 engineers (5 with PhDs and 10 with masters) was formed based on this proposal. I helped organize the team and structured the project.
When Electron Economy was sold to Viewlocity, the rule-based transaction engine is one of the two main software assets Electron Economy had (the other one is a web-based online sales transaction engine).
Design and impl java software for the project I proposed, along with other team members.
EDUCATION The Ohio State University
BS Computational Mathematics (error control in large numerical computational processes)
MS Mathematics
MS Computer and Info (CIS)
PhD in Computer and Info (data mining and fast data retrieval indices)
PhD candidacy in Mathematics in Combinatorics (graph theory, counting structures)
(Courses are 3-5 weeks each)
ML course certifications (21) Year Contents Institute
Getting Started with Tensorflow 2 2021 CNN, RNN, LSTM, python Imperial College London
Customizing your model with Tensorflow 2 2021 CNN, RNN, LSTM, python Imperial College London
Supervised Machine Learning Regression and Classification 2023 python, Tensorflow DeepLearning.AI
Unsupervised Learning, Recommenders, Reinforcement Learning 2023 python, Tensorflow DeepLearning.AI
Advanced Learning Algorithms 2023 Complex models, Tensorflow, python DeepLearning.AI
Sequence Models 2023 RNN, LSTM, GRU, forecast, python DeepLearning.AI
Deep Learning Neural Network with PyTorch 2023 PyTorch, python IBM
Sequences, Time Series and Predictions 2023 RNN, LSTM, ARIMA, LLM, python DeepLearning.AI
Transformer Models and BERT Model 2023 Transformer, BERT model, python Google Cloud
Google Cloud Big Data and Machine Learning 2023 GCP Google Cloud
How Google Does Machine Learning 2023 GCP Google Cloud
Intro to Vertex AI Studio 2023 Vertex AI Studio Google Cloud
TensorFlow on Google Cloud 2023 GCP, Tensorflow, python Google Cloud
Feature Engineering 2023 GCP, DataPrep, DataFlow, BigQuery Google Cloud
Launching into Machine Learning 2023 GCP, Vertex AI, python Google Cloud
Natural Language Processing on Google Cloud 2023 GCP, LLM, python Google Cloud
MLOps Started 2023 GCP, MLOps Google Cloud
Machine Learning in the Enterprise 2023 GCP, MLOps Google Cloud
Production Machine Learning Systems 2023 GCP, MLOps Google Cloud
Distributed Computing with Spark SQL 2023 AWS, databricks, Spark SQL UC Davis
Generative AI with Large Language Models 2024 Generative AI, Transformer, LLM DeepLearning.AI
Statistics course certifications (8) Year Contents Institute
Practical Time Series Analysis 2021 Statistics based, ARIMA, R State Univ of New York
Bayesian Statistics: Techniques and Models 2021 Statistical Model, MCMC, jags, R UC Santa Cruz
Inferential Statistics 2023 Inference, hypothesis tests, R Duke University
Experimental Design Basics 2024 ARIMA, Latin square, block design, R Arizona State Univ
R Programming 2024 R language John Hopkins Univ
Getting Started with SAS Programming 2024 SAS language SAS Institute
Probabilistic Graph Model 1: Representation 2024 Bayesian and Markov Networks Stanford Univ
Probabilistic Graph Model 2: Inference 2024 Bayesian and Markov Networks Stanford Univ
Data Visualization course certifications (4) Year Contents Institute
Creating Dashboards and Storytelling with Tableau 2023 Tableau UC Davis
Visual Analysis with Tableau 2023 Tableau UC Davis
Fundamentals of Visualization Tableau 2023 Tableau UC Davis
Essential Design Principles for Tableau 2023 Tableau UC Davis
CDMP professional certification
Certified Data Management Professional at the Master Level with Specialties in Data Management and Data Warehousing) 2013
From DAMA (Data Management Association International) and ICCP (Institute for Certification of Computing Professional
Short projects and short courses certifications (5)
Image Compression and Generation using Variational Autoencoders 2021 Coursera
Decision Tree and Random Forest Classification using Julia 2023 Coursera
ML with ChatGPT, image Classification 2023 chatGPT assist in ML image classification Coursera
Cypher Fundamantals 2024 Cypher language in Neo4j database Neo4j GraphAcademy
Neo4j Fundamenatals 2024 Neo4j database Neo4j GraphAcademy
Graph Data Modeling Fundamentas 2024 data modeling in Neo4j database Neo4j GraphAcademy