Data Science Modeling

Location:

Dublin, OH

Salary:

120000

Posted:

July 29, 2024

Contact this candidate

Resume:

Dr. Yuping Yang

Email: ***********@*****.***

Web link: /https://www.linkedin.com/in/yuping-yang-59a1aa314/

Phone: 614-***-****

Summary: Primary skills in data modeling and data science, also a good software designer, coder, and project leader.

SKILLS

languages: python (9 years), java, c, c++, plsql, tsql, presto sql

data science: LLM, tensorflow, inferential and bayesian statistics, mcmc, jags, vision ml, time series

data modeling: nosql, relational, star schema, graph data modeling.

others: b,c,k,t shells, node.js, awk, perl, c#, pyqt, tableau, d3.js, matplotlib, oracle, mssql, postgresql, hive, spark, databricks., wsl2, docker, github, cuda, docker, linux, jupyter notebooks

EXPERIENCE

Time series forecasting projects:

Columbus, Ohio 04/15/2024 - present

A petroleum stock price ml based forecasting (in progress)

A time series forecasting software with algorithm to search for time differenced correlations with dynamic parameter adjustments (in progress)

Some of the python libraries used in time series analysis: numpy, pandas, matplotlib, pickle, h5py, sklearn.cluster Kmeans, DBSCAN, AgglomerativeClustering, tslearn.clustering TimeSeriesKmeans, sklearn.metrics silhouette_samples, silhouette_score, dtaidistance dtw, fastdtw tslearn.metrics cdist_dtw, tensorflow

Senior Data Scientist in Fractal Analytics Contract to C3.ai

Redwood City, California 01/12/2024 - 04/15/2024

Familiarized myself with process scheduling optimization (pso) module based on integer programming

Upgrading of c3 AI platform in different environments (local, jupyter, docker, Linux in wsl2, C3 cloud).

used vm, linux in WSL2, docker, mac, github, python, node.js

Participated in client engagement for AI projects

ML Architect in ZeroSight

San Jose, California 02/2023 - 12/2023

A POC project to investigate best ways “from video imaging to 3D modeling”. Some of the tech used:

Used ML packages openpose, mediapipe, Yolov8, and Yolo-nas to do object detection and pose estimation.

Did Dockerfile, pre-configured images, cmake, python virtual environ, Nvidia gpu, cuda, anaconda, vlc, ffmpeg, streamlit

Used pretrained models (Yolov8) and done some training. Used cvat to prepare labels for training.

Done affine transformations for 3D modeling. Used quaternions, 3D transformation matrices, etc.

Used Blender (python) for crafting 3D models and Unity scripting (C#) for controlling object movements

Done socket programming: UPD and TCP, messages between python to python (Blender) and python to C# (Unity)

Done threading and async coding (C#) for decoupling message passing and object control

Investigated data storage for storing and searching large amount of imaging data, such as vector database Milvus

Data Scientist in Meta Platforms

Menlo Park, California 12/2021 – 01/2023

Analyzed online ads parameters to enhance ads effectiveness. Used Presto SQL, Hive, Python, and notebooks. Originally focused on bid cap's impact, the project evolved into a high-dimensional optimization problem, analyzing various parameters collectively and individually. Used Presto SQL, Hive, statistics, and notebooks. Identified lift, reach, impression, CTR, and OR across channels, objectives, verticals, and partner conditions. Presented my findings to marketing managers and DS group.

In marketing campaign result analysis, used Python, R, z-tests, t-tests, p-values, winsorization, CUPED to statistically analyze marketing result analysis and communicate to marketing managers with reports.

Done Machine Learning, used PyTorch, Hive, and notebooks, to scope target audience for marketing campaigns. Created labels, prepare data, design and trained models, and identified responsive customers.

Senior Technical Staff in ComResource

Columbus, Ohio 06/2019 – 05/2021

Directly report to CTO. I have done numerous POCs (proof-of-concept) for clients. All projects are in python.

Done data mining (wrote code to analyze data): used apriori algorithm to discover value patterns with 99% accuracy. The code can discover thousands of value patterns in hours as compared to human staff can do in several months.

Done product tier classification (wrote code to analyze data) in a large database for client Cargill.

Done NLP (wrote code to analyze text): extracted patent citations with 97% accuracy from 50,000+ pdf documents from US Patent Office. I proposed this approach to replace the original proposal to do manual extraction with outsourced low labor cost (it would need more than 10 people) and won this contract from client Chemical Abstract.

Independently impl a data search tool (a gui tool. 5k lines python and pyqt), used to search in a large client database.

Independently impl a graph data visualization tool (a gui tool. 3k lines python and pyqt), used to search in a neo4j graph database with interactive graphs and menu selections.

Built a Azure chatbot as a demo project for Microsoft local marketing office in Columbus Ohio.

Used and compared gcp, aws, azure cloud platforms and wrote a report on usability and learning curves on major features relevant to the client.

Done data modeling: designed and built relational and graph database schemas, starting from a set of large (in num of data items as well as data volumes) unstructured and uncleansed datasets.

Founder in Code Track

Columbus, Ohio 03/2016 – 03/2019

I created this company to develop an automatic data lineage searching software that I designed. Because of the small funding received, I personally developed most of the code (15k lines of python).

Used Tensorflow/Keras (ML) to build a component in lineage search engine to capture user habits of the software.

Design and impl an office file sharing program. (similar to Slack, 3k lines python code)

Meetings with investors, did presentations, wrote business plans, impl plans, budget estimates

Used python, data mining libraries, Tensorflow, web crawler, Django, regex, web sockets, postgresql, pyqt gui framework, windows and ubuntu linux

Python Developer in Signet Accel

Columbus, Ohio 04/15/2015 – 03/2016

This was a small medical software startup aiming to consolidate all medical data into a common schema.

Wrote a versatile data converter in python. Its input are data from different hospitals. The challenges were to correctly identify and categorizes text values of the same concepts with different medical codings.

Familiarized with OMOP schema, Observational Medical Outcome Partnership, and many medical coding sets such as SNOMED, CPT-4, and ICD9.

Used python, lucene, usagi, csvkit, sql20, postgresql. sql server, windows, linux

Developer in JP Morgan Chase Bank

Columbus, Ohio 03/2013 – 04/2015

In financial compliance department, I worked on modification of compliance code due to regulation changes,

Wrote java program to archive data to third party storage format to comply with regulatory requirements. A 8 month project with 3 managers (source, general, and target), and 4 tech resources to collect and store data. I was the only conversion programmer to convert source data format to target data format and conversion is the only part of the project that requires substantial coding effort.

Investigated legacy financial compliance systems, separated and abstracted its main logic components for re-factorization of the system. Studied and done an (visual programming) POC.

Traced data flows from cobol/copybook files from ibm db2 (Bear Stern, bought by JPMC in 2008 financial crisis) to target oracle system in an effort to link two different systems.

Traced all historical data linked to a high profile Wall Street financial scandal.

Did website security analysis and maintenance, such as SSL setup for the Chase web site.

Used java, c++, junit, log4j, oracle, sql server, ibm db2, xml, ssl setup for https

Data Architect in Research Institute at Nationwide Children Hospital

Columbus, Ohio 11/2011– 03/2013

Redesign and impl the design of the data backend of etrac system. Data backend was buggy after many years of continuous modifications. Etrac system is used to track from funding proposal all the way to fund use for every research project. This system tracks research fundings for 200 medical professors, along with 800 scientists and assistants, from the medical school of Ohio State University.

Used sql server, data modeling, etrac, erwin, olap cubes, star schema

Senior System Analyst in G2O

Columbus, Ohio 5/2006 – 11/2011

Worked in about 30 projects and the projects are in various client companies and government agencies around Columbus. Most projects are database and data warehouse design, ETL (SQL or SSIS) impl and BI build.

Done many PO projects, wrote many technical proposals and assessments, for client companies.

Used sq server, oracle, postgresql, pentaho, Microsoft Bl suite, google map, iWay (ETL), java, google map program

Data Architect in GE Aircraft Engines

Cincinnati, Ohio 01/2005 - 5/2006

Coordinate the integration of several data areas, each area has several dozens of databases, and each database has dozens or hundreds of tables. This is done to support merging of GE Aircraft Engines and GE Locomotives

Founder in Ruiling Optics

Columbus, Ohio 2003 – 2004

Designed and built an ultra-fast (split-of-second) flat-bed image scan device, applied and obtained 3 US utility (invention) patents, conducted research on imaging software and hardware, and done many funding raising activities (business plans, exec summaries, emails, presentations, meeting with investors), It is intended to replace flatbed scanner

Principal Engineer in Profitlogic

Cambridge, Massachusetts 10/2001 – 10/2002

Profitlogic is a software startup to do retail price setting. JC Penny project ($15m, 6/2001-12/2002) is the biggest pricing project awarded to this company before it sold to Oracle.

I was in charge of the design and impl of the whole data backend of this pricing software and lead a team of 7 developers. It was a large data lake (several TBs) with relational and star schema. It has data about 3.5 million active SKU's across 1,020 stores.

Used UNIX Shell Scripting, Toad, Oracle 8i, PL/SQL, Pro*C, and Erwin in the project.

Senior Engineer in Electron Economy

Cupertino, California 5/2000 – 10/2001

Electron Economy was a software startup funded with $86m. I was hired with no fixed role assignment.

Proposed an online, rule-based intelligent business negotiation system, 2 month after I joined the company.

A division of 15 engineers (5 with PhDs and 10 with masters) was formed based on this proposal. I helped organize the team and structured the project.

When Electron Economy was sold to Viewlocity, the rule-based transaction engine is one of the two main software assets Electron Economy had (the other one is a web-based online sales transaction engine).

Design and impl java software for the project I proposed, along with other team members.

EDUCATION The Ohio State University

BS Computational Mathematics (error control in large numerical computational processes)

MS Mathematics

MS Computer and Info (CIS)

PhD in Computer and Info (data mining and fast data retrieval indices)

PhD candidacy in Mathematics in Combinatorics (graph theory, counting structures)

(Courses are 3-5 weeks each)

ML course certifications (21) Year Contents Institute

Getting Started with Tensorflow 2 2021 CNN, RNN, LSTM, python Imperial College London

Customizing your model with Tensorflow 2 2021 CNN, RNN, LSTM, python Imperial College London

Supervised Machine Learning Regression and Classification 2023 python, Tensorflow DeepLearning.AI

Unsupervised Learning, Recommenders, Reinforcement Learning 2023 python, Tensorflow DeepLearning.AI

Advanced Learning Algorithms 2023 Complex models, Tensorflow, python DeepLearning.AI

Sequence Models 2023 RNN, LSTM, GRU, forecast, python DeepLearning.AI

Deep Learning Neural Network with PyTorch 2023 PyTorch, python IBM

Sequences, Time Series and Predictions 2023 RNN, LSTM, ARIMA, LLM, python DeepLearning.AI

Transformer Models and BERT Model 2023 Transformer, BERT model, python Google Cloud

Google Cloud Big Data and Machine Learning 2023 GCP Google Cloud

How Google Does Machine Learning 2023 GCP Google Cloud

Intro to Vertex AI Studio 2023 Vertex AI Studio Google Cloud

TensorFlow on Google Cloud 2023 GCP, Tensorflow, python Google Cloud

Feature Engineering 2023 GCP, DataPrep, DataFlow, BigQuery Google Cloud

Launching into Machine Learning 2023 GCP, Vertex AI, python Google Cloud

Natural Language Processing on Google Cloud 2023 GCP, LLM, python Google Cloud

MLOps Started 2023 GCP, MLOps Google Cloud

Machine Learning in the Enterprise 2023 GCP, MLOps Google Cloud

Production Machine Learning Systems 2023 GCP, MLOps Google Cloud

Distributed Computing with Spark SQL 2023 AWS, databricks, Spark SQL UC Davis

Generative AI with Large Language Models 2024 Generative AI, Transformer, LLM DeepLearning.AI

Statistics course certifications (8) Year Contents Institute

Practical Time Series Analysis 2021 Statistics based, ARIMA, R State Univ of New York

Bayesian Statistics: Techniques and Models 2021 Statistical Model, MCMC, jags, R UC Santa Cruz

Inferential Statistics 2023 Inference, hypothesis tests, R Duke University

Experimental Design Basics 2024 ARIMA, Latin square, block design, R Arizona State Univ

R Programming 2024 R language John Hopkins Univ

Getting Started with SAS Programming 2024 SAS language SAS Institute

Probabilistic Graph Model 1: Representation 2024 Bayesian and Markov Networks Stanford Univ

Probabilistic Graph Model 2: Inference 2024 Bayesian and Markov Networks Stanford Univ

Data Visualization course certifications (4) Year Contents Institute

Creating Dashboards and Storytelling with Tableau 2023 Tableau UC Davis

Visual Analysis with Tableau 2023 Tableau UC Davis

Fundamentals of Visualization Tableau 2023 Tableau UC Davis

Essential Design Principles for Tableau 2023 Tableau UC Davis

CDMP professional certification

Certified Data Management Professional at the Master Level with Specialties in Data Management and Data Warehousing) 2013

From DAMA (Data Management Association International) and ICCP (Institute for Certification of Computing Professional

Short projects and short courses certifications (5)

Image Compression and Generation using Variational Autoencoders 2021 Coursera

Decision Tree and Random Forest Classification using Julia 2023 Coursera

ML with ChatGPT, image Classification 2023 chatGPT assist in ML image classification Coursera

Cypher Fundamantals 2024 Cypher language in Neo4j database Neo4j GraphAcademy

Neo4j Fundamenatals 2024 Neo4j database Neo4j GraphAcademy

Graph Data Modeling Fundamentas 2024 data modeling in Neo4j database Neo4j GraphAcademy

Contact this candidate