Post Job Free

Resume

Sign in

Data Scientist Analysis

Location:
Columbus, OH
Salary:
150000
Posted:
April 01, 2024

Contact this candidate

Resume:

Dr. Yuping Yang

Email: ad4pz9@r.postjobfree.com

Web link: https://www.linkedin.com/in/yuping-yang

Phone: 614-***-****

Summary: Over 20 years in software app dev, data modeling, data analysis, and advanced data analysis through Machine Learning, data mining, and Statistics. Good at prototyping, doing POC projects, and a fast learner and enthusiastic new technology explorer.

EXPERIENCE

Fractal.ai

Senior Data Scientist, as Contractor at C3.ai

Redwood City, California

01/12/2024-present

• System setup for working with c3 AI platform

• Systems including vm, docker, linux, windows, mac, github, python, node.js, c3 cloud AI platforms

• Upgrading c3 AI platforms

• participate in client engagement for AI projects ZeroSight

ML Architect

San Jose, California

02/2023 – 12/2023

• Because of the stealth nature of the project, I do not disclose the goal and scope of the project. I function as a foundation engineer to probe the design, technical viability, and construct a POC of a complete AI based software. Here are some of the tech skills that I use in this project:

• Investigated various installations of openpose, mediapipe, Yolov8, Yolo-nas, under a variety of environments: Linux, Windows, docker images and containers, Dockerfile, pre-configured images, cmake, python virtual environ, Nvidia gpu, cuda, anaconda, vlc, ffmpeg, streamlit

• Used vision ML packages (object recognition, pose estimation) yolov8, yolo-nas, openpose, mediapipe, pytorch for object detection and pose estimation.

• Used vision package opencv

• Used pretrained models and done some model training. Used annotation software cvat to prepare labels for training.

• Done affine transformations for 3D modeling: translation, rotation, quaternions, 3D transformation matrices, etc.

• Used Blender to craft 3D objects and Blender scripting (python) for object movement control.

• Used Unity and Unity scripting (C#)

• Done socket programming for messaging: both UPD and TCP sockets, message between (1) python and python (in Blender) and (2) python and C# (in Unity)

• Thread and aync coding (C#) for decoupling messaging and object control

• Investigated data storage for storing and searching large amount of image data Meta Platforms

Data Scientist

Menlo Park, California

12/2021 – 01/2023

• Implemented marketing research and campaigns to enhance advertisement effectiveness, resulting in improved engagement and conversion rates for clients.

• Conducted analysis to optimize online marketing parameter. Used Presto SQL, Hive, Python plotting libraries, and notebooks. Originally focused on bid cap's impact, the project evolved into a high-dimensional optimization problem, analyzing various parameters collectively and individually. Presented findings through tabular representations and 3D graph plots.

• Conducted high-level data analysis using Presto SQL, Hive, and Python, leading to the optimization of online marketing parameters. Using Presto SQL, Hive, statistics, and notebooks. Identified lift, reach, impression, CTR, and OR across channels, obectives, verticals, and partner conditions. Used Python/R, z-tests, t-tests, p-values, winsorization, and CUPED to provide statistical significance and confidence level for marketing managers.

• Done Machine Learning, used PyTorch, Hive, and notebooks, to define target audience for marketing campaigns. Employed campaign design data to create labels, prepare data, design and trained models, and identified responsive customers.

ComResource

Senior Technical Staff

Columbus, Ohio

06/2019 – 05/2021

• Directly report to CTO of the company. I have done project mostly because the company is communicating with clients and either client need a POC to be done or need a technical solution to be found and designed.

• Done ata mining project: using apriori algorithm to discover 1000+ valuable patterns with 99% accuracy (python).

• Done automated product tier classification for client Cargill (python).

• Done NLP: extracted patent citation info automatically with 97% accuracy from 50,000+ PDF documents issued by US Patent Office. I proposed this automatic extraction program to win this contract from Chemical Abstract (initial proposal was manually extract info, which need more than 10 people and the cost was too high to be accepted).

• Independently custom built a data search tool (a simplified version of my data lineage tool, with many added client specific details, 5k lines python with pyqt gui).

• Independently custom built a graph data visualization tool (3k lines python with pyqt gui, and d3.js), . Tool users do search in a neo4j graph database with mouse clicks on visual displayed graphs and menu selections.

• Built a Azure Chatbot for Microsoft Local Marketing Office in Columbus Ohio.

• Used and compared gcp, aws, azure clouds and wrote a report on usability and learning curves on major features relevant to the client.

• Done data modeling: designed and built relational and graph database schemas, starting from a large amount of unstructured and uncleansed datasets.

• Legacy sql program optimization: Performed quality checks and code improvements on several large legacy sql programs (several thousand rows of SQL codes each) Code Track

Founder

Columbus, Ohio

03/2016 – 03/2019

• This company is established by me to develop an automatic data lineage searching software that I designed. Because of the small funding received, I personally developed most of the code (15k lines of python). I also did many fund raising.

• Used Tensorflow/Keras (ML) to build a component in lineage search engine.

• Design and implemented an office file sharing program. (3k lines python code)

• Meetings with investors, presentations, business plans, implementation plans, budget estimates, etc.

• Tech Environment: Python, Data Mining Libraries, Tensorflow, Web crawler, Django, regex, web sockets, PostgreSQL, MySQL, SQL Server, PyQt GUI, Windows and Ubuntu

Signet Accel

Python Developer

Columbus, Ohio

04/15/2015– 03/2016

• Wrote a versatile data converter in python, as part of a universal medical storage platform. It stores data from different hospitals. The challenges were to correctly categorize and identify the same concepts but in different codes, names, and descriptions, from different medical data repositories, which are based on different medical coding systems.

• Familiarized with OMOP schema, Observational Medical Outcome Partnership, a standard medical data model and many medical code sets such as SNOMED, CPT-4, and ICD9.

• Tech Environment: Python, Lucene, Usagi, csvkit, SQL20, PostgreSQL. SQL Server, Windows, Lunix JP Morgan Chase Bank

Developer

Columbus, Ohio

03/2013 – 04/2015

• Th work was done in financial compliance department. I have worked on modification of compliance code due to regulation changes,

• Coding (java) to do data archiving to third party due to regulatory requirements (8 month project, 3 managers (source, general, and target data), and 3 tech resources at source to collect data and 1 tech resource at target to store data. I was the only person to convert JPMC data format to the third party data format. Data conversion is the only part of the project that requires substantial coding effort).

• Investigated legacy financial compliance systems, separated and abstracted its main logic components for re- factorization of the system, and the re-factorized software architect is used in designing of new compliance system

(intended to be built with Actimize). Studied and done a Actimize POC.

• Traced data flows from Cobol/Copybook files in IBM DB2 (Bear Stern, bought by JPMC in 2008 financial crisis) to oracle system. This is preparation work for a project to link these two systems from originally two different companies.

• Traced all historical data and transactions related to a high profile Wall Street financial scandal.

• Did website security analysis and maintenance, such as SSL setup for the Chase web site.

• Tech Environment: Java, C++, Junit, Log4j, Oracle, SQL Server, IBM DB2, XML, SSL setup for https Research Institute at Nationwide Children Hospital Data Architect

Columbus, Ohio

11/2011– 03/2013

• Redesign and impl the design of the data backend of etrac system. Data backend was buggy after many years of continuous modifications. Etrac system is used to track from funding proposal to every step in approval process to fund use in every step of the funded project. This system is used for tracking the research fundings for 200 medical professors, along with 800 scientists and assistants, from the medical school of Ohio State University.

• Tech Environment: SQL Server, Data Modeling, Etrac, Erwin, SQL, OLAP Cubes, Star DB Schema g2o

Senior System Analyst

Columbus, Ohio

5/2006 – 11/2011

• Worked in about 30 projects and the projects are in various client companies and government agencies around Columbus. Most projects are database and data warehouse design, ETL (SQL or SSIS) implementation and BI build.

• Done many POCs, and wrote many technical proposals, assessments, for client companies.

• Tech Environment: SQL Server, Oracle, PostgreSQL, Pentaho, Microsoft Bl suite, Google Map, iWay (ETL), Java GE Aircraft Engines

Data Architect

Cincinnati, Ohio

01/2005 - 5/2006

• Worked to coordinate the integration of several data areas, each area has several dozens of databases, and each database has dozens or hundreds of tables during the merge of GE Aircraft Engines and GE Locomotives Ruiling Optics

Founder

Columbus, Ohio

2003 – 2004

• Designed an ultra-fast (split-of-second) flat-bed image scan device, applied and obtained 3 US utility (invention) patents, conducted research on imaging software and hardware, and done many funding raising activities (business plans, exec summaries, emails, presentations, meeting with investors, etc) Profitlogic

Principal Engineer

Cambridge, Massachusetts

10/2001 – 10/2002

• Profitlogic is a software startup to do retail price forecasting. JC Penny project ($15m, 6/2001-12/2002) is the biggest pricing project done by this company before it sold to Oracle.

• I designed a large data lake (several TBs). The core part of it contains a relational schema as well as a star schema, leading a team of 7 developers, collaborated with JCP IT to build the database, which is data portion of the pricing computing engine software of this $15m project.

• Led a team of seven Engineers in building a large data warehouse for JC Penny's pricing optimization system, handling 3.5 million active SKU's across 1,020 stores. The project was implemented on an Oracle database, with an initial data load of around one TB. Utilized UNIX Shell Scripting, Toad, Oracle 8i, PL/SQL, Pro*C, and Erwin in the project. Electron Economy

Senior Engineer

Cupertino, California

5/2000 – 10/2001

• Electron Economy was a startup funded with $86m. I was hired with no role assignment and my role was fixed when I proposed an automatic, modularized, online, rule-based intelligent business transaction (negotiation) system, 2 month after I joined the company. A division of 15 engineers (5 with PhDs and 10 with masters) was formed based on this proposal. I helped form the team and structured the project. When the company is sold to viewlocity, the rule-based transaction engine is one of the two main software assets Electron Economy had (the other is a web-based online sales transaction engine).

• Design and implemented java software for the project I proposed. TECHNICAL SKILLS

• Languages: python (gui, tensorflow, pytorch), R, machine learning (vision, marketing, NLP), SQL (hive, presto, oracle, mssql, pgSQL), Java (spring, junit, jdbc, ant, maven), c, c++, javaScript (node.js, d3.js), awk, perl

• Data Modeling: relational, star schema, data warehouse, datamart, neo4j, sql graph

• Custom GUI making: x/motif/c, pyqt

• Visualization: Tableau, matplotlib, d3.js, PyQt, seaborn

• Science: Deep Learning, descriptive and Bayesian statistics, graphical models, mcmc, jags, time series, data mining

• Platforms: windows, unix/linux (ubuntu), oracle, presto, hive, sql server, postgres, mysql, jira, colab, bento, jupyter, wsl2, docker, Git, nivida cuda, docker

• Misc.: ssl/web security, regex, jira, pro*c,, ocr, arctic (time-series),, xml, json, github, docker EDUCATION

The Ohio State University

• MS Mathematics

• MS Computer and Information Science (CIS)

• PhD in CIS (data mining (knowledge discovery from database) and fast data retrieval indices)

• PhD candidacy for mathematics PhD program in Combinatorics (graph theory, counting structures, etc) Course CERTIFICATIONS

Getting Started with Tensorflow 2 2021

Customizing your model with Tensorflow 2 2021

Practical Time Series Analysis 2021

Bayesian Statistics: Techniques and Models 2021

Supervised Machine Learning Regression and Classification 2023 Unsupervised Learning, Recommenders, Reinforcement Learning 2023 Advanced Learning Algorithms 2023

Sequence Models 2023

Deep Learning Neural Network with PyTorch 2023

Sequences, Time Series and Predictions 2023

Inferential Statistics 2023

Transformer Models and BERT Model 2023

Google Cloud Big Data and Machine Learning 2023

How Google Does Machine Learning 2023

Tensorflow on Google Cloud 2023

Feature Engineering 2023

Launching into Machine Learning 2023

Natural Language Processing on Google Cloud 2023

MLOps Started 2023

Machine Learning in the Enterprise 2023

Production Machine Learning Systems 2023

Creating Dashboards and Storytelling with Tableau 2023 Visual Analysis with Tableau 2023

Fundamentals of Visualization Tableau 2023

Essential Design Principles for Tableau 2023

Distributed Computing with Spark SQL 2023

Machine Learning with ChatGPT: Image Classification 2023 Genetive AI with Large Language Models 2024

Probabilistic Graph Model 1: Representation 2024 (Bayesian and Markov Networks) Probabilistic Graph Model 2: Inference (in progress) Professional CERTIFICATION

CDMP (Certified Data Management Professional at the Master Level with Specialties in Data Management and Data Warehousing) 2013 From DAMA (Data Management Association International) and ICCP (Institute for Certification of Computing Professional)



Contact this candidate