Data Science Information Systems

Location:

La Habra, CA

Posted:

September 05, 2025

Contact this candidate

Resume:

Dony Ang, PhD

*** ****** *****,

La Habra, CA

Email : ********@*****.***

Mobile : 714-***-****

Career Highlights:

• Successfully managed, architected, led and implemented 4 massive and complex Big Data Infrastructures whose sizes are in excess of 150 Terabytes at the Top Internet Companies and support from 1500 to 5000 internal users.

• Led Architects, Data Engineers & Data Scientists (I was chief Data Architect & Lead Scientist at Google, EA, Yahoo, OpenX and recently Alignment Healthcare )

• Deep industry - experiences in both Ad tech, Gaming and recently in Healthcare.

• Founding member of one of the most successful startups in So Cal - Jamdat Mobile Inc.

• Principal Investigator of multiple peer-reviewed publications on tier 1-journal

(MDPI) - see publication section.

• Deep understanding of large-scale NoSQL / Big data platform from Hadoop

( Cloudera, MapR, Hbase, Cassandra, Spark ) all the way to MPP type of databases ( Snowflake, Netezza, Teradata ( Certified ), Redshift, Vertica etc ).

• Expert knowledge of popular machine learning tools such as Scikit-Learn, Weka, MLLIB, NLTK, Vowpal Wabbit, H20.ai, OpenNLP, Stanford NER and including Deep Learning frameworks such as PyTorch and Optimization Engine(s) such as Gurobi 8.X, CVXOPT or GoogleOR tool.

• Inventor of US PTO patent US20180210883A1 : System for converting natural language questions into sql-semantic queries based on a dimensional model and has been cited more than 26 citations from Amazon, Salesforce, Baidu, Tableau, IBM, Fujifilm and others.

• Phd in Computational Data Science ( research interest in Generative AI, Drug Discovery / Computational Bio ) and Master of Computer Science from Univ. Southern California

EXPERIENCE:

Accenture 3/2024 – PRESENT

Computational Science Manager, Center of Advanced Artificial Intelligence Center of Advanced AI is the department within Accenture ( publicly-traded company listed on NYSE : ACN ) that is responsible for providing products and services that require deep understanding of AI in general and GenerativeAI specifically.

● Own an AI Agent ( or known as an AI accelerator ) for LLM-based Language Translation

● Work as GenAI engineer/manager for Life Science client for implementing automatic document generation platform for Compliance Report

● AI Tech Lead for client project at AbbVie Inc spearheading the document-generation platform for their internal compliance reports ( PSUR, NDA and CSR ) in building the platform from the ground up until it was productionalized / operationalized last year and starts automating the generations of these compliance reports across their products. The responsibility includes the architecting their AI platform all the way to fine-tuning some of their prompt engineerings + AI agents. Alignment Healthcare 4/2019 – 1/2024

Principal Data Scientist / Manager of Data Science Alignment Healthcare is known as one of the growing healthcare startups that recently went public in Mar 2021 ( symbol : ALHC ) in Orange County. The company provides healthcare insurance for seniors under CMS - Medicare Advantage Part A/D umbrella. My roles and responsibilities are :

● Designed and built in-patient hospital admissions - model for predicting the likelihood of our members to be admitted to hospital given all their conditions, RXs and other data points ( in production ) using XGBoost Model. The work entailed building the highly-scalable training & prediction pipeline called Blueprint using Pandas / Spark as the underlying dataframe technologies.

● Designed and built AI-HCC prediction models for alerting our clinicians the potential conditions of our members ( in production ). The models were built using XGBoost as multi-class and multi-output models and this model is responsible for more than 25-50% revenue lift in comparison to our internal heuristic rule-based engine.

● Built the ranking system across our members who have potential to be ‘at-risk’ for being unsatisfied with our products and services based on the surveys called CAHPS. The solution was completed using a combination of ‘Weighted’ Collaborative Filtering and Random Forest.

● Architected, designed and built a NLP engine that can extract relevant information from incoming faxes ( known as facesheets from hospitals / IPAs etc ) by using custom Named Entity Recognition (NER ) using both CRF + Deep Learning ( Keras/spaCy ) and document - classification engine to re-route the incoming faxes into appropriate workflows using PyTorch - Convolutional Neural Network (CNN). And designed and implemented Heuristic & Similarity Metric model to perform complex entity-resolution to match extracted information from these faxes against our internal Data Warehouse records.

● Researched and built Part B & D - Drug recommendation engine using Collaborative Filtering ( item-to-item ) as well as Non-Negative Matrix Factorization.

● Researched and built a solution to recommend the relevant medical codings based on ICD-10s found on many clinical documents using Large Language Model ( Llama ) + PEFT / QLora.

● Architected and built an Enterprise Claim-Fraudulent Detection system from the ground up to detect fraudulent claims from different providers by using various methods from Siamese Deep Learning Model, Z-scoring ( scikits-learn ), building binary classifier(s) using XgBoost and Graph-based algorithms such as Graph KNN and Node Embedding. The solution has been used as our primary FWA (Fraud, Waste and Abuse ) strategy and has helped to save / recover more than $5-6M annually.

● Built a POC using Deep Learning Model that detects Covid-19 ( SAR-COV-2 ) induced

– pneumonia by applying a hybrid of computer vision ( CNN + Semantic Segmentation using U-Net ) and Feed Forward Neural-Network against Chest X-ray of patients and patient clinical data using PyTorch.

● Trained Data Science-team to use causal inference approach for some of the key initiatives without the organization and becomes the subject-matter expert for Causal Inference.

Nativo Inc 1/2017 – 2/2019

Director of Big Data - Science

Nativo is the leading adtech company for native advertising based in El Segundo.

● Manage and build 2 different data teams : Big-data engineering and data-science teams from the ground-up and responsible for building data - products which are used as key technology differentiators to give Nativo more competitive advantages.

● Invented and architected one of the key technologies called SSP ( Server-Side Platform ) that is responsible for 5x revenue lifts.

● Led and architected the new near R/T big-data architecture to support exponential growth of incoming data streams using Lambda architecture and leveraging technologies from Spark & Snowflake and S3.

● Architected and led multiple key initiatives such as User-Interest targeting system called Nativo Persona Graph & Look-A-Like Modelling using multiple tech stacks such as Spark, H20.ai / Vowpal Wabbit.

● Led and architected High-Performance Yield Optimization Engine responsible for maximizing and optimizing yields for both publishers and advertisers. Yield Optimization Engine acts as ‘the brain’ of all the company-wide ad serving technology as it optimizes a staggering hundred millions of of inventories across million permutations of budget data targetings from Geo targeting, device, Audience etc.

● And led many other key projects such as Inventory Forecast, Dynamic Floor Price, Creative Optimizations & Predictive Programmatic-bidder selections. textPlus Inc, Marina Del Rey, CA 4/2014 – 4/2016

VP of Big data & Data Science

One of the leading messaging apps that have been used close to 100M users worldwide .

● Architected and Implemented key technology innovations to enable our app to have better virality by leveraging technologies such as Graph Database to do People Recommendation Engine.

● Architected and implemented Scalable and effective Spam-Detection system using Supervised Algorithm to detect spammers / fraudsters using Scikits - Learn / Spark MLLIB.

● Architected end-to-end Big Data Infrastructures from ground up from the data- collections all the way to near-RT BI tool ( Tableau ) using Cloudera platform and Real- time Streaming technology using Elastic Search / Kibana using Lambda Architecture with Kafka and Spark Streaming as messaging gateway and stream processing - language.

● Architected and built Supply-Side Platform (SSP) to do Yield-Optimizations for our app

● Managing a team of Data Architect, Data Scientist, Data Engineer, BI Engineer and Data QA.

● Responsible for maintaining company-wide Data Dictionary and Reporting/ Dashboarding.

OpenX Inc, Pasadena, CA 10/2012 – 4/2014

Distinguished Big Data Architect & Data Scientist

The leading and fastest growing ad exchange company ( ranked #5 nationwide based on Forbes ).

● Built, designed and architected end-to-end Scalable Data Intelligence Infrastructures from Hadoop ( using Cloudera CDH4) & Vertica MPP. Scale of Data platform is ranging between 100 TB to 4-5 PB. Acted as chief data architect to build all these platforms from ground up until they are up-and-running and used throughout the organizations.

● Built and architected Business Intelligence platform using Vertica and Microstrategy 9.4.1. All the reportings go to both external customers and internal ranging from finance, sales, demand partners all the ways to operational team.

● Building and architecting Scalable Audience Segmentation technology using Machine Learning and Natural Language Processing. This piece of technology provides audience data to DSPs such as user interests, demographic and geographic information etc for hyper-targeting their campaigns which in turn will help to increase their ROIs by running and matching the right campaigns and brands with the right audiences.

Electronic Arts, Playa Vista, CA 8/2006 – 6/2008

Director of Big Data / Chief Big Data Architect

● Led Enterprise Datawarehouse, Business Intelligence and Data Mining projects / initiatives such as Enterprise Datawarehouse and Business Intelligence project to Advanced Analytic solutions such as Recommendation Engine (Association and Collaborative Filtering), Carrier-level Sales Forecasting (Microsoft Time-Series) and Customer Segmentation using Clustering algorithm.

● Provided visions, directions and solutions for Database Administration, Business Intelligence and Data Mining Teams.

● Built database architectures / modeling for both OLTP and EDW (Enterprise Datawarehouse ) which spans multiple areas from schema for store-front, our internal MDM solution ( a.k.a. JMSS ), schemas for any networked/multiplayer games to the type of schema for EDW Kimball dimensional model) hosted on multiple RDBMS platforms from Oracle RAC, MySQL,SQL Server 2005 to Teradata V2.6.1. Yahoo Inc,, CA 7/2005 – 8/2006

Senior Big Data Architect / Manager

● Architected the famous Panama Enterprise Datawarehouse (Yahoo! Search Marketing Data Warehouse) as our Corporate Business Intelligence / Reporting Platform to support more than 1500 internal users ranging from Yahoo! Executives, Sales Operations, Customer Supports, Internal Finance/ Sales Analysts to Business Planning Dept. This architecture is as large as 150 Terabytes (one of the largest BI /DW architectures using a hybrid of centralized DW and hub-spoke architecture).

● This platform is used as foundation for corporate connected-Data Marts ranging from Conversion, Search Traffics (Revenue Tracking ), Listing Management, Global Content Quality and Advertiser Behaviors which are hosted using low–cost community-server solutions utilizing Clustering Technology ( one of the largest Oracle Clustering Implementation ) for scalability and cost-effectiveness purposes.

● Established Corporate Yahoo! Conformed Dimensions as building blocks to enable Cross- Analysis across different marts / departments.

● Formulated all business and User requirements by translating them into Fact Bus Architecture and implemented into Corporate Data warehouse Schema using Dimensional Modelling with its all detailed treatments such as Dimensions with SCDType 1/2, Aggregates & Partitioning Strategy, and Retention Policy.

● Designed and Implemented Panama Dimension Manager UI using Oracle Application Express / Oracle HTMLDB. This UI is a central console for managing all Conformed Dimensions.

● Implemented Text Categorization by adopting machine learning / Text Mining for providing extensive Term /Term–Category Analysis.

Google Inc 1/2005 – 7/2005

Data Architect / Engineer

● Architected the 1st Google Data Warehouse to pull both the AdWords and Adsenses’ metadata and advertising log files that spread across hundreds or even thousands of MySQL shared architectures ( Adword and Adsense are the two primary money-making engines for Google ) and transformed them into Dimensional Models for analytical reporting purposes ( Microstrategy is used as BI tool )

● Built out the Adword’s and Adsense BI dashboards / reporting for various department reporting needs.

● Established Corporate Google Conformed Dimensions as building blocks to enable Cross- Analysis across different marts / departments.

Jamdat Mobile Inc, Los Angeles, CA 3/2001 – 1/2005 Big Data Architect / Founding Member

JAMDAT is the most successful wireless entertainment company which went public in 2004 ( NASDAQ SYMBOL : JMDT ) and acquired by Electronic Arts at the end of 2005 for

$680M. The success of the company can be largely attributed to the analytic platform ( Big Data platform ) that we have built internally in order to provide the timely and accurate information to our decision makers.

● The architect of Big Data platform that uses dimensional modeling of Star Schema from the ground up by adopting software development life cycle (SDLC). This Enterprise Data Warehouse contains all the critical data, which are used to analyze different aspects of our company to help executives make their business decisions. These aspects are, but not limited to, game sales from all different wireless carriers, game traffics, marketing campaigns / promotions, financials, deployment and our game studio. This invaluable information is published using OLAP tools (Microstrategy 7i / 8) to Jamdat executives / business users. This Enterprise DW system has been in production for 4 years now and has become a critical component for executive level to make their strategic decisions.

● Developed a mechanism to mine all the raw data from different wireless carriers to feed the Enterprise Data Warehouse System. Work with different feed formats to make sure they are propagated properly in the data warehouse schema.

● Defined ETL flows (Extraction, Transformation and Loading) to capture those sourcedata for Enterprise DW and wrote all ETL scripts to facilitate these information flows either using trickle-feed technique (by means of Oracle Advanced Queuing) for realtime DW,Oracle ETL (External table and PL/SQL), or Microsoft DTS for monthly vendor XML Feed thru FTP Call.

● Designed, developed and implemented all enterprise reporting based on the Enterprise Data Warehouse system, which are used by various departments in the company from sales, marketing, and finance, deployment and game studio. These reports are written using OLAP tool Microstrategy. Some of these reports include Executive Dashboard to see the overall Jamdat performance from Studio, Deployment, Application Submission, Sales, Marketing and Finance in a single interface, which is drill-able to see more detailed pertinent information. Another part of the report is to show the traffic of multiplayer games, including the award-winning “Bejeweled Multiplayer”.

● Designed and Mapped all the Relational Dimensional Modeling into OLAP Architecture using Microstrategy Architect.

● Setup, configured and administered OLAP solution using Microsoft Analysis Service

(MSOLAP) and MicroStrategy 7i/8 Web/Desktop.

● Tuned all aspects all the databases (application, database, operating system) to give the most optimal performance especially those related to the Enterprise Data Warehouse, where quick response time is crucial in querying millions of data (close to 250Gb) using Oracle Partitioning, Materialized View,Parallel Query and Bitmap Indexes.

● Data modeling / data dictionary documentations.

EDUCATION:

● Graduated in 1995 from Sekolah Tinggi Teknik Surabaya – Indonesia with Bachelor of Science in Computer Science

● 2009 – 2011 University of Southern California MS in Computer Science ( specializing in Machine Learning and Natural Language Processing / NLP )

● 2020 - 2024 Chapman University Phd in Computational Data Science.

Research interests : Generative AI, AI-driven drug discovery, Computational Biology

PUBLICATIONS (PEER-REVIEWED)

● Ang, D.; Kendall, R.; Atamian, H.S. Virtual and In Vitro Screening of Natural Products Identifies Indole and Benzene Derivatives as Inhibitors of SARS-CoV-2 Main Protease (Mpro). Biology 2023, 12, 519. https://doi.org/10.3390/biology12040519

● Ang, D; Rakovski, Cyril, Atamian, H.S. De Novo Drug Design Using Transformer- Based Machine Translation and Reinforcement Learning of an Adaptive Monte Carlo Tree Search. Pharmaceuticals 2024 - https://doi.org/10.3390/ph17020161 REFERENCES:

Available upon request

Contact this candidate