WEIMIN SUN
** ******** ******, **********, **, **039
Telephone: 973-***-****(Cell)
Email: *******@*******.***
EXECUTIVE SUMMARY
A unique mix of skills in data science, data engineering and data architecture
Working experiences in big data technology (Hadoop Ecosystem) and big data analytics
20 years of working experiences in the financial industry
Strong BA and PM skills
Advanced education in Statistics (Ph.D./M.Sc.), Finance (MBA) and Applied Mathematics (B.Sc.) with seven published papers and two books on blockchain technology
PROFESSIONAL EXPEDIENCE
Director, BNP Paribas, New Jersey, NJ, USA, Q4, 2016 - present
Head of Data Services
I am leading a team working on acquiring data for building and maintaining a data repository to support regulatory reporting, compliance surveillance and analytics & machine learning. I have also actively engaged in data science work.
More specifically,
Lead the efforts to acquire data from more than 50 sources for building a data warehouse basing on OFSSA and the POC of a data repository serving for data science activities. I am hands on in architectural design, data modeling and BA work.
Worked on using financial news to predict stock price movements by utilizing NLP and machine learning algorithms such as logistics regression, decision tree, and SVM etc.
Worked on using news on blockchain technology for creation of investment leads. For feature extraction, I coined the term ‘media coverage index’, which is a good indicator on the future value of a blockchain startup. POC of a web tool for generating leads of investment on blockchain startups is built. ML algorithms are applied.
Worked on sentiment analyses to predict positive or negative view of an article on a blockchain startup by utilizing Python’s text processing and ML classification algorithms.
Coauthor two blockchain books: Blockchain Quick Start Guide (published in December 2018) and Blockchain Security Token and Stablecoin Quick Start Guide (to be published in Q2 of 2019).
Hands on architectural design and data modeling for building a repository to host employee and compliance data. The repository utilized the MapR big data platform.
Executive Director, Data Scientist, JP Morgan, New York City, NY, USA March, 2014 - Q4, 2016
Head of Wholesale Data Insights Team within JP Morgan Intelligent Solutions
I was leading a team covering Wholesale data on-boarding, reporting and data science activities. Tools included Machine Learning algorithms, Python, R, SQL, Greenplum, Cloudera Hadoop platform, Impala/HIVE, QlikView, and Tableau etc. In addition, I have closely followed up on the development of cutting edge big data technology and analytics. I was regularly interacting with vendors, attending conferences and taking trainings.
More specifically,
Fo JPMC Global Wealth Management and Private Bank, I worked on POC efforts for generating new trading ideas and cross/up-sale leads. Work included building a consolidated data repository, a data driven recommendation engine and analytics dashboard basing on machine learning algorithms such as K-S testing, discrete data analyses and clustering. The solution was big data technology (Hadoop HDFS, Impala, Spark etc) based.
Involved in early stage data insights work to support the Global Investment Management Analytics Group. The work included profiling retirement spending. SQL and R were the coding languages.
Led a POC project, in which a topological data analysis tool (Ayasdi) was used to gain a lift on auto loan default rate. The work involved heavy Python and R coding as well as machine learning algorithms (e.g. K-S test, SPA & TDA for data reduction, and clustering analyses etc)
Worked on POC project, in which texting mining (NLP) was used for reducing and summarizing contents of employee surveys. Python and clustering analyses were used.
Worked on a POC project, in which JPMC client wallets data was combined with Nielsen’s TV viewer information. The purpose was to serve for precision marketing. I involved in statistical analyses and coding in SQL & Python.
Had led the Data Work-stream for building a consolidated platform to support risk parameter analytics on the entire JPMC’s Wholesale credit risk exposure book. Work included analyzing multiple source data, modeling data, defining ETL logic and testing the results (e.g. SIT/UAT). The platform was based on big data technology (Hadoop etc)
Provided SME support to a project, the purpose of which was to enhance Net Present Value calculation for the Fund Services group. Machine learning algorithms were used.
Worked on the data analyses for supporting the development of a Multiple Factor Optimization engine. The engine provided an optimized solution for the collateral management services.
Led the efforts in developing a big data analytics dashboard. The dashboard was used for multiple use cases such as program management deck generation etc. QlikView and HIVE/Impala were used to build the dashboard.
Led the efforts to acquire Wholesale data for building a data repository, which was used for data science projects.
Director, Bank of America Merrill Lynch, Jersey City, NJ, USA Mar, 2013 – Mar, 14
Data Architect, Global Risk Management Reporting Technology
I was leading the efforts in defining the architecture and data models for both the interim and the strategic data solutions. The work was to support Global Liquidity Risk Management (GLRM) reporting, in particular, the Basel III Liquidity Coverage Ratio (LCR) reports. I have supervised the E2E delivery of these solutions. Tools included SAS, SQL, and Oracle for data processing and analyses.
More specifically,
I have designed a hybrid solution for sourcing data from both the legacy system (LRS 1.0) and the strategic solution ( the Atlas Data Warehouse). The data were loaded to a strategic GLRM reporting platform, LIBRA, in order to satisfy EMEA’s Basel III LCR reporting needs. I have closely worked with GLRM BU, Data Management Group, BU PM Group, technology groups. I led a virtual team to complete the E2E delivery. Data being sourced from LRS and Atlas were at the Enterprise level across all products including OTC/ETD Derivatives, FX products, Collaterals, PB/Trust, Firm Inventory (unencumbered assets), Traded Loans, Other Assets/Liabilities, Secured Funding (Repos/Reversed Repos, Security Borrowing/Lending), Deposits, Debts, and Loans.
I was deeply involved in designing and implementation of the strategic solution for building a Corporate Treasury DW (Atlas). The DW was used to support Corporate Treasury applications such as GLRM Reporting, Balance Sheet Management (BSM) and Capital Management Reporting (including the Basel III capital ratio reporting). Specially, I was supervising a virtual IT team and delivered a data solution for loading Loans Products (e.g. Commercial Loans, Leases, Commercial Commitments with/without Binding, Bank Credit Cards, Mortgage, and Security Based Lending etc) into Atlas by sourcing data from multiple applications.
Senior Vice President, Citi Group, Warren, NJ, USA June, 2010 – Feb., 2013
Application Development Senior Group Manager
I was leading a large team working on Transaction and Reference Data Acquisition. The acquired data was used to build a large scale Risk Data Warehouse for supporting Basel II Reporting and Global Risk Reporting. I also led the work to generate regulatory reports such as Loan Loss Reserve/Risk Capital, Shared National Credit Report, and Limit Monitoring Reports on Asset-for-Sell/Held-to-Maturity. I led the Data Quality Program for improving Basel II/III reporting accuracy. I have also managed a team to develop manual adjustment GUI tools for supporting multiple Risk applications.
I have supervised end-to-end delivery of several multi-million dollar programs and worked on business analysis, data analysis and development. I was a Subject Matter Expert in Credit Risk applications and various financial products with strong analytical skills.
More specifically
Supervised the End-to-End delivery of a multi-million dollar program by leading a 20+ team. The work included to migrate a 14 years old Global Risk Reporting (GRR) application onto a new platform. GRR was the primary risk reporting platform covering the Wholesale Bank business within Citigroup having over 5000 registered users and 2000 risk reports. The project consisted of several major components: the Data Acquisition of Risk Transaction as well as Risk Reference data such as Facilities, Obligors, Relationship and Support etc; the GRR Consolidation Data Mart; the Outbound Component for feeding data to around 30 downstream applications; and around 230 consolidated Risk reports. In this project, I was hands-on in business and data analyses.
Supervised the End-to-End delivery of an over one-million dollar program. The work included to migrate a legacy application that generated the Basel II reporting for Corporate Treasury, Private Bank management and Citi Mexico business (Banamex) etc onto a new platform. The project consisted of multiple components: the Data Acquisition of Transaction and Reference Data from the above business lines; Data Reconciliation with General Ledger applications; Outbound Component for interacting with Basel II calculators such as PD (Probability of Default), LGD (Loss Given Default), EAD (Exposure at Default) and RWA (Risk Weighted Average); the reporting generation component covering 20+ reports; and the Outbound component for feeding data to business Sector owners and downstream applications. For this project, I was hands-on in business and data analyses.
Supervised the End-to-End delivery of a multi-million dollar Data Quality program. The work included to clean up and improve Basel II/III reporting accuracy by improving data quality of key data steps (e.g. PD, CCF, and LGD) leading to RWA calculation. The strategy was to improve quality of key data elements, e.g. Facility or Customer identifier, and Regulatory Segment etc.
Led the Data Capture (DA) team with around 15 developers to capture transaction data (Contracts, Positions and Positions) from 80 Financial Product Processors and 5 Risk Applications. The acquired data was used to build the Data Warehouse across multiple product lines at the Enterprise level such as Deposits, Loans, OTC Derivatives, Credit Cards, Rates, Commodities, Bonds, Mortgage, AFS (Available For Sales)/HTM (Hold to Maturity), FX, Repos, unsettled Cash trading positions and Secured Financial Transactions etc.
Led the Reference Data team with around 8 developers to capture reference data such as Obligors, Accounts, Instrument, Employee, Facilities, Risk Ratings, Relationship, Support, Collaterals, and Mitigates etc.
Led a team to generate quarterly Regulatory Report - SNC (Shared National Credit), which is required by OCC.
Led a team to build Outbound/Inbound components feeding to LLR (Loan Loss Reserve) and RC (Risk Capital) reports
Led the efforts to extract AFS/HTM flows from a warehouse and created AFS/HTM limit monitoring reports by utilizing BI reporting tool - Cognos.
Led a global Java team with 12 developers to build a web based Manual Adjustment application. The application was used by hundreds of users around the world. Users from Risk and Finance operation used the tool to adjust balances and other vital attributes that were sourced from the warehouse to support Basel II/III reporting, Gold Copy reference data maintaining and transaction data manual uploading.
Vice President, Deutsche Bank, New York, NY, USA March, 2007 – May, 2010
Lead and Architect of a Compliance Reporting and Data Warehouse Team, Investment Banking IT
I was leading a global team to build a large equity transactional data warehouse (potentially tens of TB incremental data per year). The warehouse was used to generate compliance reports, create data marts and other reports. In 2010, my budget was 5.7m EUR (2.5m EUR for labor and 3.2m EUR for hardware cost). Tools included Informatics, Oracle and C/C++ for data processing, analytics and reporting.
More specificall,
I have recruited developers and built a global team. I trained and mentored the members.
Being the architect of an equity data warehouse and reporting data marts, I was hands-on at every step of the platform’s development. The warehouse platform was designed to capture hundreds of millions of trading messages a day in real-time.
I played the role of an architect and business analyst for various compliance reports such as OATS reporting, the Best Execution Report (rule 606) and order ticket. I was hands-on in data checking and quality control through the life cycle of the platform’s development.
I was the author of an in-house trading message protocol, dbFIX, which is an extension of FIX 4.2. The protocol covered majority of the trading scenarios.
I have modeled the entire life cycle of an order for various trading scenarios and built a relational data database.
I was the architect and business analyst for the data mart design. The data mart supported internal surveillance by feeding processed transactional data to the third-party surveillance report generation application such as Actimize (US) and Smarts Broker (EU).
Being the architect, I drove the efforts to create data marts and develop ELT components. The data marts provided data to analytics tools such as commission reporting, trading flow analyzers and risk tools.
I was the architecture and data modeler for building a reference data mart, which was designed to become a golden copy of client, employee, trading system static and client accounts data serving the analytics group. The data mart served equity trading platforms, the IPO management platform and analytics applications.
I was hands-on in enforcing the data model and data assurance by designing and implementing various data sniff tests and other monitoring tools.
I also drove the efforts in building a process for feeding data to soft dollar reporting.
Vice President, Data Miner, Morgan Stanley, New York, NY, USA November 1999 – March, 2007
Business Intelligence Group, Equity Trading: Led the efforts to develop analytical tools on equity transaction data used by senior managers, traders, investment bankers and research analysts. I worked on delinquency prediction model of credit card holders, and built web based prediction models for commodity derivative traders. Tools included SAS, Perl, SQL, JS and Sybase for data modeling, data mining, machine learning, analytics and visualization.
More specifically:
I developed statistical models for predicting delinquency probability of credit card holders by working on large data set and utilizing sampling, data compressing, and variable selection techniques. It was implemented in SAS and Perl.
I built a web based forecast tool to predict HDD (Heat Degree Days)/CDD (Cold Degree Days) for commodity traders by applying time series models. It was implemented in SAS.
I interacted with senior managers, traders, bankers and research analysts to determine business requirements and direction of the product.
I engaged in data mining on client trade flows.
I led the efforts to develop web based analytical tools for CRM managers and traders. I used statistical and machine learning algorithms to identify trading patterns and generate sales tips. I built the tool in PERL, SQL, and Javascript etc.
Institutional Equity Data Group, Equity Trading: I led a small team to control equity trade data quality globally; to clarify definitions of key FIX tags; to design equity trade data models for various trading systems and played a leading role in business analysis.
More specifically:
I interacted extensively with developers, compliance, traders, reporting teams and companyIT for capturing equity trades into the centralized equity database. The database was used for generation of compliance reporting.
I led the development of DQ engines and web based monitoring/analytical tools for detecting and tracing data issues.
I worked on data models for equity flows such as client self trading, program trading, convertibles, derivatives, cash, Reg NMS, Reg Show, and internal limit monitor etc.
I helped the compliance department in internal auditing on equity flows.
I drove the effort of designing a centralized repository. The repository was used to save business rules on equity trades for data validation.
I led the efforts to define and clarify definitions of key FIX tags and database fields.
PROFESSIONAL SKILLS
Statistical and Analytics Programming: SAS, Splus, R, Perl, Python (including Scikit-learn, NumPy, Pandas, PySpark etc), Minitab, SPSS, SQL, C/C++ etc.
Machine Learning Algorithms: Decision Trees/Random Forests, Classification and Clustering (e.g. K-Mean, VSM etc), Prediction and Forecast (Multivariate Regression, Time Series Modeling, Bayesian Spatial and Temporal inference etc), NLP and Text Mining, Neural Networks, Data Reduction (PCA, TDA) etc.
Big Data Platform: Hadoop, HIVE/Impala, HBase, Spark etc.
Analytics tools: Ayasdi, Platfora etc.
BI and Visualization Tools: Cognos, BO, Tableau, QlikView.
Database: Greenplum, Oracle, Sybase, Sybase IQ, Greenplum, DB2, NEO4J etc.
Web Publishing: HTML, JavaScript, PERl/CGI, XML and JAVA.
Computer Platform: Unix, Windows, and Mainframe.
EDUCATION BACKGROUNDS
MBA in Finance
Ph.D. in Statistics
M.S. in Applied Statistics
B.S. in Applied Mathematics
OTHER ACHIEVEMENT
Seven sole or co-authored journal or conference papers.
Two books on Blockchain technology.