Colleen Wei Chen Data Engineer/Data Scientist
Burke, Virginia 22015 *******.*.****@*****.*** 703-***-****
www.linkedin.com/in/colleen-wei-chen-cqf-9134772/ --- US citizen, https://github.com/colleen-chen/dep_thinkful
Summary
A quantative software developer turned data scientist, with machine learning and deep learning experience. Having knowledges and experiences in data mining, statistical analysis, numerical modeling & monte carlo simulation, stochastic calculus, risk analytics and machine/deep learning. Expertise in integration of data models into production system. Very efficient in trouble shooting and debugging data models in production. Work effectively both independently with minimal supervision or collaboratively in team.
Technical Proficiencies
Data Science:
A/B tests, Null hypothesis & t-tests, Naïve Bayes, KNN, Decision Tree, Random Forest, Linear & logistic regression, K-mean clustering, natural language processing(NLP), convolutional neural network. MatLab, SAS, statistics, stochastic process, simulation & stochastic calculus, Black Schole model
Languages:
C++(7+ years), C(2+ years), C#(3 years), Core Java(2 years), VBA(3+ years)
Python (including lib NumPy, SciPy & Panda & Scikit-Learn, matplotlib & seaborn)(1.5 year), Tensorflow & Keras(3 months), Hadoop ecosystem(Spark, Pig, Hive), XML(3 years).
Scripts:
awk, sed, UNIX shell(2+ years)
Database:
MS SQL Server, PostgreSQL, Oracle, SQL Query & Stored procedures(4+ years), Access(2 years),
PROFESSIONAL EXPERIENCE
Data Engineer/Scientist (11/2018 – current) Thinkful, Washington,DC
Developed 8-layered deep convolutional neural network using tensorflow and keras on google colab. With data augmentation and regularization techniques, results accomplish 68% accuracy
https://github.com/colleen-chen/dep_thinkful/blob/master/DScamp_FinalCapstone_Unit07/Capstone_00.ipynb
Wrote 2-layered fully-connected neural network using numpy. Applied forward and backward propogation with softmax loss
Function and L2 regularization.
Wrote utilities, functions & classes in python3. Developed codes using lambda functions. Designed tables, developed SQL queries in PostgreSQL Performed extensive data visualization using matplotlib & seaborn.
Built various data models using naïve bayes, multi-variant linear regression & logical regression, Nearest neighbor classifier & regression, decision tree, Random forest, NLP, and neural network using TensorFlow and Kera .Perform complex model Verification & consistency validation.
Software Engineer (08/2018 – 10/2018) Advanced Geolocation Solutions LLC, Sterling,VA
Developed multi-thread, socket-to-socket C++ program to test VITA49 protocol for customed radio.
Wrote python scripts for data processing. Used gnuradio for data, and used wireshark to capture, analyze and diagnose network UDP packets.
Developed embedded C application to mini-circuit RF switch API. Worked with Xilinx zedboard, built u-boot and petalinux with an updated DMA driver.
Software Engineer (03/2018 – 07/2018) Chemring Sensors and Electronics, Sterling,VA
Developed C++ codes in messaging simulator, with Qt GUI frontend. Programmed socket-level multi-thread communication. Whole system builds by make, runs and debugged using gdb and valgrind on Linux. Configure unix Machines to static and dynamic IP. Setup and configure test machines to perform integrated system tests. Worked on and performed tests of a control & command C++/Java system, which includes C++ backend server, raw radar PFGA/DSP system Geoscope, java algorithm system and Qt GUI front xTablet. Worked on implementing Matlab codes into Java. Prepared for field data collection and data verification.
Senior software engineer (08/2017 – 11/2017) Raytheon, Sterling, VA
Using objected oriented Analysis & Development, improving, developing and testing OO components of DoD Navy’s NIOP common control system, which are mainly xml messaging system written in C++/Java, running on redhat Linux. Implement new data struct in java to match DDS message schema. The system runs low-level socket-to-socket UDP communication. Performed full installation in testLab. Setting up configuration, running gdb between server and clients to trouble shoot. Writing user cases for system integration tests. Using jupyter notebook for development and debugging python codes(use lib pandas & scikit-learn) to enhance machine learning capabilities for unmanned air vehicle. Use matplotlib for visualization and presentation.
Key Achievements: Learn new systems and software quickly. Excellent at diagnosing defects and trouble shooting.
Senior C++ Developer (10/2016 – 5/2017)
Freddie Mac, McLean, VA
With no documents and author developers having left long time ago, upgraded and migrated 10+ years old legacy C++ codes from AIX to Linux, which interfaces to xerces lib to parse data into DOM Obj, performs socket-level communication. Developed Makefile to build shared lib. Performed extensive debugging and memory profiles by using gdb and valgrind. Developed korn shell, awk and sed scripts. Use git for continued build. Parsed data with python and stored procedures, analyzed data with NumPy and Panda. Organized a data processer in Hadoop and Apache Spark.
Key Achievements:
Reverse-engineered logic by reading/debugging codes to upgrade and migrade 10+ year-old legacy C++ sys.
Senior Analyst/Programmer (3/2015 – 12/2015)
Pension Benefit Guarantee Corp, Washington, DC
The software system, PIMS, is a 17 years-old numerical simulation system written in C++ and interfacing to SQL database. Refactored the codes to run parallel. Crafted and released a random number generator and several unique features in C++ that enhanced the PIMS system. Extracted large amounts of data from a SQL database that fed the report generator.
Key Achievements: Reduce the run time by factor of tens by refactoring codes.
Financial Engineer (5/2014 – 11/2014)
Fannie Mae, Washington, DC
In C++/Unix based, XSLT interfaced ValuationNet analytic system, Instituted a reverse mortgage pricing model and developed C++ codes for loan-to-Value and prepayment modules. Gathered user requirements and translated their requests into technical specifications. Performed full reconciliation of results between MatLab codes and C++ system. Extensively used Unix gcc and gdb.
Key Achievements: Delivered production release codes thru full SDLC processes.
Quant Developer (7/2012 – 11/2013)
Barclays Capital, New York, NY
Developed 3-tier daily trades report using Excel as front UI, C#.NET middle & SQL database as back-end. Developed market Data plugin in C++ to load gas shaped curves. Developed arbitraged pricing solutions and library components, including commodity dispersion and apo trades. The development uses quant library toolkit and pricing/risk components. Also developed daily P/L explained. Implemented a skew dependent correlation pricing model to value arbitrage trades of option of baskets vs. basket of options.
Key Achievements:
Authored a three-tier daily trading report in Excel front, C#, .NET middle and a SQL database as the back end data drivers. Increased reporting efficiency and output by approximately factor of 10.
Analyzing Explained daily P&L performance and organizing it into graphical reports. Delivered daily reports to the senior leadership.
Senior Derivative Analyst/Programmer (2/2012 – 7/2012)
AEGON USA, Baltimore, MD
Modified global equity index universal life hedging system with Excel vba embedded C++ stochastic simulation calculation engine. Built C++ interest rate swap pricer module, Built C# yield curve builder. Tested on cash flow pricing model, with interest rate models of Vasicek & Cox, Ingersoll, Ross & Ho, Lee & Hull White. Worked to attain certificate in quant finance.
Quant Developer (7/2011 – 10/2011)
Bank of America, Charlotte, NC
Work on bond portfolio hedge engine, using interest rate swap to hedge interest rate risk. Develop new cash flow model functionality in Java to Enhance MBS hedge engine, using interest rate swap, interest swaption, CMO. Hedge system consists of core hedge engine in C# GUI front end, java/C++ middle layer and SQL server database backend. Build Excel spreadsheet to valuate bond PV and MBS PV, scenarios and attributions for bond/MBS portfolios to verify analysts’ model independently. Build MatLab prepayment model on current market rate (mortgage as non-callable bond and a call option) to verify analysts’ model.
Quant Developer (9/2010 – 5/2011)
Merrill Lynch Commodities, Houston, TX
Integrated several commodity pricing models with existing trading systems, bank’s internal general derivative analytics library (GDA) and risk reporting and management systems. The models include seasonal Gabillion gas swaption model, crude oil (wti) calendar spread option, crack spread option and average price option (APO). Performed model validations after integration. Trading systems and risk reporting systems are integrated thru Endur’s C++ epm interfaces. Performed extensive unit tests and system tests. Trouble shooting and testing to make sure that valuations and Greeks are tied up to penny between C++ models and Excel models. Excellent in debugging numerical calculation problems. Excellent in number crunching.
Quant Developer (3/2008 – 6/2010)
Hartford Investment Management Company, Hartford, CT
Developed OO components library in both data Engine and analytic Engine. The business modules include various curves (yield curve, dividend yield curve, default probability curve), and surfaces (volatility surface (equity, fx, swaption), credit index correlation surface), various Rates Data, Cross Currency Basis Swap data (C#). Developed analyticEngine system to verify yield curve bootstrapping, single name default probability curve bootstrapping, single CDS NPV pricing from default probability curve, CDX pricing and CDO tranche pricing by using weighted index matching and correlation matrix(C++). Developed Windows n-tiered “Data Management System” and utilizing secure middle tier ADO.NET data access components and SQL Server 2005. Developed C# codes to manage trades/portfolio positions.
Quant Developer (6/2007 – 2/2008)
Standard & Poor's, New York, NY
Developed new Credit Default Swap Model(Stochastic) in MatLab, using Gaussian Copula Model. Implement MatLab Model into C++ engine. Supported and enhanced CDO Evaluator system. It consists of C++ engine on back end, Excel front and XML messaging link. Enhanced automated data processing system. Responsible for programming in C# (front-end), Oracle Database and VBA middle layer.
Software / Quant Developer (8/2004 – 10/2006)
Quantlab Financial, Houston, TX
Worked on validation of tick data of Equity, Indices Futures, and Currency Futures. Wrote C++ (2000 lines) and C# (1500 lines) API to retrieve real time data. Developed a real time 3_dimension data filtering system in C# (3000+ lines). Developed autotrade system to utilize model’s trading signals in C++. Developed codes in C++ to improve distributed multi-feed Real-Time Trading System.
Analyst / Programmer (6/2000 – 6/2004)
Chicago, IL
Created ARIMA time-series model to equity price in SAS and Minitab. Developed a compound option pricing model for business spin-off in Excel/VBA. Wrote C++ codes to process OTC stock quotes. Developed C codes for bond arbitrage model. Developed tree and Feynman path-integral calculation of option model. Extract underlying probability distribution from volatility skew.
Post-Doctorate researcher (12/1997 – 9/1999) Fermi National Lab, Batavia, IL
Created Monte Carlo models for real data and background using FORTRAN. Developed sophisticated algorithm to identify photon signal from overwhelming background, using both H-matrix and neural network techniques. Wrote algorithm to spot pattern/trend in large dataset.
EDUCATION & CREDENTIALS
PhD in Physics, 1997
State University of New York at Stony Brook, Stony Brook, New York
Master of Science in Software Engineering, 2003
DePaul University, Chicago, Illinois
Bachelor of Science in Physics, 1985
Beijing University, Beijing, China
Certifications
Data Science Immersion 2019, thinkful • Neural Networks and Deep Learning by Andrew Ng
• Certificate of Python Professional(Dec.2017) •Certificate in Big Data Hadoop & Spark Developers(July 2017)
• SOA Exam-P(Jan.2015) • Certificate of SAS Programmer(Dec.2017)• C++ Certified Associate Programmer(Jan.2013) • Certificate in Quantitative Finance(May 2012) •Certificate in .NET Master’s Program(March, 2007)
.
Security Clearance
NACI - non-sensitive positions security clearance (2015)