Sophie FENG **********@*****.*** Tel: 832-***-****
Summary Of Qualification
** + years hands-on experiences of big data analytics in broad categories: energy, information technology, navigation, and finance in both academia and industry
Hold a B.A. degree in computer science and be superior in professional programming for practical problems solving. Hold two M.S. degrees in science and engineering with full scholarships for quantitative discipline and strong data analytical skill with statistics and control theory
9 + years of industrial working experience for big data processing and mining in a leading oil/gas services company. Lead and manage projects in massive scale data and analytics using advanced statistical machine learning models running on HPC clusters network. Manage long-term relationships with clients, like Exxon, BP, Chevron, Shell and Total S.A.
Frame the problem targeted resolving conflict, improving morale, and establishing clear goals by effectively managing timelines and sharing resources with special emphasis across inner departments. Deliver project needs on time and within the agreed acceptance criteria in a hybrid methodology environment as projects attempted to transition to an Agile Methodology
Position appropriate solutions and strategies for projects as interactive customer support. Provide leadership for challenges and capture the intelligence of requirements to keep high client retention and loyalty
Three years of Cloud computing experience for Google Cloud, Microsoft Azure, and AWS Cloud Certifications
AWS cloud practitioner Valid through 06/2022
AWS Specialty - machine learning Valid through 07/2023 Skills
Programming: Python, PyTorch, SQL, C++, C-shell, JAVA, Perl, R, Matlab, HTML, Unix/Linux, Latex, Vim/VI editor, Excel macros
ML packages & tools: Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, Mxnet, pySpark, Hadoop, Tableau Featured Algorithms: NB, SVM, PCA, XGBoots, LightGBM, ANN, CNN, RNN (LSTM), K-NN Professional Experience & projects
Staff Data Scientist 2015 – 2020
Coordinate big data projects (as ~100 terabytes per project) with paralleled processing on Hadoop-style clusters network. Quality control for crucial steps and deliverables. Explore accuracy and strengthen output. As the main contact of client: document workflow, demo intermedium output through cloud and publish QC reports
Identify target problems, and be responsible for key steps design. Lead initiative for statistical modeling determination and initialization based on business needs. Optimize factors for 3D geo-tomography modeling update. Monitor production and track performance within the development life cycle
Lead the effort to ML/DL projects: a) log files processing to identify problematic records with semantic analyses with WordPiece tokenization b) deep learning approach for Salt tectonics/Fault imaging recognition and interpretation by U-net based on 1T migration images from in-house data library c) pre-stack image inversion from synthetic augmentation with trained CNN representing as nonlinear function which maps observed surface records to the earth model
Automate pipelining processing and model optimization flow into API. Manage and deploy computational demands among projects for realms widely ranging from resource scheduling and allocations to automation of imaging-related computations in HPC environment
Take on the challenge of ML algorithms for other applications: a) switching super shots clustering from K-mean to DBSCAN b) unsupervised signal/noise classification API developed by imaging pattern recognition c) drilling instrumental demand forecasting with LSTM time series processing Data Scientist/Data processing at CGG Services (US) INC 2011 - 2014
Work in a project-focused team and use data processing modules on inner high-performance paralleled GPU/CPU ETL network. Create and evaluate executive dashboards and scorecards to show the trends in data using Excel and VBA-Macro
Data visualization, signal and imaging processing by practical mathematics and physics skills: noise and multiple attenuation with convolution imaging filters for specific pattern detection, outlier removal, amplitude scaling, data interpolation, data transform (tau-p/Fourier) and regularization from scripting for automatically detect and fix messed-up data headers with supervised learning for multiple imputations from a 4D geospatial sense
Daily work involves testing to QC and the presentation of results to management and clients. Access data by SQL. Bash scripting under UNIX operating system for data I/O Data Science Fellow at Data Application Lab 2017 Mar - 2017 Dec Fintech system: Peer-to-Peer Lending Marketplace loan Data Analysis (on Hadoop)
Design a tree boosting intelligent investment system in the lending marketplace for an optimal combination to invest in. Capture training data with 100 + features through Web API. Apply dimensionality reduction to speed up modeling by 40%. Compare performance between XGboost and LightGBM (20% faster). Exploratory analysis by properly tuning hyper-parameters and feature importance study
Steam Game Recommendation System
Develop video games recommending systems from Flask-based web applications. Implement web crawlers with python for raw training data collection with Steam Web API to derive such as users’ statistics for playing games and Steam’s game inventory information. Perform loading steam game inventory files into MySQL database with ETL operations. Applied Alternative Least Square (ALS) model in Spark ML package to train model and predict products Education and Research
Master of Engineering, Research Assistant: Geomatics Engineering (2007 Sep ~ 2010 Feb), University of Calgary, Canada, GPA 3.94/4.0
Apply linear quadratic estimation from series of joint GPS mobiles measurements over time to separate statistical noise and inaccuracies for better position guidance. Motion planning optimization from dynamic GNSS system with Kalman filter and least square for automobile. Recursively estimate position in real-time to compensate for missing observations. Provide noise covariance estimation based on sensors
(eg, speed). Optimally model to match the real system and assess performance Master of Science, Research Assistant: Space Physics (2004 Sep~ 2007 Jun), Beijing (Peking) University
(top2 university in China), Beijing, China, GPA 3.9/4
Historical solar events database construction by MS SQL. Statistical modeling for high and low latitude substorms evolution. Quantitatively analyze activities from 20 space weather stations located at different provinces in China by association rule learning approach. Time series analyses for solar events phases, and corresponding spatial expansion forecasting Bachelor of computer science,(2000 Sep ~ 2004 Jun), Tianjin university, China, GPA 3.75/4
Academic courses highlights: Linear Algebra (GPA: 4.0), Probability & mathematical statistics
(4.0), Data structure and algorithm (3.9), Object oriented programming design and C++ language (3.9), Principles of database/SQL (3.7), Advanced estimation method (4.0), Internship in IBM student laboratory,(2004 Jun ~ 2004 Jul), Tianjin university
University internet security control for inner network and students database Selected Publications
[1] Skone, S., Feng and R. Tiwari, A Coster (2009), Proceedings of the 22nd International Technical Meeting of the Satellite Division of the Institute of Navigation (ION GNSS 2009 Savannah, GA), pp. 2551-2558
[2] Feng, M; D.H. Zhang, Zuo X, et al. (2007), Science in China Series E-Technological Sciences, Vol. 50., pp.: 422-429