Jie Feng, Ph. D
15+ years software development experience of statistical modeling, data analytics, artificial intelligence and big data in multiple domains such as healthcare, image processing, finance, insurance, energy, materials, biological systems etc.
Extensive experience in Machine Learning/Deep Learning with TensorFlow/Keras/Pytorch/Scikit-Learn, Natural Language Processing(NLP), Image Processing with Covolutional Neural Network(CNN)/Capsule Net, Time series prediction with RNN/LSTM, Google Cloud Platform(GCP), parallel computing and GPU computing. As well, has extensive experience in other data science libraries such as Pandas, NumPy, SciPy, Gensim, SUMY, Matplotlib, Opencv, Pydicom, and more.
Solid knowledge and skills in data processing/analysis like missing data handle, data normalization, method assembling, stacking, boosting in a variety of statistical modeling such as linear/nonlinear regression, linear/logistic classification, KNN, K-mean, random forest, SVM, neural network etc.
Extensive experience in developing solutions for text summarization, text similarity identification and other text mining using NLP and ML/DL
Extensive experience in developing deep learning application in analysis of medical image (CT, MRI, Mammogram images, etc). Familiar to a variety of medical image formats such as dicom and ljpeg.
Strong expertise in developing and deploying Machine Learning/Deep Learning products on GCP using Google Cloud APIs such as Google cloud storage, BigQuery, cloud dataflow, Apache beam, compute engine, ML engine, cloud datalab etc.
Hands-on experience in multiple software technology languages (Python, C/C++, R, Fortran, etc).
Expertise in designing architectures of the data & analytics/ML/DL/data migration solutions in Linux/window platform and GCP environment.
Solid skills in using Linux and writing shell scripts.
Machine Learning/Deep Learning/AI
Natual Language Processing (NLP)
Big Data and Analytics
Python, R, C++/C, Fortran
SQL Server, Oracle
Tools and skills
Successfully completed ML Immersive program in Advanced Solution Lab and PSO training in Google, NYC.
Successfully completed Deep Learning/Machine Learning Courses, Certificated by COURSERA
Two winning proposals for high performance computing as PI/Co-PI
26 publications and 14 invited talks/conference reports related to application of high performance computing and statistical modeling
Ph.D. Chemical and Biological Engineering, University at Buffalo, The State University of New York, Buffalo, NY, 2007
Data Scientist/Architect/Google partner manager
Virtusa May/2018 - Present
Leading data science teams to implement/deliver AI/ML solutions to clients
Implementing and implementing Deep learning/renforcement learning for credit fraud and health insurance fraud detection systems
Developing automation system for document classification and information extraction on google cloud platform
Proposing and developing novel NLP system to extract, integrate and summarize tables and texts
Developing new image processing algorithms based on CapsuleNet
Implementing convolutional neural network systems to analyze the Mammogram images of breast cancer
Proposing and developing ML/DL model to detect fraud transactions
Implementing AI system in cloud service such as google cloud.
Prepare and propose proof of concept (POC) and point of view (POV) for inside and external clients
Chief Data Scientist
Super Machine Assisted Research and Technology Jan/2017 – April/2018
Applied Machine Learning algorithms to diagnose lung/liver cancers from medical images and tests (X-ray, CT, MRI, microscope, Ultrasound, blood test, etc)
Designed a Machine Learning based life-saving monitoring system that performs intelligent diagnosis alerting most common cardiac abnormalities
Proposed a novel Machine Learning algorithm combing supervised/unsupervised learning, transfer learning and reinforcement learning for medical diagnosis system
Mentored sophisticated organizations on large scale data and analytic using advanced statistical and machine learning models.
Architected and implemented analytic and visualization components for device data analysis platform.
Proved the algorithm achieves theoretically least possible complexity ever attainable
Established Sqoop to transfer data between RDBMS and HDFS.
Converted existing SQL queries into Hive QL queries.
ORISE Research Associate (Federal), 2012 – 2016
Oak Ridge Institute of Science and Education/National Energy Technology Laboratory, Pittsburgh, PA
Developed Machine Learning based system for High Throughput Materials Discovery and Optimization
Implemented machine learning algorithms to evaluate material properties.
Developed Python and SQL code to extract data from various databases and innovative ideas around the Data Science and Advanced Analytics practices. Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
Developed pipelines to analyze large simulation datasets combining my own Python and Shell scripts with established molecular modeling tools.
Interpreted complex simulation data using statistical methods.
Implemented chemi-informatics algorithms for data analysis.
Developed audience extension models relying on decision trees, random forest, logistic regression, XGboost and other categorical data
Combined machine learning method with atomistic computation model to discover high performance materials
Importing and exporting data into HDFS and Hive using Sqoop
Responsible to manage data coming from different sources
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Postdoctoral Associate 2007 - 2012
Department of Polymer Engineering, The University of Akron
Develop large scale simulation algorithms for composite materials
Assist to manage/maintain group computing clusters and accounts in Ohio supercomputing Center
Research Assistant 2004 - 2007
Visiting Scholar 2001 - 2004
Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York
Develop molecular and mesoscopic modeling and algorithms
Administer, configure, upgrade and maintain group workstations, printers, scanners and other hardware
Managed account in UB Center for Computational Research