Sign in

Data Machine

Buffalo Grove, Illinois, United States
August 15, 2019

Contact this candidate


Jie Feng, Ph. D


Professional summary

15+ years software development experience of statistical modeling, data analytics, artificial intelligence and big data in multiple domains such as healthcare, image processing, finance, insurance, energy, materials, biological systems etc.

Extensive experience in Machine Learning/Deep Learning with TensorFlow/Keras/Pytorch/Scikit-Learn, Natural Language Processing(NLP), Image Processing with Covolutional Neural Network(CNN)/Capsule Net, Time series prediction with RNN/LSTM, Google Cloud Platform(GCP), parallel computing and GPU computing. As well, has extensive experience in other data science libraries such as Pandas, NumPy, SciPy, Gensim, SUMY, Matplotlib, Opencv, Pydicom, and more.

Solid knowledge and skills in data processing/analysis like missing data handle, data normalization, method assembling, stacking, boosting in a variety of statistical modeling such as linear/nonlinear regression, linear/logistic classification, KNN, K-mean, random forest, SVM, neural network etc.

Extensive experience in developing solutions for text summarization, text similarity identification and other text mining using NLP and ML/DL

Extensive experience in developing deep learning application in analysis of medical image (CT, MRI, Mammogram images, etc). Familiar to a variety of medical image formats such as dicom and ljpeg.

Strong expertise in developing and deploying Machine Learning/Deep Learning products on GCP using Google Cloud APIs such as Google cloud storage, BigQuery, cloud dataflow, Apache beam, compute engine, ML engine, cloud datalab etc.

Hands-on experience in multiple software technology languages (Python, C/C++, R, Fortran, etc).

Expertise in designing architectures of the data & analytics/ML/DL/data migration solutions in Linux/window platform and GCP environment.

Solid skills in using Linux and writing shell scripts.

Core Competencies

Statistics modelling

Machine Learning/Deep Learning/AI

Cloud computing

Natual Language Processing (NLP)

Image Processing

Big Data and Analytics

Jupyter Notebook



Apache beam





Python, R, C++/C, Fortran


SQL Server, Oracle

Tools and skills

Additional Information

Successfully completed ML Immersive program in Advanced Solution Lab and PSO training in Google, NYC.

Successfully completed Deep Learning/Machine Learning Courses, Certificated by COURSERA

Two winning proposals for high performance computing as PI/Co-PI

26 publications and 14 invited talks/conference reports related to application of high performance computing and statistical modeling


Ph.D. Chemical and Biological Engineering, University at Buffalo, The State University of New York, Buffalo, NY, 2007

Professional Experience

Data Scientist/Architect/Google partner manager

Virtusa May/2018 - Present

Leading data science teams to implement/deliver AI/ML solutions to clients

Implementing and implementing Deep learning/renforcement learning for credit fraud and health insurance fraud detection systems

Developing automation system for document classification and information extraction on google cloud platform

Proposing and developing novel NLP system to extract, integrate and summarize tables and texts

Developing new image processing algorithms based on CapsuleNet

Implementing convolutional neural network systems to analyze the Mammogram images of breast cancer

Proposing and developing ML/DL model to detect fraud transactions

Implementing AI system in cloud service such as google cloud.

Prepare and propose proof of concept (POC) and point of view (POV) for inside and external clients

Chief Data Scientist

Super Machine Assisted Research and Technology Jan/2017 – April/2018

Applied Machine Learning algorithms to diagnose lung/liver cancers from medical images and tests (X-ray, CT, MRI, microscope, Ultrasound, blood test, etc)

Designed a Machine Learning based life-saving monitoring system that performs intelligent diagnosis alerting most common cardiac abnormalities

Proposed a novel Machine Learning algorithm combing supervised/unsupervised learning, transfer learning and reinforcement learning for medical diagnosis system

Mentored sophisticated organizations on large scale data and analytic using advanced statistical and machine learning models.

Architected and implemented analytic and visualization components for device data analysis platform.

Proved the algorithm achieves theoretically least possible complexity ever attainable

Established Sqoop to transfer data between RDBMS and HDFS.

Converted existing SQL queries into Hive QL queries.

ORISE Research Associate (Federal), 2012 – 2016

Oak Ridge Institute of Science and Education/National Energy Technology Laboratory, Pittsburgh, PA

Developed Machine Learning based system for High Throughput Materials Discovery and Optimization

Implemented machine learning algorithms to evaluate material properties.

Developed Python and SQL code to extract data from various databases and innovative ideas around the Data Science and Advanced Analytics practices. Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.

Developed pipelines to analyze large simulation datasets combining my own Python and Shell scripts with established molecular modeling tools.

Interpreted complex simulation data using statistical methods.

Implemented chemi-informatics algorithms for data analysis.

Developed audience extension models relying on decision trees, random forest, logistic regression, XGboost and other categorical data

Combined machine learning method with atomistic computation model to discover high performance materials

Importing and exporting data into HDFS and Hive using Sqoop

Responsible to manage data coming from different sources

Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.

Postdoctoral Associate 2007 - 2012

Department of Polymer Engineering, The University of Akron

Develop large scale simulation algorithms for composite materials

Assist to manage/maintain group computing clusters and accounts in Ohio supercomputing Center

Research Assistant 2004 - 2007

Visiting Scholar 2001 - 2004

Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York

Develop molecular and mesoscopic modeling and algorithms

Administer, configure, upgrade and maintain group workstations, printers, scanners and other hardware

Managed account in UB Center for Computational Research

Contact this candidate