Data Engineer

Location:

La Honda, CA

Posted:

July 15, 2019

Contact this candidate

Resume:

Petr Ponomarenko

Email: ****@*************.*** Phone: +1-617-***-****

Los Altos Hills, near Stanford, CA, USA. Green Card (Permanent Resident). LinkedIn: https://www.linkedin.com/in/petr-ponomarenko/ Biostars: https://www.biostars.org/u/14638/

Google scholar: http://bit.ly/2WzlFoZ (Citations 422, h-index 12, i10-index 16) Summary

For me, it is very important to make life easier, happier and more productive for people. As a Biostatistician, Data Analyst, Data Scientist, and Data Engineer, my teams were responsible for the science part of several products related to genetics and wellbeing data from idea to implementation. My roles have been focused on finding useful insights from gigabytes and terabytes of data related to biological, medical, and genetics experiments. During 10+ years of bioinformatics experience, I used different bioinformatics tools, made and maintained pipelines, used Linear Programming, Genetic Algorithms, empirical methods, regression analysis, Markov Chain Monte Carlo simulations and Bayesian Networks, functional data analysis, clustering, visual analytics, time series analysis, neural networks, and more. In the past 2 years, my consulting clients needs expanded to include business, marketing, logistics, stocks, risks, and cryptocurrency data. You can use my experience to get valuable answers for your big data questions.

Education

Sep 2010 - Dec 2013, ABD Ph.D. Computational Biology and Bioinformatics, Institute of Cytology and Genetics SB RAS, Russia.

(bioinformatics is data science applied to biological data) Sep 2008 - Jul 2010, M.S. Chemical and Biological Physics, Novosibirsk State University, Russia.

Sep 2004 - Jul 2008, B.S. Physics of Atoms and Molecules, Novosibirsk State University, Russia.

Hard Skills

R, Python, Tableau, SQL, AWS, SageMaker, TensorFlow, MXNet, scikit-learn, Functional Data Analysis, linear programming, bioinformatics software for microarray and NGS data (Bissmark, GATK, BINA, STAR align, VEP, Mollsoft ICM), C/C++ (7+ years ago) and MatLab (4+ years ago). I am learning Julia now. Select Data Science Consulting Projects

Oct 2017 – present, Online system to test ideas about time series data, Consulted as Data Engnineer, Data Analyst, Data Scientist, Machine Learning Engineer, Product Manager.

(building API for ETL & Machine Learning from Tableau Server on AWS): The client needed a new ML-oriented system to test ideas about trading in order to improve trading, portfolio, number of customers and money under management. For this, the client wanted a website with dashboards to track prices of different stocks and cryptocurrencies, apply different statistics to them, fit different models, run forecasts with fitted models, store all of the analysis, and use machine learning to find short term strategies for trading. I delivered them a system of EC2 Linux+Python instances that automatically run ETL, fit different models with R Server and Python servers at requests from Tableau Server via dashboards, train ML models, query AWS SageMaker endpoints for forecasts using different ML techniques, like KNN clustering, kernel regression, logistic regression, Q-learning, neural networks and other methods. The client uses this system to test investment strategies, broaden trading portfolio, increase the number of customers and money under management. Jan – Aug 2017, Software products to make predictions using users’ genetics data, Consulted as Data Engineer, Data Scientist, Bioinformatics specialist.

(Markov Chain Monte Carlo Simulation + PCA + clustering): The client needed a system to analyze genetics data and one of the requirements was to improve ethnicity prediction for eastern European clients. My team assembled a data set that contained a lot of low-quality public data. We did not know what data is reliable and to what extent, so we simulated reliability to derive the most stable ethnicity, forecasting model. The client uses this software for ethnicity prediction in their products together with two other tools provided by our team. Dec 2018 – Jan 2019, Finding differences in running phases using videos, Consulted as Data Scientist.

(Pose analysis on videos using Deep Learning and Functional Data Analysis): The client wanted to find phases of leg movement that are different between normal, injured, and treated differently, given a set of videos for thousands of different legs, but each leg was only in one setting (normal, injured, or treated). Videos were translated into joints' positions time series. Functional data analysis in R was used to automate the finding and visualization of the differences. The client uses this system as part of a larger physical therapy product.

Nov 2016 – Jun 2018, Genetics data compression and analysis systems, Consulted as Data Engineer, Data Scientist, Bioinformatics Specialist.

(Responsible for the science part of Products Development & Bioinformatics): Three products were developed: sequencing data compression using parameter optimization with neural networks, 23andMe microarray results analysis for wellness, sequencing data analysis online system that interprets data, matches it with databases and other sequencing data in the system. Work Experience

Sep 2017 – present, Data Science Consultant:

Several clients needed answers about their data or systems to obtain, manage and analyze data as part of their products or tools. This required creating ML systems for using Linear Programming, Functional Data Analysis, Monte Carlo simulations, regression analysis, automation of ETL, pose recognition on video, cloud computing in AWS (EC2 instances, MySQL, Tableau Servers, R Servers, SageMaker endpoints), machine learning, visual analytics, forecasting using trained models, and making dashboards for visualization.

Sep 2017 – Jun 2018, University of La Verne, Adj. Professor, Research Associate: The employer needed a set of courses to be taught including statistics, research methods, biology, and physics as part of their regular courses. Also, the employer needed to finish an analysis of oil palm genetic data (MBD-seq) to complete a grant report. The later was done using SQL, R, bioinformatics tools, and Python. Aug 2015 – Aug 2017, University of Southern California, Research Associate: The employer had a set of projects with experimental genetics data that needed to be analyzed to write grant reports (genome assembly, genome annotation, SNP annotation, ChIP-seq, MBD-seq, BS-seq, RNA-seq). These projects were completed using SQL, R, Python, Molsoft ICM, and bioinformatics tools on HPC. Mar 2014 – Aug 2017, Children’s Hospital Los Angeles, Research Assistant: The employer was looking for genetics test validation, ethnicity prediction, SNP annotation, time series analysis to help genetics labs and doctors. Tools and pipelines were created using bioinformatics software, R, Python, SQL on HPC. Sep 2007 – Dec 2013, Institute of Cytology and Genetics, Research Assistant: The employer had sets of experimental data on the kinetics of molecular interactions

(protein-DNA and protein-RNA) that needed to be explained as a whole. Novel models were designed and fitted to data to derive insights and fundamental understanding of such interactions, as well as to make predictions about genetics mutations outcomes. Data was collected, analyzed, models were designed, trained and compiled into software tools using C/C++, Molsoft ICM, Ruby, and excel. Dec 2009 – Feb 2010, University of California San Diego, Analyst class 1: The employer wanted to understand how co-transcriptional splicing can be described theoretically. I had created a co-transcriptional splicing model, ranges of parameters were studied, predictions of the model were compared with experimental observations. The model was created and solved symbolically and numerically using systems of DE in MatLab.

2007 & 2008 (summers), The Scripps Research Institute, Paid Intern: The employer wanted to study different methods of Small Molecule Docking and their parameters. Benchmarking tool for High Throughput Virtual Ligand Screen was created using Molsoft ICM, bash, HPC that allowed to run docking, analyze and visualize results.

Volunteering

Apr 2009 – Dec 2013, Science for Youth, not-for-profit, co-founder: Together with friends, we created an organization to promote education and learning in schools. My responsibilities were in writing grant reports, work with advisors, design, and implementation of physics learning courses for different student ages. Mar 2008 – Jun 2010, AIESEC, Alumni & External Relations (sales) Director: Joined AIESEC to learn more about business and world outside academic research. I was part of the management team of our city charter. My responsibilities were in coordinating work with AIESEC alumni and work with a commercial organization. Activities included marketing, recruitment, planning, education, sales, management. Awards

2019, earthDECKS. 2009 & 2008, V. Potanin foundation. 2008, Schlumberger Fund.

Contact this candidate