SKILLS SUMMARY
A versatile researcher and data analyst with a Ph.D. in Physics & Mathematics and over 20 years of practical experience.
Extensive work on statistical estimation of protein phases resulted in 261 structures solved and 45 scientific publications
(detailed citations are available upon request).
EXPERIENCE
COLUMBIA UNIVERSITY, New York, NY 2002-present
Data Scientist
• Strong using Mathematics Statistics methodology
• Experience on main Hadoop ecosystem's project : Pig, Hive, Hbase.
• Develop Pig Latin scripts and using Hive query language for data analyses.
Handled importing data from different data sources, using transformations
with Hive, MapReduce.
• Working experience using Sqoop to import data into HDFS from RDBMS .
Exported the analyzed data to the databases using Sqoop for visualization.
• Experience in Hadoop administration : installation and configuration of clusters
using Apache and Cloudera.
• Installation and configuration Hive, Pig, Sqoop, Flume, Oozie on the Hadoop cluster.
• Developed MapReduce jobs using Hive and Pig.
• Optimized MapReduce jobs to use HDFS to increase performance by using different
compression mechanisms.
• Good experience streaming data into Apache Hbase using Apache Flume.
• Experience with working NoSQL data base.
• Good experience with R software environment for statistical computing and graphics
on UNIX and Windows platform.
• Experience using SAS, R and Matlab in a professional capacity.
• Good experience with SQL Server 2012 Management Studio Express.
SOUTHERN RESEARCH INSTITUTE, Birmingham, AL 1999-2002
Research Scientist II
• Developed application for data processing.
Scaling X-Ray intensities for heavy atom derivatives based on non-linear regression analyses.
• Designed an application for protein phase determination
based on algorithm of maximum-likelihood function.
Refinement of protein structures by the maximum-likelihood
THE UNIVERSITY OF CONNECTICUT,Storrs, CT 1991-1999
Postdoctoral Fellow, Molecular & Cell Biology Department
• Working as senior developer. Designed the application based on OS/390 architecture.
• Unix/Linux Administrator using Protein Data Bank.
Structure validation and quality of the PDB entries: MolProbity, Procheck,
Prosa-web, WHAT_CHECK, What_If.
BAYLOR COLLEGE OF MEDICINE, Houston, TX 1990-1991
Visiting Assistant Professor
• Working as senior developer in Fortran. Designed the application for tape binary conversion.
Data conversion and formatting from IBM mainframe tapes to make it usable on your PC or UNIX computer.
EDUCATION
SHEMYAKIN INSTITUTE OF BIOORGANIC CHEMISTRY OF THE U.S.S.R. ACADEMY OF SCIENCES, Moscow,
Russia (Ph.D. in Physics and Mathematics).
INSTITUTE OF PROTEIN OF THE U.S.S.R. ACADEMY OF SCIENCES, Pushino-na-Oke,
Russia (Pre-graduate Student Trainee).
LENINGRAD STATE UNIVERSITY, Department of Physics, Leningrad,
Russia (Diploma – Master’s equivalent – in Physics).
TECHNICAL SKILLS
Programming Languages/Software/Technology Platforms: Hadoop – Big Data analysis; MapReduce; Data Science;
Java,Python, C-Shell, bash; Perl, C-Shell, bash;Fortran, Windows, C-Shell, bash.
Operating Systems: Linux/Ubuntu/SuSE/Red Hat, UNIX/IRIX, OS/390, Windows 7, RDOS, DOS.
REFERENCES
Available upon request.