Post Job Free

Resume

Sign in

Machine Learning Data Science

Location:
Salt Lake City, UT
Posted:
December 11, 2023

Contact this candidate

Resume:

KALYAN SRIVASTAVA

Email: ad1vzv@r.postjobfree.com Mobile: 805-***-****

github.com/kalsriv

CAREER SUMMARY

Expertise in engineering/data science, machine learning, statistics and computational biology. Worked at leading-edge industry, contract research organizations and university institutes. Passionate about developing AI/ML models, statistical analysis pipelines and visualization modules to provide decisive answers. Hands-on experience and certification in key programming languages including R Statistical Programming, Python, SQL and SAS. Demonstrated leadership, supervisory and collaborative competencies within a diverse group. Strong proponent of positive work ethics, regulatory compliance, talent promotion and continuous learning.

Work Authorization: United States Citizen

EXPERIENCE LIST

Employer: Beckman Coulter – Senior Application Scientist (Cytobank Machine Learning Service) Duration: 06/2022 onward

• Providing support in the basic analytics and machine learning solution to user working on biological data set by answering and troubleshooting through ticket process

• Supporting development team in scripting, testing and implementing new machine learning and statistical analysis feature. Providing recommendations on new features based upon user requests, market research and various research board suggestion

• Performing analytical and machine learning solutions on both supervised and unsupervised machine learning algorithm such as computer vision based autogating, CUDA backed t-Stochastic Neighbor embedding, Uniform Manifold Progression and Self Organizing Map.

• Providing R program-based API support to the user who wish to create analytical pipeline that calls functions inside the Cytobank.

• Guiding customers in development analytical and AI/ML pipeline for flow cytometry data analysis using RStudio Jupyter notebook markdown files

• Agile project management using Zendesk, JIRA, ServiceMax for new feature development and integration. Measuring the team KPI and enforcing the remedial steps of improvements.

• Interacting with cloud services team for provisioning the resources for user and transfer of their data using ETL process

• Creating articles for user education on machine learning algorithm and publishing it using Zendesk publishing

• Testing newly released R and Python libraries-based machine learning algorithm for flow cytometry and RNA-Seq Employer: Atos Inc. – Sponsor: Daiichi Sankyo Industries – Senior Data Engineer/Project Manager Duration: 05/2021 to 05/2022

• Data engineering using Extract, Transform, Load (ETL) approach for raw biomarker big data hosted on S3 bucket. Data pipeline development and testing, in order to engineer the cancer therapy datasets obtained as clinical data and gene expression data.

• Generating and updating the R program-based scripts for engineering the unstructured ADAM and SDTM data in accordance with CDISC data submission guidelines. Testing the consistency of raw biomarker and clinical data with master file to identify data issues, prior to submitting it to FDA by sponsors. Generating configuration files for consistency check based on requirement gathering and QC’ing using Python.

• Assisting the sponsors on basic analytics such as statistical analysis (t.test, ANOVA, prop/z test, A/B test, regression analysis) using base R statistical functions and python numpy and maths libraries. Assisting in additional machine learning approaches such as SVM, logistic regression, RF and XGBoost algorithms. Performing data preparation before ML steps such as feature engineering (addition, deletion, combination, mutation, selection of columns), feature transformation, scaling, one hot encoding, data splitting, model selection etc.

• AWS cloud technology viz. SageMaker to prototype and assess in-house AIML systems (developed sklearn, pyTorch, caret libraries and DataRobot) that will give our product offerings new advanced capabilities for predictive analytics and model development from clinical datasets.

• Using SAS-SQL to view and generate the data from electronic data capture system needed to be engineered downstream. Analysis of genomic and sequencing datasets after engineering their corresponding datasets. Data visualization using PowerBI, tableau, ggplot2, Rshiny and matplotlib. Generating TLF reports and interpreting the outcomes.

• Querying and analyzing large dataset hosted on publicly available domain and cancer genome repositories such as cBioPortal, TCGA, Sage Bionetworks, RefSeq, dbSNP, ClinVar and GEO. Re-apportioning the code instruction provided by these domains, e.g. cbioportal, in docker container to locally implement and distribute these services.

• Agile project management using tools such as Atlassian JIRA orchestration layer, ZenHub and Asana Pages. Assignment of ingested data to the team members, updating active assigned data, communicating with reviewers and stake holder vis-à-vis delivery status. Training and onboarding new members for data engineering task. Guiding and mentoring team on immunology, R and Python scripts. Employer: ARUP Laboratories - Scientist III - Duration: 01/2018 to 05/2021 – Salt Lake City UT

• Development of analytical pipeline for scientific data using R program for dimensionality reduction and unsupervised machine learning based methods such as tSNE, UMAP and X-shift analysis. Data visualization exploiting R/Bioconductor libraries such ggplot2, plotly, Rshiny and Tableau.

• Predictive data analytic using base R libraries such as caret and tidymodel; python library such as sklearn and enterprise level cloud-based services such as SageMaker and DataRobots. Using the model for predicting disease trajectory.

• Using basic R codes and specialized libraries such as rcmdr for performing biostatistical data analysis such as ANOVA, Z-test, prop-test, t-test and correlation analysis. Visualization and preparation of publication quality images using ggplot2 and web applications such as shiny packages.

• Development of analytical pipeline using API version of Cytobank an AWS dependent cloud-based service on RStudio for dimensionality reduction analysis. Testing and improving the python-based flow data analysis for viSNE and UMAP as a reproducible and scalable approach for CyTOF data analysis.

• Execution of data acquisition, randomization, normalization and filtering of data from donor sample. Creating reference interval of rare and common immunological cells using traditional gating in combination with the EP evaluator program and R statistical programming.

• Performing data analysis and target validation support for internal RNA-Seq and CyTOF data, using open-source resources such as Callisto, Galaxy web tool, DESeq2 and edgeR, cytofkit, FlowCore, CytoCompare libraries and bioPython based tools developed by computational biology unit.

• Project management using tools such as Atlassian JIRA orchestration layer. Performing additional administrator functions such as budgeting, forecasting & strategic planning, business intelligence and exploration of potential revenue generation.

Employer: Health Research Inc. – Sponsors: RPCI, Buffalo - HRI Scientist (Data Science and Acquisition)- Duration: 09/2016 - 01/2018 – Buffalo NYC

• Developing data analysis pipeline using R/Bioconductor, base R program and cloud-based programs for solution mode CyTOF data analysis, rapid data clustering, data cleaning (debarcoding, normalization and randomization). Exploration of alternative data visualization approaches such as R/ggplot2, Tableau and MicrosoftBI.

• Beta testing Hyperion IMC equipment for multidimensional image data. Testing 3 different digital image analysis software for IMC data analysis and visualization. Developing R/Bioconductor based approach for phenotypical analysis of region of interest from data obtained from tissue micro-array.

• Working with cloud-based analytical tools such as Cytobank for cluster analysis suing SPADE, CITRUS and tSNE on deep phenotyping of data obtained using high-definition flow cytometry, CyTOF and Immunogenetics (RNA-Seq) experiments.

• Analysis of lab generated RnaSeq data using DESeq2 for identification of target protein and subsequent development of antibody panel for CyTOF and multi-color flow cytometry based proteomic data analysis. Employer: New York Blood Center Inc – Clinical Trial Expert - Duration:07/2014 - 06/2016 – Upper Manhattan, NY

• Managing Phase III clinical trials (Cerus Biotechnology sponsored) to study the proprietary platelet storage for hematological cancer patients. Also, Managing Phase IV clinical trials (MSKCC, New York sponsored) to investigate the inflammatory response of drug Plerixafor (Mozobil) using multicolor flow cytometry in of patients’ whole blood. Analyzing flow data using open-source R based software and FlowJo Software to come up with basic biostatistical answer and predictive model as per sponsors requirement.

• Data analytic for clinical data using tools such as SAS from electronic data capture sheet, R Statistical programming and Excel-VBA. Analysis of graphical, time series and survival data using R based program developed by team and shared by other investigators.

• Immunogenetics (RnaSeq) based identification of miRNA that regulate the cellular function, to produce platelet for transfusion in patients. Established open-source R/Bioconductor and proprietary data analytic pipeline for RnaSeq data analysis obtained from Illumina platform

Employers: The Johns Hopkins Medical Institute, Baltimore MD, Aab CVRI, URMC, Rochester NY, Duration: 011/ 2004- 07/2014

• Worked as scientist and faculty in projects related to biomedical research, preclinical studies and clinical trials focusing predominantly on data mining, computational biology and sequence alignment.

• Projects involved wet biology lab work, data analysis using proprietary software, data analytics using Perl and R data mining, bioinformatics, molecular modeling and biostatistics. Employers: International Center for Genetic Engineering and Biotechnology, Trieste Italy, Weizmann Institute of Science, Israel Duration: 08/ 2001- 05/2004

• Worked as intern and associate in projects related to vaccine development using data mining and sequence alignment.

EDUCATION

Bachelor of Science: Mathematics and Chemistry - Institute Of Science, BHU – India Master of Science and PhD: Biochemistry - Institute of Science and Institute of Medical Sciences, BHU – India 2002 CERTIFICATES

o Professional Certificates on Classical Machine Learning for Financial Engineering: Using sklearn/python for machine learning. May 2021

o AWS Cloud Practitioner Essentials (Certificate) - Amazon Web Services (AWS) Jun, 2022 o Agile and Scrum Fundamentals (Certificate) - IBM - July 2022 o Power Bi for Financial Data Analysis (A Project Based Certification) - Coursera - June 2022 o Certificate on Introduction to R for Data Science (Certificate) – Microsoft 2017 o Visualizing Data with Python (Certificate) – IBM August 2019 o Biostatistics using R (Certificate) - DoaneX July 2020 o Certificates SAS Programming: Essentials (Certificate) – SAS Academy for Data Science Dec 2021 o Introduction to Genomic Technologies (Certificate) - Johns Hopkins Medicine April 26, 2020



Contact this candidate