KALYAN SRIVASTAVA, PhD
*******@*****.*** 805-***-**** linkedin.com/in/kalyansrivastava/ github.com/kalsriv Work Authorization – US Citizen
PROFESSIONAL SUMMARY
Proficient in data science & engineering, machine learning, statistics, and statistics with a proven track record in prominent industry, contract research organizations, and university institutes. Dedicated to advancing data science applications, AI/ML, data models, statistical pipelines, and visualization tools to deliver effective business and scientific solutions. Practical experience in essential programming languages including R Statistical Programming, Python, SQL, and SAS. Demonstrated leadership and collaborative abilities within diverse teams. Strong advocate for positive work ethics, growth mentality, regulatory comliance, talent development, and continued learning. WORK EXPERIENCE
NTT Data – Senior Data Engineer (Statistical Data Analysis) Location: Remote Duration: 11/2023 onward
• Provided support to client Genospace for daily SDTM activity for 60 domains using R programming for 3 different studies
• Working with domain such as Oncology Therapeutic Area (TU, TR, RS, using recist criteria), AE, LB, DM, RELREC for 3 different projects
• Creating internal R libraries to access and manipulate the raw data before transformation.
• Providing assistance with code conversion from SAS to R programming, and regression testing the outcome.
• Developing codes for visualization of ADaM and SDTM data using RShiny
• Leveraging Large Language Models to support CDISC data engineering tasks, data quality assurance enabling efficient data standardization and CDISC compliance Beckman Coulter Inc. – Data Science Field Engineer (Cytobank Machine Learning SaaS) Duration: 06/2022 onward
• Provided support to about 350 users on machine learning solution to user working on biological data set by answering and troubleshooting on both supervised and unsupervised machine learning algorithm such as CUDA backed t-Stochastic Neighbor embedding, Uniform Manifold Progression and Self Organizing Map, anomaly detection-based QC and neural network based autogating.
• Supported development team on development of 2 major algorithm and addition of 8 new features by scripting, testing and implementing. Additionally, provided recommendations on 120 new features based upon user requests, market research and various board suggestion
• Provided R and Python program-based support to over 20 user who wish to create analytical pipeline that calls API functions from inside the Cytobank.
• Agile project management using Zendesk, JIRA, ServiceMax for new feature development and integration. Increased the team KPI by 10% by monitoring and ensuring compliance.
• Interacting with cloud services team for flawless provisioning the resources for user and transfer of their data using Datawarehouse and ETL process
• Created 20 articles for user education on machine learning algorithm and publishing it using Zendesk publishing Atos Inc. – Senior Data Engineer and Project Manager Client Daiichi Sankyo Industries –Duratio 05/2021 to 05/2022
• Data engineering using R program and SAS to convert the SDTM dataset to ADaM and ADaM dataset to TLF for FDA submission – successful submission of over 30 datasets. The engineering steps were supported by an Extract, Transform, Load (ETL) process for data that was stored on cloud platforms like AWS and GCP.
• Proactively tested the consistency of SDTM data with using R, before collaborating with the data management team to discuss follow up measures related to above data engineering.
• Assisting the biostatisticians on basic statistical analysis (t.test, ANOVA, prop/z test, A/B test, regression analysis) using base R. Assisting in additional learning model such as SVM, logistic regression, RF and XGBoost algorithms.
• Performed data wrangling before ML, steps such as feature engineering (addition, deletion, combination, mutation, selection of columns), transformation, null imputations, scaling, one hot encoding, data splitting, model selection etc.
• Worked with testing and provisioning cloud-based AI platform such as SageMaker and DataRobot for simultaneous 20 classical and neural network-based model development, for predictive analytics of both business and pharma datasets.
• Developed a data analysis pipeline using Python, to engineer the cancer therapy datasets obtained as clinical data and gene expression data.
• Data visualization using RShiny, PowerBI and Tableau for visualization of TLF reports. Agile project for a team of 10 engineers using tools such as JIRA, Zenhub and Asana Pages. ARUP Laboratories Inc – Data Scientist - Duration: 01/2018 to 05/2021 – Salt Lake City UT
• Development of unsupervised machine learning pipeline using R program algorithm such as KNN, tSNE, UMAP and X-shift analysis for flow data, leading to 10 biomarkers.
• Using base R program and packages such as dplyr, ggplot2, forcast survival, caret, admiral to perform biostatistical data analysis such as ANOVA, Z-test, prop-test, t-test and correlation analysis. Visualization and preparation of publication quality images using Rshiny and ggplot2.
• Development of analytical pipeline using API version of Cytobank, an AWS dependent Saa, using RStudio for dimensionality reduction analysis. Testing and improving the python-based flow data analysis for UMAP as a reproducible and scalable approach for CyTOF data analysis.
• Creating reference interval of 500 rare and common immunological cells using traditional gating in combination with the EP evaluator program and R statistical programming.
• Performing data analysis and target validation support for internal RNA-Seq and flow data, using open-source resources such as Callisto, Galaxy web tool, DESeq2 and edgeR, cytofkit, FlowCore, CytoCompare libraries and bioPython based tools developed by computational biology unit
• Querying and analyzing large dataset hosted on publicly available domain and cancer genome repositories such as cBioPortal, TCGA, RefSeq, dbSNP to identify 2 novel biomarkers. Health Research Inc. – Sponsors: RPCI, Buffalo – Data Scientist (Data Science and Acquisition)- Duration: 09/2016 - 01/2018 – Buffalo NYC
• Developing data analysis pipeline using R/Bioconductor, base R program and cloud-based programs for solution mode CyTOF data analysis, rapid data clustering, data cleaning (debarcoding, normalization and randomization). Exploration of alternative data visualization approaches such as R/ggplot2, Tableau and MicrosoftBI
• Beta testing Hyperion IMC equipment for high image data. Testing 3 different digital image analysis software for IMC data analysis and visualization
• Developing R/Bioconductor based approach for phenotypical analysis of region of interest from data obtained from tissue micro-array
• Working with cloud-based analytical tools such as Cytobank for cluster analysis suing SPADE, CITRUS and tSNE on deep phenotyping of data obtained using high-definition flow cytometry, CyTOF and Immunogenetics
(RNA-Seq) experiments
• Analysis of lab generated RnaSeq data using DESeq2 for identification of target protein and subsequent development of antibody panel for CyTOF and multi-color flow cytometry based proteomic data analysis PREVIOUS WORK EXPERIENCE
• Weizmann Institute of Science, Israel Worked as intern for successful identification of novel gene and expression patterns using computational approaches (2001 – 2002)
• International Center for Genetic Engineering and Biotechnology, Trieste Worked as intern and associate in projects related to vaccine development using data mining and sequence alignment (2002 -2005)
• The Johns Hopkins Medical Institute, Baltimore MD and Aab CVRI, URMC Worked as fellow and faculty in projects related to biomedical research, preclinical studies and clinical trials focusing predominantly on data mining, computational biology and sequence alignment. 15 highly reviewed research article in the subject matter. Combined 1000 citations of published articles (2005 -2014)
• New York Blood Center Inc- Clinical Trial Manager, Location: New York City/Remote, Duration: 07/2014 - 06/2016 Managing Phase IV clinical trials for OncologyTA of patients’ whole blood and TLF activity for clinical data using tools such as SAS and R Statistical programming on SDTM datasets. EDUCATION
Bachelor of Science: Mathematics and Chemistry - Institute of Science, BHU – India 1996 Master of Science and PhD: Biochemistry - Institute of Science and Institute of Medical Sciences, BHU – India 2002 SKILL
R/RStudio, Python, SAS, SQL, AWS, UNIX/Bash, SLURM, Docker, Finance, Marketing/Business Tools, Statistics, Genetics and Sequencing, GDPR and HIPAA
CERTIFICATES
o Professional Certificates on Classical Machine Learning for Financial Engineering: Using sklearn/python for machine learning NYU Certificates o AWS Cloud Practitioner Essentials (Certificate) - Amazon Web Services (AWS) o Prompt Engineering and Advanced ChatGPT - edX
o Certificate on Introduction to R for Data Science (Certificate) – Microsoft o Visualizing Data with Python (Certificate) – IBM o Certificates SAS Programming: Essentials (Certificate) – SAS Academy for Data Science o Agile and Scrum Fundamentals (Certificate) - IBM o Power BI for Financial Data Analysis (A Project Based Certification) - Coursera o Data Science for Healthcare Claims Data - Certificate- Udemy Business