Computational Biologist Data Analysis

Location:

Jersey City, NJ

Posted:

June 03, 2025

Contact this candidate

Resume:

NIKITA TOMAR

Senior Computational Biologist scRNA-seq Database Development

Molecular Dynamics ******@**.*** 848-***-**** NYC SUMMARY

Computational biologist with 3+ years of experience in bioinformatics and genomics, specializing in Database engineering, single-cell RNA sequencing (scRNA-seq), somatic mutation analysis, and Molecular Modeling. Adept in building automated pipelines, integrating clinical and multi- omics data, and extracting biological insights from large-scale datasets. SKILLS & ABILITIES

• Programming & Scripting: Python, R, Bash, SQL, Django

• Data Analysis & Visualization: Pandas, Seaborn, ggplot2, Dplyr, Seurat, Tidyverse, Limma RShiny

• Tools & Pipelines: Bioconductor, STAR, BWA, IGV, Clustal, MUSCLE, Blast, BowTie, SamTools, bedtools, T-coffee, DAVID, GSEA, IPA, PLINK

• Databases & Repositories: GenBank, gnomAD, KEGG, MSigDB, UK Biobank, UCSC Genome Browser, ENCODE, TCGA, GEO, SwissProt, UniProt, PDB, SwissADME, ExPASy, NCBI

• Statistical & ML Methods: PCA, K-means, K-NN, Regression models, Hierarchical clustering, Decision trees

• Software: GenomeStudio, IGV (Genomics & Visualization), WEKA, DBeaver (ML and SQL), AutoDock, SwissDock, MOE, BIOVIA Discovery Studio Visualizer,PyMOL (Molecular Docking and Drug Discovery), CellOrganizer (Cell Modeling)

• Cloud & Workflow: Jupyter Notebook, WDL, Nextflow, Docker, AWS, SLURM EXPERIENCE

Staff Bioinformatician CPMG, Columbia University Irving Medical Center, NYC Sep 2022 – Sep 2024 Main Projects:

CNVDB: Custom CNV Database for Chronic Kidney Disease (CKD) Cohorts Django · Python · SQL · D3.js · Docker · Nginx · SNP Arrays · Clinical Genomics

• Built a full-stack Django platform to manage CNV data from 28,000 cases and 22,000 controls genotyped via SNP arrays for CKD studies.

• Developed a pipeline to ingest and standardize Excel-based CNV annotations, storing structured data in SQL for real-time query and gene/region- level filtering.

• Created an interactive frontend with D3.js to visualize CNV overlap, frequency, and pathogenicity, enabling gene-specific and syndrome-specific insights.

• Added user-uploaded CNV search for assessing control cohort frequency and overlap with 119 known genomic disorder regions (curated from DECIPHER, ISCA, literature).

• Enabled discovery of 728 pathogenic CNVs across 601 individuals, compared CNV burden in cases vs. controls thus streamlining variant curation of WES/WGS-based CNV findings.

ScRNA-seq: Computational Analysis of B Cell Activation and Clonal Expansion in IgA Nephropathy (IgAN) Cellranger · Scrublet · DESeq2 · SeuratV3 · LIANA · Monocle · edgeR · Shazam

• Investigated B cell activation and clonal expansion in IgA nephropathy (IgAN), utilizing LIANA for cell-cell interaction analysis, Shazam for gene expression normalization, and Scrublet for doublet detection.

• Optimized and automated computational pipelines integrating Cellranger for raw data processing, Seurat for clustering, differential gene expression

(DEG) analysis, and dimensionality reduction, and Cellranger vdj for analyzing B cell receptor sequences to identify clonally expanded B cell populations.

• Leveraged Monocle to perform trajectory analysis, tracking the developmental pathways of B cell subsets and their activation states to explore their role in the immune dysregulation associated with IgAN.

• Identified activated B cells producing defective IgA1 with O-glycosylation defects (Gd-IgA1), contributing to immune complex deposition in IgAN, and provided computational insights into immune dysregulation underlying the disease.

• Applied advanced bioinformatics tools to uncover transcriptional signatures of immune cell populations, enabling better understanding of IgAN pathogenesis and identification of potential therapeutic targets based on scRNA-seq data. Somatic Mutation Analysis for Clinical Correlation of CHIP Variants in CKD Patients Mutect2 · gnomAD · R · Regression Analysis

• Contributed to a large-scale analysis of 2,187 whole-exome sequencing (WES) samples from a CKD cohort to investigate the prevalence and clinical impact of clonal hematopoiesis of indeterminate potential (CHIP), a somatic condition linked to aging and inflammation.

• Utilized the Mutect2 pipeline by treating CKD samples as case/tumor and using gnomAD as a population control to identify somatic variants associated with CHIP-related genes.

• Designed downstream correlation analyses integrating clinical metadata (e.g., medication history, immunosuppressant use, comorbidities) to assess the influence of therapeutics on clonal expansion and their potential contribution to CKD progression.

*Publications at CUIMC

Research Assistant Translational Research Laboratory, DPU, Pune Jan - Dec 2019 AutoDockVina · MOE-Dock · BIOVIA Discovery Studio Visualizer · ADMET · PyMol

• Conceptualized a natural 3D Bio-printer ink with red sandalwood for the formation of an artificial vascular scaffold.

• Executed protein modeling, active site prediction, molecular docking, model visualization and ADMET property analysis for various cancer receptors.

• Studied molecular docking of red sandalwood’s active compounds with 5 different cancer pathway proteins and HPBCD using MOE-Dock and AutoDockVina to study their binding affinity for drug designing.

• Analyzed datasets to determine best binding affinities and interaction residues between red sandalwood and cancer proteins.

• Presented thesis research in RAMBB conference 2019. EDUCATION

Master of Science in Bioinformatics Boston University Boston, MA, USA 2022 GPA:3.90

Received merit-based scholarship of $15,000 at Boston University. B. Tech Medical Biotechnology Dr. D. Y. Patil University (DPU) Pune, India 2019 Graduated with 9.09/10 CGPA

Honored with monetary scholarship for performance in degree program by DPU. Member of Translational Research Laboratory, DPU, India. Boston University Boston, MA

ACADEMIC PROJECTS

Biological Database: Mutational Accumulation Data Aggregation Data Engineer Mar - May 2022

• Spearheaded the development an open-source web interface that relocates and pools raw data files, cross references feature and performs statistical aggregation to easily visualize queries against experimental data and automate workflows.

• Build a database focusing on identifying C to A mutational frequency in yeast strains containing msh2 mutations using python and SQL with DNA replication fidelity group, NIEHS.

Analysis of Gene Expression profile & Prediction of Phenotypes for Brain Aging & Insulin Resistance Dataset Mar - May 2022

• Analyzed the datasets to understand the phenotypic enrichment for each dataset collected from different patients, by implementing PCA followed by K-means clustering and checking which phenotypes cluster well with the data in python.

• Performed correlation of gene expression with phenotypes and used gene expressions of highly correlated genes to create a decision tree model and predict phenotypes of testing data with 90% accuracy using python. Transcriptional Profile of Mammalian Cardiac Regeneration with mRNA-Seq Analyst Feb - Mar 2022

• Quantified Gene Expression through cufflinks and identified differentially expressed genes associated with myocyte differentiation.

• Clustered gene sets into subgroups using DAVID functional annotation tool to replicate the findings of O’Meara et al. Microarray Based Tumor Classification Programmer

• Implemented the RMA algorithm to normalize the microarray data to replicate the findings of Marisa et al.

• Computed standard quality control metrics on the normalized data and visualized the distribution of samples applying Principal Component Analysis (PCA). Jan - Feb 2022

Analysis of Intra-tumoral LUAD Cell Groups scRNA-seq Transcriptome Data to Identify DEGs Sep - Dec 2021

• Reproduced the findings of Kim, Kyu-Tae et al. to understand the biological complexity of different gene expression profiles for each intra- tumoral LUAD cell groups collected from different patients, by performing preprocessing, filtering, PCA, and K-means clustering of PCA data using Seurat.

• Enhanced the findings by performing preprocessing and downstream analysis of the data using PCA for clustering analysis, WRST for DEGs, gene set enrichment analysis using KEGG, GO and cancer hallmark gene sets using GSEABase Bioconductor package. PUBLICATIONS

• Ma, B.M., Elefant, N., Tedesco, M., Bogyo, K., Vena, N., Murthy, S.K., Bheda, S.A., Yang, S., Tomar, N., et al. (2024). Developing a genetic panel for post-transplant kidney morbidity. Kidney International, 106(1), 115–125. https://doi.org/10.1016/j.kint.2024.02.021

Co-author. Contributed to bioinformatic analysis supporting panel design for post-transplant morbidity risk stratification.

• Krishna Murthy, S.B., Yang, S., Bheda, S., Tomar, N., et al. (2024). Assisting the analysis of insertions and deletions using regional allele frequencies. Functional & Integrative Genomics, 24, 104. https://doi.org/10.1007/s10142-024-01358-3

Co-author. Contributed to genomic data analysis and development of tools for variant interpretation using regional allele frequencies.

• Milo Rasouly, H., Krishna Murthy, S.B., Vena, N., Povysil, G., Tomar, N., et al. (2024). Exome-wide analysis of congenital kidney anomalies reveals new genes and shared architecture with developmental disorders. medRxiv. https://doi.org/10.1101/2024.11.05.24316672

Co-author. Performed exome sequencing analysis and supported gene discovery and genotype-phenotype correlation in congenital kidney anomaly cohorts.

Contact this candidate