Post Job Free
Sign in

Data Scientist

Location:
New York City, NY
Posted:
April 01, 2025

Contact this candidate

Resume:

Komal Sharma

New York, USA +1-551-***-**** ******@***.*** https://www.linkedin.com/in/komal-1902/ https://github.com/komal-1902 Work Experience

Assistant Research Scientist, NYU (Dept of Environmental Studies) Jun 2024 – Present

•Constructed a publicly available database of 100K+ research papers from 10 high-impact climate journals using data mining for large- scale climate change research.

•Developed an NLP & LLM-powered pipeline to extract geographical entities and climate themes (impacts, mitigations, etc.) from research papers with 92% accuracy and 89% precision.

•Identified a 4x research disparity using geospatial analysis and statistical significance testing, revealing systematic inequity in climate research between Annex-1 and nonAnnex-1 countries. Data Scientist / Engineer, Citigroup Inc. Jul 2018 – Jul 2024

•Strengthened financial risk controls by 20% by detecting market abuse (e.g., spoofing, money laundering) with Regression and PySpark.

•Decreased false positives in insider trading detection by 40% using unsupervised learning and RFM-based customer segmentation.

•Reduced ~60 hrs/week manual effort by automating client feedback analysis with Named Entity Recognition (NER) & Sentiment Analysis.

•Strategised the migration of Citi's Client Coverage data to Big Data environment, optimizing ETL pipelines with HiveQL and Spark to improve performance by 45%.

•Mentored cohort of 7-10 female technologists annually for transitioning into data science careers through workshops and 1-1 connects. Skills

Tools and Languages:

Python, Java, R, SQL, SQL Server, PySpark, NoSQL, Hadoop, Hive, MongoDB, ElasticSearch, Tableau, Sklearn, TensorFlow, PyTorch, Keras, Langchain, OpenAI, LLama, Git

Technical Expertise:

Quantitative Analysis, Statistics, Machine Learning, Natural Language Processing, Causal Inference, Big Data Analytics, ETL, Data Visualization, Deep Learning, Generative AI, Predictive Analytics, A/B Testing, Multi-AI Agents Education

New York University, Masters in Data Science Sep 2022 – May 2024

•Relevant Coursework: Machine Learning, Natural Language Processing, Big Data, Probability & Statistics, Data Visualization

•Paper Presentation: Contextual Bandit Algorithm for News Article Recommendation and Optimization

•Leadership: Vice President, NYU Graduate Student Community Building Group University of Pune, Bachelors in Computer Science Jun 2014 – May 2018

•Relevant Coursework: Databases, Natural Language Processing, Algorithms, Data Structures, High Performance Computing Academic Projects

Finance News Q&A with RAG, LLMs, Information Retrieval Feb 2025

•Designed a financial market research tool leveraging RAG, LLMs, and vector search for fast information retrieval.

•Utilized GeminiAPI embeddings and ChromaDB to build a hybrid lexical-vector search pipeline for financial news articles.

•Implemented BM25 + dense retrieval for boosting relevant document retrieval with 78% accuracy on 100-question test dataset. Structuring Emerging Taxonomies in CMS Tags, Unsupervised Learning, NLP Mar 2024

•Classified 700+ grassroots solutions for the UN Development Program (UNDP) CMS using a multi-level agglomerative clustering pipeline.

•Leveraged Sentence-BERT, Topic Modeling and TF-IDF to improve semantic search and enhance query relevance for grassroots solutions.

•Improved searchability and query result relevance, achieving an 80% similarity score with the enhanced pipeline. Prediction of Default in Banking, Machine Learning, Financial Risk Modelling Dec 2023

•Analyzed 1.2M loan borrower financial statements to identify key features influencing credit risk.

•Performed data cleaning, correlation analysis, data transformation and feature engineering to improve model performance.

•Detected probability of default with AUC score of 85% using Random Forest model and time-series cross-validation. HushUp, Supervised Learning, Natural Language Processing (Patent Filed) Apr 2018

•Developed an emotion-based offensive audio classification system using Naive Bayes and Linear Discriminant Analysis.

•Classified audio offensiveness with an F1-score of 92%, decreasing false positives by 34%.

•Curated an automatically updating dictionary using the TF-IDF algorithm to detect the evolving universe of slur words. Certifications

Generative AI (Coursera, DeepLearning.io - Andrew Ng) Multi AI Agent Systems with crewAI (DeepLearning.io - CrewAI) Financial Markets Valuation Analyst (Corporate Finance Institute) Capital Markets and Securities Analyst (Corporate Finance Institute)



Contact this candidate