Post Job Free
Sign in

Data Science Analytics

Location:
Manhattan, NY, 10176
Salary:
70000
Posted:
July 27, 2025

Contact this candidate

Resume:

SHRUTI BALAJI

https://github.com/ShrutiBalaji https://www.linkedin.com/in/shrutibalaji17/ **************@*****.***

508-***-**** Boston, MA

EDUCATION

Master of Science, Data Science

University of Massachusetts, Dartmouth, USA (2023 - 2025)

Bachelor of Technology, Bioinformatics

SASTRA Deemed University, Thanjavur, India (2019 - 2023)

PROFESSIONAL EXPERIENCE

DATA SCIENCE LAB RESEARCHER (October 2023 – May 2025)

Multi-scale Medical Robotics Lab at UMassD

Designing a computer vision model leveraging the Faster R-CNN architecture to detect tumors and perform breast cancer segmentation on DICOM images of the MG modality. Extracted regions of interest from each image using bounding boxes based on annotation files. Extended the project to incorporate training a multimodal fusion algorithm utilizing multiple image modalities.

DATA ANALYTICS INTERN (June 2024 – August 2024)

Oceantic Network, Boston, MA

Performed data cleaning, manipulation and analysis, data visualization on past trends to give business insights for key performance indicators such as percentage increase on the job numbers over the past few years, probability of increase in the upcoming years, trends in the market, percentage increase in skilled and unskilled laborers, etc.

Conducted time series forecasting and predictive analysis on employment trends in the offshore wind industry using ARIMA models, predicting a 7% increase in job opportunities by 2025 based on seasonal patterns and government policy influence.

SOFTWARE DEVELOPER TRAINEE/INTERN (February 2022 – March 2022)

Psiog Digital, Chennai, India

Completed a 5-month software development and data analytics training program at Psiog, designed to simulate real-world data team environments and project workflows. Worked on mock project implementations focused on web development and business intelligence, contributing to the creation of interactive dashboards using Power BI.

Gained practical experience in C# through systems programming and multi-threading exercises, aligned with production-level coding standards. Developed front-end components using React and performed database operations in MySQL, while exploring cloud-based development using Microsoft Azure.

DATA ANALYTICS INTERN (February 2022 – March 2022)

Rela Multi-specialty Hospital, Chennai, India

Designed and developed an interactive dashboard leveraging call center data of the hospital to provide actionable business insights, including key performance indicators (KPIs) such as call volume trends, average handling time, most in-demand department, availability of doctors in each, average time taken to find appointment slots, etc. The dashboard enabled real-time monitoring, data-driven decision-making, and operational efficiency improvements across customer support teams.

Integrated the tabular data format into Tableau, created calculated fields, chose the visualization type (line chart or bar chart, heat map, etc.), organized metrics to visualize to provide data insights. Also made use of R programming language for simpler statistical modeling and business requirements.

MACHINE LEARNING INTERN (February 2021 – February 2023)

Computational Lab at SASTRA Deemed University, Remote

1.Breast Cancer Stage Classification and Segmentation

Developed and trained deep learning models including a 10-layer CNN, ResNet50, and Faster R-CNN for multi-class cancer stage classification and tumor detection on DICOM mammograms.

Achieved 94.6% accuracy with CNN and optimized model performance via parallelized training on 64-core Carnie (UMassD supercomputer), reducing training time by 15%. Engineered region-based segmentation using bounding boxes derived from annotation files for precise tumor localization. Extended the pipeline by integrating multimodal fusion techniques to enhance diagnostic accuracy across multiple imaging modalities.

2.Screening of compounds for inhibitors

Developed a Random Forest binary classification model to screen for inhibitors and non-inhibitors.

Performed data balancing with SMOTE, feature selection, and comparison of dimensionality reduction using PCA and LDA, achieving 87% accuracy with PCA. The model efficiently predicted the inhibitors for in-silico analysis from a database containing lakhs of compounds, saving immense time and screening the compounds 90 times faster.

PERSONAL PROJECTS ON GEN AI AND LLMs

Conversational Chatbot for Course Recommendations (March 2025 – Present)

Developing a real-time conversational chatbot for the UMass Dartmouth website that accepts student resumes, analyzes their skills and strengths, identifies skill gaps, and recommends relevant university courses to address those gaps.

The chatbot leverages the Lang Chain framework for integration, OpenAI embedding, FAISS vector database, GPT- 4 for LLM as Retrieval-Augmented Generation (RAG) stack to build an intelligent, personalized course recommendation system.

Financial data streaming and processing using stock prices (Sept 2024 – Dec 2024)

Designed a data ingestion and processing pipeline using historical stock data and simulated a real-time streaming environment using Kafka, mimicking live data ingestion and processing pipelines for financial market data. Created Python scripts to establish a producer-consumer architecture for data ingestion and processing through Kafka brokers.

Integrated IAM roles for security management and implemented data storage using S3 buckets for downstream analytics, integrated ETL jobs by extracting JSON data format in Amazon Glue, transformed into structured format and loaded back into S3. Used Athena to make necessary data insights and analyze the data using SQL queries.

Building a Nano LLM for text generation – GenAI and NLP (January 2024 - May 2024)

Engineered an n-gram Nano Language Model using transformers with character tokenization, which learns from the user-provided text and generates subsequent words similar to the provided data. The retained model memory is pickled and transferred to be used later for text generation based on the trained data.

CERTIFICATION

MLOps Specialization offered by Duke University, Data Science course offered by IIT Madras, Technical Support Fundamentals offered by Google, Bit and Bytes of Computer Networking offered by Google, Operating System offered by Google, Web Programming by EdX, Python Boot Camp offered by Udemy.

SKILLS

Language: Python (NumPy, Pandas, Streamlit), R Language (Shiny App), C, C++, C#, Java

AI/ML concepts: Machine Learning models (Scikit-learn), Deep learning frameworks (TensorFlow, PyTorch, Torch Vision, Huggingface transformers), Predictive Modelling (Time Series Forecasting, Regressions), Fine-tuning Large Language models (LLMs), Retrieval Augmented Generation (RAG) stack, Transformers, Topic Modelling (LDA, BERT), Natural Language Processing (NLP), Computer Vision (YOLO, CNN, R-CNN, Faster R-CNN), Audio Processing (TTS and voice cloning using LLasa 3B)

Cloud Computing and Deployment: AWS (Kafka, Glue, Athena, EMR serverless, QuickSight, SageMaker), Docker

Other Tools: MS Office (Excel, PowerPoint), Jira

Databases / DBMS: MySQL, PostgreSQL, MongoDB

Data Visualization: Tableau, Matplotlib, Seaborn, Plotpy, ggplot2

CUDA, AutoGPT, Linear Algebra, Data Structures & Algorithms, Full-stack Development, High-Performance Computing, Statistics, Git

Web Development: HTML, CSS, JavaScript, ReactJS, PHP



Contact this candidate