SHALIN KAJI
Dallas, TX ; **********@*****.***
linkedin.com/in/shalinkaji ; +1-857-***-****
EDUCATION
THE UNIVERSITY OF TEXAS AT DALLAS, Richardson, TX Jan 2023 - May 2024 Master of Science, Major: Computer Science (Data Science Concentration) GPA: 3.65/4.0
(Coursework: Big Data Management & Analytics, Artificial Intelligence, Natural Language Processing) GUJARAT TECHNOLOGICAL UNIVERSITY, Ahmedabad, GJ Jun 2018 - Jun 2022 Bachelor of Engineering, Major: Computer Engineering GPA: 3.96/4.0
(Coursework: Discrete Mathematics, Machine Learning and Predictive Modelling, Software Engineering, Cybersecurity) EXPERIENCE
DATA ENGINEER INTERN – DREAMLINE AI Dallas, TX
Jun 2024 – Present
• Collaborated with Data Science and ML teams to build end-to-end ML pipelines for production models, optimizing data ingestion and feature engineering using PySpark & Scala involving Energy Resources dataset and distributed systems (AWS Sagemaker, Databricks), improving model training times by 40%.
• Developed and deployed scalable LLM-based models for NLP tasks, integrating fine-tuned GPT models for document classification and summarization, leading to a 20% increase in automation efficiency across key business processes.
GRADUATE TEACHING ASSISTANT – NSM UT DALLAS Richardson, TX Jan 2024 – May 2024
• As a Graduate Teaching Assistant for PHY 2126 Electricity & Magnetism labs at NSM UTD, I spearhead the instruction of 60+ sophomores, facilitating 14 comprehensive experiments over the course of 100 lab hours. CS – OUTREACH INSTRUCTOR – UT DALLAS Dallas, TX
Jan 2023 – Jun 2023
• Taught Python, SQL, and R to high school students, improving their coding efficiency by 67% on HackerRank. DATA SCIENCE INTERN – WISSENAIRE IIT-BHUBANESHWAR Ahmedabad, GJ Jan 2022 – Jun 2022
• Designed and implemented predictive models using XGBoost, Random Forest, and Gradient Boosting for housing price predictions, achieving an 87% model accuracy.
• Developed a time series forecasting model using ARIMA for housing market trends, incorporating LSTM networks to enhance predictive accuracy by 10%. Containerized and deployed ML models using Docker, reducing operational latency by 30%, and automated model retraining workflows using AWS Lambda. TECHNICAL SKILLS
Programming Languages: JAVA, C++, Python, Scala, PySpark, SQL Machine Learning: TensorFlow, PyTorch, Scikit-learn, XGBoost, LLM Fine-tuning (GPT, BERT), Random Forest, Neural Networks, Time Series Forecasting MLOps & Automation: AWS Sagemaker, Azure ML, Kubeflow, Docker, Kubernetes, CI/CD (Jenkins, GitLab) Data Engineering & Pipelines: Apache Spark, Kafka, Hadoop, Databricks Cloud Platforms: AWS (EC2, S3, Lambda), Azure, GCP Software Development: SDLC, Object-Oriented Design, Agile Methodologies. PROJECTS
• ConfineDoc (LLM-Based Document Processing System): Built with: Python, PyTorch, Hugging Face, LangChain, AWS Sagemaker, that leverages a fine-tuned GPT-3 model for entity extraction, summarization, and document classification. Utilized AWS Sagemaker for model training and LangChain to efficiently manage large datasets. Improved processing time by 40% and decreased error rates by 25%.
• DirtyDollar-Sniffer: Engineered a fraud detection algorithm for a fintech client, analyzing transactions across 500,000 daily records to detect anomalies in real-time. Used Random Forest & Gradient Boosting classifiers to identify fraudulent behavior with 95% accuracy and reduced false positives by 20%. Deployed on GCP with Docker containerization, the solution gave instant alerts, mitigating fraud losses by 35% within the first quarter.
• Customer Sentiment Analysis for E-commerce: Fine-tuned BERT to identify product-specific insights, providing actionable data that improved customer satisfaction scores by 18% and boosted targeted marketing campaigns by 25%. Hosted on AWS Comprehend with data processing on Databricks. ACHIEVEMENTS
State-level Powerlifter with an 840 lbs total lift at 165 lbs. Classic-Physique Bodybuilder. State-level FIDE Chess Player.