Post Job Free

Resume

Sign in

Machine Learning Data Scientist

Location:
Los Angeles, CA
Posted:
December 18, 2023

Contact this candidate

Resume:

JAYANTRAJ COIMBATORE SELVAKUMAR

Versatile ***-***-Computer 2151 jcoimbat@Science Graduate usc.edu with www.a strong linkedin.foundation com/in/JayantRajCS in math, machine github.learning, com/jayantraj & coding. EDUCATION

University of Southern California (USC) Dec 2022 Honors: Phi Kappa Phi (Awarded to Top 1%) GPA: 4.0/4.0 Master of Science in Computer Science Key Courses: Data Structures, Algorithms, AI, NLP, PSG College of Technology, India May 2019 Machine Learning, Probability, Stats, Linear Algebra Bachelor of Engineering in Computer Science GPA: 9.13/10.00 Database Systems (also the TA for 3 semesters), Indian Institute of Science, India Dec 2020 Mathematics for Machine Learning, Data Mining, Deep Learning, Calculus EXPERIENCE

JP Morgan Chase & Co Jun 2022–Aug 2022

Data Scientist Intern Python (PyTorch, Pandas), LLM, SQL, Power BI, Tableau New York, NY

• Leveraged ML techniques (Transformers, BERT, and GPT-3) and applied quantitative analysis to enable personalized fund recommendations for clients and generated a profit greater than $2M in one month.

• Created an EDGAR parser using BeautifulSoup and Scrapy that Significantly improved the efficiency from 60% to 99%.

• Developed AI-powered decision-making tools for investment strategy formulation and implemented A/B testing to determine the most effective strategies and performed visualization using Power BI and Tableau.

• Employed Advanced SQL and Performed hypothesis testing on the Chase Bank, Factset dataset to identify spam/payment fraud and other anomalies.

USC Oct 2021-Present

Research Assistant Python (Tensorflow, PySpark), AWS, GCP, Airflow, SQL, SAS, LLM Los Angeles, CA

• Implemented highly scalable text/sentiment classification methods on a vast dataset (Size >100 TB) of digitized newspapers spanning over 400 years, extracting valuable insights to predict current stock market trends using Python, SAS, and SQL.

• Utilized Advanced Statistical models to perform OCR correction and increase the legibility of poorly digitized texts.

• Optimized Big Data processing by creating ETL pipelines for ML model training for large-scale infrastructures using PySpark, AWS, Apache Airflow and drastically reducing execution time from 125 days to 4 days and pricing from $240K to 3K. USC Apr 2021-Oct 2021

Research Assistant Python (PyTorch, OpenCV), C++, MongoDB Los Angeles, CA

• Analyzed the flight patterns of Drosophila flies possessing Green Fluorescent Protein (GFP) by tracking their 3D movements. Predicting flight patterns enabled us to identify its lifespan with 87% accuracy.

• Implemented Time-Series LSTM and CNN using Pytorch, integrating with MongoDB to handle large-scale data. IBM Jan 2019–Jul 2019

Machine Learning Intern Python (Tensorflow, Scikit-learn), Apache Kafka Bengaluru, India

• Developed an automated system that utilizes Data Mining and Clustering techniques to extract failure records from multiple firmware defects. The system incorporates decision trees and regression-based algorithms to predict failure rates.

• Achieved an exceptional 80% reduction in defect screening and fixing time, significantly improving efficiency and productivity. PROJECTS Impact of Social Media Engagement on Athlete Performance Leveraged NLP and inferential statistical techniques including DistilBERT, Transformers, and Large Language Models (LLM) for polarity detection in athletes' social media discourse, elucidating the correlation between pre-match mood and performance outcomes. Disentangling Brain Activity from EEG Data

Led a team of five peers in utilizing ML models, including RNN, to predict an individual’s understanding capability by monitoring brain activity. Used Tensorflow to accurately identify whether test subjects had a strong grasp of the recommended concepts. Cancer Image Detection Using Convolutional Neural Networks Constructed a CNN-based classification model to accurately detect cancer cells in patient’s tissue images. Achieved an accuracy over 80%, demonstrating the models effectiveness in aiding cancer detection and diagnosis. SKILLS

• Programming Languages: Python, C, C++, Java, R, SQL, Shell Scripts

• AI & ML Tools: Tensorflow, Keras, Pytorch, MXNet, HuggingFace

• Big Data & Analytics Tools: Hadoop, Docker, Apache Airflow, PySpark, Statsmodels, Pandas, Numpy, NLTK, Scipy, A/B test

• Visualization & Others: Tableau, Matplotlib, Rshiny, GGplot, Plotly, AWS, S3, SageMaker, Git, JIRA, Linux, Matlab, Scikit-learn, Gensim, Spacy, BeautifulSoup, Scrapy



Contact this candidate