Machine Learning Computer Science

Location:

Tempe, AZ

Posted:

May 06, 2024

Contact this candidate

Resume:

Shivanshu Verma

+1-571-***-**** • ************@*****.*** • linkedin.com/in/Shivanshu156 • github.com/Shivanshu156 EDUCATION

M.S. Computer Science Expected May 2024

Arizona State University, Tempe, AZ 4.17 /4.0 GPA

Courses: Machine Learning, NLP, Artificial Intelligence, Data Mining, Cloud Computing, Data Visualization B. Tech and M. Tech (Dual Degree) Engineering, Minor Degree in Computer Science May 2020 Indian Institute of Technology Delhi 7.7 /10.0 GPA Courses: Machine Learning, Database Management System, Operating Systems, Software Engineering, DSA TECHNICAL SKILLS

Programming Languages: Python, Java, JavaScript, C/C++ Cloud and Databases: Hadoop, HBase, PostgreSQL, MySQL, SQL, Amazon Web Services, Docker Libraries: PyTorch, TensorFlow, Stable Baselines, Scikit, Hugging Face, RL4LMs, Pandas, Numpy, D3.js Certifications: Deep Learning Specialization – Coursera PROFESSIONAL EXPERIENCE

AAIR Lab, Arizona State University, Tempe, AZ: Graduate Services Assistant 08/2023 – Present Working as a research student under Prof. Siddharth Srivastava, Associate Professor, Arizona State University, Tempe

● Founded a novel Conditional Action and State Abstraction Tree approach to construct dynamic abstractions for continuous states and actions for making reinforcement learning generalizable, scalable, sample efficient, interpretable.

● Achieved interpretable RL solution and 20 % better sample efficiency than SOTA approaches including DDPG, PPO, SAC. Standard Chartered Global Business Services, Bangalore, KA: Software Engineer 8/2020 - 7/2022

• Developed an end-to-end trade processing application using JAVA, SpringBoot, Hadoop, HBase, SQL, JavaScript, Python.

• Optimized data storage strategies with a combination of HBase database for dynamic data and Oracle DB for static data.

• Scaled the application to achieve a 10-fold increase in number of trades processed per day with around 2 million trades. Schlumberger, Pune, MH: Data Science Intern 5/2019 - 7/2019

● Developed a semi-supervised text classification model to automate useful information extraction from large contracts.

● Applied various topic modelling and clustering algorithms including LDA, Lda2vec, K Means, Hierarchical Clustering, GMMs on a dataset of 3.5 million documents. Successfully mapped topics to contract clauses using supervised algorithms (Neural Networks) and achieved a 0.93 validation accuracy and reduced human resources workload by 50%. PUBLICATIONS

Triple Preferences Optimization (TPO) (in submission) ACL’ 24

● Founded a new preference learning method Triple Preferences Optimization to align LLMs with three preferences without requiring a separate supervised fine-tuning step by jointly optimizing preferences and gold standard responses.

● Beat SOTA methods as DPO by 4% and SFT by 4.7% on average on MT-Bench and Open LLM Leaderboard benchmarks. Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks (in submission) ACL’ 24

● Investigated the performance of DPO, its variants CPO, KTO, IPO on 13 benchmarks including MT-Bench, BigBench, Open LLM Leaderboard across three different scenarios using SFT, Pretrained and Instruction Tuned models as base models.

● Analysed the impact of supervised fine tuning, instruction tuning and dataset size on the efficacy of alignment methods. Learning Generalizable Symbolic Options for Transfer in Reinforcement Learning GenPlan, NeurIPS'23 Rashmeet Kaur Nayyar, Shivanshu Verma, Siddharth Srivastava

● Developed a novel top-down domain-independent approach COPlanLearn to tackle sample inefficiency of state-of-the- art methods in Transfer Reinforcement Learning, achieved 3 times better sample efficiency than Option-Critic approach.

● Surpassed SOTA methods when transferring learning from smaller to larger & less cluttered to more cluttered domains. RESEARCH AND PROJECTS

Enhancing Logical Reasoning in Large Language Models with Reinforcement Learning 8/2023 – 12/2023

● Utilized Chain of Thought prompting with RL to improve logical reasoning ability of Large Language Models using RL4LMs

● Implemented custom reward functions rewarding the model for correct chain of thought reasoning and successfully achieved better validation accuracy on flan T5 base, llama, Mistral models on context and reasoning problems dataset. Symbolic Differentiation using Deep Learning 8/2022 – 10/2022

● Solved differentiation of mathematical expressions with respect to a given variable using deep learning techniques.

● Utilized sequential models including RNNs and LSTMs to predict differentiation of strings of mathematical expressions w.r.t the given variable, trained over a dataset of 1M samples, successfully achieved 99% validation accuracy with LSTMs

Contact this candidate