Post Job Free

Resume

Sign in

Machine Learning Big Data

Location:
Boston, MA
Posted:
September 18, 2023

Contact this candidate

Resume:

YASH YADAV

+1-913-***-**** adzsl5@r.postjobfree.com LinkedIn: yash01 GitHub: yash85763

EDUCATION

Northeastern University Massachusetts, US

Master of Professional Studies in Applied Machine Intelligence 04/2022 - 12/2023(Expected) ML Algorithms, Data Mining, Data management and Big Data, Applications of AI, GAN, Statistics, Healthcare Information Processing, AI System Technologies, Deep Learning, MapReduce, GNN, NLP, Elasticsearch, DynamoDB, RDS Kurukshetra University Haryana, India

Master of Computer Application 05/2018 - 05/2021

Spark, Hadoop, Big Data, UML/C++, Artificial Intelligence, Algorithms, Advanced Database Systems, MySQL, C/C++, LINUX Kurukshetra University Haryana, India

Bachelor of Science 05/2015 - 05/2018

Probability, Statistics, C language, PL/SQL, C++, Networking Architecture, CI/CD process, Physics, Applied Mathematics EXPERIENCE

BOUVÉ COLLEGE OF HEALTH SCIENCES Boston, MA

R Lab Instructor 09/2023 - Present

• Collaborating with faculty from the Bouvé College of Health Sciences to ensure course material is aligned with academic standards, industry trends and genomic therapeutic research in Public Health.

• Conducting hands-on sessions, enabling students to apply statistical methods, data viz. techniques, and hypothesis testing using R. ITERATIVE HEALTH Cambridge, MA

Data Science Researcher 04/2023 - 07/2023

• Revamped IBD patient classification using MLOps in collaboration with the Scientific Team; reduced errors and saved time at IH by processing endoscopy images(biomedical) and biomedical text data using Web Crawler and ML algorithms.

• Optimized NLP algorithms, achieving a 70% error reduction and a 25% efficiency boost using BERT in PyTorch and TFlow.

• Presented results via visualizations(via Plotly) to non-technical stakeholders, leading to an $800,000 R&D budget increase. INTERGLOBE TECHNOLOGY QUOTIENT Haryana, India

Machine Learning Intern 01/2022 - 04/2022

• Led a high-performing team to create real-time flight tracking software; deployed ML pipeline with Python, Rest API, and AWS EKS and SageMaker for ITQ's edge application(built with Dash) using automation tools like Terraform and Jenkins.

• Applied AGILE Scrum, boosted pipeline performance(~ 78%) and improved system stability with GIT, CI/CD, saving $1.3M+. TECHNICAL KNOWLEDGE

• Languages: Python, R, Julia, SQL, Cypher Query Language, C/C++, Scala, PySpark

• Frameworks: AWS EKS, Azure, Kubernetes, Keras, Jenkins, Terraform, PyTorch, Docker, PostgreSQL, Rdkit, SageMaker

• Libraries: NumPy, Pandas, Keras, TensorFlow, Seaborn, Plotly, PyTorch, MedSpaCy, RDkit, Tidyverse, scikit-learn, Dask RELEVANT PROJECTS

BRAIN TUMOR DETECTION USING DEEP LEARNING Boston, MA Northeastern University; Computer Vision, Deep Learning, Machine Learning, Python 01/2023 - 04/2023

• Led the inception of an open-source platform incorporating 30,000 bio-medical MRI images(~100GB Real-World Data) to build and train model for brain tumor detection and classification, integrating High Performance Computing using P100 GPU.

• Conducted hyperparameter tuning optimizations to fine-tune computer vision algorithms for a deep CNN, resulting in an impressive 98% precision, with the potential to save over $500 million in healthcare expenditure. INFLUENCE OF POLLUTANTS IN CIRCULATORY AND RESPIRATORY DISEASES Boston, MA Northeastern University; Machine Learning, R, Python, SQL, Data Science & Analysis, Statistics 10/2022 - 04/2023

• Initiated scrutinizing CDC's mortality and US-EPA pollutants data, aiming to analyze the impact of CO and Pm10, focused on Respiratory and Circulatory diseases to mitigate emissions' effects and organized a localized SQL database for efficient querying.

• Performed extensive statistical analysis employing Transformations, and implemented non-Parametric approaches to train, achieving 98% accuracy in forecasting Respiratory, 99% in Circulatory diseases, delivered results through Power BI and R. ETL PIPELINE OF UNITED STATES PRESCRIBERS Boston, MA Northeastern University; Big Data, Apache Spark, Hadoop, Python, SQL 05/2022 - 09/2022

• Designed a robust ETL pipeline for all doctors in US(~20 GB Real-World Data), featuring production-grade Python code with unit testing within the PyCharm framework, used tools for data extraction and integration, such as PDI/Kettle or Apache.

• Leveraged Big Data Tech to configure data with LSF, by connecting Postgres JVM with Hive, utilized Docker containerization to effectively produce pipeline with fault tolerance capabilities, orchestrated Kubernetes AWS EKS and Luigi.

• Built connection to remote SLURM system instance on GCP via SSH and navigated the file system using Bash Commands, effectively utilized UNIX Permission System, and persisted final report in AZURE Storage and AWS S3 Buckets.



Contact this candidate