Data Analyst Machine Learning

Location:

Wixom, MI

Salary:

73000

Posted:

November 03, 2024

Contact this candidate

Resume:

Rajkumar Conjeevaram Mohan

***** ******* **, **** *****, Wixom, MI - 48393 +1-202-***-**** **********.****@*****.*** SUMMARY

Data Scientist with 3+ years of experience developing scalable machine learning solutions and optimizing production-level models to deliver measurable business impact. Expert in Python, R, and advanced ML algorithms such as XGBoost and BERT for predictive modeling and Natural Language Processing. Proficient in handling large datasets, streamlining workflows in cloud environments

(AWS, GCP), and driving cross-functional collaboration to solve complex business problems. Proven ability to reduce processing time by up to 40% and improve sales forecasting accuracy by 15%. Data Science graduate with a GPA of 3.91 and dedicated to leveraging cutting-edge AI/ML technologies to deliver reliable, efficient, and innovative solutions. EDUCATION

The George Washington University, Washington, DC Dec 2023 Master of Science, Data Science (GPA: 3.91)

Relevant courses: Machine Learning, Time Series, Natural Language Processing, Deep Learning Imperial College London, Greater London, UK Nov 2017 Master of Science in Computing (Specialism in Artificial Intelligence) Relevant courses: Advanced Statistical Machine Learning, Intelligent Data Analysis University of Liverpool, Liverpool, UK July 2013

Bachelor of Science with Honors in Computer Information Systems (GPA: 4.0) Degree Classification: First Class Honors

TECHNICAL SKILLS

Programming Languages - Python, R, Scala, Java, and JavaScript Python libraries - Seaborn, TensorFlow, PyTorch, Scikit-Learn, Nibabel, PySpark, SpaCy, NLTK, HuggingFace, Pandas, Plotly, Seaborn, Scikit-Learn, Pandas, Numpy, Matplotlib, Seaborn, Nibabel) R libraries - ggplot, ggraph, visNetwork, caret, dplyr Big data - Apache Spark (Scala), Apache Hadoop, Apache Spark, PySpark, Databricks Machine Learning - Linear Regression, Logistic Regression, Polynomial Regression, Decision Tree, Ensemble Model, Artificial Neural Network, Generalized Linear Models, Principal Component Analysis, Linear Discriminant Analysis, Gaussian Mixture Model, KNN, K-Means, Hidden Markov Model, Support Vector Machine, AdaBoost, Gradient Boosting, XgBoost) Statistical Testing - Hypothesis test using t-test, z-test, ANOVA, AOV, Chi-Squared Test (Goodness of fit, Independence Test), A/B testing.

Time Series Analysis and Forecasting - Autocorrelation, Partial Autocorrelation, Generalized Partial Autocorrelation, ARIMA, SARIMA, SARIMAX)

Deep Learning - CNN, RNN, BiLSTM, LSTM, GRU, Residual Network, AutoEncoders, Large Language Model (LLM), T5 Transformer, GPT-3, BERT, MLP-Mixer, RoBERTa, Vision Transformer Natural Language Processing - Text Normalization, Stemming, Lemmatization, Tokenization, TF-IDF, Latent Semantic Analysis, Sentiment Analysis, Part of Speech Tagging, Named Entity Recognition, Transformer Models, Probabilistic Models. Data Mining - BeautifulSoup, Selenium

Database - MySQL, MongoDB, Neo4j, Apache Hive, MS-Excel Cloud Computing - AWS (EC2, Load Balancer, AutoScaling, S3, VPC, Elastic IP), Google Cloud Platform (Compute, Google Sheets, BigQuery, LookerStudio, DataProc)

Operating System - Windows, Macintosh, Linux (Ubuntu) Others - Docker, Git, Analytics, Algorithms, Statistics, Calculus, Linear Algebra, data pipelines, model deployment, cloud-based ML, DevOps for ML

WORK EXPERIENCE

Software Engineer May 2024 - Oct 2024

Inten IT Solutions

● Used large volumes of warranty-claim text data to train a BERT LLM model to identify components that often fail, and helped the company identify them.

● Reduced data retrieval time by almost 40% by switching from on-premise Relational Database to Google BigQuery and optimizing for performance by enabling Clustering, Partitioning, and Caching on frequently used queries.

● Used PyTorch to train deep learning models and extensively used Python programming language at the firm.

● Turned the client's user manual PDF documents into text embeddings and created a search tool using GCP AlloyDB, enabling the client (a manufacturer) to instantly look up relevant information. Short-Term Consultant Aug 2023 – Dec 2023

World Bank Group

● Developed and implemented a semantic segmentation model using PyTorch to detect vehicles in satellite images with 90% precision, enhancing the estimation of the logistic vehicle count at the border.

● Accurate estimation of the logistic assets at the border allowed for increased trust among the trading parties between the countries.

Technical Support Specialist III June 2022 – Jan 2023 The George Washington University – Department of Political Science

● Handled graph data with complex connections and created different types of interactive visualizations that helped the researchers visualize the network of militant organizations across the globe.

● Created a timeline-based hierarchical plot that showed how a group evolved over time i.e., patterns of splinting, and merging at particular timelines allowed researchers to better understand their behavior by associating them with private information.

● Used centrality metrics from the Graph Theory to highlight organizations that have a significant influence on the overall network which allowed the researchers to spot extremists in the large graph with ease. Data Scientist Nov 2017 – Jan 2020

Briggs & Stratton

● Identified patterns in customer purchasing habits and product preferences, and turned data into actionable recommendations that increased sales by 15%.

● Employed advanced analytics such as the Bayesian causal inferencing model to determine the cause of their engine's premature piston ring failure and helped the business fix the problem.

● I analyzed historical sales data to identify trends and patterns. Using this insight, I created a forecast for the quarterly period that accurately predicted the demand. Upon reviewing the results, I suggested optimizing the inventory levels.

● Analyzed data from various sources (e.g., sensors, logs) to detect anomalies or trends that could indicate quality issues in production processes or vehicle performance.

● Built predictive models using various machine learning tools to predict the possibility of equipment failure. TECHNICAL PROJECTS

Neural Machine Translation (English – Czech) Deep Learning Project 2024

● Translated text from English to Czech with an accuracy of 0.856 measured by METEOR score.

● I used SentencePiece, a language-agnostic tokenizer, to efficiently tokenize corpus and address problems with rare and out-of-vocabulary words.

● Wrote the code implementing GPT-3, a large language model (LLM) from scratch using PyTorch. Brain Tumor 3D Segmentation Deep Learning Project 2023

● We identified tumorous cells in the brain with an accuracy of 80% measured by the Jaccard (IoU) score.

● I used PyTorch to train the U-Net, a semantic segmentation model, with 3d convolution for precise localization of tumorous cells in the brain.

US Air Pollution Prediction and Forecast Time Series Forecasting 2022

● Forecasted CO Air Quality Index with an accuracy of 68.11% measured by R2-score.

● Converted the non-stationary signal into the stationary signal by making log and difference transformations.

● Used Generalized Partial Autocorrelation to uncover the order of the Autoregressive and Moving Average processes that generated the data.

● Used the orders with the Levenberg–Marquardt algorithm to estimate the coefficient of the process for forecasting. Credit Card Default Data Analysis 2022

● Classified bank clients who are likely to default on their next credit card bill with an accuracy of 81.3% measured by F1-score.

● Used R with `caret` package to train a decision tree whose rules are interpretable and enable easy decision-making. Skills/Traits/Topics Platforms/Tools/IDEs Langs/Libs - SDKs Virtual Private Cloud

Data & ML Pipeline

Server Administration - Ubuntu

Project Management

Natural Language Processing

(NLP)

Generative AI

Large Language Models (LLM)

Machine Learning

Data Science

Data Analytics

Probabilistic Graphical Models

Big Data & Analytics

Distributed Training

Data Wrangling

NoSQL

Data Management

Data Structures

Data Visualization

Artificial Neural Networks (ANN)

Deep Learning

Transfer Learning

Reinforcement Learning

Evolutionary Algorithms

Time Series Analysis

Decision Trees

Ensemble Models

Boosting algorithms

Geographic Information Survey

Rasterization

DNA transcription

Gene Clustering

Statistical Analysis

Docker

Databricks

Google Cloud Platform (GCP)

Amazon Web Services (AWS)

GCP BigQuery

Vertex AI

GCP Compute Engine

AWS EC2

AWS S3 bucket

VsCode

PyCharm Professional

Jupyter Notebook

GitHub & GitLab

TensorFlow

Keras

PyTorch

Tableau

R Studio

Eclipse IDE for Java Development

QGIS

MongoDB

Neo4j

Apache Hive

Python (Advanced)

R Programming Language

(Advanced)

PySpark (Advanced)

JavaScript (Medium)

JSON (Medium)

PHP (Some)

Java (Some)

.Net (Some)

HTML (Medium)

CSS (Medium)

Spark (Medium)

Hadoop

Ubuntu

Scikit-Learn

Pandas

Numpy

Matplotlib

Plotly

Seaborn

NLTK

Spacy

HuggingFace

visNetwork

ggplot

caret

dplyr

ggraph

Nibabel

BeautifulSoup

Selenium

Contact this candidate