Rajkumar Conjeevaram Mohan
***** ******* **, **** *****, Wixom, MI - 48393 +1-202-***-**** **********.****@*****.*** SUMMARY
Data Scientist with 3+ years of experience developing scalable machine learning solutions and optimizing production-level models to deliver measurable business impact. Expert in Python, R, and advanced ML algorithms such as XGBoost and BERT for predictive modeling and Natural Language Processing. Proficient in handling large datasets, streamlining workflows in cloud environments
(AWS, GCP), and driving cross-functional collaboration to solve complex business problems. Proven ability to reduce processing time by up to 40% and improve sales forecasting accuracy by 15%. Data Science graduate with a GPA of 3.91 and dedicated to leveraging cutting-edge AI/ML technologies to deliver reliable, efficient, and innovative solutions. EDUCATION
The George Washington University, Washington, DC Dec 2023 Master of Science, Data Science (GPA: 3.91)
Relevant courses: Machine Learning, Time Series, Natural Language Processing, Deep Learning Imperial College London, Greater London, UK Nov 2017 Master of Science in Computing (Specialism in Artificial Intelligence) Relevant courses: Advanced Statistical Machine Learning, Intelligent Data Analysis University of Liverpool, Liverpool, UK July 2013
Bachelor of Science with Honors in Computer Information Systems (GPA: 4.0) Degree Classification: First Class Honors
TECHNICAL SKILLS
Programming Languages - Python, R, Scala, Java, and JavaScript Python libraries - Seaborn, TensorFlow, PyTorch, Scikit-Learn, Nibabel, PySpark, SpaCy, NLTK, HuggingFace, Pandas, Plotly, Seaborn, Scikit-Learn, Pandas, Numpy, Matplotlib, Seaborn, Nibabel) R libraries - ggplot, ggraph, visNetwork, caret, dplyr Big data - Apache Spark (Scala), Apache Hadoop, Apache Spark, PySpark, Databricks Machine Learning - Linear Regression, Logistic Regression, Polynomial Regression, Decision Tree, Ensemble Model, Artificial Neural Network, Generalized Linear Models, Principal Component Analysis, Linear Discriminant Analysis, Gaussian Mixture Model, KNN, K-Means, Hidden Markov Model, Support Vector Machine, AdaBoost, Gradient Boosting, XgBoost) Statistical Testing - Hypothesis test using t-test, z-test, ANOVA, AOV, Chi-Squared Test (Goodness of fit, Independence Test), A/B testing.
Time Series Analysis and Forecasting - Autocorrelation, Partial Autocorrelation, Generalized Partial Autocorrelation, ARIMA, SARIMA, SARIMAX)
Deep Learning - CNN, RNN, BiLSTM, LSTM, GRU, Residual Network, AutoEncoders, Large Language Model (LLM), T5 Transformer, GPT-3, BERT, MLP-Mixer, RoBERTa, Vision Transformer Natural Language Processing - Text Normalization, Stemming, Lemmatization, Tokenization, TF-IDF, Latent Semantic Analysis, Sentiment Analysis, Part of Speech Tagging, Named Entity Recognition, Transformer Models, Probabilistic Models. Data Mining - BeautifulSoup, Selenium
Database - MySQL, MongoDB, Neo4j, Apache Hive, MS-Excel Cloud Computing - AWS (EC2, Load Balancer, AutoScaling, S3, VPC, Elastic IP), Google Cloud Platform (Compute, Google Sheets, BigQuery, LookerStudio, DataProc)
Operating System - Windows, Macintosh, Linux (Ubuntu) Others - Docker, Git, Analytics, Algorithms, Statistics, Calculus, Linear Algebra, data pipelines, model deployment, cloud-based ML, DevOps for ML
WORK EXPERIENCE
Software Engineer May 2024 - Oct 2024
Inten IT Solutions
● Used large volumes of warranty-claim text data to train a BERT LLM model to identify components that often fail, and helped the company identify them.
● Reduced data retrieval time by almost 40% by switching from on-premise Relational Database to Google BigQuery and optimizing for performance by enabling Clustering, Partitioning, and Caching on frequently used queries.
● Used PyTorch to train deep learning models and extensively used Python programming language at the firm.
● Turned the client's user manual PDF documents into text embeddings and created a search tool using GCP AlloyDB, enabling the client (a manufacturer) to instantly look up relevant information. Short-Term Consultant Aug 2023 – Dec 2023
World Bank Group
● Developed and implemented a semantic segmentation model using PyTorch to detect vehicles in satellite images with 90% precision, enhancing the estimation of the logistic vehicle count at the border.
● Accurate estimation of the logistic assets at the border allowed for increased trust among the trading parties between the countries.
Technical Support Specialist III June 2022 – Jan 2023 The George Washington University – Department of Political Science
● Handled graph data with complex connections and created different types of interactive visualizations that helped the researchers visualize the network of militant organizations across the globe.
● Created a timeline-based hierarchical plot that showed how a group evolved over time i.e., patterns of splinting, and merging at particular timelines allowed researchers to better understand their behavior by associating them with private information.
● Used centrality metrics from the Graph Theory to highlight organizations that have a significant influence on the overall network which allowed the researchers to spot extremists in the large graph with ease. Data Scientist Nov 2017 – Jan 2020
Briggs & Stratton
● Identified patterns in customer purchasing habits and product preferences, and turned data into actionable recommendations that increased sales by 15%.
● Employed advanced analytics such as the Bayesian causal inferencing model to determine the cause of their engine's premature piston ring failure and helped the business fix the problem.
● I analyzed historical sales data to identify trends and patterns. Using this insight, I created a forecast for the quarterly period that accurately predicted the demand. Upon reviewing the results, I suggested optimizing the inventory levels.
● Analyzed data from various sources (e.g., sensors, logs) to detect anomalies or trends that could indicate quality issues in production processes or vehicle performance.
● Built predictive models using various machine learning tools to predict the possibility of equipment failure. TECHNICAL PROJECTS
Neural Machine Translation (English – Czech) Deep Learning Project 2024
● Translated text from English to Czech with an accuracy of 0.856 measured by METEOR score.
● I used SentencePiece, a language-agnostic tokenizer, to efficiently tokenize corpus and address problems with rare and out-of-vocabulary words.
● Wrote the code implementing GPT-3, a large language model (LLM) from scratch using PyTorch. Brain Tumor 3D Segmentation Deep Learning Project 2023
● We identified tumorous cells in the brain with an accuracy of 80% measured by the Jaccard (IoU) score.
● I used PyTorch to train the U-Net, a semantic segmentation model, with 3d convolution for precise localization of tumorous cells in the brain.
US Air Pollution Prediction and Forecast Time Series Forecasting 2022
● Forecasted CO Air Quality Index with an accuracy of 68.11% measured by R2-score.
● Converted the non-stationary signal into the stationary signal by making log and difference transformations.
● Used Generalized Partial Autocorrelation to uncover the order of the Autoregressive and Moving Average processes that generated the data.
● Used the orders with the Levenberg–Marquardt algorithm to estimate the coefficient of the process for forecasting. Credit Card Default Data Analysis 2022
● Classified bank clients who are likely to default on their next credit card bill with an accuracy of 81.3% measured by F1-score.
● Used R with `caret` package to train a decision tree whose rules are interpretable and enable easy decision-making. Skills/Traits/Topics Platforms/Tools/IDEs Langs/Libs - SDKs Virtual Private Cloud
Data & ML Pipeline
Server Administration - Ubuntu
Project Management
Natural Language Processing
(NLP)
Generative AI
Large Language Models (LLM)
Machine Learning
Data Science
Data Analytics
Probabilistic Graphical Models
Big Data & Analytics
Distributed Training
Data Wrangling
NoSQL
Data Management
Data Structures
Data Visualization
Artificial Neural Networks (ANN)
Deep Learning
Transfer Learning
Reinforcement Learning
Evolutionary Algorithms
Time Series Analysis
Decision Trees
Ensemble Models
Boosting algorithms
Geographic Information Survey
Rasterization
DNA transcription
Gene Clustering
Statistical Analysis
Docker
Databricks
Google Cloud Platform (GCP)
Amazon Web Services (AWS)
GCP BigQuery
Vertex AI
GCP Compute Engine
AWS EC2
AWS S3 bucket
VsCode
PyCharm Professional
Jupyter Notebook
GitHub & GitLab
TensorFlow
Keras
PyTorch
Tableau
R Studio
Eclipse IDE for Java Development
QGIS
MongoDB
Neo4j
Apache Hive
Python (Advanced)
R Programming Language
(Advanced)
PySpark (Advanced)
JavaScript (Medium)
JSON (Medium)
PHP (Some)
Java (Some)
.Net (Some)
HTML (Medium)
CSS (Medium)
Spark (Medium)
Hadoop
Ubuntu
Scikit-Learn
Pandas
Numpy
Matplotlib
Plotly
Seaborn
NLTK
Spacy
HuggingFace
visNetwork
ggplot
caret
dplyr
ggraph
Nibabel
BeautifulSoup
Selenium