Rajkumar Conjeevaram Mohan
**** ******* **** **, ****** Elm, TX - 75068 +1-202-***-**** **********.****@*****.*** SUMMARY
Data Scientist with strong experience in developing and implementing machine learning models, specializing in time-series forecasting and statistical modeling. Adept at researching, experimenting, and deploying AI/ML solutions across digital platforms to drive business insights and automation. Skilled in the full AI/ML development life-cycle, including model evaluation, optimization, and visualization for high-performance outcomes. Passionate about leveraging cutting-edge AI research to solve complex business challenges. Strong collaborator, working closely with IT and Data Science teams to integrate AI-driven solutions effectively. EDUCATION
The George Washington University, Washington, DC Dec 2023 Master of Science, Data Science (GPA: 3.91)
Relevant courses: Machine Learning, Time Series, Natural Language Processing, Deep Learning Imperial College London, Greater London, UK Nov 2017 Master of Science in Computing (Specialism in Artificial Intelligence) Relevant courses: Advanced Statistical Machine Learning, Intelligent Data Analysis University of Liverpool, Liverpool July 2013
Bachelor of Science with Honors in Computer Information Systems (GPA: 4.0) Degree Classification: First Class Honors
TECHNICAL SKILLS
Coding (Python, R, PySpark, Apache Spark Scala), Data Analysis, Machine Learning Modeling, Software Development (Python, Java), Mathematical Modeling, Web Development (HTML, CSS, JavaScript), Database Development (RDBMS, NoSQL, PL/SQL), Database Modeling (MySQL, Oracle), Project Management, Innovation, Interpersonal Skills, Hard Work and Dedication, Critical Thinking, Data Entry Skills, Coordination Skills, Business Requirements, Business Processes, Budgeting Skills, and Attention to Detail. WORK EXPERIENCE
Software Engineer May 2024 - Jan 2025
Inten IT Solutions
• Loaded large volumes of warranty-claim text data using Python and pre-processed them using the distributed computing platform – PySpark for accelerated computing.
• Used sentence-piece tokenizer to reduce the vocabulary size and make the model memory-efficient.
• For increased reliability, BiLSTM-based RNN was leveraged to train the embedding vector space as it allows the model to better capture the contextual meaning in contrast to shallow/default models.
• Used large volumes of warranty-claim text data to train a BERT LLM model to identify components that often fail, and helped the company identify them.
• The trained embedding model was frozen and used to train the BERT model to identify the components that often fail, and most importantly the reasons mentioned in the claim forms to help the quality control team to revise their manufacturing policies.
• Reduced data retrieval time by almost 40% by switching from on-premise Relational Database to Google BigQuery and optimizing for performance by enabling Clustering, Partitioning, and Caching on frequently used queries.
• Used PyTorch to train deep learning models and extensively used Python programming language at the firm.
• Turned the client's user manual PDF documents into text embeddings and created a search tool using GCP AlloyDB, enabling the client (a manufacturer) to look up relevant information instantly. Tools used: PyCharm for Deep Learning and Machine Learning workloads, MySQL, and BigQuery for storing documents. Used Google Cloud Platform infrastructure for everything. Data Science Consultant Aug 2023 – Jan 2024
World Bank Group
• Downloaded remote-sensing data from PlanetScope to generate training images with bounding boxes (a process called rasterization) covering vehicles on highway lanes using QGIS.
• Generated training images were analyzed for patterns representing stationary and non-stationary vehicles using Python and Matplotlib to determine relevant augmentations.
• Used a road segmentation deep learning model to segment the highway road in aerial images before extracting patches from them to train an object-detection model to detect vehicles on highway lanes.
• Predicted bounding boxes along with additional information enabled differentiating between logistic and personal vehicles.
• Developed and implemented a reliable and scalable framework to estimate trade volume exchanged at the South African border to enhance operational transparency and improve trust among trading parties.
• Applied Deep Learning on satellite images to estimate the count of the logistic assets with other external information to approximate trade volume.
Research Assistant June 2022 – Jan 2023
The George Washington University – Department of Political Science
• Used R programming language to load large data with complex relationships between vertices and pre-processed them accordingly for different visualizations.
• Removed duplicate edges with different descriptions, incorrect years, or relationships for creating a spatial graph that is aimed at displaying connections between militant organizations across the globe.
• Leveraged the metrics from Graph Theory such as betweenness centrality, and degree of centrality to adjust the size of the nodes to represent influential organizations, and colored the nodes to signify the traffic respectively. This allowed researchers to spot extremists in the large network with ease.
• Created an interactive web-based application using R with Shiny framework to display a dashboard of different visualizations to help researchers understand how militant organizations function and evolve.
• Created a timeline-based hierarchical plot that showed how a group evolved over time i.e., patterns of splinting, and merging at particular timelines allowed researchers to better understand their behavior by associating them with private/undisclosed information. Data Scientist Feb 2022 – June 2022
Briggs & Stratton
• Identified patterns in customer purchasing habits and product preferences, and turned data into actionable recommendations that increased sales by 15%.
• Employed advanced analytics such as the Bayesian causal inferencing model to determine the cause of their engine's premature piston ring failure and helped the business fix the problem.
• I analyzed historical sales data to identify trends and patterns. Using this insight, I created a forecast for the quarterly period that accurately predicted the demand. Upon reviewing the results, I suggested optimizing the inventory levels.
• Analyzed data from various sources (e.g., sensors, logs) to detect anomalies or trends that could indicate quality issues in production processes or vehicle performance.
• Built predictive models using various machine learning tools to predict the possibility of equipment failure.
Data Engineer / Scientist
SF Technology Solutions Jan 2018 – Oct 2021
• Developed code to handle large data and performed visualization in R using the `ggplot` library.
• Identified and rectified errors, and inconsistent features, and imputed missing data by Clustering and Gaussian methods.
• Performed relevant preprocessing steps ensuring consistent units across features before fitting statistical machine learning models and visualizing its results.
• Used PySpark to leverage the power of distributed computing for preprocessing large volumes of data.
• Effectively communicated technical findings and insights to non-technical members lucidly.
• Developed and presented analytical insights on medical and other data.
• Used AWS cloud services for computing resources and deploying the application.
• Create several types of data visualizations using Python and Tableau.
• Collected data needs and requirements by Interacting with the other departments.
• Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
• Implemented various visualizations using Matplotlib to visualize the data and for limpid communication.
• Developed Machine Learning algorithms like Classification, Regression, and Deep Learning using Python and optimized them to deliver the best performance out of the available data.
• I used GitHub to maintain versions and collaborate with the team.
• Have created containers using Docker to ensure compatibility with different environments.
• Developed PySpark code to process the data on Amazon EMR to perform the necessary transformations based on the STMs developed.
• Worked on different data formats such as JSON and XML and performed machine learning algorithms in Python.
Database Developer
AXS Technologies Aug 2013 – Nov 2017
• Responsible for designing databases that meet the application requirements and creating Entity Relationship diagrams.
• Optimized databases using appropriate techniques such as table normalization, indexing, and setting appropriate cache size.
• On top of the back-end logic written in the application, an additional layer of security is written using PL/SQL to prevent SQL injection by verifying whether the input data complies with the expected format and standards.
• Team player with strong experience in solving complex problems.
• Created MySQL PL/SQL routines triggered by events such as attempting to insert duplicate records, deletion of records, and memory-related issues.
• Involved in writing optimal SQL queries for efficient retrieval of data.
• Involved in automation tasks such as triggering Java code that leverages Selenium to scrap information off the web by writing shell scripts.
• Developed a webpage that enables Managers to visualize sales performance. TECHNICAL PROJECTS
Neural Machine Translation (English – Czech) Deep Learning Project 2024
• Translated text from English to Czech with an accuracy of 0.856 measured by METEOR score.
• SentencePiece, a language-agnostic tokenizer, was employed to minimize vocabulary size and efficiently train the LLM.
• Wrote the code implementing GPT-3, a large language model (LLM) from scratch using PyTorch. Brain Tumor 3D Segmentation Deep Learning Project 2023
• U-Net, a semantic segmentation model was trained on BraTS 2017-2020 challenge dataset using GPU.
• Employed skip connections to ensure continued gradient flow during the backpropagation to train a deep network.
• The deep learning model was able to precisely localize tumorous cells in the human brain with an accuracy of 80% measured by the Jaccard (IoU) score and aid doctors with earlier diagnoses of cancer.
US Air Pollution Prediction and Forecast Time Series Forecasting 2022
• Forecasted CO Air Quality Index with an accuracy of 68.11% measured by R2-score.
• Converted the non-stationary signal into the stationary signal by performing relevant transformations such as logarithmic and differencing.
• Used Generalized Partial Autocorrelation to uncover the order of the Autoregressive and Moving Average processes that generated the data.
• Used the orders with the Levenberg–Marquardt algorithm to estimate the coefficient of the process for forecasting.
Credit Card Default Data Analysis 2022
• Classified bank clients who are likely to default on their next credit card bill with an accuracy of 81.3% measured by F1-score.
• Used simple decision tree model to fit the data and used the optimized rules, which is interpretable, for decision making purposes.
• Mitigated the risks of financial insolvency by meticulously selecting the correct balance between the model’s sensitivity in recalling potential high-risk and low-risk clients so the institution do not lose potential clients while attempting to recognize high-risk clients.
• Used R with `caret` package to train a decision tree whose rules are interpretable and enable easy decision-making.
Skills/Topics Platforms/Tools/IDEs Langs/Libs - SDKs Virtual Private Cloud
Data & ML Pipeline
Server Administration -
Ubuntu
Project Management
Natural Language Processing
(NLP)
Computer Vision
Generative AI
Large Language Models
(LLM)
Machine Learning
Data Science
Data Analytics
Probabilistic Graphical
Models
Big Data & Analytics
Distributed Training
Data Wrangling
NoSQL
Data Management
Data Structures
Data Visualization
Artificial Neural Networks
(ANN)
Deep Learning
Transfer Learning
Reinforcement Learning
Evolutionary Algorithms
Time Series Analysis
Decision Trees
Ensemble Models
Boosting algorithms
Geographic Information
Survey
Rasterization
DNA transcription
Gene Clustering
Statistical Analysis
Docker
Databricks
Google Cloud Platform (GCP)
Amazon Web Services (AWS)
GCP BigQuery
Vertex AI
GCP Compute Engine
AWS EC2
AWS S3 bucket
VsCode
PyCharm Professional
Jupyter Notebook
GitHub & GitLab
TensorFlow
Keras
PyTorch
Tableau
R Studio
Eclipse IDE for Java
Development
QGIS
MongoDB
Neo4j
Apache Hive
MLOPs
CI/CD Pipeline
Google Cloud Storage
Google Bigtable
Python (Advanced)
R Programming Language
(Advanced)
PySpark (Advanced)
JavaScript (Medium)
JSON (Medium)
PHP (Some)
Java (Some)
.Net (Some)
HTML (Medium)
CSS (Medium)
Spark (Medium)
Hadoop
Ubuntu
Scikit-Learn
Pandas
Numpy
Matplotlib
Plotly
Seaborn
NLTK
Spacy
HuggingFace
visNetwork
ggplot
caret
dplyr
ggraph
Nibabel
BeautifulSoup
Selenium