Ujjawal Dwivedi
Washington DC, ***** 202-***-**** Email LinkedIn GitHub
Education
The George Washington University, Washington, DC
Master of Science, Data Science December 2025
Relevant Coursework: Natural Language Processing, Time Series Analysis and Modeling, Deep Learning, Machine Learning
University of Petroleum and Energy Studies, Dehradun, India
Bachelor of Technology, Electronics IOT June 2020
Publications: Image Classifier with Convolutional Neural Network
Relevant Coursework: Statistics, Linear Algebra, Cloud Computing, Data Structures and Algorithms with Python
Leadership
oUPES Football-Captain
oUPES IEEE Student Chapter Vice President
oUPES Rubik’s Cube Club President /Founder
Technical Skills & Certifications
Programming: Python, SQL, R, Java, Kotlin
Framework & Packages: Sci-kit Learn, PyTorch, Keras, TensorFlow, XGBoost, NLTK, Spacy, BeautifulSoup, Selenium, Apache Hadoop, Apache Spark, Pandas, Numpy, Seaborn, gglpot2, tidyverse, dplyr
Software & Tools: Jupyter Notebook, VScode, Pycharm, Rstudio, MySQL, MongoDB, Neo4j, Google Bigquery, Snowflake, Git/Github, Jira, Kubernetes, MLFlow, AWS EC2, AWS Sagemaker, AWS Lambda, AWS S3, AWS RDS, TensorFlow on GCP.
Certifications: Google GROW Data Analytics, Data Science Professional by IBM, Machine Learning by Deeplearning.AI, Applied Data Science by University of Michigan, Linear Algebra and Calculus by Imperial College of London.
Work Experience
George Washington University Washington, DC
Graduate Research Assistant Aug 2024- Jan 2025
Deployed LSTM and Random Forest models for sentiment classification, achieving 90% accuracy, and utilized NLP pipelines (tokenisation, lemmatisation, NER) for text preprocessing to ensure high-quality model inputs.
Utilized LLMs (e.g., BERT, RoBERTa) for advanced text representation and feature extraction and fine-tuned transformer-based models (e.g., BERT, DistilBERT) for sentiment classification, achieving state-of-the-art performance on benchmark datasets. GitHub
Infosys Limited Gurugram, India
Machine Learning Engineer Apr 2022 – Jan 2024
Engineered automated ML pipelines (XGBoost, TensorFlow) for anomaly detection in terabyte-scale test data, deploying via AWS Lambda with 98% precision; reduced manual analysis by 40% and accelerated defect resolution by 3x through real-time alerts.
Architected CI/CD pipelines (GitHub Actions, Docker, Kubernetes) for automated ML deployment, integrating with QA systems to reduce deployment cycles by 25% and ensure zero downtime during updates.
Preprocessed terabyte-scale datasets (scikit-learn, Apache Spark) with Airflow-orchestrated workflows, ensuring SOC2 compliance via AWS KMS encryption and role-based access; enabled 98% data integrity for downstream model training.
Colt Technologies Gurugram, India
Data Analyst(ML & Predictive Analytics) Oct 2020 – Nov 2021
Engineered scalable ETL pipelines (Apache Spark, HDFS, SQL) to automate ingestion/preprocessing of 10M+ records, slashing latency by 35%; operationalized insights via real-time SQL dashboards driving 30% churn reduction in 6 months through targeted retention campaigns.
Developed and deployed a logistic regression model (scikit-learn) with feature engineering and 5-fold cross-validation, achieving 95% accuracy in predicting churn; insights drove personalized retention campaigns, boosting customer retention by 26% and saving $1.2M annually in recovered revenue.
Bharti Airtel Gurugram, India
Data Analytics Intern May2019 – Jul 2019
Designed telecom network dashboards visualizing CDR, latency, and packet loss metrics, optimizing capacity and reducing service downtime by 20%.
Leveraged Wireshark to analyze telecom traffic, identifying SIP/DDoS vulnerabilities and mitigating risks with security teams, boosting compliance and reliability.
Technical and Research Project Experience
Brain tumor Segmentation (Graduate research project) GitHub
Developed and optimized deep learning models (3D U-Net and Residual U-Net) for segmenting multi-compartment gliomas from post-treatment MRI scans, achieving high Dice scores in critical regions like surrounding FLAIR hyperintensities (0.797) and resection cavities (0.61).
Implemented advanced preprocessing and augmentation techniques, including Z-score normalisation, multi-modal MRI stacking, and transformations like zooming and rotation, enhancing model generalisation and segmentation accuracy in complex medical imaging tasks.
Recommendation system using Spotify data with Machine learning and Neo4j (Graduate Research Project)GitHub
Used Neo4j to create a graph database and mapped the relationships between users, songs, genres, and other musical attributes.
Applied KNN on content-based filtering and collaborative-based filtering. Implemented evaluation metrics to assess the performance of the recommendation system and fine-tuned the KNN models for optimal accuracy and efficiency.
Unbiased Classification of Spatial Strategies in Barnes Maze (Graduate Research Project)
Developed and implemented machine learning models, including KNN, SVM, Decision Trees, CNNs, and reinforcement learning, to classify spatial strategies in the Barnes maze and uncover complex behavioural patterns
Urban Mobility and Traffic Flow Forecasting Using Advanced Time Series Models (Graduate research project)
Designed ARIMA/SARIMA/Holt-Winters models and engineered PCA/SVD pipelines to forecast traffic flow/pollution trends from geospatial datasets (1M+ GPS points), achieving 92% accuracy in short-term predictions and 30% model efficiency gain—techniques scalable to battery sensor data for anomaly detection.