MOHIT
*****@******.***
SUMMARY
A seasoned professional with 5 years of experience in software development and data analysis. Demonstrated expertise in object-oriented programming, Python scripting, ingesting and transforming data from various sources, derive insights from data and dashboards for presentation. Holds master’s degrees in data science and computer science and bachelor’s degree in computer science. I bring over five years of industry experience and a solid foundation in various programming languages, cloud platforms, statistics and machine learning techniques. I am driven by the desire to do meaningful work through technology and am looking to make a positive impact on the business.
EDUCATION
Masters in Applied Data Science Syracuse University.
Masters in Computer Science Blekinge Institute of Technology.
Bachelors in Computer Science Jawaharlal Nehru Technological University, India.
CERTIFICATIONS
Certificate of Advanced Study in Enterprise Technology Leadership
Microsoft Certified: Azure Fundamentals
TECHINICAL SKILLS
Programming Languages: Python, R, C++, Shell Scripting, Java, C, Erlang
Data storage and Processing: Azure Data Studio, MongoDB, Advanced SQL, Microsoft SQL Server, MySQL Frameworks/Platforms/Software: Anaconda, Apache Spark, Databricks, Docker, Flask, Git, Google Analytics, Jupyter, RStudio; Microsoft Office - Microsoft Access, Microsoft Excel (VLOOKUP, XLOOKUP, Optimization, Pivot, Forecast & Prediction using Scenario Manager, StatTools); BI/Visualization Tools - PowerBI, Tableau; Big Data – Airflow, Hadoop, Spark
Cloud Platforms: Azure (Data Lake Storage, Machine Learning Studio, Compute, Data Factory, Synapse Analytics, Fabric
Machine Learning: Rule Based Learning, Unsupervised Learning – Anomaly Detection, Clustering, Dimensionality reduction; Supervised Learning – Trees (Classification, Regression), Forests, Nearest Neighbors, Support Vector Machines, Deep Learning
Math/Statistics: Algebra, ANOVA, Correlation Analysis, Distributions, Hypothesis Testing, Probability, Quantitative Finance, Regression – linear, logistic, multiple; Statistical Experimentation Inference and Significance, Timeseries analysis and decomposition
PROFESSIONAL EXPERIENCE
Data Analyst
March 2023 - Present
iConsult Collaborative, Syracuse
Led a team of four Data Analysts as Senior and reported to Project Manager
Saved 16 human hours per week by publishing PowerBI dashboard that generated insights about employees Microsoft 365 app usage
Reduced time spent on dashboards by 10% with improved interactivity of existing dashboards and assisted C-level executive management in quicker decision-making process, on average at 20% improved
Utilized Power Query Editor to import data from sources, perform data modelling and transformation of datasets before visualization.
Research Assistant
February 2023 – February 2024
Center for Emerging Network Technologies (CENT), Syracuse
Worked as a single person team to create a project plan and developed a prototype using Flask and Python for Prof. Caicedo Bastidas
Implemented the Spectrum Consumption Model Builder and Analysis Tool (SCMBAT) under the IEEE 1900.5.2 standard.
SCMBAT is used for modeling Spectrum consumption and analyzing compatibility between transmitters and receivers sharing common spectrum
Data Scientist
July 2023 – August 2023
BloomOptix LLC and Ramboll US Consulting, Syracuse
Deployed image classification ML models on Azure and developed efficient data pre-processing Python scripts using REST API
Presented comprehensive model validation results to subject matter experts, demonstrating ML model accuracy rate of 94%
Reduced harmful algae identification time from days (current industry practice) to average of 10 seconds for 7 samples by using ML.
Scrum Master
November 2018 - August 2022
Ericsson R&D, Sweden
Led a cross-functional team of 8 as a scrum master doing project management, product backlog refining, sprint planning in Jira
Communicated on behalf of team with external stakeholders like Product owners, Project managers and customers
Guided new hires by daily meets, supervised the progress introducing them to ways of working like design breakdowns, code reviews
Software Developer – 5G Systems
November 2017 - August 2022
Ericsson R&D, Sweden
Improved throughput of 5G SA by 20% over 5G NSA, over 10 times to 4G by collaborating with 50 colleagues across organization
Achieved 50x latency improvement by developing components in C++ deployed in 5G portfolio as a member of 8-member team
Halved turnaround time of external queries and requests by initiating transformation of sprint backlog, and management system.
Rotaract Club of JNTUK, India
July 2012 – June 2015
Secretary – IT and Web Services
Proficiently managed and maintained the club’s website, social media, ensuring up-to-date and engaging content. Effectively led and coordinated a team of web developers, overseeing their work to ensure the website's design, functionality, and user experience met the club's objectives and standards.
Successfully maintained a comprehensive database of member records, streamlining data organization and accessibility.
Played a crucial role in maintaining a database for a local blood bank, managing details of over 500 blood donors. Additionally, played an important role in matching blood donors based on specific blood type and urgent patient needs, ensuring timely and life-saving blood transfusions.
Excellent organizational skills demonstrated through effective coordination of website updates, database maintenance, and financial records management.
Played a crucial role in the club winning the best club award.
PROJECTS
Analyzing ACS To Identify Zip Codes Prone To Poverty
Implemented K-Means, achieving Silhouette score of 0.7, to cluster zip codes and understand commonality leading to poverty
Performed correlation analysis and exploratory data analysis to identify relation between socio-economic factors and poverty
Collected actionable insights backed by statistical data analysis and ML for authorities and social bodies to reduce poverty rates
Providing Actionable Insights To A Health Management Organization
Achieved a sensitivity of 97% and an accuracy of 90% for the model built to predict expensive customers for the organization
Suggested actionable business insights to HMO to maximize profits and customer retention based on dataset provided
Created a dashboard in Shiny to visualize the dataset provided to understand cost driving factors for customers
Visual Analysis of Syracuse City Open Data Portal in Power BI
Derived insights from non-emergency service requests data reported to city and motor vehicle crashes data within city limits
Awarded best individual made dashboard and 3rd overall after presenting the dashboard to the Mayor’s office on Open Data day.
Analysis of Soccer players in EPL 2018-2019 Season
Implemented Python scripts to ETL datasets, both structured and unstructured, from different sources using APIs
Utilized http, json libraries to extract the data, NumPy and Pandas to transform and load the data.
Datasets were preprocessed, analyzed, and summarized to answer analyses questions and derive insights
Fault Detection in District Heating Systems using Unsupervised Algorithms
Explored usage of machine learning for detecting anomalies and faults in district heating systems using clustering techniques
Models were used to identify stations in heating networks where efficiency could be improved by saving on operating costs and fuel
Implemented models in Jupyter notebook with Matplotlib, NumPy, Pandas, Scikit-learn libraries of Python
Performance Evaluation of Cassandra in Virtualized and Non-Virtualized environment
In this research project, performance of Cassandra, a NoSQL database, was compared in virtual, simulated by KVM hypervisor in Linux, and non-virtual environments. Research was conducted as part of master’s thesis work
Evaluated various metrics while running database in both environments to determine effects of virtualization on Cassandra
Machine Learning Model for Crime Classification
Built a classifier in Python for prediction of crime category in San Francisco city based on data provided in Kaggle website. The dataset consists of around one million instances spanned over 12 years
KNN algorithm was chosen over other SVM, Random Forests as it had better prediction accuracy
NumPy, Pandas, Scikit-learn libraries were used to perform exploratory Data Analysis in Spyder IDE.
Developed LLMs and RAGs
Developed several applications to utilize NLP advancements and LLMs to create chatbots and prompt-based applications
Worked on user applications that utilized open source and private LLMs along with RAGs to build domain specific chatbots.
Utilized serverless cloud platforms to run models and deploy them for users
Implemented abstractive and extractive summarization algorithm using NLP techniques