SKILLS
• Data Science and Analytics: Machine Learning, Deep Learning, Statistical Modelling, Hypothesis Testing, A/B Testing, Classification, Regression, RNN & LSTM, Anomaly Detection, Sklearn, SparkML, Pytorch, TensorFlow, Spark Streaming
• Computer Vision & NLP – Object Detection, Image Classification, Image Transformation, Convolutional Neural Network, Generative Adversarial Network, OpenCV, PIL, Topic Modeling, Sentiment Analysis, Entity Recognition, nltk, spaCy, sparkNLP
• Programming Languages and Cloud – Python, R, Java, PySpark, AWS, GCP, Databricks
• ETL, Data Engineering, Data Visualization – Pandas, NumPy, Seaborn, Tableau, PowerBI, EDA, MySQL, Excel, Docker RELEVANT EXPERIENCE & PROJECTS
J. Mack Robinson College of Business, Institute for Insight Atlanta, GA Graduate Research Assistant August 2022 - Present
• Mercedes Benz USA Sprint- Developed an anomaly detection system for MBUSA's parts database using Isolation Forest and Local Outlier Factor. Detected anomalous weight and dimension relationships via feature engineering and deployed models with a dashboard on Google Cloud pipeline. Improved error detection by 70%, reducing shipping costs and enhancing labor efficiency.
• BBB Sprint- Led a team of four graduate assistants in analyzing >300k consumer complaint cases. Engineered features and conducted regression analysis to optimize mediation and improve resolution outcomes. Recommendations from findings reduced resolution time by 20%. CNN Architecture Comparison for Object Bounding Box Regression
• Developed VGG16-based CNN models to detect pet faces in images. Augmented the dataset with horizontally and vertically flipped data. Concluded that modifying the architecture with padding and dropout layers improved localization and flexibility respectively. Model implemented in Tensorflow, achieving an average IOU 0.7 on validation set. Normalized Scoring and Ordering of Highlight Business Features on Yelp Review Dataset
• Performed text analysis via topic modelling and sentiment analysis in GCP using Pyspark to determine the most liked and disliked features of a business based on review text. Developed a normalized popularity score metric to compare the same feature between multiple businesses. Stock Prediction and Web Scraping from Estimize
• Scraped Tesla’s stock data over 3 years from estimize.com and predicted the stock values over the next quarter. Compared regression results of Linear and Polynomial regression, Recurrent Neural Network, and RNN with LSTM. Achieved best R2 of 0.91. Startup Closure Prediction
• Built a binary classification model to predict if startup is likely to close or not based on features derived from a heavily imbalanced dataset. Compared under-sampling and SMOTE to tackle imbalanced data. Used Logistic regression, Random Forest, XGBoost, SVM and KNN; performed cross-validation and achieved best accuracy of 0.88 for the minority class. Gene Identification & Disease Prediction
• Led a team of five colleagues to identify informative genes for different tumors and predict the presence of the tumor in a patient based on data of 1M+ gene expression values. Used Pearson’s correlation and t-tests to classify subjects. Used triggers, procedures, and loops in MySQL. Achieved accuracy of 99%, and improved querying speed by 70% using indexes and subqueries. A/B Testing- Paywall and Player Retention
• Established the effect of moving the first paywall in the android game ‘Cookie Cats’ from level 30 to level 40 on player retention at different timeframes. Used bootstrapping in pandas to confirm deductions. Turtle Survival Alliance (Non-profit, Wildlife Conservation) Lucknow, India Conservation Data Analyst May 2021 - July 2022
• Built an end-to-end pipeline to scrape data on illegal wildlife trade in India, detect effected species and regions, and visualize results on a Folium map in Python. Results were used in project proposals with state and central governments to inhibit illegal turtle trade in India.
• Led a team of four graduate interns to design social media campaigns for ‘World Turtle Day’ and identified significant hashtags and keywords with Chi-square test. Increased online engagement by up to 1000% during the campaigns. Uno Minda New Delhi, India
Senior Marketing Analyst July 2018 – August 2020
• Developed data-driven plans for marketing, RGM, new products, and seasonal schemes, achieving 14% annual growth with >$600K/month revenue. Reduced SKUs by 35%, revamped packaging, and saved >$4K/month via Pareto analysis on packaging data.
• Gave training to sales staff and conducted market research, surveys, and groundwork to increase product market share and profitability. EDUCATION
Georgia State University, J. Mack Robinson College of Business Atlanta, GA Master of Science in Data Science and Analytics – GPA 4.3/4.3 December 2023 (Expected)
• Coursework: Deep Learning, Text Analytics, Statistical Methods, Machine Learning, Scalable Analytics, Predictive Analytics. University of Petroleum and Energy Studies Dehradun, India Bachelor of Technology in Automotive Engineering – GPA 3.2 May 2018 github.com/infiniti21/
linkedin.com/in/devbanerjee21/
Dev Banerjee ********.*****@*****.***
+1-470-***-**** Atlanta, 30324