YESHWANTH KUMAR THALAPANENI
667-***-**** · ************************@*****.***
SKILLS
Programming Skills: SQL, Python, c, c++,
Practical Knowledge: py spark ML lib plotly NumPy Pandas Scikit-learn KNN Nave Bayes Tableau spreadsheets power bi logistic regression seaborn matplotlib random forest ETL Riskfolio lib Education: UMBC Majors: Data science GPA: 3.78 Graduation Date: May2025 PROJECTS
Sales prediction and inventory management UMBC
TEAM LEAD May 2024
• Built and optimized machine learning models, including Random Forest, Gradient Boosting, and Linear Regression, to predict sales revenue, achieving an R score of over 99% for Random Forest.
• Designed and implemented ETL processes to clean and transform 500,000+ retail transaction records, handling outliers, missing data, and feature engineering to enhance data quality and model performance.
• Conducted exploratory data analysis (EDA) using Python (Pandas, Seaborn, Plotly) to identify high-revenue products, sales trends, and customer purchasing patterns, driving actionable business insights.
• Developed unsupervised learning models using K-Means clustering to segment customers and products, revealing key consumer behavior and sales seasonality trends for targeted marketing strategies.
• Automated data visualization dashboards to present key insights on sales trends, monthly performance, and country- specific analysis using Plotly, improving decision-making for stakeholders. Anti-money laundering detection system UMBC
TEAM LEAD May 2024
• Developed an Anti-Money Laundering Detection System using PySpark and Spark MLlib, analyzing over 31 million financial transactions to detect potential laundering activities with high accuracy.
• Utilized PySpark SQL and Spark Data Frames for data preprocessing, including handling null values, currency conversion analysis, and creating a clean dataset for machine learning model training.
• Engineered a data pipeline using Spark's String Indexer, OneHotEncoder, Vector Assembler, and StandardAero for feature preparation, which improved model training efficiency by 35%.
• Built and evaluated multiple machine learning models, including Logistic Regression and Random Forest, achieving an accuracy of 99.9% and an Area Under ROC of 0.82 for identifying suspicious transactions.
• Conducted comprehensive Exploratory Data Analysis (EDA) using PySpark and Python visualization libraries
(Matplotlib, Seaborn) to uncover patterns in transaction types, payment formats, and currency use, driving deeper insights into laundering risks.
EDA on yelp data UMBC
Solo project December 2023
• Cleaned and preprocessed large Yelp datasets (business, review, user) using Pandas, NumPy, and missingno; filled missing values with statistical measures, removed duplicates, and dropped irrelevant columns to ensure data quality.
• Conducted exploratory data analysis (EDA) on ratings, reviews, and business attributes, utilizing Seaborn, Matplotlib, and Plotly for advanced visualizations, including correlation analysis, boxplots, and bar charts to derive business insights.
• Built and executed data analyses using Python, identifying top-rated food establishments, evaluating the impact of Happy Hour promotions, and analyzing rating distribution with exclamation usage; provided insights for award nominations and customer sentiment.
COURSEWORK
Data Science 2023-Present
Static visualization with python Data management Big data processing financial data science story telling with data science Ethical and legal issues in data science Intro to data science and machine learning