SOURAV AGRAWAL
LinkedIn 248-***-**** *****************@*****.*** GitHub
EDUCATION MS in Data Science, State University of New York At Buffalo (GPA: 3.7/4) Dec 2024 Bachelor of Computer Applications, St Xavier’s College, India Apr 2021 SKILLS Programming Languages and Databases: Python, R, SQL, MySQL, Oracle Machine Learning: Supervised/Unsupervised Learning, NLP, Time-Series Analysis, Neural Networks, CNN, SVM, LightGBM Cloud & Tools: Jupyter Notebooks, RStudio, GitHub, Overleaf, Google Colab, MS Office, Power BI, AWS EXPERIENCE Data Scientist at BNMC, Buffalo, NY Feb 2025 – Current
• Build predictive models to minimize machine downtime by 20% and reduce operational costs by 5%. Implemented real-time data retrieval from sensors and a processing pipeline using Socket.io and Node.js.
• Designed and fine-tuned Artificial Neural Networks and 1D CNNs, achieving over 90% accuracy.
• Worked in a highly cross-functional and agile environment, collaborating with vibration experts & engineers for incremental delivery. Led daily scrum calls to track progress & address blockers. Research & Data Analyst at S.I.T.T.Y, Sacramento, CA May 2025 – July 2025
• Led a citywide transportation equity analysis using Python, GeoPandas, and GIS tools across multiple census tracts.
• Built predictive models for EV adoption and transit usage, achieving 90% accuracy against survey ground truth.
• Built a semi-automated Python pipeline to fetch and integrate data via API calls (U.S. Census, NREL EV charging, GTFS transit) and standardize all spatial data to a common CRS.
• Developed composite scoring models and performed cost-benefit analysis to identify high-priority neighborhoods for clean mobility investments, optimizing resource allocation for maximum impact.
• Created interactive dashboards enabling policymakers to target underserved communities effectively.
• Collaborated with cross-functional teams to design and prepare for community survey validation and hypothesis testing protocols.
Data Analyst at Interactive Manpower Solutions, India Mar 2022 – Aug 2023
• Analyzed complex healthcare data for external clients and integrated data from multiple sources to generate financial reports, helping them track revenue, outstanding payments, and cash flow trends, resulting in a 20% reduction in accounts receivable aging.
• Developed financial dashboards using Power BI, improving the visibility of key financial metrics.
• Extensively leverage complex SQL queries to extract, manipulate, and analyze large datasets. Optimized SQL queries for performance, reducing data retrieval times by 25% while handling large datasets.
• Wrote complex CTEs, window functions, and stored procedures to support advanced financial and operational analytics.
• Utilized pivot tables, VLOOKUP, PowerQuery, and other advanced Excel features to automate financial reporting, saving 8+ hours per week.
• Partnered with cross-functional teams to align data reporting with business objectives. PROJECTS AI-Powered Resume Ranking and Screening System EasyRecruit A full-stack web application designed to automate initial recruitment stages by intelligently scoring and ranking candidate resumes against job descriptions, significantly reducing manual screening time. Tech Stack: Python, Flask, React.js, Google Gemini API, NLP, SQLAlchemy, PostgreSQL, Render, Netlify, Git
• Engineered an advanced NLP pipeline using Google's Gemini Pro API for context-aware entity extraction, improving keyword identification by an estimated 40% over traditional matching and increasing the accuracy of candidate-to-job description scoring.
• Developed a robust Flask backend and REST API to process and score resumes, implementing a weighted algorithm that provided data-driven candidate rankings to reduce manual screening time by an estimated 75%.
• Architected and deployed the end-to-end application on a cloud platform, utilizing Render for the containerized backend and Netlify for the frontend, establishing a CI/CD pipeline that ensured high availability for concurrent users. Home Energy Consumption Forecasting & Optimization Developed a predictive model to forecast and optimize energy consumption in smart homes using time-series analysis and advanced machine learning models to identify potential cost savings. Tech Stack: Python, NumPy, Pandas, Scikit-learn, LightGBM, Matplotlib, Seaborn
• Performed in-depth Exploratory Data Analysis (EDA) on time-series data to identify appliance-specific energy patterns and the impact of weather, engineering key features to improve model performance.
• Trained and evaluated multiple forecasting models, achieving superior accuracy with LightGBM (97.6% for high-load appliances), demonstrating the potential to reduce household energy costs by up to 15% through optimized usage schedules.
Breast Cancer Tumor Classification
Developed and compared multiple ML models to classify breast cancer tumors, with a focus on implementing a Support Vector Machine (SVM) from scratch to demonstrate a foundational understanding of optimization algorithms. Tech Stack: Python, NumPy, Pandas, Scikit-learn, CVXPY
• Implemented and benchmarked Decision Tree, Naïve Bayes, and SVM to classify tumors, systematically evaluating performance to identify the most effective model for the dataset.
• Engineered a unique Support Vector Machine (SVM) from the ground up using Python and the CVXPY optimization library, achieving a final classification accuracy of 95.61% and showcasing a deep understanding of core machine learning principles.