Yash Kattimani
****.***********@*****.*** 703-***-**** linkedin.com/in/yash-kattimani github.com/yashkattimani https://yashkattimani.github.io/
PROFESSIONAL SUMMARY
Data Scientist with 2+ years of experience transforming complex datasets into actionable insights. Recognized as Employee of the Quarter for contributions to product development and AI/ML initiatives, delivering impactful solutions that reduced shipping durations by 33% and accelerated client decision-making by 70%. Proficient in Python, SQL, and cloud platforms, with a proven track record of enhancing operational efficiency and customer engagement through data-driven methodologies. EDUCATION
The George Washington University Washington, DC
Master of Science, Data Science expected May 2025
National Institute of Technology Karnataka Karnataka, India Bachelor of Technology, Electronics and Communication Engineering Aug 2019 - Aug 2023 International Institute of Information Technology, Bangalore Karnataka, India Executive Program in Data Science Oct 2021 - Nov 2022 PROFESSIONAL EXPERIENCE
Pull Logic Inc. Atlanta, GA
AI/ML Data Science Intern Nov 2024 – Present
• Developed end-to-end demand forecasting by building data preparation pipeline using PySpark and Delta Lake for historical data processing (200K+ SKUs), implementing bootstrap-based time series forecasting for confidence interval estimation, reducing forecast error by 18%.
• Engineered real-time inventory and replenishment workflows, integrating key data from Azure SQL to reduce stockouts by 12% and overstock by 30%.
• Built a custom chatbot using OpenAI API, integrating competitor crossover analysis, product availability checks, and an enhanced product finder, resulting in 70% faster client decision-making.
• Collaborated with a cross-functional team to develop a digital twin platform with features like Network View, Supply Variability, Product Availability, and Demand Fulfillment, enhancing operational visibility by 15% and reducing shipping durations by 33%. Data Science Intern Aug 2024 – Nov 2024
• Optimized monthly data processing workflows on Databricks using PySpark and Delta Lake, implementing incremental loading patterns for Azure SQL and Blob Storage sources, reducing pipeline execution time by 25%, enabling daily data refreshes.
• Implemented swapping analysis system to redistribute unproductive inventory to high-performing dealers, leveraging shipment cost and distance optimization algorithms, reducing holding costs by 20%, and boosting dealer productivity by 15%.
• Built an interactive KPI performance dashboard featuring maps, bar plots, and pie charts, enabling comprehensive analysis of product availability, sales, market share, and inventory metrics, leading to a 15% increase in operational efficiency.
• Analyzed regional market data using PySpark on Databricks, processing sales and competitor data across 9 counties to identify dealership expansion opportunities, improving market visibility by 25%. Robokalam Hyderabad, India
Data Science Intern March 2021 – Aug 2022
• Developed an automatic certificate generator, reducing manual processing time by 70% through efficient bulk printing.
• Enhanced emotion detection in online classroom engagement analysis, increasing accuracy by 50% by applying Support Vector Machine (SVM) algorithms.
• Conceptualized an interactive dashboard using Tableau to monitor student performance metrics, enabling timely interventions and improving course completion rates by 15%.
TECHNICAL SKILLS
Programming & Analytics: Python, SQL, Bash, R, PySpark, Apache Spark, Spark SQL, Scala, Supervised and Unsupervised Learning, Time Series Forecasting, Natural Language Processing, Statistical Analysis, Graph Analytics, Deep Learning, Data Engineering Pipelines, Data Warehousing, Data Lakehouse, Generative AI (LLM) Tech Stack: Databricks, Hadoop, Snowflake, BigQuery, Redshift, ETL Orchestration (Apache Airflow, Prefect, dbt), MongoDB, Neo4j, Kafka, Cloud Platforms (AWS, GCP, Azure), Docker, Kubernetes, Jenkins, CI/CD Pipelines, Git, Flask, Streamlit, RESTful APIs, MLOps Principles, Tableau, Power BI, Google Analytics
Certifications: AWS Certified Cloud Practitioner
TECHNICAL PROJECT EXPERIENCE
Sentiment Analysis on Streaming Twitter Data
Built a real-time data pipeline with Apache Kafka and Spark Structured Streaming on Databricks, reducing data processing latency by 80% and providing immediate insights into public opinion trends through sentiment analysis of live Twitter data. Graph-Based Financial Analysis of SEC Corporate Filings Utilized Neo4j and GraphRAG to analyze SEC filings for 16 companies, identifying 5 key financial anomalies and enabling natural language querying, assisting auditors and investors in detecting potential fraudulent activity and prioritizing high-risk companies. Loan Performance Analysis of Fannie Mae Data
Transformed and optimized 141-feature dataset into Parquet format and analyzed using PySpark to identify 21 key features across 100,000+ loans, creating a data-driven approach to loan analysis.