NEHA K. NAYAK
Chicago, IL (open for relocation) 872-***-**** *******@****.************.*** in/neha-kiran-nayak github.com/NEHAKIRANNAYAK SUMMARY
Data Science graduate student at the Illinois Institute of Technology with 2+ years of hands-on experience in machine learning, big data engineering, and AI-driven analytics. Built and optimized real-time data pipelines processing 100K+ records using Apache Kafka, AWS, and Spark. Developed deep learning and Explainable AI models improving prediction accuracy by up to 15% across healthcare and finance projects. Published author (3 Springer papers) with proven expertise in Python, R, SQL, TensorFlow, and large-scale data systems. SKILLS
• Programming Languages: Python R SQL C
• Databases & Storage: PostgreSQL AWS S3 DynamoDB NoSQL Databases
• Big Data & Streaming: Apache Spark Apache Kafka (v3.5.1) Hadoop Spark Streaming AWS Glue Athena
• Containerization & Orchestration: Docker Kubernetes
• Machine Learning: Supervised Learning (Regression Classification – SVM Random Forest Gradient Boosting) Unsupervised Learning (Clustering Techniques) Model Evaluation & Validation Scikit-learn TensorFlow Keras XAI
• Deep Learning: Neural Networks (CNNs RNNs LSTMs) Image Processing PyTorch TensorFlow Computer Vision
• Natural Language Processing (NLP): Text Mining Sentiment Analysis NLTK spaCy
• Emerging AI Paradigms: Agentic AI Retrieval-Augmented Generation (RAG) Neurosymbolic AI
• Data Visualization: Matplotlib Seaborn Tableau Power BI IBM SPSS
• Statistical Analysis: Descriptive & Inferential Statistics Hypothesis Testing A/B Testing ANOVA
• Cloud Computing & Infrastructure: AWS GCP Azure Server Setup SSH Connections
• Data Engineering Practices: ETL Design Streaming Ingestion Schema Design Data Modeling Pipeline Optimization
• Mathematics for Data Science: Linear Algebra Calculus Differential Equations Graph Theory Probability Sampling
• Software Development & Tools: Git Agile Project Management GitHub Portfolio Jupyter Notebooks WORK EXPERIENCE
Data Science Fellow Build Fellowship, Chicago, IL Feb 2025 – Apr 2025
• Analyzed 100,000+ hospital encounters using Python and R, improving readmission prediction accuracy by 14%.
• Built logistic regression and ANOVA models, identifying risk factors in 95% of high-priority patient cases.
• Created interactive Tableau dashboards, visualizing trends for 200+ hospital staff to optimize patient management strategies.
• Optimized data pipelines, reducing preprocessing time by 35% for faster and reproducible analytics workflows. Data Analyst Kasturba Medical College, Manipal, India Apr 2024 - May 2024
• Processed 1,500+ patient records using Python Pandas, maintaining 100% compliance with data privacy regulations.
• Trained deep learning models with TensorFlow and Keras, improving tumor classification accuracy from 78% to 90%.
• Applied SHAP-based Explainable AI, reducing misclassification by 15% and increasing clinician confidence in predictions.
• Visualized patient trends with Power BI and Seaborn, supporting insights for 50+ healthcare professionals’ decisions. Research Analyst IIIT Allahabad, India Oct 2022 - Jan 2023
• Developed real-time water quality pipeline with Apache Kafka and Spark, reducing data latency by 25%.
• Applied ML models in Scikit-learn, improving water quality classification accuracy from 81% to 93%.
• Managed ETL processes for 500,000+ records, improving data reliability and integration efficiency by 30%.
• Collaborated with UP Government, providing actionable insights impacting 10+ regional environmental monitoring initiatives. EDUCATION
Illinois Institute of Technology, Chicago, IL Aug 2024 - May 2026 Master of Data Science, GPA 3.50
Visvesvaraya Technological University, Bengaluru, KA Aug 2020 - Jun 2024 Bachelors in AI & Data Science, GPA 3.9
PROJECTS
Real-Time Stock Market Data Pipeline Apr 2025 – Jul 2025
• Built a real-time data pipeline using Apache Kafka (v3.5.1), efficiently processing 1M+ livestock records daily.
• Deployed EC2-hosted brokers with Python producers/consumers, achieving 99.8% reliable message delivery across nodes.
• Integrated DynamoDB for low-latency data queries, reducing retrieval time by 40% and enhancing user responsiveness.
• Automated AWS orchestration and monitoring pipelines, improving system scalability, uptime, and fault-tolerance by 30%. Navigation Assistant for Visually Impaired Jan 2025 – Apr 2025
• Developed a real-time pipeline integrating video streams with CNN inference, achieving 95.68% detection accuracy.
• Built an end-to-end computer vision workflow using Python, OpenCV, TensorFlow, ensuring sub-5-second latency.
• Applied statistical validation and visualization to assess detection trends, improving model reliability across test environments.
• Optimized inference architecture and GPU utilization, enhancing processing efficiency and overall system performance by 25%. Driver Drowsiness Detection System Oct 2024 – Dec 2024
• Designed a real-time driver state analytics system using TensorFlow, OpenCV, and dlib, analyzing 100+ frames/second.
• Implemented blink-rate detection on streaming video, achieving 92% accurate driver state monitoring in real-time conditions.
• Built a low-latency inference pipeline with parallel frame processing, reducing alert-generation time by 2 seconds (33% faster).
• Optimized feature engineering and performance, achieving 93.04% accuracy and enabling deployment for 1,000+ drivers. HONORS AND ACHIEVEMENTS
• University Gold Medalist in Artificial Intelligence and Data Science Jun 2024
• High-Speed Visual Navigation for the Visually Impaired with Real-Time Mapping and Voice Interaction, ERCICA. Apr 2024
• Driver Drowsiness Alarm System: A Simulation Using CNNs, Springer Feb 2024
• Analysis and Prediction of PCOD using ML Pipelines and Ensemble Techniques, Springer Apr 2022