Professional
Experience
Programming Languages / Data Wrangling: Python, R, Java.
Artificial Intelligence : Cortical Learning Algorithm.
Machine Learning : Random-Forests, Clustering (K-Means/Hierarchical), Regression Analysis, Text mining & Unstructured-Data Analytics, Spark MLLib.
Cloud and Big Data : Apache-Spark, Spark-Streaming, Spark-SQL, Cassandra, PySpark, Scala-Spark.
Data Visualization : Tableau, Adobe Illustrator, RAW. Data Scientist
Conceptualized and Designed a Particular Real-Life Equivalent Business Use-Case (Healthcare Data-Analytics): Based on Real-Time sensory (vitals) readings (human heart rate, activity, temperature, orientation, acceleration etc), the current physical activity of the person is predicted (Apache Spark Machine Learning Library : Spark-MLLib using algorithms Decision Tree / Random Forest Models) and the corresponding Anomalous behavior of the heart-rate is detected through the Numenta Cortical Learning Algorithm. Spark Code Development was done both in Scala and PySpark.
Major Contributor in the Design, Development and Implementation of the Backend Architecture of a Scalable Real-Time Streaming Big-Data / Analytics Platform. Data is aggregated from IoT Devices as Event Streams and Injested via the message brokers in Kafka (Data Bus Layer). The Data is then processed
(Compute Layer) where Spark Jobs (Spark-Streaming) calculates Anomaly Detection metrics and Activity Prediction Features. The results are then stored in Cassandra DB from where the services reads it (Service Layer), Jsonifies the data, directs the REST Response to a REST Endpoint from where the UI Code consumes the service. In this framework, Cluster Management is handled by Mesos and the base environment was in Python. The Entire Framework is hosted in a combination of AWS and Heroku.
Extended and incorporated open-source code from the Artificial Intelligence Framework - Numenta
(http://numenta.com/) to build the Big Data Solution. Fine-Tuned and Reverse-Engineered Numenta's Hierarchical Temporal Memory based Cortical Learning Algorithm to detect Anomalies in the Data. Augmented Sequential Pattern Learning Algorithms to decipher Trends in the Data. Re-Engineered the inherent Particle Swarm Optimization Algorithm.
Product Engineering : iHealth - Connected Real-Time Streaming Data Analytics Health-Care Platform.
Demo : https://www.youtube.com/watch?v=jzG2xqIMygI
Architecture : https://ihealthapplication.github.io/Architecture.html
Algorithms : https://ihealthapplication.github.io/Algorithms.html Award for Outstanding Contribution
URL : https://www.slideshare.net/slideshow/embed_code/key/KNdyQLOmV11khq Alpharetta, GA
ObjectFrontier Software Inc. Alpharetta,GA
Technical
Skills
Achievements
Mobile : 434-***-****
Email : *******.***@*****.***
Creativity
KAUSTAV SAHA
Resiliency
Analytical
Leadership
Competencies
Demonstrated
Data Science Professional with 5+ years of Industry Experience (3.5 years in a Silicon Valley based MNC and 1.5 years in Current Job). Data Science Capability and Domain Expertise in Machine Learning, Big Data, Research & Development, End-End Data-Architecture, Predictive Analytics. Executive
Summary
10103 Creek View Court, Alpharetta, GA - 30004
Linkedin : https://www.linkedin.com/in/kaustav-saha-learning Cisco Systems Bangalore, India
Software Engineer (Big Data Engineering)
Mentored, Managed and Lead a Group of 8 in-house engineers (San Jose,CA / Bangalore,India) to develop a Dash-Board based on K-Means and Hierarchical Clustering to keep track of and identify potential Network Profile Key Performance Indicators and User Profile Key Performance Indicators.
Designed and Programmed a Machine Learning Platform based on Random Forests using R that predicted brand impact and helped accelerate sponsor investment by quantifying and communicating the value and effectiveness of sponsor spending.
Coded a Deep-Learning component (Neural-Nets) using Python and libraries like Theano to correlate Image Data to decipher potential advertisement opportunities.
School of Informatik, University of Rostock Rostock, Germany Data Science Engineer, Modeling and Simulation Group 2011 - 2014
Implemented Major Changes in the Back-End Architecture (released to market) that enhanced Query performance 60-fold.
Awarded “Cisco Star” for extraordinary contribution in solving critical customer escalation (Giant European Bank) in 2013.
Developed and Optimized a Java-based framework for Data-Mining on Simulation Performance Data. Solved the problem of Automatically selecting the most Appropriate Simulation Algorithm for specific applications and infrastructures. Programmed the components of the framework, integrated external tools, and re-formulated the Algorithm Selection problem from a Data Mining perspective. UNIVERSITY OF VIRGINIA, Data Science Institute Charlottesville, VA Master of Science in Data-Science, May, 2016
GPA : 4.0/4.0
Course-Work : Cloud and Big Data, Machine Learning, Data Visualization, Statistical Computing, Data-Mining, Linear Models, Capstone Project.
INDIAN INSTITUTE OF TECHNOLOGY (IIT) Kharagpur, India Master of Technology in Computer Science and Engineering, 2011 Bachelor of Technology (Honors.) in Computer Science and Engineering, 2011
Course-Work : Data Structures, Information Retrieval, Intelligent Systems, Software Engineering, Advanced Algorithms, Distributed Systems, Artificial Intelligence, Master Thesis. Masters
Data Science
Capstone
Project
Achievements
Mathematically derived the functional form of a novel Geometric Brownian Motion equation that incorporates the Volatility and Drift of a Time-Series and measures Market Price Impact of Order Flow Imbalance. Made significant contribution to the body of research on Market Impact by employing our method to observe price impact that reduces error by accounting for volatility and discovered significant additional explanatory variables (price, drift of the stock etc) that quantify market impact across market states (volume, time of day, spread) thus enhancing the predictive power of the generated model over standard models. Data was extracted from the High Performance Database kdb+ and subsequent computations were performed in R. Random Forest and Stepwise Regression Algorithms were implemented for the Feature Selection. IEEE Publication - IEEE Systems and Information Engineering Design - Effect of Order Flow Imbalance on Market Impact Across Market States at SIEDS, Charlottesville, VA, USA, May 2016. IEEE URL : http://ieeexplore.ieee.org/document/7489318/?reload=true Full Text URL : https://www.slideshare.net/slideshow/embed_code/key/jTLH51ZpbNr5di
Community Work with the local Baptist Church, Atlanta: Leading a group of volunteers to organize events, manage teams, orchestrating work flows in the Student and Life-Change Ministries.
Winner (Silver) in the Cisco Inter-Department (Engineering) Soccer Championship in 2012.
Cleared the Chartered Financial Analyst (CFA) Level 1 Examination in December, 2013.
Recipient of DAAD (Germany) Scholarship for the Summer Technology Exchange Program in 2008. Student
Work
Experience
Achievements
2008
Research Publication - ACM Digital Library - Data Mining for Simulation Algorithm Selection at SimuTools, Rome, Italy, March 2009.
ACM URL: http://dl.acm.org/citation.cfm?id=1537633 Full Text URL : https://www.slideshare.net/slideshow/embed_code/key/oS0yy5KfggW8fZ Achievements
Leadership
Activities /
Teamwork /
Academic
Distinctions
Education
References will be provided on request.
Foreign Human
Languages
English
Spanish
Native Human
Languages
Bengali
Hindi
2016
Linguistic Proficiency