Resume

Sign in

Data Scientist

Location:
New York, New York, United States
Posted:
April 30, 2018

Contact this candidate

Resume:

Professional

Experience

**** - ****

Programming Languages / Data Wrangling: Python, R, Java.

Artificial Intelligence : Cortical Learning Algorithm.

Machine Learning : Random-Forests, Clustering (K-Means/Hierarchical), Regression Analysis, Text mining & Unstructured-Data Analytics, Spark MLLib.

Cloud and Big Data : Apache-Spark, Spark-Streaming, Spark-SQL, Cassandra, PySpark, Scala-Spark.

Data Visualization : Tableau, Adobe Illustrator, RAW. Data Scientist

Conceptualized and Designed a Particular Real-Life Equivalent Business Use-Case (Healthcare Data-Analytics): Based on Real-Time sensory (vitals) readings (human heart rate, activity, temperature, orientation, acceleration etc), the current physical activity of the person is predicted (Apache Spark Machine Learning Library : Spark-MLLib using algorithms Decision Tree / Random Forest Models) and the corresponding Anomalous behavior of the heart-rate is detected through the Numenta Cortical Learning Algorithm. Spark Code Development was done both in Scala and PySpark.

Major Contributor in the Design, Development and Implementation of the Backend Architecture of a Scalable Real-Time Streaming Big-Data / Analytics Platform. Data is aggregated from IoT Devices as Event Streams and Injested via the message brokers in Kafka (Data Bus Layer). The Data is then processed

(Compute Layer) where Spark Jobs (Spark-Streaming) calculates Anomaly Detection metrics and Activity Prediction Features. The results are then stored in Cassandra DB from where the services reads it (Service Layer), Jsonifies the data, directs the REST Response to a REST Endpoint from where the UI Code consumes the service. In this framework, Cluster Management is handled by Mesos and the base environment was in Python. The Entire Framework is hosted in a combination of AWS and Heroku.

Extended and incorporated open-source code from the Artificial Intelligence Framework - Numenta

(http://numenta.com/) to build the Big Data Solution. Fine-Tuned and Reverse-Engineered Numenta's Hierarchical Temporal Memory based Cortical Learning Algorithm to detect Anomalies in the Data. Augmented Sequential Pattern Learning Algorithms to decipher Trends in the Data. Re-Engineered the inherent Particle Swarm Optimization Algorithm.

Product Engineering : iHealth - Connected Real-Time Streaming Data Analytics Health-Care Platform.

Demo : https://www.youtube.com/watch?v=jzG2xqIMygI

Architecture : https://ihealthapplication.github.io/Architecture.html

Algorithms : https://ihealthapplication.github.io/Algorithms.html Award for Outstanding Contribution

URL : https://www.slideshare.net/slideshow/embed_code/key/KNdyQLOmV11khq Alpharetta, GA

ObjectFrontier Software Inc. Alpharetta,GA

Technical

Skills

Achievements

Mobile : 434-***-****

Email : ac5aqe@r.postjobfree.com

Creativity

KAUSTAV SAHA

Resiliency

Analytical

Leadership

Competencies

Demonstrated

Data Science Professional with 5+ years of Industry Experience (3.5 years in a Silicon Valley based MNC and 1.5 years in Current Job). Data Science Capability and Domain Expertise in Machine Learning, Big Data, Research & Development, End-End Data-Architecture, Predictive Analytics. Executive

Summary

10103 Creek View Court, Alpharetta, GA - 30004

Linkedin : https://www.linkedin.com/in/kaustav-saha-learning Cisco Systems Bangalore, India

Software Engineer (Big Data Engineering)

Mentored, Managed and Lead a Group of 8 in-house engineers (San Jose,CA / Bangalore,India) to develop a Dash-Board based on K-Means and Hierarchical Clustering to keep track of and identify potential Network Profile Key Performance Indicators and User Profile Key Performance Indicators.

Designed and Programmed a Machine Learning Platform based on Random Forests using R that predicted brand impact and helped accelerate sponsor investment by quantifying and communicating the value and effectiveness of sponsor spending.

Coded a Deep-Learning component (Neural-Nets) using Python and libraries like Theano to correlate Image Data to decipher potential advertisement opportunities.

School of Informatik, University of Rostock Rostock, Germany Data Science Engineer, Modeling and Simulation Group 2011 - 2014

Implemented Major Changes in the Back-End Architecture (released to market) that enhanced Query performance 60-fold.

Awarded “Cisco Star” for extraordinary contribution in solving critical customer escalation (Giant European Bank) in 2013.

Developed and Optimized a Java-based framework for Data-Mining on Simulation Performance Data. Solved the problem of Automatically selecting the most Appropriate Simulation Algorithm for specific applications and infrastructures. Programmed the components of the framework, integrated external tools, and re-formulated the Algorithm Selection problem from a Data Mining perspective. UNIVERSITY OF VIRGINIA, Data Science Institute Charlottesville, VA Master of Science in Data-Science, May, 2016

GPA : 4.0/4.0

Course-Work : Cloud and Big Data, Machine Learning, Data Visualization, Statistical Computing, Data-Mining, Linear Models, Capstone Project.

INDIAN INSTITUTE OF TECHNOLOGY (IIT) Kharagpur, India Master of Technology in Computer Science and Engineering, 2011 Bachelor of Technology (Honors.) in Computer Science and Engineering, 2011

Course-Work : Data Structures, Information Retrieval, Intelligent Systems, Software Engineering, Advanced Algorithms, Distributed Systems, Artificial Intelligence, Master Thesis. Masters

Data Science

Capstone

Project

Achievements

Mathematically derived the functional form of a novel Geometric Brownian Motion equation that incorporates the Volatility and Drift of a Time-Series and measures Market Price Impact of Order Flow Imbalance. Made significant contribution to the body of research on Market Impact by employing our method to observe price impact that reduces error by accounting for volatility and discovered significant additional explanatory variables (price, drift of the stock etc) that quantify market impact across market states (volume, time of day, spread) thus enhancing the predictive power of the generated model over standard models. Data was extracted from the High Performance Database kdb+ and subsequent computations were performed in R. Random Forest and Stepwise Regression Algorithms were implemented for the Feature Selection. IEEE Publication - IEEE Systems and Information Engineering Design - Effect of Order Flow Imbalance on Market Impact Across Market States at SIEDS, Charlottesville, VA, USA, May 2016. IEEE URL : http://ieeexplore.ieee.org/document/7489318/?reload=true Full Text URL : https://www.slideshare.net/slideshow/embed_code/key/jTLH51ZpbNr5di

Community Work with the local Baptist Church, Atlanta: Leading a group of volunteers to organize events, manage teams, orchestrating work flows in the Student and Life-Change Ministries.

Winner (Silver) in the Cisco Inter-Department (Engineering) Soccer Championship in 2012.

Cleared the Chartered Financial Analyst (CFA) Level 1 Examination in December, 2013.

Recipient of DAAD (Germany) Scholarship for the Summer Technology Exchange Program in 2008. Student

Work

Experience

Achievements

2008

Research Publication - ACM Digital Library - Data Mining for Simulation Algorithm Selection at SimuTools, Rome, Italy, March 2009.

ACM URL: http://dl.acm.org/citation.cfm?id=1537633 Full Text URL : https://www.slideshare.net/slideshow/embed_code/key/oS0yy5KfggW8fZ Achievements

Leadership

Activities /

Teamwork /

Academic

Distinctions

Education

References will be provided on request.

Foreign Human

Languages

English

Spanish

Native Human

Languages

Bengali

Hindi

2016

Linguistic Proficiency



Contact this candidate