Sahil Goel
******@***.*** ***, ** St, Apt *, Brooklyn, NY 11220 929-***-****
https://www.linkedin.com/in/sahil-goel-37699699/ https://github.com/sahil5591 EDUCATION:
New York University, New York, USA Expected: May 2020 Masters of Science, Computer Science
Relevant Courses: Neural Networks, Design and Analysis of Algorithms, Big Data Analytics, Machine Learning, Deep Learning GGSIPU, New Delhi, INDIA May 2016
Bachelor of Technology, Electronics and Communication SKILLS:
Programming/Scripting Languages: Python, R, SQL, C Big Data Ecosystems: Hive, Apache Spark (SparkML, SparkSQL), Docker, Hadoop Machine Learning/Deep Learning Frameworks: Scikit-Learn, Data Visualization (Matplotlib, Seaborn, Plotly, ggplot), Keras, TensorFlow, PyTorch, Decision Trees, Boosting, NLP (Sentiment Analysis & Text Analysis), Statistical Analysis (Classification, Clustering, Hypothesis Testing, Regression), Convolutional and Recurrent Neural Networks (LSTM), Predictive Modelling Miscellaneous: Talend and ETL Techniques, MySQL, Data Mining, Data Science Toolbox, Microsoft Excel, Git WORK EXPERIENCE:
TransOrg Analytics, Gurugram, India
Data Scientist (June 2016 – June 2018)
Client: A Japanese Multinational Automotive Manufacturer
• Consolidated large datasets from multiple sources to create a 360-degree customer view to build loyalty scores after deduplication to achieve a 22% reduction in customer base. Also, exploratory data analysis was conducted to generate leads for sales and services Client: India’s Largest Low-Cost Carrier
• Performed sentiment analysis on customer tweets and feedback in python to understand customer's opinion
• Built a K-Means algorithm to classify customers into different segments, which can be targeted for specific campaigns TransOrg’s Look-A-Like Product - Clonizo (http://transorg.com/clonizo/)
• Developed an algorithm with combination of feature reduction techniques, ensemble methods, clustering and distance measures
• Tested algorithm on datasets from different industry verticals like security, telecom, aviation, etc ETL Layer for a Dashboarding tool - Mobisights (http://transorg.com/mobisights/)
• Developed ETL jobs for automating data transfer between multiple databases using Talend Open Data Integration tool Client: An American Multinational Financial Services Corporation
• Identified non-complaint spend at merchants for both English and Non-English-speaking countries using Text Analytics and Cutoff measures based on merchant description obtained through google scrapping Client: A leading Telecom Operator in India
• Improved and tuned churn prediction models of two circles using feature engineering and random forest to achieve a recall of 78% Data Science Intern (April 2016 – May 2016)
Client: A Leading Securities Firm in India
• Tagged multiple accounts owned by same client using Text Analytics, De-duplication using fuzzy string matching algorithms and regular expressions
PROJECTS:
Galaxy Merger Detection (Python, Keras, Convolutional, and Neural Networks, Computer Vision)
• Classified morphologies of different galaxies using Convolutional Neural Networks, image augmentation and preprocessing techniques
Profile Suitability Estimator (Apache Spark, Spark MLib, NLP, Word2vec, Matplotlib, and SparkSQL)
• Leveraged Spark’s Machine learning library to create a pipeline to find word embedding of resume and job description texts based on Google’s Word2vec model. Cosine similarity was used to obtain a matching score between resume and job descriptions. Analyzing Conferences in Twitter (Python (Numpy, Pandas, Plotly, Matplotlib, and Seaborn), Twitter Streaming API, and Tweepy)
• Analyzed tweets related to conferences to find its impact using Social Aviary. Various graphs and statistics like ‘Vertex Growth over time’, ‘Graph Density’, ‘Clustering Coefficient’ were computed Human Activity Recognition (R, XGboost Algorithm)
• Developed a predictive model for classifying human activities data obtained through sensors into walking, walking upstairs, walking downstairs, sitting, standing and laying. Different set of algorithms were compared out of which XGboost gave best results with 82% accuracy
PUBLICATIONS:
Analytics in Banking & Airlines
Sentiment on Fintech Post Demonetization