Data Scientist Python

Location:

Chicago, IL

Posted:

July 24, 2018

Contact this candidate

Resume:

Raunak Sharan

Contact No.: 312-***-**** Email : ac6eez@r.postjobfree.com Location: Chicago, IL 60612, United States

Personal website: https://raunakbits4.wixsite.com/rsportfolio

SUMMARY

• Accomplished Analytics professional: Gained 3+ years of work experience in Data Science, Analytics and Business Intelligence.

• Expertise in identifying, solving, and communicating solutions to challenging analytics problems.

• Ability to work cross functionally across different stakeholders to help implement solutions, explain complex ideas, and develop data and analytics strategies.

• Experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data

Validation, Predictive modeling, Data Visualization, developing different Statistical Machine Learning models.

• Experience in foundational Machine Learning Models and concepts are Regression, Random Forest, Boosting, GBM, expertise in transforming business requirements into analytical models and designing algorithms.

• Experience in using Statistical procedures and Machine Learning Algorithms such as ANOVA, Clustering and Regression and Time Series Analysis to analyze data for further Model Building, hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.

• Experience in implementation of A/B tests, Online learning algorithms Thompson sampling and UCB.

• Applied Deep learning concepts CNN, LSTMS, Gibs Sampling and EM to various projects.

• Strong in statistical programming languages, Python and R

• Adept in Big Data technologies like Hadoop, Hive and cluster computing framework Spark

• Skilled in using dplyr in R and pandas in python for performing exploratory data analysis.

• Experience working on cloud computing services AWS for deploying algorithms.

• Proficient in creating insightful visualizations in Tableau to tell data stories.

• Proficient in deploying the concepts of data science using analytical tools and data-oriented platforms to formulate sophisticated models, insightful visualizations, actionable strategies, and finally tell interpretable and data inspired stories.

• Worked and extracted data from various database sources like Oracle, SQL Server, regularly accessing JIRA tool and other internal issue trackers for Project development.

EDUCATION

• MS in Business Analytics (3.81/4.0) at University of Illinois, IL, United States Jun 2016 - May 2018.

• B.E.(Hons.) Computer Science at BITS Pilani “Top 5 ranked university in India”, India Aug 2009 - May 2013.

EXPERIENCE

Data Analyst Intern at Ensono, Greater Chicago Area, United States Jun 2017 – Aug 2017

Responsibilities:

• Using Amazon’s speech recognition engine created an AWS deployed skill in Python which utilized NLP and predictive modeling to accurately estimate software development times, reducing process time by 50%. (Fetched Innovation Award)

• Built models using Statistical techniques and Machine Learning classification models like XG Boost, SVM, and Random Forest.

• Validated the machine learning classifiers using ROC Curves and Lift Charts.

• Leveraged multiple data sources, resulting in an Apriori unsupervised model for high incidence areas and the development of a systemwide Tableau dashboard.

• Coalesced incidence data using ETL tools and SQL from integrated, CA-based, firm IT systems to develop a centralized data warehouse, allowing for analysis and internal function-based efficiency and resolution reporting.

Environment : Python 3.5.2, SQL, ETL, AWS Lambda, Alexa, Tableau 10.5, Agile methodology, Apriori, SVM, XGBoost

Data Scientist Project Intern at ResearchDone, Greater Chicago Area, United States Jan 2018 – May2018

Responsibilities:

• Parsed financial filings from SEC website using Python, R and PostgreSQL with 98% accuracy to build a cognitive map of companies.

• Performed data Acquisition in R designing and running a database from EDGAR to fetch all 10 K Filings.

• Stored the filings mappings in a PostgreSQL database and parsed out binary characters (files) and removed HTML tags using html2text

python library.

• Used regular expressions in Python to parse and extract certain sections of the filings.

• To do data classification, used term frequency, paragraph breaks, and other indicators to classify paragraphs as related to Identity statement.

• Used Gensim on Google’s pre-trained word2vec model with reduced vocab size to position the companies.

• Built a hierarchal cluster model to group the companies and created dendrograms (cophenetic distances) with complete linkage. The model was able to group companies with same SIC code together.

• Visualized the clusters using networkxx graphs.

• Determined the competitors of a company viz. knn neighbors for the company in the chosen cluster with the same SIC code. Environment : Python 3.5.2, R and PostgreSQL and UNIX

Senior Data Analyst Consultant at Oracle Financial Services Software., Mumbai, India Aug 2013 – Sept 2015

Collaborated with many teams that cater to multiple departments of a Fortune 50:

Client: HDFC Bank (India’s Largest Private sector bank)

Responsibilities:

• Analyzed online payment transactions in CASA Module of Flexcube (Core Banking Solution by Oracle) and modified the software to address messaging lapses, improving the user experience for 32.7 million customers.

• Changes made to define new bank level alert and identification of transactions for sending the SMS and email alerts. Provisioned

Alerts to be sent only on the successful debit reversal of accounts.

• CPSMS integration: Implemented web service as part of product integration to capture MICR codes which enabled tracking and monitoring of the fund disbursement and utilization process by Central Plan Scheme Monitoring System. CPSMS integration catered to funds worth $ 50 billion.

• Expanded utility of high-performance Oracle database in SQL, equipping systems to identify key account information and generate reports by writing stored procedures, optimal queries, and other database objects.

• Developed prediction rules for predicting account risk using a decision tree model.

• Involved in consulting with client and business analysts to determine approach post the requirement gathering stage.

• Produced impact documents and managed APEX an in-house project management portal for waterfall methodology. Environment : SQL, Python, Oracle, Decision Trees, Java, Oracle BI, APEX

Software Engineer at Playgames 24x7, Mumbai, India Oct 2015 – April 2016

Responsibilities:

• Implemented modern REST web service API methods to facilitate improved web presence management practices.

• To further improve web presence hosted projects on bitbucket and automated the build and deployment using Go.

• Added bonus reward point functionality to the game in Java that improved player retention by 22%. Environment : Java, MySQL, RabbitMQ, Git, Go, AWS S3, UNIX, BitBucket, Maven

Research Intern at ITER “World’s largest fusion experiment”, Cadarache, France Jun 2012 – Jul 2012

Assisted the engineering team by studying & coding on the scientific web-based platform:

• Implemented internal framework to align internal technology needs with organizational goals and compiled report of identified potential improvements.

PROJECTS

• Image Caption Generation (CNN + RNN) : Built a model that auto generates captions for Images using a merge architecture of CNN

and RNN. Best BLEU score of 0.52 was obtained at a dropout of 0.2. Also created a Rest API for this model.

• Portfolio Optimization (Pandas): Used Pandas extensively to optimize a financial portfolio and graph the efficient frontier.

• Portfolio Building using Value Investing (Big Data) : Used PySpark to build a portfolio with F-score strategy on 10-year stock data from financial databases yielding a portfolio with stable returns.

• Transfer Learning (Deep Neural Networks): Implemented FFN maxout non-linearity from scratch in python and using Keras in

TensorFlow backend. Performed image classification on trained VGG model using Convolution Neural Network.

• Credit scoring Analysis (Kaggle): Improved the state of the art credit scoring algorithm using stack of tree-based models viz. Random

Forest in H2O thereby predicting the defaulters with ROC value of 0.88.

• Custom News Feed: Using saved stories in Pocket app and stream of RSS feeds into google sheets created a custom news feed using

NLP and Support Vector Machine.

• Target Marketing: Boosted cost effectiveness of direct marketing campaign by USD 16,000 of a veteran’s organization by using

Gradient Boosted Trees on a stratified data sample.

• Jet Fuel Futures Hedging (Derivatives Finance) : Used econometric models to hedge basis futures of gasoline and heating oil to save

$5m for United Airlines

SKILLS

• Languages and Tools – Proficient: SQL, Python, R Programming, Java Intermediate: Hadoop, Hive, Excel, Tableau, PySpark, AWS

• Packages – Python (tensorflow, scikit-learn, keras, nltk, boto, seaborn), R (dplyr, ggplot2).

• Concepts - Data mining, Machine learning, Predictive Modeling, Statistics, Text Mining & NLP, Topic Modeling, Big data, Deep learning.

• Domain ken - Options Pricing (Black Scholes), Portfolio analytics, Hedging commodities, SEO Marketing, Operations OEM Enablement

ACHIEVEMENTS

• 2017 CEO Challenge Summer Internship Data Science Innovation Award (Ensono).

• Merit Certificate Holder by United Nations School Org. (All India Rank 6) ; Ranker at IIT-JEE 2009 exam

• Leadership: Vice-President International Youth Forum; Sports: Gold medalist in Table Tennis

Contact this candidate