Data Scientist

Location:

Posted:

December 11, 2017

Resume:

PANKAJ NEGI

*** ***** * *******, * Boulevard, Harrison, NJ 07029 *********@*****.*** https://www.linkedin.com/in/pankaj-negi-34a35145 862-***-**** Education

Rutgers Business School Newark, NJ, US

Master’s in Quantitative Finance Aug 2016- Jan 2018 Courses: Econometrics, Machine Learning, Deep Learning, Financial Time Series, Data Mining. FORE School of Management New Delhi, India

Master of Business Administration, Finance Jun 2013-Mar 2015 Courses: Portfolio Management, Investment Banking, Financial Statement Analysis, Micro and Macro Economics, Optimization Guru Gobind Singh, IP University New Delhi, India

Bachelors of Technology, Electrical & Electronics Aug 2007-Mar 2011 Courses: Linear Algebra, Calculus, Statistics

Work Experience

Hero Fincorp Ltd. New Delhi, India

Lead Associate, Data Science and Product Design Apr 2015-Jul 2016 Auto Loan/PL – Financial product which includes credit lending for the purchase of Hero motorbikes & personal loan

• NLP+CNN: Develop an Embedding using word2vec and implemented CNN model for sentiment analysis of new motorbikes launched using 26,758 reviews collected over 5 different zones(Train accuracy-100, Test accuracy-87.5).

• Artificial Neural Network: Implemented ANN model to predict customer churnout. Evaluated model using keras wrapper

& k-cross validation and achieved accuracy in the range of 81%-86%.

• Improved the model by parameter tuning using Grid Search and was able to achieve accuracy of 85%.

• Logistic Regression(Default Customer):Implemented Logit model based on customer history such as DPD string, zero bounce, salary etc. to predict customer will default on his loan or not.

• Hybrid Model - ANN & Self Organizing Map(Default Customer):Implemented ANN model to predict probability of customer default on his loan. Used SOM to identify the default by assessing mean inter neuron distance,MID. Jindal ITF New Delhi, India

Machine Learning Engineer, Procurement Mar 2012-Jun 2013

• Classification/NLP: Used different classification techniques- LDA,Bag-of-words, Logistic Regression, Decision Tree, Ensemble Methods, SVM to classify proposals to be accepted/not accepted. Implemented K- Nearest neighbor algorithm model to classify crime prone regions in East India.

• Regression: Implemented various regression models: Lasso, Ridge, Elastic Net, Least Angle, Theil-Sen Regression etc. for pricing in order to leverage in vendor negotiation. Research projects

• Recurrent Neural Network & LSTM: Implemented RNN model with LSTM layers for stock price trend prediction using Google stock price data. Evaluated the performance using RMSE(0.4%). Used dropout regularization to prevent overfitting.

• Text Generation using RNN & LSTM: Created char-to-Integer Mapping, input/output patterns from raw dataset(Alice in Wonderland). Created stacked LSTM network to model dataset and checkpoints for best seen model. Used model to generate text for a given seed.

• Configured a EC2 server instance for faster deep learning on GPU; ran model on AWS for faster computation.

• Recommender Systems: Implemented Restricted Boltzman Machine that predicted binary ratings”Like” or “Not Like”. Evaluated model using RMSE(0.4 2) & Average Distance(0.24).

• Implemented stacked autoencoder, Sparse Auto Encoder, with pytorch to predict ratings between 1 to 5.(Test Loss-0.95)

• Used collaborative filtering to built recommender system for the customers using spark and pyspark. Used RMSE, root mean square error to evaluate the prediction(RMSE= 0.52%).

• Apriori: Used associative learning algorithm to find best combinations of products to be put closely in a grocery store so they are brought together by predicting the lift and confidence for a particular combination.(eg.pasta & escalope:confidence- 37%,lift-4.7)

• Clustering(Soccer Scout-Data Incubator semi-finalist): Explored multiple clustering algorithms like K- Means, mini- batch k-means, affinity propagation, mean shift, hierarchal clustering. Ran PCA to reduce random variables from 33 to 7 which explained the maximum variance. Used Elbow method to optimize number of clusters.

• Used Adjusted Rand Index or ARI to evaluate the performance (mini batch k-means -0.47 and k-means -0.49). Technical Skills and Certifications

• National Stock Exchange: Derivatives Analyst Pro, Investment and portfolio management

• Technical Skills: Python(Keras,Theano,TensorFlow, Scikit Learn, NLTK,Word2Vec),Pytorch R, SQL,Tableau, Spark, Pyspark, AWS

Contact this candidate