Sign in

Data Machine

Bradenton, Florida, United States
August 18, 2018

Contact this candidate



***** **** *****, *********, ******* 508-***-**** PROFESSIONAL SUMMARY

• Data Scientist with 3+ years of professional experience using Predictive modeling, Data Visualization, Statistical Analysis using R

• Involved in python supervised deep learning programs and passionate about learning unsupervised deep learning

• Familiar with AWS environment (EC2, Redshift, S3, IAM etc)

• Diverse technical background skills in many areas of information technology

• Experienced in developing different Statistical Machine Learning, Forecasting, Text Analysis and Data Mining Solutions to the various business problems using Python, Tableau, SQL.

• Proficient in Machine Learning and Artificial Intelligence (Linear, Logistic Regression, Multivariate Regression, Random Forest, K-NN, SVM, Natural Language Processing, ANN, CNN, RNN) techniques

• Familiar with Hadoop Ecosystem and Big Data Tools such as HDFS, Sparks 2.0, HiveQL, Pig and other Big Data tools

• Expertise in Python programming using various packages including Numpy, Pandas, SKlearn, Scikitlearn, Tensorflow, Pytorch, CV2 EXPERIENCE

Integra Technologies June 2018 - Present

Data Scientist

• Extensively using open source tools - R Studio(R), SQL for analysis and Spyder(Python) for statistical analysis and building the machine learning models

• Creating statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution Riddhi Siddhi Enterprise December 2016 – December 2017 Data Scientist Intern

• Built Classification & regression models like ANN, SVM, Linear Regression of Machine learning by using SKlearn, Keras, Tensorflow and Matplot packages to predict upcoming sales and roll over customers with accuracy of 85%. Used different technique like estimators and drop method to enhance performance by 2%

• Scrapped, cleaned and performed NLP techniques to comprehend sentiments of customers using NLTK framework

• Developed Statistical Reports in ggplot of R, Seaborn, Plotly, Matplotlib of Python based on customer requirement Plumbob Design April 2015 – August 2016

Data Scientist

• Built SQL Server database repository, used ETL to manage extraction, transformation and population of data files in SQL server. Identified and resolved data issues to ensure optimal performance

• Executed Machine learning techniques like Isolation forest, Decision Tress and Random forest to monitor future cost and usage of raw materials. Refined the models by hyper tuning there parameters to achieve 89.3% accuracy

• Used Regression model analyses using Sckitlearn, SKlearn and plotlib to predict clients and profits margins in future and improved the model

Webdrills August 2012 – July 2013

Data Scientist Intern

• Designed neural network using Keras, fitted layers with rectifier & sigmoidal function, compiled the whole ANN into a classifier to predict the churn rate of customers

• Implemented 100 epoch, divided into 10 batches to iterate through Stochastic gradient descent which trained the set with accuracy of 83.6%

• Improved and tuned performance of ANN by adopting k-fold cross validation and keras drop out method resulting in 90% accurate output


Core Competencies: ETL, Predictive Analytics, Data Visualization, Data Modelling, Machine Learning, Forecasting Languages: Python, R, SQL, SAS

Frameworks: Numpy, Pandas, Matplotlib, Plotly, Seaborn, dplyr, ggplot, Sklearn, NLTK,Tensorflow, Keras, PyTorch, MLlib

Tools: Visual Studios, SSIS, Tableau, Talend Integration, Power BI, Excel, MS Project, Jupyter notebook, Ambari, Qlik View, SSRS, AWS (IAM, S3, EC2), MapReduce Database: SQL Server, MySQL, PostgreSQL, Oracle 11g, MongoDB, HiveQL, Hadoop, HBase Machine learning Skills: SVM, Random Forest, Naïve Bayes, K-NN, Logistic Regression, Linear Regression, Natural Language Processing, ANN, RNN, CNN, Stacked auto encoders, Restricted Boltzmann Machine, LSTM ACADEMIC PROJECTS

Classification of Twitter using NLP June 2018- July 2018

• Created corpus of 981 tweets and following Twitter API wrappers to train Naïve Bayes Model

• Classified sentiments(positive, Negative, Neutral) of people on different Wine Manufacturers using NLTK libraries and NLP techniques with accuracy(Precision, Recall, F1 score of 93.47% on validation set Google Stock Prediction Using RNN March 2018- July 2018

• Designed high dimensionality, stacked LSTM layers to built a robust model with 1.2% data loss, avoided overfitting using drop out regularization on 20% of neurons

• Optimized regressor with powerful Adam's optimizer, fitted RNN to train set with 100 epochs, batch size of 32 to predict continues output

• Predicted stock prices were 80% similar to real ones, visualized to see that, curve behavior compared with real price curve Credit Card Fraud Detection March 2018 – April 2018

• Examined data set of 280,000 rows of credit card transactions to predict fraudulent transaction

• Imputed missing values, visualized it using heatmap, scatter-plot in Seaborn and Matplotlib to estimate correlation.

• Implemented Isolation Forest and Local outlier forest and compared predictions

• Analyzed predictions made on Validation set to compare with training set using classification report using parameters -Accuracy, Precision, Recall, F-1 Score

Happiness Detector (Computer Vision) September 2017 – October 2017

• Implemented OpenCV & Smile Cascades(xml files with filters added) to detect Smiles accurately

• Improved filter performance of application by increasing scaling factor to 1.7 and increased neighbor count to 22 to trigger rectangle detectors only when there is movement in the face pertaining to smile EDUCATION

Northeastern University, Boston, MA M.Sc.Engineering Management May 2018 GPA- 3.4

Related Coursework: Database Management and Design, Data Warehousing & Business Intelligence, Collecting Storing and Retrieving Data (using R), Data Mining, Probability & Statistics

Uttar Pradesh Technical University, India B.Sc. Electronics Engineering May 2015

Contact this candidate