Data Sales

Location:

Columbus, OH

Posted:

February 07, 2018

Contact this candidate

Resume:

YASIR MODAK

Data Scientist/R Developer at Eli Lilly and Company, IN.

Contact: 817-***-**** Email: **********.**@*****.***

SUMMARY

A pioneering Data Scientist with over 3 years of professional experience including pharmaceutical industry, automotive industry and e-commerce industry. Possess experience in working every stage of product life cycle along with enhanced background in wide spectrum of areas such as Machine Learning (Supervised, Unsupervised), Data Acquisition, Exploratory Data Analysis, Time series analysis, Spatial Data Analysis, Data Mining, Text Analysis, Data Modeling, Predictive Modeling, and Technical Documentation. Possess strong domain expertise in Healthcare Sectors and Data Management Technologies involved including Hadoop HDFS, Apache Spark, HIVE, Oracle, and SQL. Adept in statistical programming languages like R (Caret, dplyr, data.table, sparklyR, GGPLOT, MASS, Shiny), Python (NumPy, Pandas, Matplotlib, SciKit-Learn, Bokeh) along with hands on implementation of machine learning algorithms (k-NN, Ensemble Methods; Boosting, Bagging, Support Vector Machines (SVM), Random Forests) in production environment.

Willing to relocate: Anywhere in the Continental US

Authorized to work in the US

EDUCATION

The University of Texas at Arlington - Master of Science in Mechanical Engineering (GPA-3.3)

(May 2016)

PROFESSIONAL WORK EXPERIENCE

Data Scientist/R Developer– Eli Lilly and Company

Indianapolis, IN – June 2016 to Present

Eli Lilly and Company is among the top 10 pharmaceutical companies in the world and pioneering in its field to discovery of new vaccines. I am involved in multiple statistical projects; one of them is to predict the class of genetic mutation base on clinical evidence (text). I have also successfully deployed a machine-learning model to forecast next 3 years revenue from anti-diabetic vaccine sales in production, built on R and Shiny. Extensively used cluster computing to handle large-scale data aggregation and computation.

Responsibilities:

Built data pipelines using MS SQL Server by performing necessary ETL tasks.

Performed Exploratory Data Analysis using R and Apache Spark.

Performed Data Cleaning, features scaling, features engineering.

Performed text analysis, tf-idf analysis.

Performed PCA (SVD) to extract dominating features and reduce the dimension to overcome the curse of dimensionality (dimensionality reduction).

Built a multi class classification model using advanced machine learning boosting algorithm (xgboost) to predict gene mutation.

Performed spatial data analysis by spotting the pattern, locating clusters using spatstat, dbscan.

Developed a shiny app to highlight Bayesian analysis and performed visualizations with ggplot2.

Released R packages for internal use on GitHub version control portal.

Built a forecasting model to predict future sales for anti-diabetes vaccines in global market.

Built multiple time-series models like ARIMA, ARIMAX (Dynamic Regression), TBATS, ETS.

Performed unit testing, debugging and code optimization in Agile/Scrum environment.

Environment: Excel, Unix, Oracle, HADOOP (HDFS), Apache Spark, R Studio, Python, JAVA, Tableau, SQL.

Data Scientist Intern– Daimler

Portland, OR – January 2015 to July 2015

Daimler is a leading German multinational automobile manufacturer at Portland, OR. At Daimler, I have contributed in multiple data science projects; recently I worked on a project that asked for building a predictive model that utilizes text as features to predict repair hours for trucks. Further, we also formulated a recommender system, which facilitated an enhanced approach for configuration of trucks and built a forecasting model to predict future sales.

Responsibilities:

• Examined the existing MS SQL Server database, collected statistics to learn about user behavior.

• Merged user data from multiple data sources by writing SQL queries.

• Performed Exploratory Data Analysis using Python and HIVE.

• Performed spatial data analysis by spotting the pattern, locating clusters using spatstat, dbscan.

• In real-time association rules were implemented which uses prior probabilities.

• Performed Data Mining in Python(NLTK package).

• Established dimensionality reduction by SVD, 1500 data codes were transitioned into 24 different unique features.

• Developed Performance metrics to evaluate Algorithm's performance.

•Performed time series analysis with stationarity tests, change point analysis (PELT, Binary Segmentation).

• Built a forecasting model to predict future sales in R and Shiny.

• Performed Sentiment Analysis in R to capture insights from social media comments.

• Performed data visualization on the front end by using Tableau.

Environment: Excel, Unix, Oracle, HADOOP (HDFS), R Studio, Python, SAP HANA, JAVA, HIVE, Tableau.

Data Scientist– Infinite Solutions

India – June 2013 to July 2014

The client is a leading e-commerce company, which has millions of customers who shop online over a range of thousands of products. The transactions made produce enormous amount of data, which can provides insights about the products that are in demand, consumer habits, consumer demands, marginal profits from the sales etc. Our aim is to make wise use of such data to develop a recommender engine, which can learn from past data of the customer transactions and recommend relevant options of new products to the customers. This enhances the customer experience as well as increases sales. By implementing this project, we were able to achieve 5% increase in overall sales revenue.

Responsibilities:

• Examined the existing database MS SQL Server and performed data acquisition tasks.

• Merged user data from multiple data sources by writing SQL queries.

• Used Collaborative Filtering with Latent Factors model to build a recommender engine.

• Performed extensive implicit as well as explicit data collection.

• Performed Exploratory Data Analysis using R and Hadoop(HDFS).

• Prototype machine learning algorithm for POC (Proof of Concept).

• Performed Data Cleaning, handled missing data, outliers, features scaling, and features engineering.

• Developed Performance metrics to evaluate Algorithm's performance.

• Calculated RMSE score, F-SCORE, PRECISION, RECALL, and A/B testing to evaluate recommender's performance.

• Addressed the over-fitting by adding regularization (lasso / ridge) term in the algorithm.

• Fine-tuned low bias and high variance trade off.

• Performed data visualization on the front end by using R Shiny.

Environment: Excel, Unix, Oracle, HADOOP (HDFS), SQL, R Studio, Python, MAHOUT, JAVA, HIVE.

Contact this candidate