Post Job Free

Resume

Sign in

Data Scientist

Location:
Royersford, PA
Posted:
December 08, 2020

Contact this candidate

Resume:

Surya P. Sunkavalli, PhD

adig58@r.postjobfree.com 801-***-**** www.linkedin.com/in/dr-surya-s-b9aa96156 Spring City, PA, 19475

Data Scientist

Data Science experience with high proficiency in Predictive Modeling, Dashboard reporting, Requirement gathering, Data Warehousing, and ETL Environment.

Worked in various Industries like Financial Services, Retail, Environmental, Healthcare and Manufacture.

Knowledge in statistical and machine learning methods (AI/ML) including Survival Analysis, Time series analysis, Decision Trees, Empirical Bayes, K-NN, Naive Bayes, XGBoost, SVM, Clustering, Random Forests, Linear Regression, Logistic Regression, NLP and Neural Networks (RNN, CNN, LSTM, etc.).

Used Machine Learning concepts in Customer Churn, Revenue Forecasting, Anti-Money Laundering, Fraud Detection, Survival Analysis, Predictive Maintenance, Anomaly Detection, Image Identification, Image classification, and Environmental Prediction.

Extensive experience in Structured and Unstructured Data analysis, Migration, Cleansing, Transformation, manipulation, chart creation, visual representation, Import, and Export through various ETL tools and feature engineering techniques on very large datasets.

Deep knowledge in model validation using Confusion Matrix, F1 Score, AUC – ROC, RMSE, and Cross-Validation.

Experience in using Agile software development methods and DevOps environment.

Expert in all Tableau tools including Tableau Desktop, Tableau Server, and Tableau Prep.

Experienced in requirement gathering/analysis, design, development, testing, and production rollover of reporting and analysis projects.

In-depth knowledge of SDLC, Agile methodology, HP ALM, and QC.

Experienced in Unit Testing, Regression Testing, and User Acceptance Testing.

Excellent communication skills along with the ability to tell a story through data.

Education:

Ph.D. in Chemical Engineering, University of Utah, 2011

Master’s in Environmental Engineering, NITK, India, 2006

Bachelor’s in Chemical Engineering, Bangalore University, India, 2001

TECHNICAL SKILLS:

ML Tools: Pandas, NumPy, Scikit-learn, git, Jupiter, Matplotlib, Seaborn, RStudio, Shiny, OpenCV, TensorFlow, Keras, PySpark, SageMaker

BI Tools: Tableau, Power BI and SSRS

Programming: SQL, NoSQL, VBA, Macros, PowerQuery, VB, R, Python, Scala, Scripting, MATLAB

ETL Tools: Tableau Prep, SSIS, Informatica

Defect management tool: SharePoint, HP ALM, JIRA and QC

Databases: MongoDB, Teradata, Amazon Redshift, Oracle, SQL Server, Bigquery, PostgreSQL

Platforms: Windows, Mac OS, DOS, UNIX, Linux, Ubuntu, Docker, AWS

ERP/CRM: SAP, Salesforce

Professional Experience:

Sr. Data Scientist July 20 – Present

Kohls, Spring City, PA

Responsibilities:

Worked on Demand Forecasting/Optimization framework.

Collected Sales data, Holidays data, Products data and COVID data and loaded into BigData (GCP).

Collected data were preprocessed for missing values, outliers, dimensionality reduction, and feature engineering.

Conducted exploratory data analysis on Sales data, Holidays data, and Products data.

Developed time series forcasting models at Category -Color level, SKU level, Dept - ZIP3 level, DEPT-Sub-class level for both ECOMM and Stores.

Used ARIMA, Prophet, Random forest and LSTM models for time series forecasting.

Validated models using nested cross-validation and performance metrics like SMAPE.

Hyper parameter tuning using Grid search, random search and bayesian optimization

Incorporated best model into Demand forceasting engine and deployed into production system.

Hands-on experience in the Google Cloud Computing environment.

Data Scientist Mar 18 – Apr 2020

Smiths Detection, Spring City, PA

Responsibilities:

Developed an Advanced Predictive Analytics Suite called CORAL based on Predictive Maintenance (PdM)

Installed various IoT sensors on each scanner and started gathering data and loading into BigData (MongoDB).

Collected data were preprocessed for missing values, outliers, dimensionality reduction, and feature engineering.

Conducted exploratory data analysis on operational data, equipment data, and service data.

Developed survivability models for an Encoder, XRay, and Heat Exchanger for each scanner using Diagnostic info, system events, Xray on time, bag info, and service data.

Developed a bag utilization model to predict the bag counts for each scanner in terms of daily and hourly using Time series forecasting.

Developed PdM metrics like Operational Availability and Mean Time Between Failure (MTBF).

Analyzed scanner images using image classification.

Developed Regression models to predict remaining useful lifetime (RUL) like How many days/cycles are left before the component fails?

Developed Classification models to predict failure within a given time window like Will a machine fails in the next N days/cycles?

Developed Survival analysis model for the prediction of failure probability over time.

Incorporated all these models to get a recommendation engine with a ticketing system and deployed into production system.

Hands-on experience in the AWS Cloud Computing environment.

Used Tableau to create CORAL front end including Heat Maps, Geo Maps, Symbol Maps, Pie Charts, Bar Charts, Line Charts, Area Charts, Scatter Plots, Bullet Graphs, and Histograms.

Data Scientist Apr 16 - Feb 2018

Discover Financial Services, Riverwood, IL

Responsibilities:

Worked closely in the project team with Business Analysts, ETL, and Project Manager and update daily status in the Daily Scrums.

SAM (AML Detection): Using various classification and regression techniques, we were able to flag AML trnsactions. The number of investigations drops by 20% without reducing the number of cases referred for more scrutiny.

CRS (Fraud Detection): By comparing each transaction against account history, using machine learning algorithms we were able to assess the likelihood of a transaction being fraudulent. Each transaction is given a fraud score, which represents the probability of a transaction being fraudulent. The model is first customized to the client's data and then updated periodically to cover new fraud patterns.

Developed Investigation Tools (SAM and CRS) for AML and Fraud detection using Tableau as front end.

Developed time series models for revenue forecasting and customer churn modeling for identifying clients at risk for termination.

Environmental Data Scientist Apr 2012 - Mar 2016

Geomega, Boulder, CO

Responsibilities:

Employed a mixture of tools including the full R data science stack and its extensive libraries.

Water samples classification: Performed the data cleansing and utilized unsupervised machine learning algorithms such as K-Means to cluster 800,000 water samples into 60 groups. Analyzed the differences of each cluster and the importance of variables.

Mine water quality prediction: 15 Parameters such as rainfall, air temperature, depth to the water table, and discharge pH were used as training inputs, while sulfate was used as the training output. The system was tested using historical data with over 99% training accuracy.

Water bodies image classification: Focused on using machine learning techniques on unclassified 16-bit aerial imagery collected. Classification algorithms were used for this study. Overall, machine learning algorithms correctly classified roughly 72-75% of all sampled pixels.

Drill core image classification: Classified sulfide percentage concentrations from images are more accurate than the manual estimate of sulfide concentration when compared to XRD analyses. These observations highlight the challenges faced by human operators in accurately estimating mineral percentages, especially at very low concentrations.

Seepage flow rates: Weather conditions (precipitation and temperature) were used to train. To validate this approach, the seepage flow rate predicted by the trained model is compared with the real site monitoring data for those 30 years, which shows high agreement. It is also found that all full-scale peak flow scenarios in the field are fully captured and predicted by the developed model.



Contact this candidate