SHRUTI SALIAN
***** ********* *****, *****, ** **613 : 813-***-**** : ac33xy@r.postjobfree.com LinkedIn
PROFESSIONAL SUMMARY
Interested in seeking full-time opportunity in Data Science and Predictive Modeling. Experienced in the Healthcare domain as a Business Intelligence Intern in Visualization, Analytics and Reporting. 3 years of experience as a Software Engineer in Telecom Domain. Expertise in Data Preprocessing, SQL, Excel, Machine Learning, Statistics, Python, R and Tableau.
RELEVANT PROJECTS
HR Analytics - Employee Attrition and Performance (R, Anaconda, R Studio, Caret, ggplot2, dplyr, Shiny):
Performed Data Munging/Data Wrangling using dplyr, explored data distribution using ggplot2 and built Dashboard using Rshiny
Used Caret package and applied various predictive models in R-Studio to analyze factors that contribute to employee churn
ShinyR - HR Analytics Dashboard (click on the link to view Dashboard)
Predictive Modeling for Website Classification (Python, Scikit-Learn, Matplotlib, Pandas, Numpy):
Loaded, explored and applied various predictive models for classifying websites as legitimate, suspicious and phishy
Performed exploratory data analysis using Numpy and Pandas, used Matplotlib for visualizations in Anaconda - Jupyter, Spyder
Used Python libraries to build Regression and Classification models and assessed model quality by error and other metrics
Risk Classification for Prudential Life Insurance (Kaggle, R Studio, H2o)
Performed missing value imputation and handled outliers, variable importance analysis, feature selection from over a hundred attributes. Performed predictive modeling using Naïve Bayes, Recursive Partitioning, Random Forest and GBM
Model comparison suggested that Random Forest comparatively classified with better accuracy
WORK EXPERIENCE
Data Science Intern, TrueMedicines – CA September 2017 to present
Created visualization dashboards using Tableau Public for publishing research on company’s website and marketing purposes
Built statistical plots and graphs using Python packages – Matplotlib, Scipy and Plotly in Anaconda - Jupyter Notebook
Web scrapping using Beautiful Soup and Chrome Web Scraper to scrape reviews for statistical analysis and model building
Used Python NLTK for Natural Language Processing of scraped product reviews to research on benefits of plant based medicines
Performed Data wrangling and exploration using R packages like dplyr, tidyr and ANOVA on clinical data
Used Python Numpy, Pandas on clinical data for distance computation between strains and identification of most similar strains
Business Intelligence Intern, MDVIP – Boca Raton, FL June 2017 to August 2017
Analyzed BIRST dashboards and visualizations charts, KPIs’, metrics to identify areas of improvements, monitor and target outliers
Performed QA of dashboard metrics for validating data accuracy against Salesforce reports, SQL Server records
Analyzed Time-Series data in Excel using Pivots, V-Lookups, interpreted trends on dashboards, SSRS reports for data analysis
Performed data cleaning and automated a Data Merge process using R programming that reduced data processing time by 40%
Carried out clustering on patient data using Truven Health Analytics based on patient demographics for detailed analysis
Experienced in data modeling with Facts and Dimensions tables, Star and Snowflake Schemas and data warehousing
Systems Engineer, TCS – Mumbai, India December 2015 to July 2016
Carried out Unit testing (writing JUNIT) for development side (JAVA) Code testing on eclipse
Charged with automating an end to end workflow in Python for SON systems saving manual efforts by 60%
Utilized IBM Cognos Framework to build data model, publish packages. Built dynamic list and drill down reports, charts using Report Studio and Query Studio based on business requirements to address network related problems, network node usages
Documented workflows, workarounds, mentored and conducted training sessions for lateral joiners
Software Engineer, Tech Mahindra– Mumbai, India September 2013 to December 2015
Created customer segmentation dashboards in Tableau for analysis of Customer Churn and Product Purchase Patterns
Responsible for execution of parallel run functionality for data migration from AMDOCS Enabler billing system 7.5 to Enabler 9.1
Carried out Database validation using MYSQL Queries and validation from backend using Unix/Linux platform.
Worked with structured, unstructured data, various file formats such as XML files, JSON files
Carried out end to end flow of invoicing for validation of price-plans, bundles and U-Verse product usages for different billing cycles
POC: Predictive Analytics – created a prototype for forecasting likelihood of customer churn by building a supervised machine learning model (CART) using open source R programming on R Studio.
Unsupervised Clustering – clustering on customer profile for customer segmentation and finding patterns/trends in data.
EDUCATION
Masters of Science: Business Analytics and Information Systems, Graduated - Dec 2017 CGPA: 3.83/4
University of South Florida – Tampa, Florida
Data Mining, Statistical Data Mining, Big Data, Data Visualization, DBMS, Data Warehousing, R, Python, SAS
Bachelor of Engineering: Electronics, May 2013
University of Mumbai – Mumbai, India
SKILLS
Languages: SQL, R (readxl, ANOVA, Caret, dplyr), Python (Numpy, Pandas, Scikit-learn, Matplotlib), SAS, Java, Unix
Visualization/Reporting: Tableau, BIRST, Cognos, Shiny, Power BI, Salesforce, ggplot2, Matplotlib, Wordcloud
Tools: SAS E-Miner, Weka, Rapid Miner, R-Studio, Dataiku, QlikView, SAS Studio, Spyder, Excel, Jupyter, Anaconda, SQL dev
Data Analytics: Data Mining, Preprocessing, Supervised & Unsupervised Learning, A/B testing, PCA, NLP, Text Mining, NLTK
Database: SQL, MYSQL, Oracle, DB2, MS SQL Server, MS Access
Certifications: R Programming, Inferential and Descriptive Statistics, Data Scientist’s Toolbox (Coursera), Tableau 9 (Udemy), Machine Learning, Python for Data Science (DataCamp), SAS Base Programming 1 Essentials
Methodologies: Scrum, Agile, Waterfall, STLC, SDLC