Data Software Engineer

Tampa, Florida, United States
January 19, 2018

Interested in seeking full-time opportunity in Data Science and Predictive Modeling. Experienced in the Healthcare domain as a Business Intelligence Intern in Visualization, Analytics and Reporting. 3 years of experience as a Software Engineer in Telecom Domain. Expertise in Data Preprocessing, SQL, Excel, Machine Learning, Statistics, Python, R and Tableau.


HR Analytics - Employee Attrition and Performance (R, Anaconda, R Studio, Caret, ggplot2, dplyr, Shiny):

Performed Data Munging/Data Wrangling using dplyr, explored data distribution using ggplot2 and built Dashboard using Rshiny

Used Caret package and applied various predictive models in R-Studio to analyze factors that contribute to employee churn

ShinyR - HR Analytics Dashboard (click on the link to view Dashboard)

Predictive Modeling for Website Classification (Python, Scikit-Learn, Matplotlib, Pandas, Numpy):

Loaded, explored and applied various predictive models for classifying websites as legitimate, suspicious and phishy

Performed exploratory data analysis using Numpy and Pandas, used Matplotlib for visualizations in Anaconda - Jupyter, Spyder

Used Python libraries to build Regression and Classification models and assessed model quality by error and other metrics

Risk Classification for Prudential Life Insurance (Kaggle, R Studio, H2o)

Performed missing value imputation and handled outliers, variable importance analysis, feature selection from over a hundred attributes. Performed predictive modeling using Naïve Bayes, Recursive Partitioning, Random Forest and GBM

Model comparison suggested that Random Forest comparatively classified with better accuracy


Data Science Intern, TrueMedicines – CA September 2017 to present

Created visualization dashboards using Tableau Public for publishing research on company’s website and marketing purposes

Built statistical plots and graphs using Python packages – Matplotlib, Scipy and Plotly in Anaconda - Jupyter Notebook

Web scrapping using Beautiful Soup and Chrome Web Scraper to scrape reviews for statistical analysis and model building

Used Python NLTK for Natural Language Processing of scraped product reviews to research on benefits of plant based medicines

Performed Data wrangling and exploration using R packages like dplyr, tidyr and ANOVA on clinical data

Used Python Numpy, Pandas on clinical data for distance computation between strains and identification of most similar strains

Business Intelligence Intern, MDVIP – Boca Raton, FL June 2017 to August 2017

Analyzed BIRST dashboards and visualizations charts, KPIs’, metrics to identify areas of improvements, monitor and target outliers

Performed QA of dashboard metrics for validating data accuracy against Salesforce reports, SQL Server records

Analyzed Time-Series data in Excel using Pivots, V-Lookups, interpreted trends on dashboards, SSRS reports for data analysis

Performed data cleaning and automated a Data Merge process using R programming that reduced data processing time by 40%

Carried out clustering on patient data using Truven Health Analytics based on patient demographics for detailed analysis

Experienced in data modeling with Facts and Dimensions tables, Star and Snowflake Schemas and data warehousing

Systems Engineer, TCS – Mumbai, India December 2015 to July 2016

Carried out Unit testing (writing JUNIT) for development side (JAVA) Code testing on eclipse

Charged with automating an end to end workflow in Python for SON systems saving manual efforts by 60%

Utilized IBM Cognos Framework to build data model, publish packages. Built dynamic list and drill down reports, charts using Report Studio and Query Studio based on business requirements to address network related problems, network node usages

Documented workflows, workarounds, mentored and conducted training sessions for lateral joiners

Software Engineer, Tech Mahindra– Mumbai, India September 2013 to December 2015

Created customer segmentation dashboards in Tableau for analysis of Customer Churn and Product Purchase Patterns

Responsible for execution of parallel run functionality for data migration from AMDOCS Enabler billing system 7.5 to Enabler 9.1

Carried out Database validation using MYSQL Queries and validation from backend using Unix/Linux platform.

Worked with structured, unstructured data, various file formats such as XML files, JSON files

Carried out end to end flow of invoicing for validation of price-plans, bundles and U-Verse product usages for different billing cycles

POC: Predictive Analytics – created a prototype for forecasting likelihood of customer churn by building a supervised machine learning model (CART) using open source R programming on R Studio.

Unsupervised Clustering – clustering on customer profile for customer segmentation and finding patterns/trends in data.


Masters of Science: Business Analytics and Information Systems, Graduated - Dec 2017 CGPA: 3.83/4

University of South Florida – Tampa, Florida

Data Mining, Statistical Data Mining, Big Data, Data Visualization, DBMS, Data Warehousing, R, Python, SAS

Bachelor of Engineering: Electronics, May 2013

University of Mumbai – Mumbai, India


Languages: SQL, R (readxl, ANOVA, Caret, dplyr), Python (Numpy, Pandas, Scikit-learn, Matplotlib), SAS, Java, Unix

Visualization/Reporting: Tableau, BIRST, Cognos, Shiny, Power BI, Salesforce, ggplot2, Matplotlib, Wordcloud

Tools: SAS E-Miner, Weka, Rapid Miner, R-Studio, Dataiku, QlikView, SAS Studio, Spyder, Excel, Jupyter, Anaconda, SQL dev

Data Analytics: Data Mining, Preprocessing, Supervised & Unsupervised Learning, A/B testing, PCA, NLP, Text Mining, NLTK

Database: SQL, MYSQL, Oracle, DB2, MS SQL Server, MS Access

Certifications: R Programming, Inferential and Descriptive Statistics, Data Scientist’s Toolbox (Coursera), Tableau 9 (Udemy), Machine Learning, Python for Data Science (DataCamp), SAS Base Programming 1 Essentials

Methodologies: Scrum, Agile, Waterfall, STLC, SDLC

