Sign in

Software Engineer Data

Boston, MA
June 27, 2018

Contact this candidate



** ****** ****** ***#* Boston MA USA 02215 857-***-****

LinkedIn GitHub Website E-mail GitLab


Demonstrated leader with expert communication skills, with experience managing a team of 15 individuals

• Mark-up Languages/Stylesheets: XSLT, XML, PHP, HTML, CSS

• Programming Languages: JAVA, SQL, MySQL, Python, Scala, R

• Development Tools: SSAS, SSRS, SPSS, Tableau, Power BI, Open Refine, NLP, MATLAB, Octave

• Frameworks/ IDEs: Eclipse, NetBeans, IntelliJ IDEA, R Studio, Stylus Studio, Jupyter Notebook, Zeppelin, InformaticaDT, Julia

• Algorithms Implemented: Linear Regression, Multiple Linear Regression, K-Means Clustering, Hierarchical Clustering, Logistic Regression, SVM, KNN, Anomaly Detection, Random Forest, Decision Trees, Bayesian Neural Networks, Newton Raphson, MCMC, Naïve Bayes, MLE

• Few Libraries Used: Pandas, Numpy, PyMC3, SciPy, Scikit-Learn, Keras, Tensor flow, ggplot2, dplyr, NLTK BeautifulSoup, Sampyl, NLTK [corpus], SpaCy

• Automation Tools: See Test, Selenium, UFT, Mobile labs, Monkey talk, Ranorex, Cucumber, Perfecto cloud GRADUATE TEACHING ASSISTANT – Data Science using Python May 2018 - August 2018

• Help students in homework and labs practical sessions

• Grade students homework, mid-terms and final project under professor’s guidance BUSINESS EXPERIENCE:

SymSoft Solutions LLC, Sacramento CA US (Role: Intern – Data Analytics and Web solutions) May 2017 – Jan 2018

• Utilized classification, clustering, and prediction algorithms for identifying customers who are unable to consume health insurance services for one of California Healthcare client

• Data cleansing activities and data visualization using Mapthat. Almost 70% efforts are required in data cleaning and pre- processing to come up with appropriate results

• Worked for different clients for new web design projects DGS, Rabobank, DWR and SMUD

• Learnt Web Content Management System and performed testing for CMS and Website users

• Accessibility as well as TCP/IP Testing for Rabobank and DWR Project

• Gathered requirements for CMS and Website User functionality for DGS new website design

• Content authoring and content migration activities for SMUD, Rabobank and DWR projects

• Prepared Test Strategy, Test plans [Test cases], Created User Stories in JIRA, and logged defect, re-tested defects for web solutions

Appreciation: Received client appreciation from Rabobank Marketing Team – Starbucks gift card and handwritten note

Infosys Ltd, India (Roles: Technical lead, Technology analyst, Software Engineer) July 2008 - June 2016 Few responsibilities and outcomes include:

• Promoted from Software engineer to Technology Lead within 5 years

• Data wrangling to find out missing tags, special characters, symbols

• Data cleaning and pre-processing to handle missing tags, special characters and symbols using Perl pattern matching

• Data transformation to client specific data (Publication) content using XSLT, Java, Perl, UNIX, InformaticaDT, Stylus Studio

• Delivered project ahead of schedule by fostering continuous communication and identifying opportunities for efficiency

• Built device matrix to define testing strategy which helped to reduce of testing on multiple mobile, tablets and web browsers by 30%, manual efforts reduced(Innovative), Presentation skills

• Developed test strategy, test design, test plan, defect tracker, status reports, Automation framework.

• Expert in leveraging agile and waterfall methodologies and Maintained project progress track and assignment to meet deadline


1. Enron Data

• Applied Zipf’s Law on Enron emails using libraries like matplotlib, BeautifulSoup and NLTK [Python]

• Preformed analysis to count number of emails per user, to find out who emailed who the most 2. Lending Club Dataset

• Found interest rate trends and different variable relations

• Performed exploratory analysis, handled missing data, visualized data in python as well as using Power BI

• Created pipeline for both accepted and rejected loan dataset using Luigi and Docker

• Implemented K-Means clustering to find clusters

• To predict interest rate built models, Linear Regression [Variance score ~0.45], Logistic Regression [Accuracy ~0.92], SVM

[Accuracy ~ 0.89] and RFNN [Accuracy ~0.97] [Python] 3. Amazon Fine Food Reviews

• Performed sentiment classification for Amazon fine food reviews

• Created pipeline: Data Extraction and Conversion - Cleaning and pre-processing - Data Exploratory analysis (using python and Power BI) - Feature Extraction - Sentiment classification and analysis - Luigi Pipeline - Dockerization - Creation of Rest API using Microsoft Azure ML Studio - UI deployment

• Classified sentiments to find top worst and good reviews [Python] 4. Freddie Mac Loan Dataset

• Found out that problem with the sample data instead of 23 columns it had only 22 columns

• Communicated same to Freddie Mac Loan team, appreciated and send data after a month

• Performed basic analysis on it [Python]

5. Movie Reviews

• Applied Zipf’s Law on movie reviews summaries

• Built Linear Regression Model to Examine the growth of reviews by year, and see if there are any trends and found that after year 2000 onwards shows near about linear increment in no. of reviews [80% accuracy]

• Chi-square analysis of the distribution of critics’ picks: across years, across months, across critics, across MPAA ratings


6. NYT API from Books API

• Performed analysis to list categories and bestseller for each category [Python]

• Performed analysis on reviews written on Google to find out whether it is a Financial or Technology company 7. Santander Product Recommendation

• Performed exploratory analysis using Zeppelin

• Created data cleaning pipeline using apache spark

• Build prediction model using decision trees with accuracy ~ 60% and precision ~45%

• Recommended top 10 products to New customers and Top 10 products to existing customers as per consumption[Programming Language Used: Scala]

8. Edgar Dataset

• Performed exploratory analysis using python and created visualization dashboard with Tableau PERSONAL PROJECTS:

Refer my GitHub repository: CERTIFICATIONS:

• Machine Learning by Stanford University – Coursera – Andrew Ng [MATLAB & Octave] May 2018 – Aug 2018 VOLUNTEERING:

• Assistant Organizer for Machine Learning Society Boston [Currently hosting and participating Kaggle competition]

• Write articles / v-logging on LinkedIn - to share knowledge I gained & in turn gain some more knowledge

• Student Ambassador - ACM Sacramento Chapter - Learning new upcoming platforms & encouraging them EDUCATION:

• Northeastern University, Boston, MA, USA

Master of Science in Computer Systems (Focus work: Data Science/Data Analytics) August 2018

• Government College of Engineering, Karad, MH, India Bachelor of Mechanical Engineering, First class with distinction June 2008

Contact this candidate