Sign in

Data Engineer

Round Rock, Texas, United States
August 14, 2018

Contact this candidate



Mobile Phone: 503-***-**** E-mail:


6+ years’ experience as a System Validation Engineer at Intel Corporation

Programming experience in R, Python (NumPy, pandas, SciPy, scikit-learn) and analyzing large datasets – GB of data

Data visualization using python libraries like pandas, NumPy, tableau, R programming

Knowledge of descriptive and inferential statistical concepts like Binomial distribution, Normal distribution, Bayes Rule, Sampling distribution, Hypothesis testing (A/B)

Experience in exploratory DA using R programming to explore relationship between 1 to multiple variables

Experience in Data Wrangling to gather data from multiple sources, to assess its tidiness and quality

Data story telling using Tableau to visualize datasets and highlights trends or patterns in a data set

Knowledge in Supervised Learning Algorithms – Decision Tree, Naïve Bayes, SVM, Ensemble Methods (Bagging, Boosting) and Unsupervised Learning Algorithms – Clustering

Hard-working, dedicated and responsible team player with strong technical, analytical and debugging skills

Self-motivated, time and target-oriented approach with good interpersonal and communication skills



C++, Java, C#, VBScript, JavaScript, JSON, REST API, Python (pandas, NumPy, Matplotlib, SciPy, Scikit Learn, Tweepy), R-programming, MYSQL, SQL-SERVER


Anaconda, Jupyter Notebook, Tableau, Android Studio, Eclipse, Json Validator, Firebase,


Master’s in computer science, Portland State University

Bachelor of Engineering in Computer Science, Madurai Kamaraj University


Data Analyst/ Machine Learning(continuing) Nano-Degree from Udacity


Data Visualization of Flight dataset with Tableau [Project Link]

Objective is to visualize flight delays, cancellations and diversions of flights based on 1 Mil. data records from RITA from various carriers in airports across continental United States

Created worksheets, Dashboards and story to analyze average arrival/departure delays for major airlines and its trend over 12 months considering delays like weather, late flights, security and its impact on flight delays

Computed granular delay data at each airport/supported airline, used maps to visualize data at airports/states

Obtained avg. delays for airlines/airports, useful for travelers to consider airlines to avoid at specific airports

Calculated highest cancellation and diversion rate for each airline and visualized them as 12-month trends

Hypothesis A/B testing to determine users are willing to pay for a new e-commerce website [Project Link]

The e-commerce company has developed a new web page in order to try and increase the number of users who "convert," meaning the number of users who decide to pay for the e-commerce company’s new website

The main goal of this project is to help the company understand if they should implement this new page, keep the old page

Used null/alternate hypothesis and implemented them in two ways: confidence interval (sampling dist. and bootstrapping) and simulation using null hypothesis (calculating P-value)

As a result, we found out that users preferred to use the old page rather than the new pay-page

Confirmed the above results using regression testing approach

Data analysis of medical records from Kaggle to determine scheduled patient appointment show-up [Project Link]

Dataset is a collection of medical appointment information from 100k medical appointments in Brazil

Analysis is focused on predicting whether patients will show up for their scheduled appointments.

Analysis determined that patient show-up is correlated to wait times and longer the wait times, more no-show ups for appointments

Exploratory Data Analysis with R to determine quality of red wine [Project Link]

Used R to explore more than 10 chemical properties of red wine to analyze its quality using univariate, bivariate and multivariate analysis

In Univariate Analysis, used histograms to explore univariable. Most of them normally skewed, except 3 which are long tailed. Used Log transformation to make the distribution looks normal and deal with outliers

In Bivariate Analysis, used boxplots, and frequency polynomial to plot the graphs and view the relationship between wine quality and its chemical properties. Used correlation coefficients to view how variables are correlated between each other using ‘r’ score

In Multivariate Analysis, explored more than 2 variables by grouping through quality variable and related them with other variables and analyzed how multiple variables affects the quality of the wine

Data Wrangling, Analysis and Visualization of Twitter dataset from WeRateDogs [Project Link]

Used twitter dataset of around 5000+ tweets of user WeRateDogs to wrangle, analyze and visualize data

Extracted “retweets”, “favorites”, “retweets count” data by querying Twitter API using tweet ids. Used Python package Tweepy.

Assessed the above Json data, twitter archive from WeRateDogs visually & programmatically for quality/tidiness

Cleaned the assessed data and stored them in high quality, tidy panda data frames

Analyzed and visualized the above wrangled data obtained from different sources

Data Exploration of Bicycle Sharing System using Python3

Explored data(~1GB) of bike share systems for three major cities in the United States Chicago, New York City, and Washington. Various bike share metrics were computed using NumPy/Pandas in Jupyter Notebook

Features explored were popular travel time, popular station and trips, trip duration, user information

Initial implementation in Python3 without using any packages like NumPy and Pandas

Improved execution time for each metric computation with NumPy/Pandas packages by order of magnitude


Intel Corporation, Hillsboro, OR 2011-2016

System Validation Engineer

Hands-on experience in Post Silicon Validation of multiple platforms include Desktops, Mobile and SOCs (System on Chip) for Multimedia and Communication domains include Audio, Graphics, Media, Camera, and Content Protection, WIFI, BT, NFC, 4G, WIDI and protected content like Netflix, Blu-Ray

Experience in Pre-Silicon Validation using emulated FPGA systems for Multimedia and Communication domains include Audio, Media and Graphics

Experience in using various lab tools like Graphics Performance Analyzer, GPU cap Viewer (OGL/OCL) and using benchmark tools like Unigen, heaven, Valley heaven, Stone giant (NVIDIA), Pass mark, Future mark

ColumbiaSoft Corporation, Portland, OR 2010-2011

QA Developer

Implemented automated tests that simulate a variety of business functions including standard, non-standard usage scenarios

Performed various levels of testing including acceptance, integration, functional, regression and stress testing

Implemented black-box, white-box and performance/load testing on DocumentLocator® application

Performed detailed application testing before launching of each product

References: Available upon request

Visa Sponsorship: Not required. Can work for any employer

Contact this candidate