Data Engineer

Richardson, TX
January 31, 2019

The University of Texas at Dallas August 2018 - Present M.S Information Technology and Management

Dean’s Excellence Scholarship

GITAM University, Hyderabad June 2011 - July 2015

B. Tech., Mechanical Engineering


Analytical Tools: Base SAS, SAS Enterprise Miner, Data Ware Housing, Informatica, SSIS Visualization Tools: Tableau, QlikView, Power BI, Gephi, D3 Languages: SQL, Unix, C, Java, Python (NumPy, Pandas, Scikit-learn, Beautiful Soup) Framework: Hadoop (Hive, Pig), Spark (SparkSQL)


Data Analyst, Infosys, Bangalore, India June 2015 – June 2018

• Operated Data Ware Housing Testing using Informatica and Infosys’s Data Ware House Testing to prevent data anomalies in production environment reducing it by 2%.

• Designed data pipeline for extract-load-transform (ETL) using python for data collection and cleaning, Apache Spark for processing, built dashboards in Tableau for better monitoring of data pipelines reducing data quality issues by 80%.

• Ensured the Cluster is up and running by ensuring high availability between name nodes thereby reducing job failures by 50%.

• Led project management efforts to provide data security, data lineage and other exclusive features of data governance by taking use of Hadoop technologies.

• Proficient in managing SQL database tables and utilized SQL queries to optimize query performance by using Indexes, joins, subqueries. Improved overall product query performance by 30% in production environment.

• Modeled transaction data into warehouse by designing SSIS packages, identified profit up to 1M$ and visualized trends in key metrics and recommended KPI improvement strategies against the business problem. Data Engineer Intern, Bharat Dynamics Limited May 2014 - June 2014

• Predicted changes in final price of Outer Gimbal used in Missile Guidance System, used neural networks, linear regression with various independent variables like different metals, different weight ratio etc. to find lowest target price. ACADEMIC PROJECTS

Marketing Predictive Analysis in SAS August 2018 - December 2018

• Collaborated on SAS Enterprise Miner to predict Optimal housing prices in residential homes at Ames, IOWA. Conducted cluster analysis and segmented customers based on their behavior, implemented various predictive models like logistic, linear, neural networks to identify impact on demographics on house prices.

• Compared predictive models using SAS model comparison node, Fit Statistics results to estimate that Linear regression is a better model for price estimation.

Data Modeling using Python and Visualization using Tableau

• Cleansed Kaggle TITANIC dataset, performed data pre-processing and EDA in python using various packages like seaborn, NumPy, Pandas and developed predictive models like decision trees, Gradient boosting, Naïve Bayes Classifier and KNN, identified random classifier as good predictor with 95.8 score.

• Performed clustering on the dataset using TabPy and visualized in Tableau based on “Survival by passenger class”, “Age Category”, “Embarked Point”.

Data Visualization Using D3 and Gephi and QlikView

• Visualized University’s Food Sales data using D3 and using hierarchical maps, Used SVG containers to contain tree map and link it via JavaScript, created transition/navigation for navigating to inner child elements as a part of hierarchical map to identify which product is the highest sold for each chain in the University.

• Analyzed large networks using Gephi, focused on relationship between Apple and other websites and with itself, identified which website has the strongest connection with Apple. Ran both Fruchterman Reingold and Yifan Hu for clustering by industry, clustered using Internal or External Linkage running the Yifan Hu Proportional algorithm by using Average Weighted Degree.

• Exercised multi views aka Trellis maps using QlikView to track youth Employment, income and expenditure trends over the years globally to identify new markets for Business, dips in youth employment and expenditure over the years. LEADERSHIP & ORGANIZATIONS

Envision, Data Visualization Club – Member Sep 2018 - Present ADDITIONAL INFORMATION

Languages: English, Hindi, Bengali, Assamese, Telugu, Kannada Eligibility: Eligible to work in the U.S. for internships and for full-time employment for up to 36 months (STEM) without sponsorship. Visa Expires on July 2023

