Post Job Free
Sign in

Data Engineer

Location:
Mountain View, CA
Posted:
October 15, 2017

Contact this candidate

Resume:

HARSHA RACHUMALLU

******.*.*.*@*****.*** +1-419-***-**** https://www.linkedin.com/in/sriharsharachumallu/ PROFESSIONAL SUMMARY

• Data science and Application development with 5+ years of experience in addressing business problems using machine learning and web application environment.

• Extensive exposure on analytics project life cycle CRISP-DM (Cross Industry Standard Process for Data Mining) and web applications using SCRUM methodologies.

• Business understanding, Data understanding, Data preparation, Modeling, Evaluation and Deployment.

• Experience of understanding all data points, all systems and their relations in addressing a business problem.

• Involved in Data Collection, Data Engineering, Data Cleansing and Solution Design stages of multiple projects.

• Knowledge in developing end to end web applications and incorporating visual analytics dashboards.

• Experience of supporting project team in budgeting plan adhering to SOW, working on project plan using MPP. EDUCATIONAL QUALIFICATION

• Master of Science in Analytics; Bowling Green State University; Ohio; Jun 2016.

• Bachelors of Technology in Information Technology; Bapatla Engineering College; India; May 2011 TECHNICAL SKILLS

• Machine Learning Algorithms: Logistic Regression, Linear Regression, Decision Tree, Random Forest, Gradient Boosting, SMOTE, TOMEK, SMOTE ENN, Lasso and Ridge Regression, Nearest Neighbor Classifier, Weight of Evidence & Information Value (WOE & IV), K-means clustering, RFM Analysis, DBSCAN, Affinity Propagation, Principal Component Analysis, Support Vector Machines, Naïve Bayes, Auto Regression & Moving Averages.

• Programming Skills: Python, R Programming, Spark, Pig, Hive, JAVA, C#, jQuery, SQL, SAS.

• Frameworks: Hadoop, JSF, Spring, SOAP/REST Web Services, Hibernate, LINQ, MVC.

• Database: Oracle 9i, Oracle 10g, PostGreSQL, SQL Server.

• Visualization Technologies : Power BI, Tableau, Chart.js, D3.js, Matplotlib, Seaborn, ggplot2.

• Methodologies: Agile, Scrum, Kanban, Waterfall.

• Others: Maven, Ant, Jenkins, Nexus, Pentaho data modeler, JIRA, SVN, TFS, SSIS, Alteryx, Azure Cloud Services. WORK EXPERIENCE

Employer: Ernst & Young LLP, USA September 2016 – Present Data Scientist

Email Marketing Modelling

• Designed a model to predict if a customer will respond to marketing campaign based on customer information

• Unbalanced data issue was handled using Synthetic Minority Over Sampling, SMOTE and TOMEK LINK algorithms. Missing data was handled using KNN imputation.

• Developed Random forest and logistic regression models to observe this classification. Fine-tuned models to obtain more recall than accuracy. Tradeoff between False Positives and False Negatives.

• Evaluated models using Recall, F1 Ratio, KS Statistic, Cross Validation and ROC. Digital Grid Systems- Smart Meters as a Service

• Developed Ridge regression model to predict energy consumption of customers. Evaluated model using MAPE

• Developed a model to estimate consumption of various meter types in a substation at an hourly rate. Designed visual analytics for the same

• Performed streaming analytics using Microsoft PAAS components on 15 minute interval data collected by Azure IOT hub

• Engineered visual analytics for Meter Deployment, Meter Operations, Meter Security, Customer use-cases. Advisory Americas Quality

• Developed a multilinear regression model to identify impact of various quality parameters for Service Quality among EY’s engagements

• Designed a classification model to identify risk type depending on certain client survey metrics for all EY’s engagements

HARSHA RACHUMALLU

******.*.*.*@*****.*** +1-419-***-**** https://www.linkedin.com/in/sriharsharachumallu/

• Re engineered User role, time based visualization for all Americas EY’s engagements using embedded Power BI and Chart.js

Employer: Tata Consultancy Services, India November 2011 – July 2015 Data Scientist

Global Reporting Integrated Portal (GRIP)

• Developed a K-means clustering model to identify different customer clauses in OTC FX market trading

• Involved in development of visual analytics on intraday reported trades which occurs for every 15 minutes.

• Worked with different data science teams and provided respective data as required on an ad-hoc request basis

• Assisted both application engineering and data scientist teams in mutual agreements/provisions of data, deployment of production models etc.

Systems Engineer

SIVEX Report Distribution Engine

• Involved in analysis and requirement gathering, solution design, development for replacing derivative market report distribution engine for ABN AMRO

• Devised Single Sign-on (SSO) using firm’s Active Directory (AD) adhering to organization’s information security standards

• Collaborated with stake holders to address critical issues and implemented process improvements

• Worked with different teams to maintain application environments, deployments.

• Presented project progress reports and issues to the senior management. Global Reporting Integrated Portal (GRIP)

• Involved in development and enhancement of global reporting, intraday reporting engines for ABN AMRO

• Coding and enhancing features of the code, according to the client’s requirements

• Assisted client in multi-application deployment in respective servers/systems. Assistant Systems Engineer

EMIR Implementation

• Involved in implementation of European Market Infrastructure Regulation (EMIR) regulatory reporting for ABN AMRO to regulatory Regis-TR

• Involved in enabling Holland Clearing House to report OTC transactions to Regis-TR through ABN AMRO Generic Report Archival System

• Associated with implementation of generic archive system for all reports of ABN AMRO brokerage department

• Associated in developing distribution/redistribution of archived report to registered Email/SFTP using registered encryption algorithms

PROJECTS

Retail Analytics: Rossmann Store Sales Prediction

• Analyzed sales data of Rossmann stores, Germany to design functional prototype that provides improved decision making. Used XGBoost, ggplot2, RMSPE

Big Data: Analysis of Climatic and Temperature data from NCDC

• Performed analysis on 40 years climate data from https://www1.ncdc.noaa.gov/pub/data/noaa/

• Hadoop was used for data storage. Power BI,Spark, Hive and Pig were used to perform analysis on this data Regression on identifying accident leading factors

• Applied data science on ten years of crash data from The Transportation Safety Board of Nebraska (TSBN) to determine factors leading to accidents using SAS E Miner CERTIFICATIONS

• IBM Hadoop Fundations

• IBM Big Data Foundations

• MTA: Database Fundamentals

• Oracle Certified Java Programming



Contact this candidate