Data Engineer

Location:

Evans City, PA, 16033

Posted:

February 13, 2017

Contact this candidate

Resume:

Ashok Matta

SUMMARY

Big Data Scientist and Engineer with 10 years of experience in building decision support systems based on multi-disciplinary hybrid modeling.

Extensive experience in Data Collection, Data Processing, Data Engineering, Data Analysis, Data Modeling and Data Visualization

Experience in building descriptive, predictive and prescriptive models containing Structured, Unstructured, Relational, Temporal and Geospatial Data silos

Experience in creating data pipeline, data wrangling, transformation, lambda architecture and in model evaluation metrics to select the most optimal model for deployment

Experience in coding ETL scripts in multiple platforms like Apache Hadoop, Teradata Aster, Hortonworks, Trifacta Wrangler etc.

Experience in extracting relevant features from a large dataset that may contain bad records, partial records, errors, or other forms of “noise” in a wide range of possible formats, including JSON, XML, raw text logs, industry-specific encodings, and graph link data

Advanced statistics modeler with experience in Machine Learning, Bayesian Belief Networks, Support vector machines and several regression methods in MATLAB, Python, R, Amazon ML, Teradata Aster Analytics engine, BayesiaLab and Spark MLLIB.

Developed Artificial Neural Network Models in MATLAB, Neurosolutions TM, Neural Designer as well as Automated Modeling software Eureqa TM software and deployed them to field environment

Conducted experiments and models with AWS Machine Learning service

Experience in Machine Learning Algorithms for Regression (Elasticnet, Ridge, Lasso, Linear, Logistic, Ordinal etc Two-Class Classification, Multi-Class Classification, Clustering, Anomaly Detection tasks

Experienced in design and deployment of rich graphic visualizations in Tableau, Bokeh and Orange

Experienced in Cloudera HUE Environment (HUE UI, Server and DB), Talend, Hortonworks, Teradata

With AWS Machine Learning service, you can easily conduct experiments and test your concepts.

Trained in Apache Spark, Apache Hadoop, Mahout Pig, HIVE, HBase, Tableau, Flume, Ozie

In-depth knowledge on Tableau Desktop and Tableau Server for Data Visualization

Creative problem solver with knowledge in Artificial Intelligence, Big Data Ecosystem, Internet of Things platforms

10 years of experience in Geodatabase management, working with GIS vector, raster and grid formats and using GIS tools optimized for water management modeling needs which including area-weighting, buffering, clipping, Theissen-polygon generation, auto-connectivity, spatial auditing and many more.

EMPLOYMENT HISTORY

Big Data Scientist and Engineer, VIRPIE Tech May 2016 – Present

Project: Analysis of Medicare Plans

Coordinated with Subject Matter Experts to create a data engineering and analysis pipeline to evaluate the Medicare plans in the United States for designing a recommendation engine which will eventually offered as an app to customers

Used Apache Sqoop to transfer existing databases into Apache Hadoop HDFS from Oracle and Teradata database formats.

Used pig latin scripts to convert JSON format data into CSV format and perform other essential operations like changing column data types, filtering based on various criteria, splitting columns on a delimiter, joining and aggregating multiple data sources, and reordering columns.

Used Trifacta Data Wrangler to explore, describe and audit the quality of the data fused from multiple sources

Created a data lake which was ingested with different sources and format based files with implementation of scripts to extract, load and transform all data related to Medicare plans to perform analytics.

Created HIVE external tables and used HiveQL queries to partition data and analysis for the business case

Analysis of Medicare plans to compare plan offerings by various criteria to select suitable plan for the Members

Analysis of Medicare plans to compare plan offerings to design suitable benefit plan for different regions.

Project: Analysis of 100-year Rainfall Data for several rain gauges and Water Quality Analysis

Conducted Rainfall Data Analysis of several rain gauges over 100 years period using Apache Spark on top of HDFS

Real time data usage feasibility study (Quality Check in Real Time Streaming, fault tolerant sensor data management strategies etc.)

Rainfall Data ETL in spark and data analysis using SparkR

Developed Tableau workbooks from multiple data sources like water quality data, water level data, rainfall data in different formats using Data Blending. Cleaning and blending multiple data sources to allow for different views on application data in a single dashboard.

Developed Tableau visualizations and dashboards to track pollutants using filters, Parameters and calculated Sets. Involved in building, publishing Customized Dashboards of water quality parameters and reports to Tableau Server.

Water Resources Data Engineer, Wade Trim Inc. August, 2009 - May, 2016

Locations: Pittsburgh, PA; Cleveland, OH

Clients: ALCOSAN, NEORSD, West Westmoreland Municipal Authority, Oakland County, MI

Spatial data analysis using Average Nearest Neighbor, High/Low Clustering, Incremental Spatial Autocorrelation, Spatial Autocorrelation and Multi-Distance Spatial Cluster Analysis (Ripley's k-function) methods

Devised the procedure for Automated watershed delineation

Used high-resolution DEMs to quickly discretize and parameterize models;

Created detailed catchment networks with hydraulic routing incorporated.

Generated risk maps, iso-lines, velocity vectors, plan and profile animations, record videos and visualized in 3D with Google Earth integration.

Statistical Analysis of precipitation data to identify rainfall events which are of interest for the design of hydraulic structures

Preparation of reports in form of graphs, maps, 3D formats using data visualization software

Coordinated characterization of streams using extensive research involving data collection, spatial data analysis and coordination with stakeholders in the region including non-profit organizations

Real Time streaming, analysis and modeling of rainfall data based on NEXRAD Level-II and Level-III Data

Predictive and prescriptive modeling of time series based data for forecasting and producing actionable measures to avoid water pollution

Centralized database collection of the client’s public infrastructure assests and development of geospatial asset management platform to help decision support for the clients using RDBMS, such as SQL Server, Oracle, or PostgreSQL, and supports all types of GIS data.

Prepared Web-based spatial decision support system to represent spatial precipitation estimates, flood inundation, flood vulnerable assets for the clients to plan for emergency situations like flood and other water resources based emergencies

Time series analysis of precipitation data for 50-100 years using advanced statistical concepts of entropy, support vector machines

Prepared Water Quality Management Plans to guide the monitoring efforts in cost effective way

Conducted Green Infrastructure Feasibility Studies for several municipal authorities

Performed Green Infrastructure Opportunity Identification, Technical Evaluation and Costing using EPA SUSTAIN, ArcGIS Watershed Delineation tools

Designed grit chambers as CSO Reduction Solution for two sites: conceptual design, technical analysis, development and costing of alternatives, site analysis and final alternative selection design

Water Resources Data Engineer II, CDM Smith February 2007- August 2009

Locations: Indianapolis, IN; Columbus, OH

Clients: City of Indianapolis, IN; City of Columbus, OH; City of Louisville, KY;

Strategic planning for Indianapolis water capital improvement projects using prioritization matrix

Conducted drinking water utility study for city of Louisville, KY

Contributed to multiple regression analysis module using SSOAP for the EPAs CRADA Project

Supported modeling of Columbus, OH sewer network using PCSWMM, RTK Analysis and rainfall analysis

Instrumental in preparing Discharge Monitoring Report (DMR) for the City of Indianapolis

Coordinated with EPA Director of Asset Management to set up an asset management workshop in Indiana

Field survey assignment preparation, flow monitoring location and identification for Pleasant Run

Research Assistant, University of Cincinnati September 2004-December 2006

Dynamic optimization modeling of processes

In-depth teaching of EPANET and HEC-RAS software with modeling examples

Guided 50 undergraduate engineering students in the Fluid Mechanics and Hydraulics Laboratory

Designed, demonstrated and graded experiments for 50 students each quarter

Conducted experiments involving dynamic optimization of electrochemical reactors

Taught MATLAB to undergrad students as a part of the Computing Methods in Civil Engineering course

TECHNICAL SKILLS

Programming Languages

Scala, SQL, Java, C++, MATLAB, Mathematica, Maple, R, Python

Data Modeling

Excel, MATLAB, Python, Scala, R, Orange, LibSVM, Theano, NeurosolutionsTM

Data Analysis

Excel, MATLABTM based Toolboxes, Mathematica, Maple, R, Python, Amazon AWS ML, Amazon Aurora, NeurosolutionsTM, Neural Designer and EureqaTM

Database Management

ArcGIS, MS SQL Server 2005/2008, MySQL, NoSQL

Data visualization:

Tableau, ArcGIS, MATLAB, R, Python

Optimization Tools

Excel, GPOPS, DYNOPT, DOTCVP, MATLAB toolboxes, R, Python

Data Wrangling

Trifacta Wrangler, Pig Latin, HIVE, Hbase

EDUCATION

IBM Big Data and Hadoop Certification

UID: IBM_BDH_5/16_15108

November 2016

Apache Spark Certification

License Number: 15108

August 2016

University of Cincinnati.

Environmental engineering program (course completion 3.4/4.0 GPA)

2004 - 2006

Indian Institute of Technology; Madras, India

Bachelors of Science, Civil Engineering

2004

Contact this candidate