Ashok Matta
SUMMARY
Big Data Scientist and Engineer with 10 years of experience in building decision support systems based on multi-disciplinary hybrid modeling.
Extensive experience in Data Collection, Data Processing, Data Engineering, Data Analysis, Data Modeling and Data Visualization
Experience in building descriptive, predictive and prescriptive models containing Structured, Unstructured, Relational, Temporal and Geospatial Data silos
Experience in creating data pipeline, data wrangling, transformation, lambda architecture and in model evaluation metrics to select the most optimal model for deployment
Experience in coding ETL scripts in multiple platforms like Apache Hadoop, Teradata Aster, Hortonworks, Trifacta Wrangler etc.
Experience in extracting relevant features from a large dataset that may contain bad records, partial records, errors, or other forms of “noise” in a wide range of possible formats, including JSON, XML, raw text logs, industry-specific encodings, and graph link data
Advanced statistics modeler with experience in Machine Learning, Bayesian Belief Networks, Support vector machines and several regression methods in MATLAB, Python, R, Amazon ML, Teradata Aster Analytics engine, BayesiaLab and Spark MLLIB.
Developed Artificial Neural Network Models in MATLAB, Neurosolutions TM, Neural Designer as well as Automated Modeling software Eureqa TM software and deployed them to field environment
Conducted experiments and models with AWS Machine Learning service
Experience in Machine Learning Algorithms for Regression (Elasticnet, Ridge, Lasso, Linear, Logistic, Ordinal etc Two-Class Classification, Multi-Class Classification, Clustering, Anomaly Detection tasks
Experienced in design and deployment of rich graphic visualizations in Tableau, Bokeh and Orange
Experienced in Cloudera HUE Environment (HUE UI, Server and DB), Talend, Hortonworks, Teradata
With AWS Machine Learning service, you can easily conduct experiments and test your concepts.
Trained in Apache Spark, Apache Hadoop, Mahout Pig, HIVE, HBase, Tableau, Flume, Ozie
In-depth knowledge on Tableau Desktop and Tableau Server for Data Visualization
Creative problem solver with knowledge in Artificial Intelligence, Big Data Ecosystem, Internet of Things platforms
10 years of experience in Geodatabase management, working with GIS vector, raster and grid formats and using GIS tools optimized for water management modeling needs which including area-weighting, buffering, clipping, Theissen-polygon generation, auto-connectivity, spatial auditing and many more.
EMPLOYMENT HISTORY
Big Data Scientist and Engineer, VIRPIE Tech May 2016 – Present
Project: Analysis of Medicare Plans
Coordinated with Subject Matter Experts to create a data engineering and analysis pipeline to evaluate the Medicare plans in the United States for designing a recommendation engine which will eventually offered as an app to customers
Used Apache Sqoop to transfer existing databases into Apache Hadoop HDFS from Oracle and Teradata database formats.
Used pig latin scripts to convert JSON format data into CSV format and perform other essential operations like changing column data types, filtering based on various criteria, splitting columns on a delimiter, joining and aggregating multiple data sources, and reordering columns.
Used Trifacta Data Wrangler to explore, describe and audit the quality of the data fused from multiple sources
Created a data lake which was ingested with different sources and format based files with implementation of scripts to extract, load and transform all data related to Medicare plans to perform analytics.
Created HIVE external tables and used HiveQL queries to partition data and analysis for the business case
Analysis of Medicare plans to compare plan offerings by various criteria to select suitable plan for the Members
Analysis of Medicare plans to compare plan offerings to design suitable benefit plan for different regions.
Project: Analysis of 100-year Rainfall Data for several rain gauges and Water Quality Analysis
Conducted Rainfall Data Analysis of several rain gauges over 100 years period using Apache Spark on top of HDFS
Real time data usage feasibility study (Quality Check in Real Time Streaming, fault tolerant sensor data management strategies etc.)
Rainfall Data ETL in spark and data analysis using SparkR
Developed Tableau workbooks from multiple data sources like water quality data, water level data, rainfall data in different formats using Data Blending. Cleaning and blending multiple data sources to allow for different views on application data in a single dashboard.
Developed Tableau visualizations and dashboards to track pollutants using filters, Parameters and calculated Sets. Involved in building, publishing Customized Dashboards of water quality parameters and reports to Tableau Server.
Water Resources Data Engineer, Wade Trim Inc. August, 2009 - May, 2016
Locations: Pittsburgh, PA; Cleveland, OH
Clients: ALCOSAN, NEORSD, West Westmoreland Municipal Authority, Oakland County, MI
Spatial data analysis using Average Nearest Neighbor, High/Low Clustering, Incremental Spatial Autocorrelation, Spatial Autocorrelation and Multi-Distance Spatial Cluster Analysis (Ripley's k-function) methods
Devised the procedure for Automated watershed delineation
Used high-resolution DEMs to quickly discretize and parameterize models;
Created detailed catchment networks with hydraulic routing incorporated.
Generated risk maps, iso-lines, velocity vectors, plan and profile animations, record videos and visualized in 3D with Google Earth integration.
Statistical Analysis of precipitation data to identify rainfall events which are of interest for the design of hydraulic structures
Preparation of reports in form of graphs, maps, 3D formats using data visualization software
Coordinated characterization of streams using extensive research involving data collection, spatial data analysis and coordination with stakeholders in the region including non-profit organizations
Real Time streaming, analysis and modeling of rainfall data based on NEXRAD Level-II and Level-III Data
Predictive and prescriptive modeling of time series based data for forecasting and producing actionable measures to avoid water pollution
Centralized database collection of the client’s public infrastructure assests and development of geospatial asset management platform to help decision support for the clients using RDBMS, such as SQL Server, Oracle, or PostgreSQL, and supports all types of GIS data.
Prepared Web-based spatial decision support system to represent spatial precipitation estimates, flood inundation, flood vulnerable assets for the clients to plan for emergency situations like flood and other water resources based emergencies
Time series analysis of precipitation data for 50-100 years using advanced statistical concepts of entropy, support vector machines
Prepared Water Quality Management Plans to guide the monitoring efforts in cost effective way
Conducted Green Infrastructure Feasibility Studies for several municipal authorities
Performed Green Infrastructure Opportunity Identification, Technical Evaluation and Costing using EPA SUSTAIN, ArcGIS Watershed Delineation tools
Designed grit chambers as CSO Reduction Solution for two sites: conceptual design, technical analysis, development and costing of alternatives, site analysis and final alternative selection design
Water Resources Data Engineer II, CDM Smith February 2007- August 2009
Locations: Indianapolis, IN; Columbus, OH
Clients: City of Indianapolis, IN; City of Columbus, OH; City of Louisville, KY;
Strategic planning for Indianapolis water capital improvement projects using prioritization matrix
Conducted drinking water utility study for city of Louisville, KY
Contributed to multiple regression analysis module using SSOAP for the EPAs CRADA Project
Supported modeling of Columbus, OH sewer network using PCSWMM, RTK Analysis and rainfall analysis
Instrumental in preparing Discharge Monitoring Report (DMR) for the City of Indianapolis
Coordinated with EPA Director of Asset Management to set up an asset management workshop in Indiana
Field survey assignment preparation, flow monitoring location and identification for Pleasant Run
Research Assistant, University of Cincinnati September 2004-December 2006
Dynamic optimization modeling of processes
In-depth teaching of EPANET and HEC-RAS software with modeling examples
Guided 50 undergraduate engineering students in the Fluid Mechanics and Hydraulics Laboratory
Designed, demonstrated and graded experiments for 50 students each quarter
Conducted experiments involving dynamic optimization of electrochemical reactors
Taught MATLAB to undergrad students as a part of the Computing Methods in Civil Engineering course
TECHNICAL SKILLS
Programming Languages
Scala, SQL, Java, C++, MATLAB, Mathematica, Maple, R, Python
Data Modeling
Excel, MATLAB, Python, Scala, R, Orange, LibSVM, Theano, NeurosolutionsTM
Data Analysis
Excel, MATLABTM based Toolboxes, Mathematica, Maple, R, Python, Amazon AWS ML, Amazon Aurora, NeurosolutionsTM, Neural Designer and EureqaTM
Database Management
ArcGIS, MS SQL Server 2005/2008, MySQL, NoSQL
Data visualization:
Tableau, ArcGIS, MATLAB, R, Python
Optimization Tools
Excel, GPOPS, DYNOPT, DOTCVP, MATLAB toolboxes, R, Python
Data Wrangling
Trifacta Wrangler, Pig Latin, HIVE, Hbase
EDUCATION
IBM Big Data and Hadoop Certification
UID: IBM_BDH_5/16_15108
November 2016
Apache Spark Certification
License Number: 15108
August 2016
University of Cincinnati.
Environmental engineering program (course completion 3.4/4.0 GPA)
2004 - 2006
Indian Institute of Technology; Madras, India
Bachelors of Science, Civil Engineering
2004