I have been working with Business Intelligence systems and DWH environments since 2005. Presently working as an ETL Architect on Hadoop/Pentaho/Oracle Platform at Netapp, Bangalore.
Objective
I’m looking forward to take a challenging role to utilize my creative and technical expertise in managing and hands-on development of top-of-the-line applications and reporting systems to aid the senior business goals.
Summary
More than 9+ Years of experience in ETL and Business intelligence using Oracle and pl-sql, OBIEE and Pentaho.
Having Excellent experience in ETL Metadata driven Frame work using pl-sql and shell scripting.
Hands On Experience on Performance Tuning, Advanced SQLs and Optimization.
Experienced in Data Modeling (Dimensional Modeling), Fact Tables and Design.
Implemented ETL using Pentaho where source is Hadoop and Target is Oracle 10g.
Good experience in ETL life cycle Performance tuning .
Good experience on Onsite and offshore co-ordination and Handling big ETL teams.
In-depth experience of Data warehousing concepts such SCDs (type1, type2 and type3), star schema and snowflake schema
Strong programming and debugging skills in PL/SQL, SQL and Unix shell scripting
Well versed with Hadoop ecosystem including Hbase,Hive.Sqoop and Hadoop shell.
Technical documentation including BRD, functional spec,technical spec, deployment and user guides, project plans, RFP and Proposal activity and status reports for clients, senior managements and team members to ensure the project efficiency and effectiveness.
Self-motivated, detail-oriented, creative, flexible and able to work effectively in a fast paced environment.
Having good hands on ETL development, Data Modeling and Designing ETL framework for 200+TB DWH using Oracle/Pentaho/Hadoop.
Object level database designing includes Package, Procedure, Function, Trigger, Collection, Index, Ref Cursor, tables, nested tables, partition tables, Pipeline function and materialized view and knowledge of Regular Expressions.
Experience with Hadoop architecture and Pentaho ETL integration with Oracle.
Implemented Pentaho based ETL on 200TB+ DWH, where source is HDFS(json avro) and target Oracle 10G.
Exceptional problem solving and sound decision making capabilities, recognized by alternative solutions, and confident, accurate, decision-making coupled with excellent communication and interpersonal skills.
Have exposure in educating internal customers on business systems and procedures and working with other analysts and QA team to set priorities and schedules.
Experience in architecting the data model and data-flows designs for Data warehouse and Master Data structures .
Experience in performance tuning of slowly performing data loads/ ETLs.
Academic Background & Technical Credentials
Master of Computer Application (M.C.A) MRSC, Indore.
Bachelor of Science (Math) GDC, Begumganj.
OCA (Oracle Certified Associate).
BAI (Business Analytics and Intelligence) IIM Bangalore.
Big data reporting training on DATAMEER
Cloudera Administrator Training For Apache Hadoop.
Cloudera Developer Training For Apache Hadoop.
Technical Skills
Database
ETL Tool
Programming
Platforms
Source Control
Bug/Defect Tracking
Data Modeler
Reporting Tool for Bigdata
Hive/Hbase
Oracle 9i, 10g, 11g
Pentaho 4.3,
SQL, PL/SQL, shell script.
Linux/UNIX and Hadoop
Perforce, Harvest
Burt
Erwin 7.3,8,9.0 and Toad data modeler
Datameer,Pentaho
Work Experience
Organization
NetApp, Bangalore.
Designation
ETL Architect/ Business Analyst (PL/Sql Developer).
Duration
Aug 2009 – Present
Organization
Genisys (Client Oracle GSD),Bangalore
Designation
Data Mart Engineer (PL/Sql Developer).
Duration
Sep 2008 – Aug 2009
Organization
Ascent Consulting (Client Yahoo), Bangalore.
Designation
Data Mart Engineer (PL/Sql Developer).
Duration
Feb 2008 – Sep 2008
Organization
Lanware Consulting, Mumbai.
Designation
PL/Sql Developer.
Duration
Dec 2006 – Feb 2008
Organization
Reliance Communications, Mumbai.
Designation
Software Engineer.
Duration
Nov 2005 – Dec 2006
PROJECT DETAILS:
Project 1:
Project: ASUP .NEXT ETL
Organization: NetApp, Bangalore.
Designation: ETL Architect (Pentaho, Hadoop, Oracle 10g, PL/SQL).
NetApp or Network Appliances are the technology products that primarily assist in simplifying the critical storage networking infrastructure. NetApp products are designed to support large data centers and provide an efficient, scalable and flexible data management environment in an enterprise's IT infrastructure. The range of NetApp products, included in the Enterprise Systems, software and services are in the field of the Enterprise Storage, Virtualized Storage, Near-line Storage, HPC Storage, FC SAN, IP SAN, NAS, Storage Operating Systems, Replication Software, Network Protocols, and Manageability Software etc. These products aim at bringing about efficiency in management of networking and storage resources in an enterprise as well as in reduction of the total cost of ownership (TOC) of the IT infrastructure. AutoSupport data generated by NetApp filers gets processed by the “backend” tools and presented in an easy to navigate format on web browsers. The purpose of ASUPDW is to allow quick and easy access to AutoSupport data, and to provide powerful querying and reporting capabilities. it allows query operations to be applied to the set of a zillion asups stored in the Asup Data warehouse database
AutoSupport data is sent to NetApp from filers and caches, both internal and in the field, with AutoSupport turned on and NetApp specified as a recipient (the default configuration). It contains output from various user commands, as well as system and EMS logs. It may also contain other information, such as wack logs, if a wack was performed.
AutoSupport is sent to NetApp by SMTP, HTTP POST, or HTTPS POST. A system will, if it is up and able, generate an AutoSupport (ASUP) weekly, or after particular events occur (the system detects low batteries, it's rebooted, etc.). System logs are cumulative throughout the week, so that the weekly log ASUP contains all the logs generated since the last one was sent.
60% of NetApp systems in the field are sending this data back to NetApp Data Warehouse. NetApp receives 1.2 million AutoSupports every month and this number is expanding
Netapp AsupDW is one of the largest databases in the industry at 125+ terabytes. ASUPDW database stores AutoSupports for 2 years. Flat files are stored forever!
Tool : Pentaho 4.3,Toad
Database : Oracle 10g.
Platform : Linux, Hadoop
Programming : PL/SQL, shell script
Source Control : Perforce
Data modeler : Erwin Data modeler
Bug/Defect Tracking Tool : Burt
Role and Responsibilities:
1.Designed metadata driven ETL Framework to read data from HDFS and dump into DSS in terms of facts/dim/bridge tables.
2.Framework consists of Pentaho Jobs, Pl-sql packages and shell scripts.
Framework provides basic features of
Parallel processing and Performance
Re-start ability
Incremental Data Load and recover-ability
Scalable and Efficient
Customizable
Job Dependency and Load Dependencies management
3.Pentaho setup and performance testing.
4.Templates for Dimension and fact ETL using Pentaho
5.Pentaho Best practices implementation.
6.Cartes configuration to submit jobs remotely on various cartes.
7.JVM settings refinement for DI and cartes.
8.Transition to L2/L3 on ETL monitoring and trouble shooting.
9.Managing Deliverables for team, Work plan Management and Defect Management.
10.Maintaining and Enhancing ETLs, resolving data load issues on an on-going basis as well as carrying out pro-active testing and quality control activity to ensure data warehouse data integrity and quality is maintained.
11.Work with individuals across the organization to build requirements and architect solutions to support business requirements.
12.Designing, developing and maintaining existing PL/SQL code to implement business logic for multiple clients and carrying out one-time operational activities.
13.Develop and maintain custom ETL, replication, data maintenance, and reporting routines for an Oracle-based Enterprise-wide.
14.Code migration from one ENV to another.
15.Performance tuning of Dimensions and Facts load.
16.Scheduling and monitoring of the pushed production code.
17.Monitoring of the feed-to-tables ETL flow and Data.
18.Code migration from one database to another.
19.Pushing development code to production and Coordinating and scheduling releases for enhancements and bug fixes for ETL.
Project 2:
Project: DATAMART
Designation: ETL Architect (Pentaho, Hadoop, Oracle 10g, PL/SQL).
We have a DSS which has data of last 3 years; it has more than 200TB of data in Oracle 10g. The data is organized in terms of facts, dims, bridge tables and some master tables (static/reference data). We have around 200 Tables(130 Dims,70 Facts) starting from few million rows to Billions of rows in each table. DATAMART is the subset of DSS it has around 100 objects (70 Dims, 30 Facts) which keeps only last 6 months of data.
Role and Responsibilities:
1.Designed metadata driven ETL Framework in Pl-sql and shell script to load data incrementally.
2.DATAMART refresh happens daily.
3.Created Templates to load fact/dim/Bridge tables in datamart.
Project 3:
Project: ASUPDW3 Ecosystem.
Organization: NetApp, Bangalore.
Designation: ETL Expert (Oracle 10g, PL/SQL).
Environment : Toad, SQL*Plus.
Database : Oracle 10g.
Platform : Linux
Programming : PL/SQL, shell script
Source Control : Perforce
Data modeler : Erwin Data modeler
Bug/Defect Tracking Tool : Burt
Role and Responsibilities: Leading a team of 3 Members and responsibilities includes:
1.Designing and Implementation of ETL Framework (Database side).
2.Designing and implementation of SCD Type Dimensions and Facts.
3.Managing Deliverables for team, Work plan Management and Defect Management.
4.Maintaining and Enhancing ETLs, resolving data load issues on an on-going basis as well as carrying out pro-active testing and quality control activity to ensure data warehouse data integrity and quality is maintained.
5.Work with individuals across the organization to build requirements and architect solutions to support business requirements.
6.Designing, developing and maintaining existing PL/SQL code to implement business logic for multiple clients and carrying out one-time operational activities.
7.Develop and maintain custom ETL, replication, data maintenance, and reporting routines for an Oracle-based Enterprise-wide.
8.Code migration from one ENV to another.
9.Performance tuning of Dimensions and Facts load.
10.Scheduling and monitoring of the pushed production code.
11.Monitoring of the feed-to-tables ETL flow and Data.
12.Code migration from one database to another.
13.Pushing development code to production and Coordinating and scheduling releases for enhancements and bug fixes for ETL.
Project 4:
Project: STOR (Storage Optimization Report).
Organization: NetApp, Bangalore.
Designation: Business Analyst.
The heart of an IT infrastructure is data storage. The Storage Optimization Review (STOR) report helps NetApp customers manage their Storage Infrastructure by providing an analysis of the storage utilization, efficiency and availability of the storage infrastructure. Storage utilization trending and analysis, storage efficiency review and system configuration and headroom data
are provided to assist with storage capacity planning and storage management. A review of availability information can help identify problem systems or configurations and assist in systematic resolution of availability issues. A regular review of storage and availability best practices and recommendations and Operating System and hardware firmware revisions can give the storage administrator the ability to proactively manage systems in order to avoid problems and ensure the best possible usage of your storage infrastructure investment.
STOR provides report for the following:
–Platforms
–FAS series & older Filers
–NearStore
–V-Series
–N-Series – if they send ASUP
–OS
–Data ONTAP 6.0 through 7.x
–7-mode Data ONTAP 8.0 (GA)
Salient features of STOR are as below:
Help NetApp customers optimize their investment in NetApp storage and software products by providing proactive, predictive and best practice recommendations
Bring in data and discussions around optimizing and leveraging other NetApp software products that have ASUP data: MetroCluster, SyncMirror, SnapMirror, MultiStore, V-Series, VTL
Leverage ASUP data to help with optimization of NetApp solutions for:
SAP, Oracle, Exchange, etc.
Dynamic Data Center management
Expand coverage for and leverage data from other NetApp products when they implement call home functionality (e.g. SMAI)
The primary focus of the STOR reporting is to provide value-added data that supports the Consulting Storage Optimization Review Services. Once the AutoSupport is processed by ETL, it becomes structured and well defined data in the terms of Dimensions and Facts. If a Netapp System is not disabled to send an AutoSupport then it sends weekly logs via an AutoSupport which has all the information related with that system like no. of disks available and usage, no of shelve available and there configuration, storage efficiency features like SnapVault, snapshot, dual parity, thin provisioning etc.
STOR stitch data across dimension and facts and provide complete details using weekly logs.
There are 3 sections
–Executive Summary: rolls up entire site in tables and graphs of data
–Technical Summary: roll up of individual system information into tables/graphs
–System Details: provides in depth data for individual systems.
And for above categories we have 4 types of reports
–NetApp System Inventory: (The purpose of this section is to provide an overview of the NetApp installed base at this site)
–Availability: This Storage Optimization Review (STOR) report is designed to provide insight into the operations and use of NetApp hardware in this storage infrastructure. The review is a customizable and configurable report providing both summary and in depth information relating to data and system availability, storage efficiency and storage Provide reporting on both Data and System availability and Greater granularity i.e. it provides data on protocol and platform availability.
–Performance: Disk Activity Dashboard Average Protocol and CPU Utilization and Performance Best Practices and Recommendations
–Storage Efficiency and Capacity Planning:
The purpose of this section is to quantity the total storage and money saved by the customer using NetApp’s storage efficiency features. The storage efficiency features to highlight are:
Deduplication
Thin Provisioning (FlexVol)
Double Parity RAID (RAID-DP)
Snapshot Copies
Thin Replication
Virtual Clones (FlexClone)
Storage Utilization Trend – 1 Year and prediction for next one year
Role and Responsibilities: Leading a team of 2 Members and responsibilities includes:
1.Working with various teams across the organization to build requirements and architect solutions to support business requirements.
2.Designing, developing and maintaining existing PL/SQL code to implement business logic for multiple clients and carrying out one-time operational activities.
3.Designed and Implemented Business needs into Oracle Database in terms of Dimensions and facts.
4.Interacting product owners to understand business needs.
5.Coordinating and scheduling releases for enhancements and bug fixes for STOR and pushing this code to production.
6.Performance tuning for STOR reports.
Project 5:
Project: CDW Shutdown.
Organization: Oracle GSD, Bangalore.
Designation: Pl/Sql Developer (Data Mart Engineer).
Procter Gamble Co. (P&G) is a fortune 500,American global corporation based in Cincinnati, Ohio that manufactures a wide range of consumer goods. As of 2008, P&G is the 23rd largest US Company by revenue and 14th largest by profit. The main objective of this project is to migrate the existing warehouse i.e. Common Data Warehouse (CDW) to new Data Warehouse i.e. Atomic Data Warehouse (ADW) to reduce the cost, maintenance and servers. Currently CDW has 21 Data warehouse Applications operating in NALA, ASIA and EMEA regions.
Each application may have one / more instances based on number of regions it has been operating. This project is intended to migrate all applications that are operating in different instances for different regions into one Global instance i.e. ADW. Out of 21 applications 19 are fork lifted and two are redeveloped.
Major Responsibilities:
1.Object level database designing includes partitioned tables, nested tables,
Materialized views and indexes.
2.Packages, procedures, functions, pipelined functions and other database objects.
3.Generating various reports using analytical functions.
4.Scheduling and monitoring of the pushed production code.
5.Monitoring of the feed-to-tables ETL flow and Data.
6.Code migration from one database to another.
7.Data validation, data cleaning and data fixing in prod environment.
8.Develop and maintain custom ETL, replication, data maintenance, and reporting routines for an Oracle-based Enterprise-wide.
9.Object level database designing includes Package, Procedure, Function, Trigger, Collection, Index, Ref Cursor, tables, nested tables, partition tables, Pipeline function and materialized view and knowledge of Regular Expressions.
Project 6:
Project: Atlantic Data Mart.
Organization: Yahoo R&D, Bangalore.
Designation: Pl/Sql Developer (Data Mart Engineer).
This is basically data warehouse project which stores click stream Data and provide report to business users to make business decisions. This data mart produces data for user behavior and user engagement on www.yahoo.com front-page of UK and US. It gives the statistics about the customer behavior, in terms of the link views and link clicks. This information will be used to calculate the Ad rates in any page. The Star schema of this data mart is been designed freshly and then created fact tables and dimension tables. The ETL framework in Oracle PL/SQL, Perl used to do ETL work to load data from flat-files to the Data Warehouse. The code in this ETL framework, i.e. in Oracle PL/SQL and Perl, is created and then pushed this to production. This ETL code is been scheduled to make it run daily, weekly and monthly to load these tables appropriately.
Major Responsibilities:
1.Object level database designing includes partitioned tables, nested tables, materialized view and indexes.
2. Packages, procedures, functions, pipelined functions and other database objects.
3.Generating various reports using analytical functions.
4.Creation of fresh Fact tables and Dimension tables using the star schema.
5.Creating ETL jobs using Pl/SQL and Perl code.
6.Scheduling and monitoring of the pushed production code.
7.Monitoring of the feed-to-tables ETL flow and Data.
8.Benchmarking of Oracle PL/SQL code.
Environment : Toad, SQL*Plus.
Database : Oracle 9i.
Platform : Linux
Programming : Perl, PL/SQL, Shell Script.
Project 7:
Project: Enterprise Technology Management System.
Organization: Lanware Consulting, Mumbai.
Designation: PL/SQL Developer
Enterprise Technology Management System is an employee information management system as well as It schedules day to day activities of each employee. Through this system day to day activities are scheduled and monitored of each employee .It keeps track of all the activities of each employee. It helps in scheduling assignment to employees on the basis of their skill sets, busy schedule in other assignments and their leave status. This system automatically grades the user performance on the basis of assignment done by the user. This system provide facility to launch trouble ticket or request on any issue these Trouble Tickets are automatically escalate to higher level if it is not resolved within the specified timeframe.
Major Responsibilities:
1.Analysis and Database Design.
2.Creation of PL/SQL Stored Procedures, Functions.
3.Creation of Triggers and Packages.
4.Tuned the SQL Queries and PL/SQL Code.
Environment : Toad, SQL*Plus.
Database : Oracle 9i.
Platform : Windows
Programming : PL/SQL
Project 8:
Project: Product and user management system (SELECTICA).
Organization: Reliance Communications, Mumbai.
Designation: PL/SQL Developer.
Order entry and product and user management system (SELECTICA): it is an application through which Reliance provides telecom products and services to customers by Reliance users. Reliance users enter the customer details and product details so that product and services are provided for customers and billing is done accordingly. This application maintains the hierarchy of Reliance users who provides telecom products and services to customers. Reliance Users are well mapped and have different rights according the locations. This system generates the various reports on the basis of product services, user performances. And the main feature of this system is that it helps to sales and marketing team to concentrate in their respective areas for the business growth.
Major Responsibilities:
1.Analysis and Database Design.
2.Creation of PL/SQL Stored Procedures, Functions.
3.Creation of Triggers and Packages.
4.Tuned the SQL Queries and PL/SQL Code.
Environment : Toad, SQL*Plus.
Database : Oracle 9i.
Platform : Windows
Programming : PL/SQL
Project 9:
Project: Tariff configuration System.
Organization: Reliance Communications, Mumbai.
Designation: PL/SQL Developer.
This application is used to configure tariff for phone calls and all the other services like data download, upload, SMS charges, multimedia services charge and so on.
This is basically billing application that determines whether the call is local or STD and how it will be charged. This application also determines the call rates when Reliance starts his telecom services at new location. This application determines how the call will charge at the different times of the day and which numbers are toll free and so on. It helps in creating free units, promo and CUG (closed customer grouped). This application entirely handles with call made by the user.
Major Responsibilities:
1.Creation of PL/SQL Stored Procedures, Functions.
2.Creation of Triggers.
3.Creation of Packages.
4.Tuned the SQL Queries and PL/SQL Code.
5.Creation of MIS reporting using analytical functions
Personal Profile:
Date Of Birth : 4th Sep 1980
Language Known : English, Hindi
Passport Status : G5779950
Hobbies : Sudoku solving, playing Chess,
Carom and puzzle solving
Date
(RISHI KUMAR AWASTHI)