Resume

Data Project

Location:

Posted:

June 07, 2015

Resume:

I have been working with Business Intelligence systems and DWH environments since 2005. Presently working as an ETL Architect on Hadoop/Pentaho/Oracle Platform at Netapp, Bangalore.

Objective

I’m looking forward to take a challenging role to utilize my creative and technical expertise in managing and hands-on development of top-of-the-line applications and reporting systems to aid the senior business goals.

Summary

More than 9+ Years of experience in ETL and Business intelligence using Oracle and pl-sql, OBIEE and Pentaho.

Having Excellent experience in ETL Metadata driven Frame work using pl-sql and shell scripting.

Hands On Experience on Performance Tuning, Advanced SQLs and Optimization.

Experienced in Data Modeling (Dimensional Modeling), Fact Tables and Design.

Implemented ETL using Pentaho where source is Hadoop and Target is Oracle 10g.

Good experience in ETL life cycle Performance tuning .

Good experience on Onsite and offshore co-ordination and Handling big ETL teams.

In-depth experience of Data warehousing concepts such SCDs (type1, type2 and type3), star schema and snowflake schema

Strong programming and debugging skills in PL/SQL, SQL and Unix shell scripting

Well versed with Hadoop ecosystem including Hbase,Hive.Sqoop and Hadoop shell.

Technical documentation including BRD, functional spec,technical spec, deployment and user guides, project plans, RFP and Proposal activity and status reports for clients, senior managements and team members to ensure the project efficiency and effectiveness.

Self-motivated, detail-oriented, creative, flexible and able to work effectively in a fast paced environment.

Having good hands on ETL development, Data Modeling and Designing ETL framework for 200+TB DWH using Oracle/Pentaho/Hadoop.

Object level database designing includes Package, Procedure, Function, Trigger, Collection, Index, Ref Cursor, tables, nested tables, partition tables, Pipeline function and materialized view and knowledge of Regular Expressions.

Experience with Hadoop architecture and Pentaho ETL integration with Oracle.

Implemented Pentaho based ETL on 200TB+ DWH, where source is HDFS(json avro) and target Oracle 10G.

Exceptional problem solving and sound decision making capabilities, recognized by alternative solutions, and confident, accurate, decision-making coupled with excellent communication and interpersonal skills.

Have exposure in educating internal customers on business systems and procedures and working with other analysts and QA team to set priorities and schedules.

Experience in architecting the data model and data-flows designs for Data warehouse and Master Data structures .

Experience in performance tuning of slowly performing data loads/ ETLs.

Academic Background & Technical Credentials

Master of Computer Application (M.C.A) MRSC, Indore.

Bachelor of Science (Math) GDC, Begumganj.

OCA (Oracle Certified Associate).

BAI (Business Analytics and Intelligence) IIM Bangalore.

Big data reporting training on DATAMEER

Cloudera Administrator Training For Apache Hadoop.

Cloudera Developer Training For Apache Hadoop.

Technical Skills

Database

ETL Tool

Programming

Platforms

Source Control

Bug/Defect Tracking

Data Modeler

Reporting Tool for Bigdata

Hive/Hbase

Oracle 9i, 10g, 11g

Pentaho 4.3,

SQL, PL/SQL, shell script.

Linux/UNIX and Hadoop

Perforce, Harvest

Burt

Erwin 7.3,8,9.0 and Toad data modeler

Datameer,Pentaho

Work Experience

Organization

NetApp, Bangalore.

Designation

ETL Architect/ Business Analyst (PL/Sql Developer).

Duration

Aug 2009 – Present

Organization

Genisys (Client Oracle GSD),Bangalore

Designation

Data Mart Engineer (PL/Sql Developer).

Duration

Sep 2008 – Aug 2009

Organization

Ascent Consulting (Client Yahoo), Bangalore.

Designation

Data Mart Engineer (PL/Sql Developer).

Duration

Feb 2008 – Sep 2008

Organization

Lanware Consulting, Mumbai.

Designation

PL/Sql Developer.

Duration

Dec 2006 – Feb 2008

Organization

Reliance Communications, Mumbai.

Designation

Software Engineer.

Duration

Nov 2005 – Dec 2006

PROJECT DETAILS:

Project 1:

Project: ASUP .NEXT ETL

Organization: NetApp, Bangalore.

Designation: ETL Architect (Pentaho, Hadoop, Oracle 10g, PL/SQL).

NetApp or Network Appliances are the technology products that primarily assist in simplifying the critical storage networking infrastructure. NetApp products are designed to support large data centers and provide an efficient, scalable and flexible data management environment in an enterprise's IT infrastructure. The range of NetApp products, included in the Enterprise Systems, software and services are in the field of the Enterprise Storage, Virtualized Storage, Near-line Storage, HPC Storage, FC SAN, IP SAN, NAS, Storage Operating Systems, Replication Software, Network Protocols, and Manageability Software etc. These products aim at bringing about efficiency in management of networking and storage resources in an enterprise as well as in reduction of the total cost of ownership (TOC) of the IT infrastructure. AutoSupport data generated by NetApp filers gets processed by the “backend” tools and presented in an easy to navigate format on web browsers. The purpose of ASUPDW is to allow quick and easy access to AutoSupport data, and to provide powerful querying and reporting capabilities. it allows query operations to be applied to the set of a zillion asups stored in the Asup Data warehouse database

AutoSupport data is sent to NetApp from filers and caches, both internal and in the field, with AutoSupport turned on and NetApp specified as a recipient (the default configuration). It contains output from various user commands, as well as system and EMS logs. It may also contain other information, such as wack logs, if a wack was performed.

AutoSupport is sent to NetApp by SMTP, HTTP POST, or HTTPS POST. A system will, if it is up and able, generate an AutoSupport (ASUP) weekly, or after particular events occur (the system detects low batteries, it's rebooted, etc.). System logs are cumulative throughout the week, so that the weekly log ASUP contains all the logs generated since the last one was sent.

60% of NetApp systems in the field are sending this data back to NetApp Data Warehouse. NetApp receives 1.2 million AutoSupports every month and this number is expanding

Netapp AsupDW is one of the largest databases in the industry at 125+ terabytes. ASUPDW database stores AutoSupports for 2 years. Flat files are stored forever!

Tool : Pentaho 4.3,Toad

Database : Oracle 10g.

Platform : Linux, Hadoop

Programming : PL/SQL, shell script

Source Control : Perforce

Data modeler : Erwin Data modeler

Bug/Defect Tracking Tool : Burt

Role and Responsibilities:

1.Designed metadata driven ETL Framework to read data from HDFS and dump into DSS in terms of facts/dim/bridge tables.

2.Framework consists of Pentaho Jobs, Pl-sql packages and shell scripts.

Framework provides basic features of

Parallel processing and Performance

Re-start ability

Incremental Data Load and recover-ability

Scalable and Efficient

Customizable

Job Dependency and Load Dependencies management

3.Pentaho setup and performance testing.

4.Templates for Dimension and fact ETL using Pentaho

5.Pentaho Best practices implementation.

6.Cartes configuration to submit jobs remotely on various cartes.

7.JVM settings refinement for DI and cartes.

8.Transition to L2/L3 on ETL monitoring and trouble shooting.

9.Managing Deliverables for team, Work plan Management and Defect Management.

10.Maintaining and Enhancing ETLs, resolving data load issues on an on-going basis as well as carrying out pro-active testing and quality control activity to ensure data warehouse data integrity and quality is maintained.

11.Work with individuals across the organization to build requirements and architect solutions to support business requirements.

12.Designing, developing and maintaining existing PL/SQL code to implement business logic for multiple clients and carrying out one-time operational activities.

13.Develop and maintain custom ETL, replication, data maintenance, and reporting routines for an Oracle-based Enterprise-wide.

14.Code migration from one ENV to another.

15.Performance tuning of Dimensions and Facts load.

16.Scheduling and monitoring of the pushed production code.

17.Monitoring of the feed-to-tables ETL flow and Data.

18.Code migration from one database to another.

19.Pushing development code to production and Coordinating and scheduling releases for enhancements and bug fixes for ETL.

Project 2:

Project: DATAMART

Designation: ETL Architect (Pentaho, Hadoop, Oracle 10g, PL/SQL).

We have a DSS which has data of last 3 years; it has more than 200TB of data in Oracle 10g. The data is organized in terms of facts, dims, bridge tables and some master tables (static/reference data). We have around 200 Tables(130 Dims,70 Facts) starting from few million rows to Billions of rows in each table. DATAMART is the subset of DSS it has around 100 objects (70 Dims, 30 Facts) which keeps only last 6 months of data.

Role and Responsibilities:

1.Designed metadata driven ETL Framework in Pl-sql and shell script to load data incrementally.

2.DATAMART refresh happens daily.

3.Created Templates to load fact/dim/Bridge tables in datamart.

Project 3:

Project: ASUPDW3 Ecosystem.

Organization: NetApp, Bangalore.

Designation: ETL Expert (Oracle 10g, PL/SQL).

Environment : Toad, SQL*Plus.

Database : Oracle 10g.

Platform : Linux

Programming : PL/SQL, shell script

Source Control : Perforce

Data modeler : Erwin Data modeler

Bug/Defect Tracking Tool : Burt

Role and Responsibilities: Leading a team of 3 Members and responsibilities includes:

1.Designing and Implementation of ETL Framework (Database side).

2.Designing and implementation of SCD Type Dimensions and Facts.

3.Managing Deliverables for team, Work plan Management and Defect Management.

4.Maintaining and Enhancing ETLs, resolving data load issues on an on-going basis as well as carrying out pro-active testing and quality control activity to ensure data warehouse data integrity and quality is maintained.

5.Work with individuals across the organization to build requirements and architect solutions to support business requirements.

6.Designing, developing and maintaining existing PL/SQL code to implement business logic for multiple clients and carrying out one-time operational activities.

7.Develop and maintain custom ETL, replication, data maintenance, and reporting routines for an Oracle-based Enterprise-wide.

8.Code migration from one ENV to another.

9.Performance tuning of Dimensions and Facts load.

10.Scheduling and monitoring of the pushed production code.

11.Monitoring of the feed-to-tables ETL flow and Data.

12.Code migration from one database to another.

13.Pushing development code to production and Coordinating and scheduling releases for enhancements and bug fixes for ETL.

Project 4:

Project: STOR (Storage Optimization Report).

Organization: NetApp, Bangalore.

Designation: Business Analyst.

The heart of an IT infrastructure is data storage. The Storage Optimization Review (STOR) report helps NetApp customers manage their Storage Infrastructure by providing an analysis of the storage utilization, efficiency and availability of the storage infrastructure. Storage utilization trending and analysis, storage efficiency review and system configuration and headroom data

are provided to assist with storage capacity planning and storage management. A review of availability information can help identify problem systems or configurations and assist in systematic resolution of availability issues. A regular review of storage and availability best practices and recommendations and Operating System and hardware firmware revisions can give the storage administrator the ability to proactively manage systems in order to avoid problems and ensure the best possible usage of your storage infrastructure investment.

STOR provides report for the following:

–Platforms

–FAS series & older Filers

–NearStore

–V-Series

–N-Series – if they send ASUP

–OS

–Data ONTAP 6.0 through 7.x

–7-mode Data ONTAP 8.0 (GA)

Salient features of STOR are as below:

Help NetApp customers optimize their investment in NetApp storage and software products by providing proactive, predictive and best practice recommendations

Bring in data and discussions around optimizing and leveraging other NetApp software products that have ASUP data: MetroCluster, SyncMirror, SnapMirror, MultiStore, V-Series, VTL

Leverage ASUP data to help with optimization of NetApp solutions for:

SAP, Oracle, Exchange, etc.

Dynamic Data Center management

Expand coverage for and leverage data from other NetApp products when they implement call home functionality (e.g. SMAI)

The primary focus of the STOR reporting is to provide value-added data that supports the Consulting Storage Optimization Review Services. Once the AutoSupport is processed by ETL, it becomes structured and well defined data in the terms of Dimensions and Facts. If a Netapp System is not disabled to send an AutoSupport then it sends weekly logs via an AutoSupport which has all the information related with that system like no. of disks available and usage, no of shelve available and there configuration, storage efficiency features like SnapVault, snapshot, dual parity, thin provisioning etc.

STOR stitch data across dimension and facts and provide complete details using weekly logs.

There are 3 sections

–Executive Summary: rolls up entire site in tables and graphs of data

–Technical Summary: roll up of individual system information into tables/graphs

–System Details: provides in depth data for individual systems.

And for above categories we have 4 types of reports

–NetApp System Inventory: (The purpose of this section is to provide an overview of the NetApp installed base at this site)

–Availability: This Storage Optimization Review (STOR) report is designed to provide insight into the operations and use of NetApp hardware in this storage infrastructure. The review is a customizable and configurable report providing both summary and in depth information relating to data and system availability, storage efficiency and storage Provide reporting on both Data and System availability and Greater granularity i.e. it provides data on protocol and platform availability.

–Performance: Disk Activity Dashboard Average Protocol and CPU Utilization and Performance Best Practices and Recommendations

–Storage Efficiency and Capacity Planning:

The purpose of this section is to quantity the total storage and money saved by the customer using NetApp’s storage efficiency features. The storage efficiency features to highlight are:

Deduplication

Thin Provisioning (FlexVol)

Double Parity RAID (RAID-DP)

Snapshot Copies

Thin Replication

Virtual Clones (FlexClone)

Storage Utilization Trend – 1 Year and prediction for next one year

Role and Responsibilities: Leading a team of 2 Members and responsibilities includes:

1.Working with various teams across the organization to build requirements and architect solutions to support business requirements.

2.Designing, developing and maintaining existing PL/SQL code to implement business logic for multiple clients and carrying out one-time operational activities.

3.Designed and Implemented Business needs into Oracle Database in terms of Dimensions and facts.

4.Interacting product owners to understand business needs.

5.Coordinating and scheduling releases for enhancements and bug fixes for STOR and pushing this code to production.

6.Performance tuning for STOR reports.

Project 5:

Project: CDW Shutdown.

Organization: Oracle GSD, Bangalore.

Designation: Pl/Sql Developer (Data Mart Engineer).

Procter Gamble Co. (P&G) is a fortune 500,American global corporation based in Cincinnati, Ohio that manufactures a wide range of consumer goods. As of 2008, P&G is the 23rd largest US Company by revenue and 14th largest by profit. The main objective of this project is to migrate the existing warehouse i.e. Common Data Warehouse (CDW) to new Data Warehouse i.e. Atomic Data Warehouse (ADW) to reduce the cost, maintenance and servers. Currently CDW has 21 Data warehouse Applications operating in NALA, ASIA and EMEA regions.

Each application may have one / more instances based on number of regions it has been operating. This project is intended to migrate all applications that are operating in different instances for different regions into one Global instance i.e. ADW. Out of 21 applications 19 are fork lifted and two are redeveloped.

Major Responsibilities:

1.Object level database designing includes partitioned tables, nested tables,

Materialized views and indexes.

2.Packages, procedures, functions, pipelined functions and other database objects.

3.Generating various reports using analytical functions.

4.Scheduling and monitoring of the pushed production code.

5.Monitoring of the feed-to-tables ETL flow and Data.

6.Code migration from one database to another.

7.Data validation, data cleaning and data fixing in prod environment.

8.Develop and maintain custom ETL, replication, data maintenance, and reporting routines for an Oracle-based Enterprise-wide.

9.Object level database designing includes Package, Procedure, Function, Trigger, Collection, Index, Ref Cursor, tables, nested tables, partition tables, Pipeline function and materialized view and knowledge of Regular Expressions.

Project 6:

Project: Atlantic Data Mart.

Organization: Yahoo R&D, Bangalore.

Designation: Pl/Sql Developer (Data Mart Engineer).

This is basically data warehouse project which stores click stream Data and provide report to business users to make business decisions. This data mart produces data for user behavior and user engagement on www.yahoo.com front-page of UK and US. It gives the statistics about the customer behavior, in terms of the link views and link clicks. This information will be used to calculate the Ad rates in any page. The Star schema of this data mart is been designed freshly and then created fact tables and dimension tables. The ETL framework in Oracle PL/SQL, Perl used to do ETL work to load data from flat-files to the Data Warehouse. The code in this ETL framework, i.e. in Oracle PL/SQL and Perl, is created and then pushed this to production. This ETL code is been scheduled to make it run daily, weekly and monthly to load these tables appropriately.

Major Responsibilities:

1.Object level database designing includes partitioned tables, nested tables, materialized view and indexes.

2. Packages, procedures, functions, pipelined functions and other database objects.

3.Generating various reports using analytical functions.

4.Creation of fresh Fact tables and Dimension tables using the star schema.

5.Creating ETL jobs using Pl/SQL and Perl code.

6.Scheduling and monitoring of the pushed production code.

7.Monitoring of the feed-to-tables ETL flow and Data.

8.Benchmarking of Oracle PL/SQL code.

Environment : Toad, SQL*Plus.

Database : Oracle 9i.

Platform : Linux

Programming : Perl, PL/SQL, Shell Script.

Project 7:

Project: Enterprise Technology Management System.

Organization: Lanware Consulting, Mumbai.

Designation: PL/SQL Developer

Enterprise Technology Management System is an employee information management system as well as It schedules day to day activities of each employee. Through this system day to day activities are scheduled and monitored of each employee .It keeps track of all the activities of each employee. It helps in scheduling assignment to employees on the basis of their skill sets, busy schedule in other assignments and their leave status. This system automatically grades the user performance on the basis of assignment done by the user. This system provide facility to launch trouble ticket or request on any issue these Trouble Tickets are automatically escalate to higher level if it is not resolved within the specified timeframe.

Major Responsibilities:

1.Analysis and Database Design.

2.Creation of PL/SQL Stored Procedures, Functions.

3.Creation of Triggers and Packages.

4.Tuned the SQL Queries and PL/SQL Code.

Environment : Toad, SQL*Plus.

Database : Oracle 9i.

Platform : Windows

Programming : PL/SQL

Project 8:

Project: Product and user management system (SELECTICA).

Organization: Reliance Communications, Mumbai.

Designation: PL/SQL Developer.

Order entry and product and user management system (SELECTICA): it is an application through which Reliance provides telecom products and services to customers by Reliance users. Reliance users enter the customer details and product details so that product and services are provided for customers and billing is done accordingly. This application maintains the hierarchy of Reliance users who provides telecom products and services to customers. Reliance Users are well mapped and have different rights according the locations. This system generates the various reports on the basis of product services, user performances. And the main feature of this system is that it helps to sales and marketing team to concentrate in their respective areas for the business growth.

Major Responsibilities:

1.Analysis and Database Design.

2.Creation of PL/SQL Stored Procedures, Functions.

3.Creation of Triggers and Packages.

4.Tuned the SQL Queries and PL/SQL Code.

Environment : Toad, SQL*Plus.

Database : Oracle 9i.

Platform : Windows

Programming : PL/SQL

Project 9:

Project: Tariff configuration System.

Organization: Reliance Communications, Mumbai.

Designation: PL/SQL Developer.

This application is used to configure tariff for phone calls and all the other services like data download, upload, SMS charges, multimedia services charge and so on.

This is basically billing application that determines whether the call is local or STD and how it will be charged. This application also determines the call rates when Reliance starts his telecom services at new location. This application determines how the call will charge at the different times of the day and which numbers are toll free and so on. It helps in creating free units, promo and CUG (closed customer grouped). This application entirely handles with call made by the user.

Major Responsibilities:

1.Creation of PL/SQL Stored Procedures, Functions.

2.Creation of Triggers.

3.Creation of Packages.

4.Tuned the SQL Queries and PL/SQL Code.

5.Creation of MIS reporting using analytical functions

Personal Profile:

Date Of Birth : 4th Sep 1980

Language Known : English, Hindi

Passport Status : G5779950

Hobbies : Sudoku solving, playing Chess,

Carom and puzzle solving

Date

(RISHI KUMAR AWASTHI)

Contact this candidate