Post Job Free

Resume

Sign in

Data Manager

Location:
New York, NY
Posted:
January 14, 2015

Contact this candidate

Resume:

Nitin Kumar

abhnm8@r.postjobfree.com

408-***-****

Summary:

13+ years of IT experience starting with Mainframes, Oracle DBA, ETL, Reporting and Data warehousing.

Extensive experience in Architecting and Developing Data Warehousing projects with proof for ROI

outcomes. I specialize in Master Data Management with highly evolved metadata driven approach to BI

Integration.

Achievements in Data Warehousing

• Requirement Gathering in terms of Desired outcomes/performance objectives

• Bridging gap between Business owners/drivers and technical teams

• Identifying bottlenecks in processes, tools and technologies

• Setting up BI Systems [sizing, installation and configuration]

• Defining metadata driven BI integration

• Requirements - test cases - sample data/outcomes - source data model

• Staging data model - target data model - ETL dependencies

• Reporting dependencies - ETL optimization - BI optimization

• BI feedback and reduction - ETL feedback and reduction - Redefining staging and target data models -

Redefining requirements and providing proof for business outcomes

• Optimization of resource utilization w.r.t Operating System, ETL, Databases, Reporting & BI and

Batch window availability.

• Metadata Analysis apart from Master Data Management.

• Data model optimization

• Integration of multiple ETL tools with a variety of reporting tools.

• Optimizing ER & Kimball Models for specific domain requirements.

• Specialization in open source and enterprise ETL tools w.r.t dynamic processing and scheduling.

• Performance Testing, Grid Optimization and Performance Benchmarking.

• Dynamic SQL scripting and data generation, Data partitioning and archival.

• Limited exposure to Big Data using PIG, Hive & Sqoop

Key Insights:

• Most enterprise data warehouses have grown out of proportion and snow-flaking has plagued BI users.

The BI team supporting these is not agile enough to predict their role and involvement in business

outcomes.

• A simple way of dealing with this is to incorporate the requirement gathering process in Master Data

Management and have ETL/Reports generated from these models rather than being developed by

ETL/Reporting Developers.

Technical Skills:

• Oracle Database: I worked as an Oracle DBA during the initial phase of my IT career. This has

provided me a strong understanding of data, databases and data models.

• Microsoft Excel: Simplicity and power in one go. I have used excel to solve Sudoku, magic squares

and mockups for applications. Macros, VB Scripts, Pivot tables & charts, Formulas and references.

• Qlikview: In-memory reporting tool, great for mockups of Data warehouses since it combines Data

modeling, ETL, Reporting and Analysis in the fastest possible way.

• Tableau: In Memory reporting similar to Qlikview but more closer to enterprise reporting tools like

BO, Cognos.

• Pentaho BI Suite: Data Integration and Analysis. Being open source its metadata is readily available

and can be harnessed for dynamic development. Used it extensively to create product data warehouses

which can respond dynamically to changing business requirements.

• Shell scripting. Basic level user. Normally use it to troubleshoot performance issues, scheduling

scripts, catching up for aborted jobs, etc.

• PL/SQL: Expert in writing queries and processing data. Most ETL components are extensions

PL/SQL and Reporting tools are extensions of Spool.

• Big Data: Limited exposure to big data in 10-40 node environments. Used Hive and Pig to process

data and Sqoop to transfer data across Hadoop and BI platforms. Created POCs using virtual machines for

presales presentations.

• Cloud: Amazon Redshift and EMR for batch processing

ETL Tools: Reporting/BI Tools: Domain:

• Ab-Initio • Business Objects • Advertizing

• IBM Datastage • Google Analytics • Banking

• IBM Info sphere Streams • IBM Cognos • Insurance

• Informatica • Jasper • Manufacturing

• Kettle • Pentaho BI Suite • Telecom

• Oracle Data integrator • Qlikview

• Talend • Tableau

• Microsoft SSIS • Microsoft SSRS

• Crystal Reports

Programming: Tools: Databases:

• Cobol • • Oracle

Erwin

• JCL • • DB2/ IDMS

DB wrench

• CICS • • SQL Server

Toad

• Shell scripting • • Redshift

Eclipse

• PL/SQL • Excel Operating Systems:

• Perl • • UNIX

Oracle Enterprise

• Python • IBM Mainframes

Manager

• Focus • Nmon • Windows Server

• VB • Oracle AWR • Cloud [Ubuntu]

• PIG, HIVE • SVN,IBM ClearCase • Hadoop

Responsibilities:

• Requirement Gathering Initial connects with the business stakeholders to define the Points of

Contacts, outcomes, sample data, templates and signoff criteria.

• Source System Analysis: The Source systems is analyzed in terms of data structure, granularity,

cardinality, quality, latency and method of access.

• Target Data Warehouse Design: The Data warehouse needs to be designed primarily as a Star

Schema with minor snow-flaking and Global Dimensions. Data Partitioning and archival strategy is defined.

• Designing ETL Architecture: Installing the ETL framework, Implementing the Source-Target

Mapping, Implementing Dependencies, SCD, Partitioning, Scheduling Optimization, data lineage, etc

• Reporting Design: Designing reports for business Users and evaluating their end usage. Analyzing

the Database and OS footprint and continuous optimization,

• Isolating scheduling to control ETL Steps and Reporting Refresh, monitoring and feedback to

support teams.

• Performance benchmarking and forecasting Analyzing OS logs, Database Stats and reporting

logs to forecast any issues before it actually hits the online reporting system.

• Analyzing total investment in BI resources, hardware, software, duration, etc

PROFESSIONAL EXPERIENCE:

Client: New York Times, New York, NY Feb

14 – Current

Data Warehouse Architect/ETL Engineer

New York Times Web Portal logs are imported into Advertizing and Reporting Datamarts along with various

in-house and third party systems. These facilitate Reporting and BI for end users to enhance optimized usage

of Online Advertizing Space. Data is imported from Hadoop, Sugar CRM, Amazon Redshift, Vendor files,

etc. using Pentaho Data Integrator and Cubes are generated on these Datamarts for Reporting.

Simultaneously Google Analytics is also implemented on a subset of advertizing portals.

Responsibilities:

• Requirement Gathering from the Business Users in Advertizing and Reporting

• Data warehouse Modeling using DB Wrench to create source data model, and Star schema for Data

warehouse.

• Defining Source-target mappings for ETL, dependency matrix for ETLs and Reporting.

• Creating partitions/indexes for optimized reporting access. Defining strategies for scheduling Pentaho

Mondrian cubes.

• Defining Data lineage and reprocessing logic.

• Creating master metadata for ETL and reporting.

• Creating Master ETL scripts Jobs/transformations in Pentaho and Informatica.

• Defining and creating logging and alerts.

• Optimizing scheduling dependencies by identifying critical path and dynamically triggering parallel

threads based on system resource utilization.

• Incorporating reconciliation using Google Analytics and source target data profiling.

• Dynamic ETL using Sugar CRM metadata to keep pace with the numerous changes on the CRM

Analytics and ensuing that the ETL development is no longer a bottleneck for business users.

• Creating dashboards in Pentaho BI for business health check.

• Incorporating data from Redshift and web logs in Hadoop.

• Scripting in PIG to create rollups of weblogs for feeding data into data-warehouse.

• Production support.

Environment: Pentaho BI Suite, Informatica, DB Wrench, UNIX, Oracle, Hadoop, Redshift, EMR, PIG,

HIVE

Client: Finacle Infosys Brussels, Belgium Oct 12 –

Jan 14

Product Technical Architect

Infosys has a Banking Application product FINACLE which enables various subject areas of large

banks across the globe. Finacle DW is a product data warehouse deployed over Finacle with the

capability of incorporating non-Finacle Sources. Sources ~4000 tables, 200 files, Target 200

Dimensions & Facts, 2500+ ETL, 50 Reports

Responsibilities:

• Source System Analysis -The Source systems was mainly oracle data bases along with Flat-Files or

XML feeds from Non-Finacle Systems. The systems were analyzed to bring all data sources to a de-

normalized format to be processed in a generic format that can be loaded to target Dimensions and Facts

using a Pre-staging and Staging Database. Source-Target Mappings for 200 targets.

• Target Data Warehouse Design-The Data warehouse was designed using Ralph Kimball Star Schema

with minor snow-flaking and Global Dimensions. The data model was created in Erwin and tables were

optimized for downstream reporting.

• Designing ETL Architecture Implementing the Source-Target Mapping using Transformations created in

Datastage and Pentaho. Implementing Dependencies, SCD, Partitioning using ETL and Oracle, Scheduling

Optimization, generating Hash keys for data lineage, etc

• Reporting Design Designing 50 off the shelf reports as a product. These reports were created in

Qlikview and Tableau format for easier deployment.

• Shell Scripting to control ETL Steps and Reporting Refresh, monitoring and feedback to support teams.

• Performance benchmarking Using bulk generation of data all sources were populated and product tested

for performance using IBM Labs and analyzing Oracle AWR and Nmon reports

• A key feature of this product was Dynamic ETL using Product metadata to create ETL and all objects

dynamically using semantic layers.

• Creating POCs for incorporating big data.

• Presenting Solutions to Clients and incorporating new features.

Environment: IBM Info sphere Information Server, Pentaho BI Suite, Informatica, Erwin, UNIX, Oracle,

Qlikview, Jasper soft, Rational ClearCase

Client: IBM Kuala Lumpur, Malaysia Jan 12 –

Oct 12

Data Warehouse Architect

Tivoli Netcool Performance Manager is a Telecom Domain Network Performance Monitoring and

Management product. Sources ~2000 tables, 2000 files, Target 500 Dimensions & Facts, 20000+ ETL, 100

Reports.

Responsibilities:

• Source System Analysis-The Source systems consisted of many relational databases, equipment

generated log files, and XML files. Typically any equipment connected to the network creates at least 2-5

types of data having different metadata. The systems were analyzed to bring all data sources to a pre-staging

database.

• Target Data Warehouse Design-The Data warehouse was designed using Ralph Kimball Star Schema

with major snow-flaking and Global Dimensions. The Tables were optimized for downstream reporting.

• Designing ETL Architecture Implementing the Source-Target Mapping using Transformations created

in Datastage, Pentaho, and PERL & Python. Implementing Dependencies, SCD, Partitioning using ETL and

Oracle

• Reporting Design Designing 100 off the shelf reports as a product in Cognos. These reports were later

created in Qlikview and Tableau format for easier deployment.

• Shell Scripting to control ETL Steps and Reporting Refresh, monitoring and feedback to support teams.

• KPI incorporation-The telecom domain has 1000s of Key Performance Indicators and all of them

cannot be shipped as a part of the product. An Application interface was created to provide drag and drop

framework which enabled KPIs to be directly created as ETL components. This enabled in-memory

Reporting from Qlikview and Tableau to directly connect to Data warehouse bypassing the Cognos

Framework.

• A key feature of this product was Excel Source-Target Mapping being directly used to create ETL

Source Target Mappings using semantic layers.

• Big Data processing of Network logs using PIG & HIVE

• Presenting Product Demo to Clients and incorporating new features.

Environment: IBM Info sphere Information Server, IBM Cognos, Oracle, Erwin, UNIX, Tableau, Rational

ClearCase, Cloudera, PIG, HIVE, Sqoop

Client: GCI Alaska, Anchorage, Alaska Aug 11 –

Dec 11

Data Warehouse Architect

Cycle 30 is the technical team for telecom provider GCI Alaska. The project involved creating ETLs to

capture Application data and pushing it into multiple data marts. This was implemented using SSIS on SQL

Server Database. Most ETLs had SQLs embedded in DB connectors.

Responsibilities:

• DataMart Design A couple of datamarts were designed as Star Schema and optimized for ETL and

Reporting

• Dynamic SQLs generation process used Metadata directly from ETL requirements. This was optimized

to reduce the development efforts considerably.

Environment: Microsoft SSIS/SSRS UNIX, SQL Server, Erwin

Client: Whaleshark Media, Austin, TX Jan 11 –

Jul 11

Data Warehouse Architect

Whaleshark Media is an Affiliate Marketing company specializing in online Deals & Coupons. It owns sites

like Retailmenot.com, Hotels.com, and Deals.com. The project involved analyzing the Cloud Databases and

extracts from Google Analytics and Commission Junction and creating a Data warehouse, Reporting Layer.

Responsibilities:

• Requirement Analysis Analyze site analytics and performance data. Defining site performance metrics.

Comparing multiple ETL and reporting tools.

• Data Warehouse Design Creating Data Warehouse Star Schema Model and OLAP Cubes for Reporting

• ETL Design Creating POC ETL components in Pentaho [Kettle]

• OLAP Cubes Design Creating OLAP cubes and Refresh Strategy

• Reports and Dashboard Design In-Memory Reporting tools Qlikview and Tableau were used to Design

20+ Reports and 5 Dashboard including Reports Bursting and Email delivery

Environment: Pentaho BI Suite, Qlikview, Tableau, Ubuntu, Oracle, Erwin

Client: Reliant Energy, Houston, TX Jun 07 – Dec 10

ETL Architect

Reliant Energy was a leading Electricity Retailer in Texas originating from Centre-Point Energy. The main

objective of this project was to reduce revenue loss due to reconciliation errors.

Responsibilities:

• Requirement Analysis and Database design Reconciliation requirements were translated into Database

design using Data from Flat files and ERCOT Oracle databases Most of the Reporting Requirements were

re-conciliation based data from Retail usage and ERCOT.

• Two datamarts were created with 40-50 tables and around 50 PL/SQL packages.

• Packages were tuned re-cursively when data volumes increased to more than 10 million records/day.

• Datastage ETL was used to connect to flat files and secondary Data-sources

Environment: Erwin, IBM information server, Oracle, UNIX

Client: DuPont, Mumbai, India Jun 05 –

Jun 07

Oracle DBA

DuPont Creative Constructs is the Indian support centre for DuPont maintaining multiple Applications.

Manugistics is the logistic Application that provides Manufacturing Logistics. It integrates multiple shop

floors across the globe and its data is stored in multiple Oracle databases.

Responsibilities:

• Database Installation and Configuration for Manugistics

• Database Support Post installation, monitoring and single point reconciliation of all databases,

troubleshooting issues using Enterprise Manager, Stat spack, AWR reports

• Optimizing Databases Processes like Indexing, Partitioning, Defragmentation, SQLs tuning, DB tuning.

• Critical Production Support

Environment: Oracle, PL/SQL, Enterprise manager, UNIX

Client: DuPont, Mumbai, India Oct 03 –

May 05

ETL Developer

AMTrix is a Messaging Broker with inbuilt ETL capabilities that integrate multiple platforms. DuPont

interacts with a vast number of trading partners and data flows in numerous formats.

Responsibilities:

• Requirement Analysis Data Sources in Flat-File, SAP-IDOC, EDIFACTs, SWIFT, XML, etc. Message

broker service included configuring protocols like FTP, SFTP, MQ Series, etc. This data was loaded into

Sybase Db and finally delivered via other broker services.

• ETL Development Runtime Maps were created using AMTrix EAI tool

Environment: EAI tool AMTrix, Oracle, UNIX

Client: General Electric, Mumbai, India

Jun 01 – Aug 03 Mainframes Developer

GE has multiple businesses with Applications using VANs on Mainframe Technologies like COBOL, JCL,

CICS, IDMS, DB2, ADSO, etc

Responsibilities:

• Programming/Testing Developing and Maintaining COBOL Programs and JCLs

• GUI Development Developing and Maintaining IDMS-ADSO and DB2-CICS Applications

Environment: IBM Mainframes, CICS, ADSO, JCL, IDMS, DB2



Contact this candidate