Nitin Kumar
abhnm8@r.postjobfree.com
Summary:
13+ years of IT experience starting with Mainframes, Oracle DBA, ETL, Reporting and Data warehousing.
Extensive experience in Architecting and Developing Data Warehousing projects with proof for ROI
outcomes. I specialize in Master Data Management with highly evolved metadata driven approach to BI
Integration.
Achievements in Data Warehousing
• Requirement Gathering in terms of Desired outcomes/performance objectives
• Bridging gap between Business owners/drivers and technical teams
• Identifying bottlenecks in processes, tools and technologies
• Setting up BI Systems [sizing, installation and configuration]
• Defining metadata driven BI integration
• Requirements - test cases - sample data/outcomes - source data model
• Staging data model - target data model - ETL dependencies
• Reporting dependencies - ETL optimization - BI optimization
• BI feedback and reduction - ETL feedback and reduction - Redefining staging and target data models -
Redefining requirements and providing proof for business outcomes
• Optimization of resource utilization w.r.t Operating System, ETL, Databases, Reporting & BI and
Batch window availability.
• Metadata Analysis apart from Master Data Management.
• Data model optimization
• Integration of multiple ETL tools with a variety of reporting tools.
• Optimizing ER & Kimball Models for specific domain requirements.
• Specialization in open source and enterprise ETL tools w.r.t dynamic processing and scheduling.
• Performance Testing, Grid Optimization and Performance Benchmarking.
• Dynamic SQL scripting and data generation, Data partitioning and archival.
• Limited exposure to Big Data using PIG, Hive & Sqoop
Key Insights:
• Most enterprise data warehouses have grown out of proportion and snow-flaking has plagued BI users.
The BI team supporting these is not agile enough to predict their role and involvement in business
outcomes.
• A simple way of dealing with this is to incorporate the requirement gathering process in Master Data
Management and have ETL/Reports generated from these models rather than being developed by
ETL/Reporting Developers.
Technical Skills:
• Oracle Database: I worked as an Oracle DBA during the initial phase of my IT career. This has
provided me a strong understanding of data, databases and data models.
• Microsoft Excel: Simplicity and power in one go. I have used excel to solve Sudoku, magic squares
and mockups for applications. Macros, VB Scripts, Pivot tables & charts, Formulas and references.
• Qlikview: In-memory reporting tool, great for mockups of Data warehouses since it combines Data
modeling, ETL, Reporting and Analysis in the fastest possible way.
• Tableau: In Memory reporting similar to Qlikview but more closer to enterprise reporting tools like
BO, Cognos.
• Pentaho BI Suite: Data Integration and Analysis. Being open source its metadata is readily available
and can be harnessed for dynamic development. Used it extensively to create product data warehouses
which can respond dynamically to changing business requirements.
• Shell scripting. Basic level user. Normally use it to troubleshoot performance issues, scheduling
scripts, catching up for aborted jobs, etc.
• PL/SQL: Expert in writing queries and processing data. Most ETL components are extensions
PL/SQL and Reporting tools are extensions of Spool.
• Big Data: Limited exposure to big data in 10-40 node environments. Used Hive and Pig to process
data and Sqoop to transfer data across Hadoop and BI platforms. Created POCs using virtual machines for
presales presentations.
• Cloud: Amazon Redshift and EMR for batch processing
ETL Tools: Reporting/BI Tools: Domain:
• Ab-Initio • Business Objects • Advertizing
• IBM Datastage • Google Analytics • Banking
• IBM Info sphere Streams • IBM Cognos • Insurance
• Informatica • Jasper • Manufacturing
• Kettle • Pentaho BI Suite • Telecom
• Oracle Data integrator • Qlikview
• Talend • Tableau
• Microsoft SSIS • Microsoft SSRS
• Crystal Reports
Programming: Tools: Databases:
• Cobol • • Oracle
Erwin
• JCL • • DB2/ IDMS
DB wrench
• CICS • • SQL Server
Toad
• Shell scripting • • Redshift
Eclipse
• PL/SQL • Excel Operating Systems:
• Perl • • UNIX
Oracle Enterprise
• Python • IBM Mainframes
Manager
• Focus • Nmon • Windows Server
• VB • Oracle AWR • Cloud [Ubuntu]
• PIG, HIVE • SVN,IBM ClearCase • Hadoop
Responsibilities:
• Requirement Gathering Initial connects with the business stakeholders to define the Points of
Contacts, outcomes, sample data, templates and signoff criteria.
• Source System Analysis: The Source systems is analyzed in terms of data structure, granularity,
cardinality, quality, latency and method of access.
• Target Data Warehouse Design: The Data warehouse needs to be designed primarily as a Star
Schema with minor snow-flaking and Global Dimensions. Data Partitioning and archival strategy is defined.
• Designing ETL Architecture: Installing the ETL framework, Implementing the Source-Target
Mapping, Implementing Dependencies, SCD, Partitioning, Scheduling Optimization, data lineage, etc
• Reporting Design: Designing reports for business Users and evaluating their end usage. Analyzing
the Database and OS footprint and continuous optimization,
• Isolating scheduling to control ETL Steps and Reporting Refresh, monitoring and feedback to
support teams.
• Performance benchmarking and forecasting Analyzing OS logs, Database Stats and reporting
logs to forecast any issues before it actually hits the online reporting system.
• Analyzing total investment in BI resources, hardware, software, duration, etc
PROFESSIONAL EXPERIENCE:
Client: New York Times, New York, NY Feb
14 – Current
Data Warehouse Architect/ETL Engineer
New York Times Web Portal logs are imported into Advertizing and Reporting Datamarts along with various
in-house and third party systems. These facilitate Reporting and BI for end users to enhance optimized usage
of Online Advertizing Space. Data is imported from Hadoop, Sugar CRM, Amazon Redshift, Vendor files,
etc. using Pentaho Data Integrator and Cubes are generated on these Datamarts for Reporting.
Simultaneously Google Analytics is also implemented on a subset of advertizing portals.
Responsibilities:
• Requirement Gathering from the Business Users in Advertizing and Reporting
• Data warehouse Modeling using DB Wrench to create source data model, and Star schema for Data
warehouse.
• Defining Source-target mappings for ETL, dependency matrix for ETLs and Reporting.
• Creating partitions/indexes for optimized reporting access. Defining strategies for scheduling Pentaho
Mondrian cubes.
• Defining Data lineage and reprocessing logic.
• Creating master metadata for ETL and reporting.
• Creating Master ETL scripts Jobs/transformations in Pentaho and Informatica.
• Defining and creating logging and alerts.
• Optimizing scheduling dependencies by identifying critical path and dynamically triggering parallel
threads based on system resource utilization.
• Incorporating reconciliation using Google Analytics and source target data profiling.
• Dynamic ETL using Sugar CRM metadata to keep pace with the numerous changes on the CRM
Analytics and ensuing that the ETL development is no longer a bottleneck for business users.
• Creating dashboards in Pentaho BI for business health check.
• Incorporating data from Redshift and web logs in Hadoop.
• Scripting in PIG to create rollups of weblogs for feeding data into data-warehouse.
• Production support.
Environment: Pentaho BI Suite, Informatica, DB Wrench, UNIX, Oracle, Hadoop, Redshift, EMR, PIG,
HIVE
Client: Finacle Infosys Brussels, Belgium Oct 12 –
Jan 14
Product Technical Architect
Infosys has a Banking Application product FINACLE which enables various subject areas of large
banks across the globe. Finacle DW is a product data warehouse deployed over Finacle with the
capability of incorporating non-Finacle Sources. Sources ~4000 tables, 200 files, Target 200
Dimensions & Facts, 2500+ ETL, 50 Reports
Responsibilities:
• Source System Analysis -The Source systems was mainly oracle data bases along with Flat-Files or
XML feeds from Non-Finacle Systems. The systems were analyzed to bring all data sources to a de-
normalized format to be processed in a generic format that can be loaded to target Dimensions and Facts
using a Pre-staging and Staging Database. Source-Target Mappings for 200 targets.
• Target Data Warehouse Design-The Data warehouse was designed using Ralph Kimball Star Schema
with minor snow-flaking and Global Dimensions. The data model was created in Erwin and tables were
optimized for downstream reporting.
• Designing ETL Architecture Implementing the Source-Target Mapping using Transformations created in
Datastage and Pentaho. Implementing Dependencies, SCD, Partitioning using ETL and Oracle, Scheduling
Optimization, generating Hash keys for data lineage, etc
• Reporting Design Designing 50 off the shelf reports as a product. These reports were created in
Qlikview and Tableau format for easier deployment.
• Shell Scripting to control ETL Steps and Reporting Refresh, monitoring and feedback to support teams.
• Performance benchmarking Using bulk generation of data all sources were populated and product tested
for performance using IBM Labs and analyzing Oracle AWR and Nmon reports
• A key feature of this product was Dynamic ETL using Product metadata to create ETL and all objects
dynamically using semantic layers.
• Creating POCs for incorporating big data.
• Presenting Solutions to Clients and incorporating new features.
Environment: IBM Info sphere Information Server, Pentaho BI Suite, Informatica, Erwin, UNIX, Oracle,
Qlikview, Jasper soft, Rational ClearCase
Client: IBM Kuala Lumpur, Malaysia Jan 12 –
Oct 12
Data Warehouse Architect
Tivoli Netcool Performance Manager is a Telecom Domain Network Performance Monitoring and
Management product. Sources ~2000 tables, 2000 files, Target 500 Dimensions & Facts, 20000+ ETL, 100
Reports.
Responsibilities:
• Source System Analysis-The Source systems consisted of many relational databases, equipment
generated log files, and XML files. Typically any equipment connected to the network creates at least 2-5
types of data having different metadata. The systems were analyzed to bring all data sources to a pre-staging
database.
• Target Data Warehouse Design-The Data warehouse was designed using Ralph Kimball Star Schema
with major snow-flaking and Global Dimensions. The Tables were optimized for downstream reporting.
• Designing ETL Architecture Implementing the Source-Target Mapping using Transformations created
in Datastage, Pentaho, and PERL & Python. Implementing Dependencies, SCD, Partitioning using ETL and
Oracle
• Reporting Design Designing 100 off the shelf reports as a product in Cognos. These reports were later
created in Qlikview and Tableau format for easier deployment.
• Shell Scripting to control ETL Steps and Reporting Refresh, monitoring and feedback to support teams.
• KPI incorporation-The telecom domain has 1000s of Key Performance Indicators and all of them
cannot be shipped as a part of the product. An Application interface was created to provide drag and drop
framework which enabled KPIs to be directly created as ETL components. This enabled in-memory
Reporting from Qlikview and Tableau to directly connect to Data warehouse bypassing the Cognos
Framework.
• A key feature of this product was Excel Source-Target Mapping being directly used to create ETL
Source Target Mappings using semantic layers.
• Big Data processing of Network logs using PIG & HIVE
• Presenting Product Demo to Clients and incorporating new features.
Environment: IBM Info sphere Information Server, IBM Cognos, Oracle, Erwin, UNIX, Tableau, Rational
ClearCase, Cloudera, PIG, HIVE, Sqoop
Client: GCI Alaska, Anchorage, Alaska Aug 11 –
Dec 11
Data Warehouse Architect
Cycle 30 is the technical team for telecom provider GCI Alaska. The project involved creating ETLs to
capture Application data and pushing it into multiple data marts. This was implemented using SSIS on SQL
Server Database. Most ETLs had SQLs embedded in DB connectors.
Responsibilities:
• DataMart Design A couple of datamarts were designed as Star Schema and optimized for ETL and
Reporting
• Dynamic SQLs generation process used Metadata directly from ETL requirements. This was optimized
to reduce the development efforts considerably.
Environment: Microsoft SSIS/SSRS UNIX, SQL Server, Erwin
Client: Whaleshark Media, Austin, TX Jan 11 –
Jul 11
Data Warehouse Architect
Whaleshark Media is an Affiliate Marketing company specializing in online Deals & Coupons. It owns sites
like Retailmenot.com, Hotels.com, and Deals.com. The project involved analyzing the Cloud Databases and
extracts from Google Analytics and Commission Junction and creating a Data warehouse, Reporting Layer.
Responsibilities:
• Requirement Analysis Analyze site analytics and performance data. Defining site performance metrics.
Comparing multiple ETL and reporting tools.
• Data Warehouse Design Creating Data Warehouse Star Schema Model and OLAP Cubes for Reporting
• ETL Design Creating POC ETL components in Pentaho [Kettle]
• OLAP Cubes Design Creating OLAP cubes and Refresh Strategy
• Reports and Dashboard Design In-Memory Reporting tools Qlikview and Tableau were used to Design
20+ Reports and 5 Dashboard including Reports Bursting and Email delivery
Environment: Pentaho BI Suite, Qlikview, Tableau, Ubuntu, Oracle, Erwin
Client: Reliant Energy, Houston, TX Jun 07 – Dec 10
ETL Architect
Reliant Energy was a leading Electricity Retailer in Texas originating from Centre-Point Energy. The main
objective of this project was to reduce revenue loss due to reconciliation errors.
Responsibilities:
• Requirement Analysis and Database design Reconciliation requirements were translated into Database
design using Data from Flat files and ERCOT Oracle databases Most of the Reporting Requirements were
re-conciliation based data from Retail usage and ERCOT.
• Two datamarts were created with 40-50 tables and around 50 PL/SQL packages.
• Packages were tuned re-cursively when data volumes increased to more than 10 million records/day.
• Datastage ETL was used to connect to flat files and secondary Data-sources
Environment: Erwin, IBM information server, Oracle, UNIX
Client: DuPont, Mumbai, India Jun 05 –
Jun 07
Oracle DBA
DuPont Creative Constructs is the Indian support centre for DuPont maintaining multiple Applications.
Manugistics is the logistic Application that provides Manufacturing Logistics. It integrates multiple shop
floors across the globe and its data is stored in multiple Oracle databases.
Responsibilities:
• Database Installation and Configuration for Manugistics
• Database Support Post installation, monitoring and single point reconciliation of all databases,
troubleshooting issues using Enterprise Manager, Stat spack, AWR reports
• Optimizing Databases Processes like Indexing, Partitioning, Defragmentation, SQLs tuning, DB tuning.
• Critical Production Support
Environment: Oracle, PL/SQL, Enterprise manager, UNIX
Client: DuPont, Mumbai, India Oct 03 –
May 05
ETL Developer
AMTrix is a Messaging Broker with inbuilt ETL capabilities that integrate multiple platforms. DuPont
interacts with a vast number of trading partners and data flows in numerous formats.
Responsibilities:
• Requirement Analysis Data Sources in Flat-File, SAP-IDOC, EDIFACTs, SWIFT, XML, etc. Message
broker service included configuring protocols like FTP, SFTP, MQ Series, etc. This data was loaded into
Sybase Db and finally delivered via other broker services.
• ETL Development Runtime Maps were created using AMTrix EAI tool
Environment: EAI tool AMTrix, Oracle, UNIX
Client: General Electric, Mumbai, India
Jun 01 – Aug 03 Mainframes Developer
GE has multiple businesses with Applications using VANs on Mainframe Technologies like COBOL, JCL,
CICS, IDMS, DB2, ADSO, etc
Responsibilities:
• Programming/Testing Developing and Maintaining COBOL Programs and JCLs
• GUI Development Developing and Maintaining IDMS-ADSO and DB2-CICS Applications
Environment: IBM Mainframes, CICS, ADSO, JCL, IDMS, DB2