Srikanth Guthi
Professional Summary
*+ years of experience in development, implementation and testing of Business Intelligence and Data Warehousing solutions.
Around 4 years of experience in Big Data analytics using Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, YARN, Spark, Zookeeper and Flume.
Experience in installing, configuring and administrating Hadoop cluster of major Hadoop distributions.
Experience in installing, configuring and using Hadoop, HDFS, Hive, Pig, HBase, Sqoop and Flume.
Experience in developing the custom UDFs for PIG and Hive
Excellent knowledge in Hive and Pig Analytical functions
Experience in Elastic Search and Kibana
Experience in development, implementation and testing of Database projects.
Experience in Data Warehousing and ETL using Informatica Power Center.
Strong experience in Architecture, Analysis, Design, Development and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, OLAP.
Extensive experience in Data Warehousing, Data Modeling using Star Schema and Snow-Flake Schema, Physical and Logical Data Modeling.
Worked extensively with complex mappings using Expressions, Joiners, Routers, Lookups, Update strategy, Source Qualifiers, Aggregators to develop and load data into different target types.
Strong experience in Relational Database concepts and ER diagrams.
Working with many popular Relational Database Management Systems like IBM DB2, Oracle and MS SQL Server
Extensive experience in creating the Workflows, Worklets, Mappings, Mapplets, Reusable transformations and scheduling the Workflows and sessions using Informatica PowerCenter
Extensive experience using Microsoft software products including Microsoft Office Suite (Word, Excel, Access, Outlook, PowerPoint and Publisher) for Windows 7/NT/XP and Vista.
Strong conceptual, analytical, and design skills and excellent communication skills with leadership qualities.
Excellent team work spirit and capable of learning new technologies and concepts.
Worked under stringent deadlines with teams as well as independently.
Technical Skills
Big Data Technologies
Hadoop (HDFS & MapReduce), HBase, Pig, Hive, Sqoop, Flume, Zookeeper and Oozie
Languages /Scripting
Java, C++, C, Perl, Python, PHP, Shell Scripting, HTML, XML
ETL Tools
Informatica PowerCenter (Source Analyzer, Mapping Designer, Mapplet, Transformations, Workflow Monitor, Workflow Manager)
Databases and Tools
Oracle, Teradata, Aster, SQL, PL/SQL, Toad, SQL Developer, Tableau
Platforms
Windows, Unix(Solaris), Linux(Ubuntu), VMWare
IDEs
Eclipse, Netbeans
Concepts
Data Structures
Education
Bachelor of Technology degree from JNTU, Hyderabad, India
Experience
Macy’s Systems and Technology, Johnscreek, GA Nov 2014 - Present
Hadoop / Information Architect
Project: PDW, MAINCat
Portfolio Data Warehouse is an existing application of Macy’s that has been migrated from Mainframes to BigData. It involves importing of data from Mainframes using Syncsort into Hadoop, creating tables in BigSQL and Hive and developing dashboards using Business Objects.
MAINCat is an application, which provides the complete catalog of tables in MAIN system. It involves collecting the catalog information of all the sources of MAIN in Oracle database and building Elastic search and Kibana dashboards.
Involved in installing and configuring BigInsights Hadoop platform including ecosystem environment on the server.
Involved in installing and configuring Sysncsort.
Involved in importing data from Mainframes to Hadoop using Sysncsort.
Involved in creating tables, configuring permissions in BigSQL and Hive.
Involved in configuring connections between Mainframes, Hadoop and Visualization tools like Tableau and Business Objects
Created POC using BigSQL, BigR, BigSheets, Text analytics, Apache Storm and Kafka.
Created a POC server using YARN.
Created SSIS jobs to import MAIN metadata from different sources like Oracle, DB2, SQL server into Oracle MAINODS database for MAINCat
Installed and configured ElasricSearch and Kibana
Created jobs to import data from Oracle MAINODS to ElasticSearch.
Created Kibana dashboards to search and visualize the MAINCat data.
Involved in POCs for CREDIT using Hortonworks and Cloudera.
Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs
Environment: Hadoop, BigInsights, Hive, BigSQL, Sqoop, Syncsort, Oracle 11g, DB2, SSIS, Elastic Search, Kibana 4, Tableau, PL/SQL, SQL Server, SQL Developer
NCR Corporations, Duluth, GA Oct 2013 – Oct 2014
Big Data Architect/Hadoop Consultant
Projects: Remote Service Management (RSM)
RSM project involves importing the terabytes of raw data related to ATM’s, Point of sales of NCR customers from different relational databases into Hadoop. The raw data is processed using the Hadoop ecosystem and the transformed data is being used to determine the number of hours saved by the NCR customer service team by fixing the work order incidents remotely.
Responsibilities:
Involved in installing and configuring Hadoop, Hive and Pig environment on the server
Involved in importing data from relational databases like Teradata, Oracle, MySQL using Sqoop
Created MapReduce jobs in processing the raw data
Created Hive and PIG to process the raw data
Created jobs to load data from relational databases into Hadoop using Oozie scheduler.
Involved in importing the raw log files from different server into HDFS using Flume
Involved in creating MapReduce jobs on log files and creating tables in HBase
Involved in creating HiveQL on HBase tables and importing efficient work order data into Hive tables
Involved in configuring the connection between Hive tables and reporting tools like Tableau, Excel and Business Objects
Involved in exporting the processed data into Aster database for analysis
Involved in creating the SQL-MR analytic functions using aster analytics
Experience in analyzing the data using Hive and Pig
Involved in reviewing and managing the Hadoop log files
Involved in POC in processing of data using Spark and Storm.
Monitoring the Hive, Pig and Sqoop jobs on oozie scheduler
Involved in designing reports and dashboards using Tableau.
Environment: Hadoop, Pig, Hive, Java, Sqoop, HBase, Teradata, Aster, Tableau, noSQL, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, SuSE Linux
Kaiser Permanente, Pleasanton, CA Oct 2011 – Oct 2013
Big Data/Hadoop Consultant
Projects: Kaiser OPPR POS Analysis & Production Support
Kaiser OPPR POS Analysis project has been implemented using Hadoop and BigData technologies. OPPR POS project deals with the data movement from legacy databases and distributed file systems.
Responsibilities:
Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server
Configured MySql Database to store Hive metadata
Responsible for loading unstructured data into Hadoop File System (HDFS)
Created POC to store Server Log data in MongoDB to identify System Alert Metrics
Created Reports and Dashboards of Server Alert Data
Created Map Reduce Jobs using Pig Latin and Hive Queries
Built Big Data Edition & Hadoop based architecture remodelling for one reporting stream.
Data is collected from Teradata and pushing into Hadoop using Sqoop
Used Sqoop tool to load data from RDBMS into HDFS
Cluster coordination services through Zoo Keeper
Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows
Created Reports and Dashboards using structured and unstructured data
Maintained documentation for corporate Data Dictionary with attributes, table names and constraints.
Extensively worked with SQL scripts to validate the pre and post data load.
Created unit test plans, test cases and reports on various test cases for testing the data loads
Worked on integration testing to verify load order, time window.
Performed the Unit Testing which validate the data is processed correctly which provides a qualitative check of overall data flow up and deposited correctly into targets.
Responsible for post production support and SME to the project.
Involved in the System and User Acceptance Testing.
Involved in POC working with R for data analysis.
Environment: Hadoop, Cloudera, Pig, Hive, Java, Sqoop, HBase, noSQL, Informatica Power Center 8.6, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, Windows NT, Stored Procedures.
Lincoln Financial Group, IL May 2010 – Sep 2011
ETL Consultant
Project: Unifier
Lincoln Financial Group offers: annuities, life, group life and disability insurance, 401(k) and 403(b) plans, savings plans, mutual funds, managed accounts, institutional investments, and comprehensive financial planning and advisory services. Vendor data and the distribution data will be supplied to Lincoln in SPARK data format. Lincoln’s internal systems also will supply data for unification. The distribution data will also be extracted by PowerCenter and loaded into Unifier database.
Responsibilities
Extensively involved in business and functionality requirement analysis, understanding source systems thoroughly by creating the design process flow used for standard ETL implementations.
Worked as an Informatica developer and involved in creation of initial documentation for the project and setting the goals for Data Integration team from ETL perspective.
Played a Key role in designing the application that would migrate into the existing data into Annuity warehouse effectively by using Informatica Power Center.
Parsed high-level design spec to simple ETL coding and mapping standards.
Created ftp connections, database connections for the sources and targets.
Involved in creating test files and performed testing to check the errors.
Loaded Data to the Interface tables from multiple data sources such as MS Access, SQL Server, Flat files and Excel Spreadsheets using SQL Loader, Informatica and ODBC connection.
Created different transformations for loading the data into targets like Source Qualifier Joiner Transformation, Update strategy, lookup transformation (connected and unconnected), Rank transformations, Expression, Aggregator, and Sequence Generator.
Simplified the data flow by using a Router transformation to check multiple conditions at the same time.
Created reusable transformations and mapplets to import in the common aspects of data flow to avoid complexity in the mappings.
Created sessions, sequential and concurrent batches for proper execution of mappings using workflow manager.
Used shortcuts to reuse objects without creating multiple objects in the repository and inherit changes made to the source automatically.
Extensively worked with SQL scripts to validate the pre and post data load.
Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of workflows based on changing variable values.
Responsible for monitoring scheduled, running, completed and failed sessions. Involved in debugging the failed mappings and developing error handling methods.
Generated weekly and monthly report Status for the number of incidents handled by the support team.
Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.
Designed a mapplet to update a slowly changing dimension table to keep full history which was used across the board.
Maintained documentation for corporate Data Dictionary with attributes, table names and constraints.
Responsible for post production support and SME to the project.
Environment: Informatica Power Center, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, Windows NT, Stored Procedures, Business Intelligence Development Studio, Microsoft Visio 2003, Business Objects.
Bank of America, Charlotte, NC Aug 2009 – Apr 2010
ETL Developer
Project: Brokerage and Clearing (B&C)
The Project involves analysis, design, and development for BOW Enterprise Solutions Group (ESG). The Brokerage and Clearing (B&C) system is used for verification and reconciliation of equity and fixed income brokerage and clearing fees and commissions. The system collects trading data from exchanges, brokerages, clearing houses and agent banks. This data is extracted and loaded after calculations on to database using batch processes jobs.
Responsibilities:
Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere.
Involved design and development of Data Migration from Legacy system using Oracle Loader and import/export tools for OLTP system.
Worked closely with the Data Business Analyst to ensure the process stays on track, develop consensus on data requirements, and document data element/data model requirements via the approved process and templates.
Was involved in writing Batch Programs to run Validation Packages.
Extensively worked on Informatica Power Center-Source analyzer, Data warehousing designer, Mapping Designer, Mapplet and Transformations to import source and target definitions into the repository and to build mappings.
Extensive use of Store Procedures/functions/Packages and User Defined Functions
Proper use of Indexes to enhance the performance of individual queries and enhance the Stored Procedures for OLTP system
Dropped and recreated the Indexes on tables for performance improvements for OLTP application
Tuned SQL queries using Show Plans and Execution Plans for better performance
Done the full life cycle software development processes, especially as they pertain to data movement and data integration
Environment: Informatica PowerCenter, Oracle 10g, SQL, PL/SQL, DB2, Stored Procedures, UNIX.
Aspen Technologies, Hyderabad, India Jan 2008 – July 2009
ETL/BI Developer
Responsibilities:
Participated in user meetings, gathered Business requirements & specifications for the Data-warehouse design. Translated the user inputs into ETL design docs.
Responsible for building dimension tables and fact tables for the target eSupplies Data-Warehouse.
Involved in the creation of Informatica mappings to Extracting data from oracle, Flat Files and loaded in to Data Warehouse.
Extensively used the transformations Source Qualifier, EXP, Filter, AGG, LKP, UPD, Joiner, Sorter and Router to create mappings.
Resolved design issues in transformations and mappings.
Developed PL/SQL procedures for processing business logic in the database.
Develop and deploy ETL job workflow with reliable error/exception handling and rollback framework.
Scheduled Load Scripts that fetch Data from Source Database to the eSupplies Warehouse.
Improved the data load speeds by identifying & eliminating bottle necks at various phases.
Documented the existing mappings as per the design standards followed in the project.
Provides input to the technical team in the development of the technical design and reviews the technical design.
Environment: Informatica Power Center, Oracle 9i, SQL Server, DB2, Windows 2003 Server, MS OFFICE 2003, Flat files, Stored Procedures