Data Project

Location:

Posted:

October 18, 2019

Resume:

Professional Summary

** ***** ** ***** ********** including 4.5 years of experience on major components in Hadoop Ecosystem like HDFS, Hive, Sqoop, PIG, Oozie and Flume, HUE, Spark.

Excellent understanding of Hadoop architecture and various components such as HDFS, YARN, High Availability, and Map Reduce programming paradigm.

Working knowledge on Apache Spark, Spark SQL, Pyspark.

Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop 2.x, HDFS, Hive, Oozie, Zookeeper, Sqoop, Flume and NiFi.

Experience in writing work flows and scheduling jobs using Oozie.

Written Hive queries for data analysis and to process the data for visualization.

Experienced in working with Hive data warehouse tool creating tables, distributing data by implementing Partitioning and Bucketing strategies, writing and optimizing the HiveQL queries

Extended Pig and Hive core functionality by writing custom UDFs.

Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive or RDBMS.

Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture

Good experience with Apache Zeppelin (Multi-purpose Notebook). A web-based notebook that enables interactive data analytics.

Strong knowledge on Version control tools - GIT, SVN.

Have Experience to FileNet Core of products and involved in developing and deploying FileNet Applications based on Process and Content management solutions.

Working Knowledge on Apache SOLR.

Knowledge on BI tools like Tableau Public.

Good coding experience using Java, Servlets, JSP, Struts.

Extensive work experience in deploying, configuring Application servers like BEA Web logic, IBM Web Sphere, and open source Apache Tomcat Server.

Experience in IDE’s like IBM RAD 7.0, Eclipse 3.1 and Net Beans.

Work within an Agile development process

Domain expertise in Banking, Insurance, Telecom and Health Care.

Experience in deploying the applications in PROD and UAT on Linux Environments.

Good interpersonal and communication skills. Team player with strong problem solving skills.

Excellent technical skills, consistently outperformed schedules and acquired interpersonal and communication skills

Successfully working in fast-paced environment, both independently and in collaborative team environments.

Key Domain and Technical Knowledge

Domain : Banking, Auto Insurance, Telecom and Health Insurance.

Technical: Java, Hadoop, HDFS, Map Reduce, Pig, Sqoop, Flume, Pig, Hive, Pyspark, Apache Nifi, HBase,OOZIE,XML, Struts, Life ray, Apache SOLR, IBM FileNet, PostGreSQL, Oracle, Windows, Linux, Zeke,Machine Learning using H2o,GIT,JIRA.

Academic Qualification

•Bachelor of Technology in Electronics and Communications Engineering from J.N.T University, Hyderabad.

Certifications / Professional Awards:

Infosys Certified Big Data - Hadoop Developer.

Infosys Certified Big Data – Hive Developer.

Total Work Experience: 12 Years

Company: Infosys Limited Period: July 2012 to Till Date

Company: Datum CyberTech India Limited Period: Oct 2011 to July 2012

Company: Navayuga InfoTech Pvt Limited Period: Apr 2011 to Oct 2011

Company: Monarch InfoTech Services (MITS) Pvt Ltd Period: Sep 2009 to Mar 2011

Company: Satyam Computer Services Ltd Period: June 2007 to Sep 2009

Company: I source Technologies Pvt Ltd Period: April 2006 to April 2007

Projects:

Project Title: Payment Integrity

Client: Aetna Inc., New York

Duration: From Sep 2015 to Till Date

Description :Aetna, is an US based health care company, which sells traditional and consumer directed health care insurance plans and related services, such as medical, pharmaceutical, dental, behavioral health, long-term care, and disability plans. On an average Aetna receives 1 Million claims each day. The sheer number of providers, members and plan types makes the pricing of these claims incredibly complex. Through misinterpretation of provider contracts and human errors a small amount of claims are paid improperly. As part of the Data Science team all the data from the critical domains like Aetna Medicare, Traditional Group membership, Member, Plan, Claim, Provider are migrated to Hadoop Environment .All the demographical information is moved from MySQL to Hadoop and the analysis is done on the data. Also the claims data will be moved from MQ to Hadoop and after processing the claims sending the response back to MQ and build history to track all the changes corresponding to the processed claims.

Role and Responsibilities

Design and implement data pipelines to consume data from heterogeneous data sources and build an integrated health Insurance Claims view of data. Use Hortonworks data platform, which consists of Red hat Linux edge nodes with availability of Hadoop Distributed File System (HDFS). Data processing and storage is done across a 1000 node cluster.

Involved in writing scripts to load and process the data in Hive, Pig

Regularly tune performance of Hive and Pig queries to improve data processing and retrieving

Working with data scientists to provide the required data for building the model features.

Created multiple Hive tables, implemented partitioning, dynamic partitioning and buckets in Hive for efficient data access.

Involved in scheduling the jobs using Zeke Framework.

Involved in providing the support for the jobs that are already running in Production.

Customized scheduling in Oozie to run complete end to end flow of self-contained Model, which have a functionality of integrated Hive and Pig jobs.

Extensively used GIT as a code repository for managing day agile project

development process and to keep track of the issues and blockers.

Sharing the Production Outcome Analysis report from different jobs to the end users on a weekly basis.

Migrated pig scripts to Pyspark scripts to enable faster and in memory computing. Perform ad hoc analytics on large/diverse data using PySpark.

Creating UDF in Pyspark for the custom functionalities.

Environment: Hadoop 2.x7, HDFS, Pig 0.14.0, Hive 0.13.0, Sqoop, Flume, Apache NIFI, Oozie, GIT, JIRA, Pyspark, H2o, Netezza, MY SQL, Agility.

Project Title: iTunes OPS Reporting

Client: Apple

Duration: From Feb 2015 to Sep 2015

Description: “iTunes OPS Reporting” is a near real time data warehouse and reporting solution for iTunes Online Store and acts as a reporting system for external and operational reporting needs. It also publishes data to downstream systems like Piano (for Label Reporting) and ICA (for Business Objects Reporting, Campaign List Pull and Analytics). De-normalized data is used for publishing various reports to the users. In addition this project caters to the need of the ITS (iTunes Store) Business user groups. A lot of complex analytical expertise is required which involves lot of domain knowledge, detailed understanding of the features of iTunes, its data flow and also measuring the accuracy of the system in place.

In depth knowledge and a complex analysis of the iTunes data enables the Business users to take in time and informed decisions. The analysis is based on huge volume of data and thus requires profound knowledge of various trends since the inception of iTunes store.

A very sound knowledge of Big Data (Hadoop), Oracle and Unix is required to build the various reports that helps the business users to have a better understanding of the raw data.

Role and Responsibilities

Involved in Analyzing the Tera Data Procedures.

Involved in Developing the Graffle Design Documents for the Teradata Procedures in Hive

Involved in developing the design document for implementing HQL.

Participated in design review.

Environment: Hadoop 2.x7, HDFS, Hive 0.13.0, Oozie, GIT, JIRA, Java, Agility,TeraData

Project Title: GBI Fraud Analytics.

Client: Apple

Duration: From Nov 2014 to Jan 2015.

Description: GBI Fraud Analytics is an existing semantic application in EDW and consists of two main parts, which are related to account creation info and iTunes orders summarization. Order Summary is flattened view of the iTunes transaction such as purchase, account creation, billing information change, gift redemption, rejections along with key elements such as device information, IP address, most commonly used features. iTunes is a transactional database for content purchases like Music, Movies, TV, Apps, and Books etc. Athena is a centralized platform for

Analytic decisioning and case management for the detection and mitigation of fraud, waste and abuse.

This is a migration project of the semantic application from Teradata platform to Hadoop platform and give a de-normalized view of all the iTunes purchase and account creation data required further for fraud analysis. POC for this project involved doing performance tests on the concepts of map-side joins, SMB joins in Hive and find feasibility of applying the same on the big data joins required for the project. Along with performance tuning aspects poc also included performing the joins with the Apple custom format hive tables and validate the results.

A very sound knowledge of Big Data (Hadoop), Java, HQL and Unix is required to develop this POC.

Role and Responsibilities

Creating HQL scripts for creating the tables and populating the data.

Developed UDF in Hive to Process the Semi Structured data.

Involved in Testing the Map Side Joins for Performance.

Created Oozie scripts (job. properties, workflow.xml).

Testing the HQL scripts.

Hadoop 2.x7, HDFS, Hive 0.13.0, Oozie, GIT, JIRA, Java, Agility,TeraData

Project Title: Procure Edge 3.0

Client: P&G

Duration: From July 2012 to May 2014

Description: ProcureEdge3.0 is a cloud based Source to Pay (S2P) platform that will revolutionize the way a business to business procurement happens. This platform is being co-created with P&G. ProcureEdge3.0 will provide intuitive, user friendly UI.

The integrated One Search for end users (buyer) which will search across goods and services and provide user relevant results, allow them to create a shopping cart and finally submit the shopping cart to various client's supplier relationship management systems i.e. SAP SRM and ECC modules. This platform will also enable 3rd party integration with payment systems such as virtual procurement card solution, vendor management systems used to manage various procurement categories like contingent labor, SOW, Project Services etc.

This is an end-to-end platform development project involving requirements elaboration, design, build, testing, implementation and support.

Role and Responsibilities

Involved in Detailed Design and Technical Discussions.

Involved in Requirements Phase.

Involved in developing a Search UI using Apache SOLR.

Involved in Deployment of Applications in Different Environments like UAT, Production Testing Involved in Support Tasks and resolving tickets.

Involved in Loading of Catalog Data using ETL jobs into the Data Base.

Environment: Java 1.5, SOLR, LIferay,PostGre SQL

Project Title: eFile Reporting

Client: Zurich Financial Services

Duration: From July 2011 to June 2012

Description: The reporting utility is a standalone function separate from the eFile and Task Manager applications. The reports are generated by querying data from the databases that store the content (documents) that is maintained by Efile and Task Manager. The reporting tool contains a designer client for building the queries to extract the required data and also a report designer tool. In order to meet the requirements of this change request both the queries to extract the data and the report designs will have to be modified. Documents counts are for the entire object store. This report is generated on a weekly basis at end of day on Friday so that it provides the full previous weeks data. The report includes Customer counts, document counts and document size inclusive of Mime Types, indexing and ToDo cases/tasks.

Role and Responsibilities

Involved in report design.

Involved in generating the reports in PDF.

Project Title: RAPID ECM Services Framework

Client: Zurich Financial Services

Duration: From April 2011 to June 2011

Description: RAPID ECM Services Framework is a framework built upon various ECM vendors, provides enhanced and sophisticated Document Management and Process management features which serves various Business domains like Insurance, Finance & Banking. This framework provides cost effective & high performance solution with shorter time period of ROI.

This framework can be plugged in with various ECM products like IBM Filenet, EMC Documentum, and IBM on Demand etc.

Main features of this Framework includes

1. Document management services (Archival, Versioning, and Security)

2. Supports various types documents like Tiff, PDF, Ms Office documents etc

3. Centralized storage with cost effective storage solution

4. Document Capture from various sources like Shared folder, Email, Fax etc

5. Business process management services (Process delegation, Work assignment, Email notification, Mile stones, and Deadlines)

6. Sophisticated user interface with all information embraced in a single screen space.

7. Robust search features

Role and Responsibilities

Involved in developing UI for the search Framework.

Involved in Requirements Phase.

Involved in O-R mapping using Hibernate.

Environment: FileNet, Content Engine, Process Engine, Business Process Framework, Java script.

Project Title: Correspondence Management System

Client: Global Business Machines, Dubai

Duration: From Jan 2011 to Feb 2011

Description: The proposed system is providing the User Interface for the users like Administrator, application users. This system mainly is used to upload and search for the documents. This system is also used to add permissions to the users on the for the particular user for that menus and menu items.

• Also this system is used to generate reports using jasper.

Role and Responsibilities

Involved in Design of the UI for Incoming, Outgoing, Memo and Search Panels.

Involved in Customization of the UI using GXT API.

Also Developed Admin Screens for the modules like Geographic Locations, Departments, Users, and Workflow Responses

Environment: FileNet, Content Engine, Process Engine, GWT,Java 1.5.

Project Title: CDMS (Client Document Management System)

Client: Standard Chartered bank, Chennai

Duration: From June 2010 to Dec 2010

Description: The proposed system is to generate scheduled monthly reports without any user interaction on the system and also to generate reports on adhoc basis. This helps business to move away from the current Lotus Notes application for the report generation and also avoids the dependencies on certain views in the database for few regular and adhoc reports.

Reports enable business to track the ISDA & Non-ISDA negotiations including statistical reports.

• User is provided with the option to Export the Results into the Excel Sheet only.

The newly generated reports benefits the users with Operational efficiency improvements like,

• On-line access to information

• Access to the Report at any time

• Turn Around Time (TAT) improvement

• Easy retrieval

• Easy to follow up and keep track of records.

Role and Responsibilities

Involved in Generating Reports Using Apache POI

Involved in Customization of Reports.

Using Ibatis created queries for retrieving the data from Database.

Provided support at the client location for SIT and UAT support.

Environment: FileNet, Content Engine, Process Engine, Business Process Framework, Excel API,IBatis.

Project Title: eBCP(Electronic Business Continuity Planning)

Client: Client: Standard Chartered bank, Chennai

Duration: From Dec 2009 to May 2010

Description: The proposed system is to process, store and retrieve the BCP related documents governed by workflow process. This will help business move away from the current process of relying on emails which carry inherent disadvantages like challenges in tracking of the work, inability to implement operational controls etc. This eBCP portal will encompass capturing of document transactional details, tracking of status, document-centric workflow solution, provision of alerts and notifications to the users and reports to enable business to track users work including statistical reports.

Operational efficiency improvements:

• On-line access to information

• Turn Around Time (TAT) improvement

• All approvals will be done electronically using Workflow management

• No manual intervention to re-send the document (via email) if there is a need from BCM / Regional BCM

Role and Responsibilities

Involved in Requirements gathering and design.

User Interface development for Custom Launch processor for BCM Documents Approval Process.

Involved in customization of Launch Processor.

Environment: FileNet, Content Engine, Process Engine, Business Process Framework, Excel API,IBatis.

Project Title: Incentive Tracking Tool.

Client: State Farm Insurance, USA

Duration: From Apr 2008 to June 2009

Description: Incentive Tracking Tool is designed for Large Insurance Company's employees Incentives. Incentive Tracking Tool increases the ability to align the efforts of all the employees towards the achievement of company goals. Based on this tool, Leaders recognize employees' performance in achieving goals and expectations. This "Pay for Performance" philosophy is the annual performance review, which serves as the basis for determining merit increase. Employees can view their incentives and awards in this tool. Incentives or Awards are based on the employee Score card results each year.

Role and Responsibilities

Involved in Design of Class, Sequence, and Data Flow Diagrams for the specific modules.

Involved in bug fixing

Involved in Unit testing.

As a Developer ensured that the code met the client specific standards.

Environment: Java, Servlets, JSP, Struts,Hibernate,Castor Framework.

Project Title: Document Packaging System (DPS) & Composite Returns and Funds Tracking (CRAFT).

Client: Goldman Sachs

Duration: From Sep 2007 to Mar 2008

Description: DPS: Document Packaging System (DPS) is an automated process to gather transmitted documents, sorts them into the appropriate packages, generates required documents, for example, cover letter, state summary etc and submits them for printing and distribution. Here we will create a System Run i.e. like Stone and Bridge, White Hall, Whitehall state, PIA, and then goes to all the required Stages like Configuring the Package, Uploading the investor information, Check in Process, Packaging and Distribution.

CRAFT: This project deals with filing the returns at the Partnership or fund level on behalf of the entire eligible electing partners. It elevates the partner’s burden of having to file a separate state tax return as a result of funds activity in a Particular State. The system enables investors to declare their resident states and other states where they have investments by using web. This application is used for uploading investor information, creating investor packages, printing the packages, to submit the elections, processing the forms.

Role and Responsibilities

Implementation of Action Classes and Action forms with the Struts frame work.

Verifying the Scanned Documents.

Generating the Yearly Gross proceeds and Commission Reports and Verifying the Month End reports and weekly Reports.

Database Connectivity using Stored Procedures and Tables.

Environment: Java, Servlets, JSP, Struts,Hibernate,Glui,MIthra.

Project Title: Retail Management System (RMS).

Client: Tata Tele services Limited

Duration: From Dec 2006 to Apr 2007

Description: This project deals with creating and maintenance of stores. This also deals with creation of employees pertaining to different stores. It uses circles for maintaining of the material goods were we call these stores as warehouses. From these warehouses we can order the goods and basing upon the receivable of goods we can generate GRN’s. Here we also maintain invoices and create vouchers for the invoices for the day-to-day basis.

Role and Responsibilities

Involved in Developing Web Components and Struts Components

Involved in functional Testing.

JDBC connectivity and Database operations using Data Sources.

Performed Peer Testing.

Environment: Java, Servlets, JSP, Struts

Project Title: Health Care,

Client: Centura Health, USA

Duration: From Jul 2006 to Nov 2006

Description: The Health Plus application automates the corporate hospitals administration system from the scratch. It maintains the employee details, doctor details, stock position of medicines, availability of ward details and billing system. The Front Office Executive registers the patient by providing him with a unique identification and based on the identification corresponding patient billing is done.

The system contains the following modules: Login module, Registration module, Stock details module, Test Reports module, billing system module.

Role and Responsibilities

Involved in developing JSP’s.

Developed the presentation tier with the Struts framework consisting of Action Classes and Action forms and other related configuration settings.

Involved in Unit Testing.

Environment: Java, Servlets, JSP, Struts.

Sarath Bhushan Endluri

Contact No: +1-860-***-**** Current Location: New York

Email id: *******.******@*****.***

Contact this candidate