Data Developer

Location:

Uniondale, NY

Posted:

August 06, 2014

Contact this candidate

Resume:

Anwesh Babu Email:**********.******@*****.***

Hadoop Developer mob no : 3149899000ext730

Professional Summary:

. 7+ years of professional experience in IT industry, with 3 years'

experience in Hadoop ecosystem's implementation, maintenance, ETL and Big

Data analysis operations.

. Excellent understanding of Hadoop architecture and underlying framework

including storage management.

. Experience in using various Hadoop infrastructures such as MapReduce,

Pig, Hive, ZooKeeper, HBase, Sqoop, Oozie, Flume and SOLR for data

storage and analysis.

. Experience in developing custom UDFs for Pig and Hive to incorporate

methods and functionality of Python/Java into PigLatin and HQL (HiveQL).

. Experience with Oozie Scheduler in setting up workflow jobs with

Map/Reduce and Pig jobs.

. Knowledge of architecture and functionality of NOSQL DB like HBase,

Cassandra and MongoDB.

. Experience in managing Hadoop clusters and services using Cloudera

Manager.

. Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and

MapReduce.

. Experience in importing and exporting data between HDFS and Relational

Database Management systems using Sqoop.

. Collected logs data from various sources and integrated in to HDFS using

Flume.

. Assisted Deployment team in setting up Hadoop cluster and services.

. Hands-on experience in setting up Apache Hadoop and Cloudera CDH

clusters on Ubuntu, Fedora and Windows (Cygwin) environments.

. In-depth knowledge of modifications required in static IP (interfaces),

hosts and bashrc files, setting up password-less SSH and Hadoop

configuration for Cluster setup and maintenance.

. Excellent understanding of Virtualization, with experience of setting up

a POC multi-node virtual cluster by leveraging underlying Bridge

Networking and NAT technologies.

. Experience in loading data to HDFS from UNIX (Ubuntu, Fedora, Centos)

file system.

. Knowledge of project life cycle (design, development, testing and

implementation) of Client Server and Web applications.

. Experience in writing batch scripts in Ubuntu/UNIX to automate sequential

script entry.

. Knowledge of Hardware, Software, Networking and external tools including

but not limited to Excel, Access and experience in utilizing their

functionality as and when required to enhance productivity and ensure

accuracy.

. Determined, committed and hardworking individual with strong

communication, interpersonal and organizational skills.

. Technology enthusiast, highly motivated and an avid blog reader, keeping

track of latest advancements in hardware and software fields.

Technical Skills:

Hadoop Ecosystem HDFS, MapReduce, YARN, Hive, Pig, Zookeeper, Sqoop,

Oozie, Flume and Avro.

Web Technologies HTML, XML, JDBC, JSP, JavaScript, AJAX

Methodologies Agile, UML, Design Patterns (Core Java and J2EE)

NOSQL Databases HBase, MongoDB, Cassandra

Data Bases Oracle 11g/10g, DB2, MS-SQL Server, MySQL, MS-Access

Programming C, C++, Java, SQL, PL/SQL, Python, Linux shell

Languages scripts.

Tools Used Eclipse, Putty, Cygwin, MS Office, Crystal Reports

Professional Experience:

Hadoop Developer

Wells Fargo - New York, NY July

2013 - Present

Wells Fargo & Company is an American multinational diversified financial

services company. The CORE project deals with improving end-to-end approach

to real estate-secured lending, the overall customer experience and

achieving the vision of satisfying all the customers' financial needs. The

purpose of the project is to build a big data platform that would be used

to load, manage and process terabytes of transactional data, machine log

data, performance metrics, and other ad-hoc data sets and extract

meaningful information out of it. The solution is based on Cloudera Hadoop.

Responsibilities:

. Worked on implementation and maintenance of Cloudera Hadoop cluster.

. Assisted in upgrading, configuration and maintenance of various Hadoop

infrastructures like Pig, Hive, and Hbase.

. Developed and executed custom MapReduce programs, PigLatin scripts and

HQL queries.

. Used Hadoop FS scripts for HDFS (Hadoop File System) data loading and

manipulation.

. Performed Hive test queries on local sample files and HDFS files.

. Developed and optimized Pig and Hive UDFs (User-Defined Functions) to

implement the functionality of external languages as and when required.

. Extensively used Pig for data cleaning and optimization.

. Developed Hive queries to analyze data and generate results.

. Exported data from HDFS to RDBMS via Sqoop for Business Intelligence,

visualization and user report generation.

. Managed, reviewed and interpreted Hadoop log files.

. Worked on SOLR for indexing and search optimization.

. Analyzed business requirements and cross-verified them with functionality

and features of NOSQL databases like HBase, Cassandra to determine the

optimal DB.

. Analyzed user request patterns and implemented various performance

optimization measures including but not limited to implementing

partitions and buckets in HiveQL.

. Created and maintained Technical documentation for launching HADOOP

Clusters and for executing Hive queries and Pig Scripts

. Monitored workload, job performance and node health using Cloudera

Manager.

. Used Flume to collect and aggregate weblog data from different sources

and pushed to HDFS.

. Integrated Oozie with Map-Reduce, Pig, Hive, and Sqoop.

Environment: Hadoop 1x, HDFS, MapReduce, Pig 0.11, Hive 0.10, Crystal

Reports, Sqoop, HBase, Shell Scripting, UNIX.

Hadoop Developer

PG&E - San Francisco, CA May 2012 -

June 2013

The Pacific Gas and Electric Company, commonly known as PG&E, is an

investor-owned utility that provides natural gas and electricity to most of

the northern two-thirds of California, from Bakersfield to the Oregon

border. The purpose of this project was to build and maintain a bill

forecasting product that will help in reducing electricity consumption by

leveraging the features and functionality of Cloudera Hadoop. A second

cluster was implemented for historic Data warehousing, increasing the

sample size for power and gas usage pattern analysis and for readily

available data storage by leveraging the functionality of HBase.

Responsibilities:

. Involved in development and design of a 3 node Hadoop cluster using

Apache Hadoop for POC and sample data analysis.

. Successfully implemented Cloudera on a 30 node cluster for P&G

consumption forecasting.

. Worked with systems engineering team to plan and deploy new Hadoop

environments and expand existing Hadoop clusters.

. Involved in planning and implementation of an additional 10 node

Hadoop cluster for data warehousing, historical data storage in HBase

and sampling reports.

. Used Sqoop extensively to import data from RDMS sources into HDFS.

. Performed transformations, cleaning and filtering on imported data using

Hive, Map Reduce, and loaded final data into HDFS.

. Developed Pig UDFs to pre-process data for analysis

. Worked with business teams and created Hive queries for ad hoc access.

. Responsible for creating Hive tables, partitions, loading data and

writing hive queries.

. Created Pig Latin scripts to sort, group, join and filter the

enterprise wise data.

. Worked on Oozie to automate job flows.

. Maintained cluster co-ordination services through ZooKeeper.

. Generated summary reports utilizing Hive and Pig and exported these

results via Sqoop for Business reporting and Intelligence analysis to

ascertain whether the power saving programs implemented were fruitful

or not.

Environment: Hadoop, HDFS, Pig 0.10, Hive, MapReduce, Sqoop, Java

Eclipse, SQL Server, Shell Scripting.

Hadoop Developer

RelayHealth - Atlanta, GA October

2011 - April 2012

RelayHealth, a subsidiary of McKesson, processes healthcare provider-to-

payer interactions between 200,000 physicians, 2,000 hospitals, and 1,900

payers (health plans). We processed millions of claims per day on Cloudera

Enterprise, analyzing more than 1 million (150GB) log files per day and

integrating with multiple Oracle systems. As a result, we were able to

assist our healthcare providers to get paid faster, improving their cost

models and productivity.

Responsibilities:

. Involved in the process of load, transform and analyze health care data

from various providers into Hadoop using flume on an on-going basis.

. Filtered, transformed and combined data from multiple providers based on

payer filter criteria using custom Pig UDFs.

. Analyzed transformed data using HiveQL and Hive UDF's to generate payer

by reports for transmission to payers for payment summaries.

. Exported analyzed data to downstream systems using Sqoop-RDBMS for

generating end-user reports, Business Analysis reports and payment

reports.

. Responsible for creating Hive tables based on business requirements.

. Analyzed large data sets by running Hive queries and Pig scripts

. Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for

efficient data access.

. Experienced in running Hadoop streaming jobs to process terabytes of xml

format data.

. Analyzed large amounts of data sets from hospitals and providers to

determine optimal way to aggregate and generate summary reports.

. Worked with the Data Science team to gather requirements for various data

mining projects

. Load and transform large sets of structured, semi structured and

unstructured data.

. Developed Pig Latin scripts to extract data from the web server output

files to load into HDFS

. Extensively used Pig for data cleansing.

. Implemented test scripts to support test driven development and

continuous integration.

Environment: Hadoop, HDFS, Pig 0.10, Hive, MapReduce, Sqoop, Java

Eclipse, SQL Server, Shell Scripting.

Java/J2EE Interface Developer

Avon Products - New York, NY October

2010 - September 2011

Avon Products, Inc. is an American international manufacturer and direct

selling company in beauty, household, and personal care categories. The

object of this project was to support existing applications, develop an M-

Commerce application for Avon mobile purchase portal.

Responsibilities

. Created Use case, Sequence diagrams, functional specifications and User

Interface diagrams

. Involved in complete requirement analysis, design, coding and testing

phases of the project.

. Participated in JAD meetings to gather the requirements and understand

the End Users System.

. Migrated global internet applications from standard MVC to Spring MVC

and hibernate.

. Integrated content management configurations for each page with web

applications JSPs.

. Assisted in design and development of Avon M-Commerce application from

the scratch using HTTP, XML, Java, Oracle objects, Toad and Eclipse.

. Created Stored Procedures & Functions.

. Used JDBC to process database calls for DB2 and SQL Server databases.

. Developed user interfaces using JSP, HTML, XML and JavaScript.

. Actively involved in code review and bug fixing for improving the

performance.

Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML,

Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL

Server 2008.

Java Developer

D&B Corporation - Parsippany, NJ November 2009

- September 2010

D&B is world's leading provider of business information, helping reduce the

credit risk and manages business between customers and vendors efficiently.

The D&B stores and maintains information over 77 million companies

worldwide.

Responsibilities

. Utilized Agile Methodologies to manage full life-cycle development of

the project.

. Implemented MVC design pattern using Struts Framework.

. Form classes of Struts Framework to write the routing logic and to

call different services.

. Created tile definitions, Struts-config files, validation files and

resource bundles for all modules using Struts framework.

. Developed web application using JSP custom tag libraries, Struts

Action classes and Action.

. Designed Java Servlets and Objects using J2EE standards.

. Used JSP for presentation layer, developed high performance

object/relational persistence and query service for entire application

utilizing Hibernate.

. Developed the XML Schema and Web services for the data maintenance and

structures.

. Used Web Sphere Application Server to develop and deploy the

application.

. Worked with various Style Sheets like Cascading Style Sheets (CSS).

. Involved in coding for JUnit Test cases.

Environment: Java/J2EE, Oracle 11g, SQL, JSP, Struts 1.2, Hibernate 3,

Web Logic 10.0, HTML, AJAX, Java Script, JDBC, XML, JMS, UML, JUnit,

log4j, Web Sphere, My Eclipse

Java/J2EE developer

Wilshire Software Technologies - Hyderabad, India

April 2007 - October 2009

Wilshire Technologies is committed to provide high quality service with

elevated level of client satisfaction. Wilshire has just the right mix of

technical skills and experience to provide real time client solutions. For

this we are facilitating high end infrastructure for design & development.

Responsibilities:

. Developed the application under JEE architecture, developed, Designed

dynamic and browser compatible user interfaces using JSP, Custom Tags,

HTML, CSS, and JavaScript.

. Deployed & maintained the JSP, Servlets components on Web logic 8.0

. Developed Application Servers persistence layer using JDBC and SQL.

. Used JDBC to connect the web applications to Databases.

. Implemented Test First unit testing framework driven using Junit.

. Developed and utilized J2EE Services and JMS components for messaging

in Web Logic.

. Configured development environment using Web logic application server

for developers integration testing.

Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, AJAX, Java Script, Web

Logic 8.0, HTML, JDBC

REFERENCES WILL BE PROVIDED ON REQUEST

Contact this candidate