Anwesh Babu Email:**********.******@*****.***
Hadoop Developer mob no : 3149899000ext730
Professional Summary:
. 7+ years of professional experience in IT industry, with 3 years'
experience in Hadoop ecosystem's implementation, maintenance, ETL and Big
Data analysis operations.
. Excellent understanding of Hadoop architecture and underlying framework
including storage management.
. Experience in using various Hadoop infrastructures such as MapReduce,
Pig, Hive, ZooKeeper, HBase, Sqoop, Oozie, Flume and SOLR for data
storage and analysis.
. Experience in developing custom UDFs for Pig and Hive to incorporate
methods and functionality of Python/Java into PigLatin and HQL (HiveQL).
. Experience with Oozie Scheduler in setting up workflow jobs with
Map/Reduce and Pig jobs.
. Knowledge of architecture and functionality of NOSQL DB like HBase,
Cassandra and MongoDB.
. Experience in managing Hadoop clusters and services using Cloudera
Manager.
. Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and
MapReduce.
. Experience in importing and exporting data between HDFS and Relational
Database Management systems using Sqoop.
. Collected logs data from various sources and integrated in to HDFS using
Flume.
. Assisted Deployment team in setting up Hadoop cluster and services.
. Hands-on experience in setting up Apache Hadoop and Cloudera CDH
clusters on Ubuntu, Fedora and Windows (Cygwin) environments.
. In-depth knowledge of modifications required in static IP (interfaces),
hosts and bashrc files, setting up password-less SSH and Hadoop
configuration for Cluster setup and maintenance.
. Excellent understanding of Virtualization, with experience of setting up
a POC multi-node virtual cluster by leveraging underlying Bridge
Networking and NAT technologies.
. Experience in loading data to HDFS from UNIX (Ubuntu, Fedora, Centos)
file system.
. Knowledge of project life cycle (design, development, testing and
implementation) of Client Server and Web applications.
. Experience in writing batch scripts in Ubuntu/UNIX to automate sequential
script entry.
. Knowledge of Hardware, Software, Networking and external tools including
but not limited to Excel, Access and experience in utilizing their
functionality as and when required to enhance productivity and ensure
accuracy.
. Determined, committed and hardworking individual with strong
communication, interpersonal and organizational skills.
. Technology enthusiast, highly motivated and an avid blog reader, keeping
track of latest advancements in hardware and software fields.
Technical Skills:
Hadoop Ecosystem HDFS, MapReduce, YARN, Hive, Pig, Zookeeper, Sqoop,
Oozie, Flume and Avro.
Web Technologies HTML, XML, JDBC, JSP, JavaScript, AJAX
Methodologies Agile, UML, Design Patterns (Core Java and J2EE)
NOSQL Databases HBase, MongoDB, Cassandra
Data Bases Oracle 11g/10g, DB2, MS-SQL Server, MySQL, MS-Access
Programming C, C++, Java, SQL, PL/SQL, Python, Linux shell
Languages scripts.
Tools Used Eclipse, Putty, Cygwin, MS Office, Crystal Reports
Professional Experience:
Hadoop Developer
Wells Fargo - New York, NY July
2013 - Present
Wells Fargo & Company is an American multinational diversified financial
services company. The CORE project deals with improving end-to-end approach
to real estate-secured lending, the overall customer experience and
achieving the vision of satisfying all the customers' financial needs. The
purpose of the project is to build a big data platform that would be used
to load, manage and process terabytes of transactional data, machine log
data, performance metrics, and other ad-hoc data sets and extract
meaningful information out of it. The solution is based on Cloudera Hadoop.
Responsibilities:
. Worked on implementation and maintenance of Cloudera Hadoop cluster.
. Assisted in upgrading, configuration and maintenance of various Hadoop
infrastructures like Pig, Hive, and Hbase.
. Developed and executed custom MapReduce programs, PigLatin scripts and
HQL queries.
. Used Hadoop FS scripts for HDFS (Hadoop File System) data loading and
manipulation.
. Performed Hive test queries on local sample files and HDFS files.
. Developed and optimized Pig and Hive UDFs (User-Defined Functions) to
implement the functionality of external languages as and when required.
. Extensively used Pig for data cleaning and optimization.
. Developed Hive queries to analyze data and generate results.
. Exported data from HDFS to RDBMS via Sqoop for Business Intelligence,
visualization and user report generation.
. Managed, reviewed and interpreted Hadoop log files.
. Worked on SOLR for indexing and search optimization.
. Analyzed business requirements and cross-verified them with functionality
and features of NOSQL databases like HBase, Cassandra to determine the
optimal DB.
. Analyzed user request patterns and implemented various performance
optimization measures including but not limited to implementing
partitions and buckets in HiveQL.
. Created and maintained Technical documentation for launching HADOOP
Clusters and for executing Hive queries and Pig Scripts
. Monitored workload, job performance and node health using Cloudera
Manager.
. Used Flume to collect and aggregate weblog data from different sources
and pushed to HDFS.
. Integrated Oozie with Map-Reduce, Pig, Hive, and Sqoop.
Environment: Hadoop 1x, HDFS, MapReduce, Pig 0.11, Hive 0.10, Crystal
Reports, Sqoop, HBase, Shell Scripting, UNIX.
Hadoop Developer
PG&E - San Francisco, CA May 2012 -
June 2013
The Pacific Gas and Electric Company, commonly known as PG&E, is an
investor-owned utility that provides natural gas and electricity to most of
the northern two-thirds of California, from Bakersfield to the Oregon
border. The purpose of this project was to build and maintain a bill
forecasting product that will help in reducing electricity consumption by
leveraging the features and functionality of Cloudera Hadoop. A second
cluster was implemented for historic Data warehousing, increasing the
sample size for power and gas usage pattern analysis and for readily
available data storage by leveraging the functionality of HBase.
Responsibilities:
. Involved in development and design of a 3 node Hadoop cluster using
Apache Hadoop for POC and sample data analysis.
. Successfully implemented Cloudera on a 30 node cluster for P&G
consumption forecasting.
. Worked with systems engineering team to plan and deploy new Hadoop
environments and expand existing Hadoop clusters.
. Involved in planning and implementation of an additional 10 node
Hadoop cluster for data warehousing, historical data storage in HBase
and sampling reports.
. Used Sqoop extensively to import data from RDMS sources into HDFS.
. Performed transformations, cleaning and filtering on imported data using
Hive, Map Reduce, and loaded final data into HDFS.
. Developed Pig UDFs to pre-process data for analysis
. Worked with business teams and created Hive queries for ad hoc access.
. Responsible for creating Hive tables, partitions, loading data and
writing hive queries.
. Created Pig Latin scripts to sort, group, join and filter the
enterprise wise data.
. Worked on Oozie to automate job flows.
. Maintained cluster co-ordination services through ZooKeeper.
. Generated summary reports utilizing Hive and Pig and exported these
results via Sqoop for Business reporting and Intelligence analysis to
ascertain whether the power saving programs implemented were fruitful
or not.
Environment: Hadoop, HDFS, Pig 0.10, Hive, MapReduce, Sqoop, Java
Eclipse, SQL Server, Shell Scripting.
Hadoop Developer
RelayHealth - Atlanta, GA October
2011 - April 2012
RelayHealth, a subsidiary of McKesson, processes healthcare provider-to-
payer interactions between 200,000 physicians, 2,000 hospitals, and 1,900
payers (health plans). We processed millions of claims per day on Cloudera
Enterprise, analyzing more than 1 million (150GB) log files per day and
integrating with multiple Oracle systems. As a result, we were able to
assist our healthcare providers to get paid faster, improving their cost
models and productivity.
Responsibilities:
. Involved in the process of load, transform and analyze health care data
from various providers into Hadoop using flume on an on-going basis.
. Filtered, transformed and combined data from multiple providers based on
payer filter criteria using custom Pig UDFs.
. Analyzed transformed data using HiveQL and Hive UDF's to generate payer
by reports for transmission to payers for payment summaries.
. Exported analyzed data to downstream systems using Sqoop-RDBMS for
generating end-user reports, Business Analysis reports and payment
reports.
. Responsible for creating Hive tables based on business requirements.
. Analyzed large data sets by running Hive queries and Pig scripts
. Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for
efficient data access.
. Experienced in running Hadoop streaming jobs to process terabytes of xml
format data.
. Analyzed large amounts of data sets from hospitals and providers to
determine optimal way to aggregate and generate summary reports.
. Worked with the Data Science team to gather requirements for various data
mining projects
. Load and transform large sets of structured, semi structured and
unstructured data.
. Developed Pig Latin scripts to extract data from the web server output
files to load into HDFS
. Extensively used Pig for data cleansing.
. Implemented test scripts to support test driven development and
continuous integration.
Environment: Hadoop, HDFS, Pig 0.10, Hive, MapReduce, Sqoop, Java
Eclipse, SQL Server, Shell Scripting.
Java/J2EE Interface Developer
Avon Products - New York, NY October
2010 - September 2011
Avon Products, Inc. is an American international manufacturer and direct
selling company in beauty, household, and personal care categories. The
object of this project was to support existing applications, develop an M-
Commerce application for Avon mobile purchase portal.
Responsibilities
. Created Use case, Sequence diagrams, functional specifications and User
Interface diagrams
. Involved in complete requirement analysis, design, coding and testing
phases of the project.
. Participated in JAD meetings to gather the requirements and understand
the End Users System.
. Migrated global internet applications from standard MVC to Spring MVC
and hibernate.
. Integrated content management configurations for each page with web
applications JSPs.
. Assisted in design and development of Avon M-Commerce application from
the scratch using HTTP, XML, Java, Oracle objects, Toad and Eclipse.
. Created Stored Procedures & Functions.
. Used JDBC to process database calls for DB2 and SQL Server databases.
. Developed user interfaces using JSP, HTML, XML and JavaScript.
. Actively involved in code review and bug fixing for improving the
performance.
Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML,
Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL
Server 2008.
Java Developer
D&B Corporation - Parsippany, NJ November 2009
- September 2010
D&B is world's leading provider of business information, helping reduce the
credit risk and manages business between customers and vendors efficiently.
The D&B stores and maintains information over 77 million companies
worldwide.
Responsibilities
. Utilized Agile Methodologies to manage full life-cycle development of
the project.
. Implemented MVC design pattern using Struts Framework.
. Form classes of Struts Framework to write the routing logic and to
call different services.
. Created tile definitions, Struts-config files, validation files and
resource bundles for all modules using Struts framework.
. Developed web application using JSP custom tag libraries, Struts
Action classes and Action.
. Designed Java Servlets and Objects using J2EE standards.
. Used JSP for presentation layer, developed high performance
object/relational persistence and query service for entire application
utilizing Hibernate.
. Developed the XML Schema and Web services for the data maintenance and
structures.
. Used Web Sphere Application Server to develop and deploy the
application.
. Worked with various Style Sheets like Cascading Style Sheets (CSS).
. Involved in coding for JUnit Test cases.
Environment: Java/J2EE, Oracle 11g, SQL, JSP, Struts 1.2, Hibernate 3,
Web Logic 10.0, HTML, AJAX, Java Script, JDBC, XML, JMS, UML, JUnit,
log4j, Web Sphere, My Eclipse
Java/J2EE developer
Wilshire Software Technologies - Hyderabad, India
April 2007 - October 2009
Wilshire Technologies is committed to provide high quality service with
elevated level of client satisfaction. Wilshire has just the right mix of
technical skills and experience to provide real time client solutions. For
this we are facilitating high end infrastructure for design & development.
Responsibilities:
. Developed the application under JEE architecture, developed, Designed
dynamic and browser compatible user interfaces using JSP, Custom Tags,
HTML, CSS, and JavaScript.
. Deployed & maintained the JSP, Servlets components on Web logic 8.0
. Developed Application Servers persistence layer using JDBC and SQL.
. Used JDBC to connect the web applications to Databases.
. Implemented Test First unit testing framework driven using Junit.
. Developed and utilized J2EE Services and JMS components for messaging
in Web Logic.
. Configured development environment using Web logic application server
for developers integration testing.
Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, AJAX, Java Script, Web
Logic 8.0, HTML, JDBC
REFERENCES WILL BE PROVIDED ON REQUEST