Hadoop Developer

Location:

Atlanta, GA

Posted:

April 25, 2016

Contact this candidate

Resume:

ANUDEEP

*********@*****.***

(***) ***- ****

Comprehensive experience of 6 years, with over 4 years in Hadoop and Scala(spark) development and Administration experience along with 2+ of experience in Java/J2EE enterprise application design, development and maintenance.

Extensive experience implementing Big Data solutions using various distributions of Hadoop and its ecosystem tools.

Hands-on experience in installing, configuring and monitoring HDFS clusters (on premise & cloud AWS).

In depth understanding of MapReduce programs to scrub, sort, filter, join and query data

Planning, deployment, and tuning of SQL (SQL Server, MySQL) and NoSQL (elasticsearch, Redis, memcached) databases.

Implemented innovative solutions using various Hadoop ecosystem tools like Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, ElasticSearch, Zookeeper, Couchbase, Storm, Solr, Cassandra and Spark.

Experience developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDF s) for data specific processing.

Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop & Flume.

Hands-on experience developing workflows that execute MapReduce, Sqoop, Flume, Hive and Pig scripts using oozie.

Hands-on experience on R language using Shiny web application framework.

Well-versed database development knowledge using SQL data types, Joins, Views, Transactions, Large Objects and Performance tuning.

Good knowledge of Data warehousing concepts and ETL and Teradata.

Experience writing Shell scripts in Linux OS and integrating them with other solutions.

Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC.

Fluent with the core Java concepts like I/O. Multi-threading, Exceptions, RegEx. Collections, Data-structures and Serialization.

Excellent problem-solving analytical, communication, presentation and interpersonal skills that help me be a core member of any team.

Experience mentoring and working with offshore and distributed teams.

Areas of Expertise

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Chukwa, Pentaho Kettle and Talend

Programming Languages: Java, C/C++, eVB, Assembly Language (8085/8086)

Scripting Languages: JSP & Servlets, PHP, JavaScript, XML, HTML, Python and Bash

Databases: NoSQL, Oracle, SQLtalend

UNIX Tools: Apache, Yum, RPM

Tools: Eclipse, JDeveloper, JProbe, CVS, Ant, MS Visual Studio

Platforms: Windows (2000/XP), Unix, Linux, Solaris

Application Servers: Apache Tomcat 5.x 6.0, Jboss 4.0

Testing Tools: NetBeans, Eclipse, WSAD, RAD

Methodologies: Agile, UML, Design Patterns

Work Experience

Hadoop Developer

Dow Jones & Company, Atlanta, GA February 2015 to April 2016

The company was best known for the publication of the Dow Jones Industrial Average and related market statistics, Dow Jones Newswire and a number of financial publications. In 2010 the Dow Jones Indexes subsidiary was sold to the CME Group and the company focused on financial news publications, including its flagship publication The Wall Street Journal and providing financial news and information tools to financial companies.

Responsibilities

Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years' worth of claim data to detect and separate fraudulent claims.

Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.

Played a key-role is setting up a 40 node Hadoop cluster utilizing Apache Spark by working closely with the Hadoop Administration team.

Worked with the advanced analytics team to design fraud detection algorithms and then developed

MapReduce programs to efficiently run the algorithm on the huge datasets.

Developed Scala programs to perform data scrubbing for unstructured data.

Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.

Helped in troubleshooting Scala problems while working with Micro Strategy to produce illustrative reports and dashboards along with ad-hoc analysis.

Used Flume to collect the logs data with error messages across the cluster.

Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.

Played a key role in installation and configuration of the various Hadoop ecosystem tools such as Solr, Kafka, Pig, HBase and Cassandra.

Tibco Jasper Soft studio was used for the ireport analysis using AWScloud

Teradata concepts were used for the early instance creation with the DBMS concepts.

Actively updated the upper management with daily updates on the progress of project that include the classification levels that were achieved on the data.

Environment: Java, Hadoop, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g, Teradata, Cassandra

Hadoop Developer/Hadoop Admin

AMERICAN EXPRESS, Phoenix, AZ Aug 2014 to Jan 2015

Description: American Express Company is a multinational financial services corporation best known for its credit card, charge card, and travelers cheque businesses. As many corporate companies enrolled to this card services, transaction data sets are basically huge. The log files generated by the system for these credit transactions by customers are maintained in Hadoop cluster for further analysis and operations

Responsibilities:

Responsible for architecting Hadoop clusters with CDH3

Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4

Developer in Big Data team, worked with Hadoop AWS cloud, and its ecosystem.

Installed and configured Hadoop, Map Reduce, and HDFS.

Used Hive QL to do analysis on the data and identify different correlations.

Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.

Installed and configured Pig and also written Pig Latin scripts.

Wrote Map Reduce job using Scala.

Great understanding of REST architecture style and its application to well performing web sites for global usage.

Developed and maintained Hive QL, Pig Latin Scripts, Scala and Map Reduce.

Worked on the RDBMS system using PL/SQL to create packages, procedures, functions, triggers as per the business requirements.

Involved in ETL, Data Integration and Migration.

Worked on Talend to run ETL jobs on the data in HDFS.

Imported data using Sqoop to load data from Oracle to HDFS on a regular basis.

Developing scripts and batch jobs to schedule various Hadoop Programs.

Have written Hive Queries for data analysis to meet the business requirements.

Creating Hive Tables and working on them using Hive QL.

Importing and exporting data into HDFS from Oracle Database, and vice versa using Sqoop.

Experienced in defining job flows.

Experience with NoSQL database HBase.

Wrote and modified stored procedures to load and modifying of data according to business rule changes.

Involved in creating Hive Tables, loading the data and writing Hive Queries that will run internally in a map reduce way.

Developed a custom file system plugin for Hadoop to access files on data platform.

The custom file system plugin allows Hadoop Map Reduce programs, HBase, Pig, and Hive to access files directly.

Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.

Organized and benchmarked Hadoop/HBase Clusters for internal use.

Environment: Hadoop, HDFS, HBase, Pig, Hive, MapReduce, Sqoop, Flume, ETL, REST, Java, Python, PL/SQL, Oracle 11g, Unix/Linux, CDH3, CDH4.

Hadoop Developer/Hadoop Admin

TCS, Chennai January 2013 to July 2014

TCS, with view of Digital re- imagination through analytics, has architected a group to build competency and knowledge in specific areas like Analytics Big data, & Information management (ABIM) and in turn develop solutions to customer with latest technologies available in these areas. As part of this program, for handpicked associates across TCS, resources were made available to build "Proof of Concepts" on the trending technologies to solve customer centric problems in areas of ABIM.

Responsibilities

Define scope and architecture requirements

Identification of Data Sources, Data Transformation languages & tools and Data Reporting & Visualization tools

Design preliminary approach Functional and Technology infrastructure

Review and finalize approach

Installation of required apache and related packages

High level code design documentation

Code development and implementation

Generate reports using open source tools like Pentaho

Documentation - approach to make POC scalable for enterprise project

Environment: Hive, PIG, MapReduce, Java, Shell scripting, SQL, Python, HDFS, CDH4.3, Sqoop, Flume, Oozie, Web Crawler, Cent OS, Fedora.

Java/J2EE Developer

Reliance Communications, Bangalore March 2011 to November 2012

Reliance Communications Ltd. is an Indian Internet access and Telecommunications Company headquartered in Navi Mumbai, India. It provides CDMA, GSM mobile services, fixed line broadband and voice services, DTH depending upon the areas of operation.

Responsibilities:

Effective role in the team by interacting with welfare business analyst/program specialists and transformed business requirements into System Requirements.

Developed analysis level documentation such as Use Case, Business Domain Model, Activity, Sequence and Class Diagrams.

Handling of design reviews and technical reviews with other project stakeholders.

Implemented services using Core Java.

Developed and deployed UI layer logics of sites using JSP.

Spring MVC is used for implementation of business model logic.

Worked with Struts MVC objects like action Servlet, controllers, and validators, web application context,

Handler Mapping, message resource bundles, and JNDI for look-up for J2EEcomponents.

Developed dynamic JSP pages with Struts.

Employed built-in/custom interceptors, and validators of Struts.

Developed the XML data object to generate the PDF documents, and reports.

Employed Hibernate, DAO, and JDBC for data retrieval and medications from database.

Messaging and interaction of web services is done using SOAP.

Developed JUnit test cases for Unit Test cases and as well as system, and user test scenarios

Environment: Struts, Hibernate, Spring MVC, SOAP, WSDL, Web Logic, Java, JDBC, Java Script, Servlets, JSP, JUnit, XML, UML, Eclipse, Windows.

Junior Java Developer

HDFC Bank, Mangalore May 2010 to March 2011

The Housing Development Finance Corporation Limited (HDFC) is one of the oldest and largest financial services firm in India. The aim of this project was to automate the process of Initial Public Offering by designing a GUI- based tool developed in Java Swing to capture the key set of events as per the laid down business specifications. This consisted of generation of IPO Deals, Client Letters and corresponding business functionalities to upload and download such documents within the system.

Responsibilities:

Developed JavaScript behavior code for user interaction.

Used HTML, JavaScript, and JSP and developedUI

Used JDBC and managed connectivity, for inserting/querying& data management including stored procedures and triggers.

Involved in the design and coding of the data capture templates, presentation and component templates.

Developed an API to write XML documents from database.

Used JavaScript and designed user-interface and checking validations.

Part of a team which is responsible for metadata maintenance and synchronization of data from database.

Environment: java script, JSP, JDBC, HTML, XML.

Education and Professional Development

Jawaharlal Nehru Technological University, Computer Science and Engineering

Member, Hadoop Users Group of Atlanta

References

Available upon Request

Contact this candidate