ANUDEEP
*********@*****.***
Comprehensive experience of 6 years, with over 4 years in Hadoop and Scala(spark) development and Administration experience along with 2+ of experience in Java/J2EE enterprise application design, development and maintenance.
Extensive experience implementing Big Data solutions using various distributions of Hadoop and its ecosystem tools.
Hands-on experience in installing, configuring and monitoring HDFS clusters (on premise & cloud AWS).
In depth understanding of MapReduce programs to scrub, sort, filter, join and query data
Planning, deployment, and tuning of SQL (SQL Server, MySQL) and NoSQL (elasticsearch, Redis, memcached) databases.
Implemented innovative solutions using various Hadoop ecosystem tools like Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, ElasticSearch, Zookeeper, Couchbase, Storm, Solr, Cassandra and Spark.
Experience developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDF s) for data specific processing.
Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop & Flume.
Hands-on experience developing workflows that execute MapReduce, Sqoop, Flume, Hive and Pig scripts using oozie.
Hands-on experience on R language using Shiny web application framework.
Well-versed database development knowledge using SQL data types, Joins, Views, Transactions, Large Objects and Performance tuning.
Good knowledge of Data warehousing concepts and ETL and Teradata.
Experience writing Shell scripts in Linux OS and integrating them with other solutions.
Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC.
Fluent with the core Java concepts like I/O. Multi-threading, Exceptions, RegEx. Collections, Data-structures and Serialization.
Excellent problem-solving analytical, communication, presentation and interpersonal skills that help me be a core member of any team.
Experience mentoring and working with offshore and distributed teams.
Areas of Expertise
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Chukwa, Pentaho Kettle and Talend
Programming Languages: Java, C/C++, eVB, Assembly Language (8085/8086)
Scripting Languages: JSP & Servlets, PHP, JavaScript, XML, HTML, Python and Bash
Databases: NoSQL, Oracle, SQLtalend
UNIX Tools: Apache, Yum, RPM
Tools: Eclipse, JDeveloper, JProbe, CVS, Ant, MS Visual Studio
Platforms: Windows (2000/XP), Unix, Linux, Solaris
Application Servers: Apache Tomcat 5.x 6.0, Jboss 4.0
Testing Tools: NetBeans, Eclipse, WSAD, RAD
Methodologies: Agile, UML, Design Patterns
Work Experience
Hadoop Developer
Dow Jones & Company, Atlanta, GA February 2015 to April 2016
The company was best known for the publication of the Dow Jones Industrial Average and related market statistics, Dow Jones Newswire and a number of financial publications. In 2010 the Dow Jones Indexes subsidiary was sold to the CME Group and the company focused on financial news publications, including its flagship publication The Wall Street Journal and providing financial news and information tools to financial companies.
Responsibilities
Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years' worth of claim data to detect and separate fraudulent claims.
Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
Played a key-role is setting up a 40 node Hadoop cluster utilizing Apache Spark by working closely with the Hadoop Administration team.
Worked with the advanced analytics team to design fraud detection algorithms and then developed
MapReduce programs to efficiently run the algorithm on the huge datasets.
Developed Scala programs to perform data scrubbing for unstructured data.
Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
Helped in troubleshooting Scala problems while working with Micro Strategy to produce illustrative reports and dashboards along with ad-hoc analysis.
Used Flume to collect the logs data with error messages across the cluster.
Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
Played a key role in installation and configuration of the various Hadoop ecosystem tools such as Solr, Kafka, Pig, HBase and Cassandra.
Tibco Jasper Soft studio was used for the ireport analysis using AWScloud
Teradata concepts were used for the early instance creation with the DBMS concepts.
Actively updated the upper management with daily updates on the progress of project that include the classification levels that were achieved on the data.
Environment: Java, Hadoop, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g, Teradata, Cassandra
Hadoop Developer/Hadoop Admin
AMERICAN EXPRESS, Phoenix, AZ Aug 2014 to Jan 2015
Description: American Express Company is a multinational financial services corporation best known for its credit card, charge card, and travelers cheque businesses. As many corporate companies enrolled to this card services, transaction data sets are basically huge. The log files generated by the system for these credit transactions by customers are maintained in Hadoop cluster for further analysis and operations
Responsibilities:
Responsible for architecting Hadoop clusters with CDH3
Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4
Developer in Big Data team, worked with Hadoop AWS cloud, and its ecosystem.
Installed and configured Hadoop, Map Reduce, and HDFS.
Used Hive QL to do analysis on the data and identify different correlations.
Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
Installed and configured Pig and also written Pig Latin scripts.
Wrote Map Reduce job using Scala.
Great understanding of REST architecture style and its application to well performing web sites for global usage.
Developed and maintained Hive QL, Pig Latin Scripts, Scala and Map Reduce.
Worked on the RDBMS system using PL/SQL to create packages, procedures, functions, triggers as per the business requirements.
Involved in ETL, Data Integration and Migration.
Worked on Talend to run ETL jobs on the data in HDFS.
Imported data using Sqoop to load data from Oracle to HDFS on a regular basis.
Developing scripts and batch jobs to schedule various Hadoop Programs.
Have written Hive Queries for data analysis to meet the business requirements.
Creating Hive Tables and working on them using Hive QL.
Importing and exporting data into HDFS from Oracle Database, and vice versa using Sqoop.
Experienced in defining job flows.
Experience with NoSQL database HBase.
Wrote and modified stored procedures to load and modifying of data according to business rule changes.
Involved in creating Hive Tables, loading the data and writing Hive Queries that will run internally in a map reduce way.
Developed a custom file system plugin for Hadoop to access files on data platform.
The custom file system plugin allows Hadoop Map Reduce programs, HBase, Pig, and Hive to access files directly.
Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
Organized and benchmarked Hadoop/HBase Clusters for internal use.
Environment: Hadoop, HDFS, HBase, Pig, Hive, MapReduce, Sqoop, Flume, ETL, REST, Java, Python, PL/SQL, Oracle 11g, Unix/Linux, CDH3, CDH4.
Hadoop Developer/Hadoop Admin
TCS, Chennai January 2013 to July 2014
TCS, with view of Digital re- imagination through analytics, has architected a group to build competency and knowledge in specific areas like Analytics Big data, & Information management (ABIM) and in turn develop solutions to customer with latest technologies available in these areas. As part of this program, for handpicked associates across TCS, resources were made available to build "Proof of Concepts" on the trending technologies to solve customer centric problems in areas of ABIM.
Responsibilities
Define scope and architecture requirements
Identification of Data Sources, Data Transformation languages & tools and Data Reporting & Visualization tools
Design preliminary approach Functional and Technology infrastructure
Review and finalize approach
Installation of required apache and related packages
High level code design documentation
Code development and implementation
Generate reports using open source tools like Pentaho
Documentation - approach to make POC scalable for enterprise project
Environment: Hive, PIG, MapReduce, Java, Shell scripting, SQL, Python, HDFS, CDH4.3, Sqoop, Flume, Oozie, Web Crawler, Cent OS, Fedora.
Java/J2EE Developer
Reliance Communications, Bangalore March 2011 to November 2012
Reliance Communications Ltd. is an Indian Internet access and Telecommunications Company headquartered in Navi Mumbai, India. It provides CDMA, GSM mobile services, fixed line broadband and voice services, DTH depending upon the areas of operation.
Responsibilities:
Effective role in the team by interacting with welfare business analyst/program specialists and transformed business requirements into System Requirements.
Developed analysis level documentation such as Use Case, Business Domain Model, Activity, Sequence and Class Diagrams.
Handling of design reviews and technical reviews with other project stakeholders.
Implemented services using Core Java.
Developed and deployed UI layer logics of sites using JSP.
Spring MVC is used for implementation of business model logic.
Worked with Struts MVC objects like action Servlet, controllers, and validators, web application context,
Handler Mapping, message resource bundles, and JNDI for look-up for J2EEcomponents.
Developed dynamic JSP pages with Struts.
Employed built-in/custom interceptors, and validators of Struts.
Developed the XML data object to generate the PDF documents, and reports.
Employed Hibernate, DAO, and JDBC for data retrieval and medications from database.
Messaging and interaction of web services is done using SOAP.
Developed JUnit test cases for Unit Test cases and as well as system, and user test scenarios
Environment: Struts, Hibernate, Spring MVC, SOAP, WSDL, Web Logic, Java, JDBC, Java Script, Servlets, JSP, JUnit, XML, UML, Eclipse, Windows.
Junior Java Developer
HDFC Bank, Mangalore May 2010 to March 2011
The Housing Development Finance Corporation Limited (HDFC) is one of the oldest and largest financial services firm in India. The aim of this project was to automate the process of Initial Public Offering by designing a GUI- based tool developed in Java Swing to capture the key set of events as per the laid down business specifications. This consisted of generation of IPO Deals, Client Letters and corresponding business functionalities to upload and download such documents within the system.
Responsibilities:
Developed JavaScript behavior code for user interaction.
Used HTML, JavaScript, and JSP and developedUI
Used JDBC and managed connectivity, for inserting/querying& data management including stored procedures and triggers.
Involved in the design and coding of the data capture templates, presentation and component templates.
Developed an API to write XML documents from database.
Used JavaScript and designed user-interface and checking validations.
Part of a team which is responsible for metadata maintenance and synchronization of data from database.
Environment: java script, JSP, JDBC, HTML, XML.
Education and Professional Development
Jawaharlal Nehru Technological University, Computer Science and Engineering
Member, Hadoop Users Group of Atlanta
References
Available upon Request