Sign in

Spark/Hadoop Developer

Piscataway Township, New Jersey, 08854, United States
October 25, 2016

Contact this candidate


*+ years of overall IT experience in a variety of industries, which includes hands on experience of 3+ years in Big Data technologies and designing and implementing Map Reduce

•Expertize with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.

•Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm

•Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.

•Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.

•Experience in data analysis using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.

•Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.

•Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.

•Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.

•Extensive Experience on importing and exporting data using Flume and Kafka.

•Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.

•Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.

•Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.

•Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.

•Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.

•Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.

•Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.

•Worked in large and small teams for systems requirement, design & development.

•Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications.

•Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and CVS.

•Experience of using build tools Ant and Maven.

•Preparation of Standard Code guidelines, analysis and testing documentations.


Big Data/Hadoop Technologies

HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie.

NO SQL Databases

HBase, Cassandra, MongoDB


C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting.

Java & J2EE Technologies

Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB.

Application Servers

Web Logic, Web Sphere, JBoss, Tomcat.

Cloud Computing Tools

Amazon AWS.


Microsoft SQL Server, MySQL, Oracle, DB2

Operating Systems

UNIX, Windows, LINUX.

Build Tools

Jenkins, Maven, ANT.

Business Intelligence Tools

Tableau, Splunk

Development Tools

Microsoft SQL Studio, Toad, Eclipse, NetBeans.

Development Methodologies

Agile/Scrum, Waterfall.


Citi Bank-NYC, NY Nov 14 to Present

Sr. Hadoop / Spark Developer

The primary objective of this project is to integrate Hadoop (Big Data) with the Relationship Care Application to leverage the raw/processed data that the big data platform owns. It will provide an enriched customer experience by delivering customer insights, profile information and customer journey. This will allow us to prioritize conversations to drive value generation and a 360 degree view of an account member. Responsible for configuring and implementation of entire Big Data stack of the project. The technical stack included Spark, Kafka, Sqoop and Oozie.


•Responsible for building scalable distributed data solutions using Hadoop.

•Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.

•Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.

•Configured deployed and maintained multi-node Dev and Test Kafka Clusters.

•Developed Spark scripts by using Scala shell commands as per the requirement.

•Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

•Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.

•Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

•Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.

•Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

•Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.

•Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.

•Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

•Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.

•Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.

•Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.

•Worked on Cluster of size 130 nodes.

•Worked extensively with Sqoop for importing metadata from Oracle.

•Good Experience working with Amazon AWS for setting up Hadoop cluster.

•Involved in creating Hive tables, and loading and analyzing data using hive queries

•Developed Hive queries to process the data and generate the data cubes for visualizing

•Implemented schema extraction for Parquet and Avro file Formats in Hive.

•Good experience with Talend open studio for designing ETL Jobs for Processing of data.

•Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Environment: Hadoop YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Cloudera, Oracle 10g, Linux.

Target-Minneapolis, MN Sept 13 to Oct 14

Hadoop Developer

Current CDI processes, including address cleansing, trade-area-capture, house holding, and transaction matching, are running on various cycles. Some processes are as frequently as daily and others are as infrequent as monthly. Therefore, we cannot consistently recognize customers and match transactions to customers on a more frequent and flexible schedule basis. This project would coordinate these processes to consistent & timely cycles, and improve our ability to match transactions to customers. The goal is to match 75% of sales to a known customer within 24 hours of the transaction


Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.

Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. I was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.

Worked on Installation and configuring of Zoo Keeper to co-ordinate and monitor the cluster resources.

Implemented test scripts to support test driven development and continuous integration.

Consumed the data from Kafka queue using spark.

Configured different topologies for spark cluster and deployed them on regular basis.

Load and transform large sets of structured, semi structured and unstructured data.

Involved in loading data from LINUX file system to HDFS

Importing and exporting data into HDFS and Hive using Sqoop

Implemented Partitioning, Dynamic Partitions, Buckets in Hive.

Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig

Used Reporting tools like Tableau to connect to Hive ODBC connector generate daily reports of data.

Experienced in running Hadoop streaming jobs to process terabytes of xml format data.

Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs

Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.

Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.

Actively involved in code review and bug fixing for improving the performance.

Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.

Created Linux Scripts to automate the daily ingestion of IVR data

Implementing spouts and bolts for Apache storm Processing.

Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.

Processed the raw data using Hive jobs and scheduling them in Crontab.

Helped the Analytics team with Aster queries using HCatlog.

Good Experience with apache storm using HortonWorks cluster.

Created HBase tables to store various data formats of incoming data from different portfolios.

Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.

Automated the History and Purge Process.

Developed the verification and control process for daily load.

Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs

Environment: Hadoop, HDFS, Pig, Hive, Sqoop, Kafka, Apache Spark, Storm, Solr, Shell Scripting, HBase, Kerberos, Zoo Keeper, Ambari, Horton Works, MySQL.

Prudential-Newark, NJ Sep 12 to Aug 13

Big Data/Hadoop Developer

Prudential provides a wide range of insurance policies and handles payroll deduction insurance coverage for policy holder, achieving it with use of Apache Hadoop.


Worked on the proof-of-concept for Apache Hadoop1.20.2 framework initiation

Installed and configured Hadoop clusters and eco-system

Developed automated scripts to install Hadoop clusters

Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validated

Performed load and retrieve unstructured data (CLOB, BLOB etc.)

Developed Hive jobs to transfer 8 years of bulk data from DB2 to HDFS layer

Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts

Job automation framework to support & operationalize data loads

Automated the DDL creation process in hive by mapping the DB2 data types

Monitored Hadoop cluster job performance and capacity planning.

Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Had experience in Hadoop framework, HDFS, MapReduce processing implementation.

Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters

Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query’s, testing, debugging, Peer code review, troubleshooting and maintain status report.

Used AVRO, Parquet file formats for serialization of data.

Good experience with informatica power center.

Developed several test cases using MR Unit for testing Map Reduce Applications

Responsible for troubleshooting and resolving the performance issues of Hadoop cluster.

Used Bzip2 compression technique to compress the files before loading it to Hive

Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.

Prepare daily and weekly project status report and share it with the client.

Environment: Hadoop, Map Reduce, Flume, Sqoop, Hive, Pig, Restful Service, Linux, Core Java, HBase, Informatica, Avro, Cloudera, MR Unit, MS-SQL Server, DB2

Intergraph-Hyd, India Mar 10 to Aug 12

Sr.Java/J2EE Developer

Risk Development workflow is a business process used to measure and generate reports for all kind of risk involved in risk derivative process. In Risk development workflow I was involved on module to generate out of box reports for both web based portal as well as for report center using tracked data in excel format from Process Center and Performance Data Warehouse.


•Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.

•Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.

•Applied OOAD principle for the analysis and design of the system.

•Implemented XML Schema as part of XQuery query language

•Applied J2EE design patterns like Singleton, Business Delegate, Service Locator, Data Transfer Object (DTO), Data Access Objects (DAO) and Adapter during the development of components.

•Used RAD for the Development, Testing and Debugging of the application.

•Used Websphere Application Server to deploy the build.

•Developed front-end screens using Struts, JSP, HTML, AJAX, JQuery, Java script, JSON and CSS.

•Used J2EE for the development of business layer services.

•Developed Struts Action Forms, Action classes and performed action mapping using Struts.

•Performed data validation in Struts Form beans and Action Classes.

•Developed POJO based programming model using spring framework.

•Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.

•Used Web Services to connect to mainframe for the validation of the data.

•SOAP has been used as a protocol to send request and response in the form of XML messages.

•JDBC framework has been used to connect the application with the Database.

•Used Eclipse for the Development, Testing and Debugging of the application.

•Log4j framework has been used for logging debug, info & error data.

•Used Hibernate framework for Entity Relational Mapping.

•Used Oracle 10g database for data persistence and SQL Developer was used as a database client.

•Extensively worked on Windows and UNIX operating systems.

•Used SecureCRT to transfer file from local system to UNIX system.

•Performed Test Driven Development (TDD) using JUnit.

•Used Ant script for build automation.

•PVCS version control system has been used to check-in and checkout the developed artifacts. The version control system has been integrated with Eclipse IDE.

•Used Rational Clear quest for defect logging and issue tracking.

Environment: Windows XP, Unix, RAD7.0, Core Java, J2EE, Struts, Spring, Hibernate, Web Services, Design Patterns, Websphere, Ant, (Servlet, JSP), HTML, AJAX, JavaScript, CSS, jQuery, JSON,SOAP, WSDL, XML, Eclipse, Oracle 10g, Win SCP, Log4J, JUnit.

Avon Technologies - Hyd, India Sep 07 to Feb 10

Java/J2EE Developer

E-Check Payment System involves the development of ‘software modules’ for the generation and verification of ‘Electronic Check.’ The Payee System is used to verify payer's signatures. The Payee Bank Server processes and verifies the ‘deposited endorsed E-Checks’ and ‘Clears and Settles’ the transaction.


•Designed and developed the application using agile methodology.

•Implementation of new module development, new change requirement, fixes the code. Defect fixing for defects identified in pre-production environments and production environment.

•Wrote technical design document with class, sequence, and activity diagrams in each use case.

•Created Wiki pages using Confluence Documentation.

•Developed various reusable helper and utility classes which were used across all modules of the application.

•Involved in developing XML compilers using XQuery.

•Developed the Application using Spring MVC Framework by implementing Controller, Service classes.

•Involved in writing Spring Configuration XML file that contains declarations and other dependent objects declaration.

•Used Hibernate for persistence framework, involved in creating DAO's and used Hibernate for ORM mapping.

•Written Java classes to test UI and Web services through JUnit.

•Performed functional and integration testing, extensively involved in release/deployment related critical activities. Responsible for designing Rich user Interface Applications using JSP, JSP Tag libraries, Spring Tag libraries, JavaScript, CSS, HTML.

•Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.

•Used Soap UI for testing the Web Services.

•Use of MAVEN for dependency management and structure of the project

•Create the deployment document on various environments such as Test, QC, and UAT.

•Involved in system wide enhancements supporting the entire system and fixing reported bugs.

•Explored Spring MVC, Spring IOC, Spring AOP, and Hibernate in creating the POC.

•Done data manipulation on front end using JavaScript and JSON.

Environment: Java, J2EE, JSP, Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, JUnit, Log4J, AJAX, Web services, JNDI, JMS, HTML, XML, XSD, XML Schema


Bachelor of Technology in Computer Science, Anna University, India- (2007)

Contact this candidate