Sign in

senior hadoop developer

Malvern, Pennsylvania, 19355, United States
March 01, 2018

Contact this candidate



Around * years of Professional Experience in various IT sectors such as health-care, Finance, Insurance, and retail, which includes 4 years of experience with Big Data and Hadoop Eco Systems.

Extensive experience of development using Hadoop ecosystem components like Spark, Hive, Kafka, Impala, HBase, MapReduce, Pig, Sqoop, Yarn and Oozie.

Strong programming experience using Java, Scala, Python and SQL.

Strong fundamental understanding of Distributed Systems Architecture and parallel processing frameworks.

Strong experience designing and implementing end-to-end data pipelines running on terabytes of data.

Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.

Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.

Experience in using D-Streams in spark streaming, accumulators, Broadcast variables, various levels of caching and optimization techniques in spark.

Strong experience working with data ingestion tools Sqoop and Kafka.

Good knowledge and development experience with using MapReduce framework.

Hands on experience in writing AD-hoc Queries for moving data from HDFS to Hive and analyzing data using Hive QL.

Proficient in creating Hive DDL's, writing Hive custom UDF’s.

Knowledge in job workflow managing and monitoring tools like Oozie and Rundeck.

Experience in designing, implementing and managing secure authentication mechanism to Hadoop cluster with Kerberos.

Experience in working with NoSQL database like HBase, Cassandra and Mongo DB.

Experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.

Good knowledge in creating ETL jobs through Talend to load huge volumes of data into Hadoop Ecosystem and relational databases.

Experience working with Cloudera, Hortonworks and Amazon AWS EMR distributions.

Good experience in developing applications using Java, J2EE, JSP, MVC, EJB, JMS, JSF, Hibernate, AJAX and web based development tools.

Strong experience in RDBMS technologies like MySQL, Oracle and Teradata.

Strong expertise in creating Shell-Scripts, Regular Expressions and Cron Job Automation.

Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.

Experience with various version control systems such as CVS, TFS, SVN.

Worked with geographically distributed and culturally diverse team, including roles that involve interaction with clients and team members.


Big Data Eco System

Hadoop, HDFS, MapReduce, Hive, Pig, Impala, HBase, Sqoop, NoSQL (HBase), Spark, Spark Streaming, Zookeeper, Oozie, Kafka, Flume, Hue, Cloudera Manager, Amazon AWS, Hortonworks

Java/J2EE & Web Technologies



C, C++, Core Java, Shell Scripting, PL/SQL, Python, Pig Latin, Scala

Scripting Languages

JavaScript and UNIX Shell Scripting, Python

Operating system

Windows, MacOS, Linux and Unix


UML, Rational Rose, Microsoft Visio, E-R Modelling


Oracle 11g/10g/9i, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, RDBMS, MongoDB, Cassandra, HBase

IDE and Build Tools

Eclipse, NetBeans, Microsoft Visual Studio, Ant, Maven, JIRA, Confluence

Version Control




Web Services


Web Servers

Web Logic, Web Sphere, Apache Tomcat, Jetty

Professional Summary:

Client : Vanguard June 2016 - Present

Location : Malvern, PA

Role : Sr. Hadoop Developer

Project Description: One of the world's largest investment management companies, headquartered in Valley Forge, PA; caters to individual investors & institutions; areas of activity: mutual funds, ETFs, annuity products, brokerage, retirement investing, and advice. Project deals with the delivering insights from the use of clickstream data combined with enterprise data for comprehensive view of customers and their subscription and browsing behaviors.


Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.

Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.

Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.

Worked on troubleshooting spark application to make them more error tolerant.

Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.

Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.

Wrote Spark-Streaming applications to consume the data from KAFKA topics and write the processed streams to HBase.

Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.

Worked extensively with Sqoop for importing data from Oracle.

Experience working for EMR cluster in AWS cloud and working with S3.

Involved in creating Hive tables, loading and analyzing data using hive scripts.

Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Good experience with continuous Integration of application using Bamboo.

Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.

Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Designed, documented operational problems by following standards and procedures using JIRA.

Environment: Hadoop 2.x, Spark, Scala, Hive, Sqoop, Oozie, Kafka, Amazon EMR, ZooKeeper, Impala, YARN, JIRA, Kerberos, Amazon AWS, Shell Scripting, SBT, GITHUB, Maven.

Client : Thomson Reuters Apr 2014 - May 2016

Location : Dallas, TX

Role : Hadoop Developer

Project Description: NY-based global company offering intelligent, information-based solutions, software tools, & applications for professionals in finance, pharma, marketing, engineering sectors etc; formed by the 2008 merger of Thomson Corporation & Reuters Group PLC. This Project is about rehousing their current existing data warehousing setup into Hadoop based platform.


Involved in requirement analysis, design, coding and implementation phases of the project.

Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.

Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.

Written new spark jobs in Scala to analyze the data of the customers and sales history.

Used Kafka to get data from many streaming sources into HDFS.

Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.

Created Hive external tables to perform ETL on data that is generated on daily basics.

Written HBase bulk load jobs to load processed data to Hbase tables by converting to HFiles.

Performed validation on the data ingested to filter and cleanse the data in Hive.

Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.

Loaded the data into hive tables from spark and used parquet columnar format.

Developed oozie workflows to automate and product ionize the data pipelines.

Developed Sqoop import Scripts for importing reference data from Netezza.

Environment: HDFS, Hadoop, Pig, Hive, HBase, Sqoop, Kafka, Teradata, Map Reduce, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, Amazon Web Services, Maven, Agile Methodology, JIRA, Linux.

Client : Vera Bradley Aug 2012 - March 2014

Location : Fort Wayne, IN

Role : Hadoop Developer

Project Description: Company selling handbags, totes, stationery, baby bags, travel items, laptop backpack, stationery, neck ties, cuff links, & eyewear through Vera Bradley stores, 3,000 independent retailers across the world, and online; based in Fort Wayne, Indiana. Project goal is to create a centralized data warehouse in Hadoop and integrate with various data sources. Build a framework to extract, transform and load the data into warehouse. It automatically propagates data at several stages such as Preparation, Staging and Journaling. This centralized data is used for Enterprise Business Intelligence.


Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.

Developed custom MapReduce programs and custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.

Wrote MapReduce jobs using Java API and Pig Latin.

Extracted the data from the flat files and other RDBMS databases into staging area and ingested to Hadoop.

Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.

Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.

Developed numerable Pig batch programs for both implementation, and optimization needs.

Used HBase in accordance with Hive/Pig as per the requirement.

Created different Pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.

Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.

Integrated the hive warehouse with HBase for information sharing among teams.

Developed complex Hive UDFs to work with sequence files.

Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of MapReduce outputs.

Created dashboards in Tableau to create meaningful metrics for decision making.

Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.

Monitored System health and logs and respond accordingly to any warning or failure conditions.

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Worked with Avro Data Serialization system to work w0069th JSON data formats.

Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.

End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets.

Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms

Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by Directed Acyclic graph (DAG) of actions with control flows.

Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: HDFS, HBase, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Oozie, Avro, HDP Distribution, Eclipse, Log4j, JUnit, Linux.

Client : Blue Fountain Media Oct 2010 - Jul 2012

Location : New York, NY

Role : Java Developer

Project Description: New York City-based web design firm located on Madison Avenue; services: content management, web development, online marketing, SEO & marketing, logo design, flash demos & animation, print design, copywriting etc; has offices in the USA & Europe. This project's goal was to make a centralized site for all users - one that offers access to a variety of products and contacts in the easiest, most functional way possible.


Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).

Used Rational Rose for developing Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.

Used spring for cross cutting concerns and IOC for dependency injection.

Implemented application level persistence using Hibernate and spring.

Consuming and exposing various Web services using JAX-RS to different systems like NPI Validation, Address validation.

Implemented the core java programming for the inventory cost.

Developed complex Web Services and tailored JAX-RSAPI to suit the requirement.

Development of UI models using HTML, JSP, JavaScript, AJAX, Web Link and CSS.

Wrote custom JavaScript and CSS to maintain user friendly look and feel.

Wrote jQuery function while implementing various UI Screens across the whole web application.

Wrote application level code to perform client side validation using jQuery, JavaScript.

Primarily focused on the spring components such as Spring MVC, Dispatcher Servlets, Controllers, Model and View Objects, View Resolver.

Wrote complex named SQL queries using Hibernate.

Generated POJO classes with JPA Annotations using Reverse Engineering.

Developed the application using IntelliJ IDE.

Used LOG4J, JUnit for debugging, testing and maintaining the system state.

Used SOAP-UI for testing the Web-Services.

Used SVN to maintain source and version management.

Using JIRA to manage the issues/project work flow.

Implemented SOLID Design Principles throughout the development of Project.

Unit tested all the classes using JUNIT at various class level and methods level.

Environment: Java/Java EE5, JSP2.1, Spring 2.5, Spring MVC, Hibernate3.0, Web services, JAX-RS, Rational Rose, WADL, SoapUI, HTML, CSS, JavaScript, AJAX, JSON, jQuery, Maven, JMS, Maven, log4j, Jenkins, JPA, Oracle, MY SQL, SQL Developer, JIRA, SVN, PL/SQL, Weblogic 10.3, IntelliJ, UNIX.

Client : Sanofi India June 2009 - Sep 2010

Location : Mumbai, Maharashtra

Role : Java Developer

Project Description: Global pharma company that operates through two entities: Aventis Pharma Ltd and Sanofi-Synthelabo (India) Ltd; Sanofi-Aventis owns 50.1% of Aventis Pharma Ltd through its 100% subsidiary Hoechst GmbH.


Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.

Understood open source frameworks along with debugging by Eclipse tool.

Utilized Spring Framework including encouraging application architectures based on the MVC (J2EE Design Patterns) design paradigm.

Implemented RESTful API Web Services.

Performed server side programming using AJAX, JQuery.

Configured the hibernate files for the libraries of the project.

Implemented bootstrap in designing the responsive design of the web page.

Created wireframes in designing the structure of the project.

Involved in system design and development in core java using Collections, multithreading and exception handling.

Designed user interface using HTML, CSS, Servlet, JSP.

Implemented templates for different rules for accessing different applications.

Performed Client Side validations using Java script.

Developed Web Pages using HTML, DHTML and CSS.

Actively involved in the integration of different use cases, code reviews and refactoring.

Used Log4J to maintain the user defined logs on system.

Created unit test cases using Junit for the end-end testing.

Actively worked with the client to collect requirements for the project.

Involved in the implementation of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.

Environment: Spring, Core Java, HTML, DHTML, Log4J, UNIX OS, CSS, JavaScript, AJAX, JQuery, Eclipse IDE, RESTful Web Service, Maven, UML, Java Mail API, Hibernate, MVC, JSP, Junit, wireframes.

Contact this candidate