Hadoop Developer

Secaucus, New Jersey, United States
September 12, 2018

Hadoop and Spark Developer

Phone Number: 973-***-****


My 7+ years of IT industry experience includes 4 years of hands-on work with Big-Data Technologies.

As a Hadoop and Spark Developer I got the opportunity to work with the Financial, Retail and Health-Care Sectors

which involve huge data and thereby I was extensively dealing with Data Ingestion, Storage, Querying,

Data Processing and Data Analysis on large sets.

• Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop Cluster.

• Experience on Big Data Ecosystem using Hadoop framework and related technologies such as HDFS, HBase, Map Reduce, Hive, Pig, Flume, Oozie, Kafka, Sqoop, Zookeeper, YARN, Spark (PySpark & Spark-shell), Cassandra, NiFi

• Experience in Python, Scala, Java, SQL and Shell programming

• Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture. Having good knowledge in Spark and Kafka.

• Hands on experience with Spark Core, Spark SQL, Spark Streaming using Scala and Python.

• Excellent understanding of Hadoop architecture and its components such as HDFS, Name Node, Data Node and MapReduce programming paradigm.

• Worked on Performance Tuning of Hadoop jobs by applying techniques such as Map Side Joins, Partitioning and Bucketing. Having good knowledge in NoSQL databases like MongoDB, Cassandra.

• Worked with Multiple File Formats like Avro, Parquet, CSV, JSON, Sequential, ORC etc.

• Experience utilizing Java tools in Business, Web, and Client-Server environments including

Java, Jdbc, Servlets, Jsp, Struts Framework, Jasper Reports and SQL.

• Fluid understanding of multiple programming languages, including C, C++, JavaScript, HTML, and XML.

• Experienced in using Version Control Tools like Subversion, Git.


DATA INGESTION: Sqoop, Kafka, Flume, HDFS Commands

DATA PROCESSING: Spark, YARN, Hive, PIG, Map Reduce



LANGUAGES: C, C++, Python, Scala, Java, Shell, SQL


ETL: Talend, DataStage, ODI

MONITORING: Ambari, Cloudera Manager

DISTRIBUTIONS: Cloudera, Hortonworks


BUILD TOOLS: ANT, Maven, Gradle, SBT

SDLC - Methodologies: Agile Methodology, Waterfall Model

CLOUD: AWS: EMR, EC2, S3, DynamoDB


Bachelor of Engineering - CBIT, Hyderabad.


Role: Senior Hadoop Developer Feb 2017 to Till Date

Client: BCBS, Cranbury, NJ

BCBS uses various Sectors data to manage and increase their efficiency. Constant processing of data is required to update analytics.


• Ingested incremental Batch Data from MySQL database and Teradata to HDFS using Sqoop at scheduled intervals

• Involved in ingesting real time data to HDFS using Kafka and implemented the Oozie job for daily imports.

• Worked on Amazon Web Services (AWS) using Elastic map reduce (EMR) for data processing with S3 for storage.

• Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.

• Involved in converting the files in HDFS into RDD's from multiple data formats and performing Data Cleansing using RRD Operations.

• Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

• Worked with various HDFS file formats like Avro, ORC, Sequence File and various compression formats like Snappy, gzip etc.

• Integrated Oozie with the rest of Hadoop stack supporting several types of jobs as well as the system specific jobs (such as Java programs and shell scripts).

Environment: HDFS, Spark, Hive, Sqoop, Kafka, AWS EMR, AWS S3, Oozie, Spark Core, SPARK SQL, Maven, Scala, SQL, Linux, YARN, IntelliJ, Agile Methodology

Role: Spark & Hadoop Developer Aug 2015 to Jan 2017

Client: Optum, New York, NY

Migrating mainframe IMS, Information Management Systems & traditional ETL data warehousing to Hadoop data lake and processing systems.


• Responsible for building scalable distributed data solutions using Hadoop and Spark.

• Developing data ingestion pipelines using Sqoop and Kafka to ingest the database tables and streaming data into HDFS for analysis.

• Developing spark streaming application to receive the data streams from Kafka and process the continuous data streams and trigger actions based on fixed events.

• Teamed up with architects to design Spark streaming model for the existing Map Reduce model and migrating the Map Reduce models to Spark using Scala.

• Using Hive to analyze the partitioned and bucketed data and compute various metrics for creating dashboards in Tableau.

Environment: Hadoop - HDFS, Spark - SQL & streaming, Kafka, Sqoop, Hive, Core Java, Scala, Unix Shell Scripting, Oozie workflows, Ambari - Hortonworks, Informatica Power center.

Role: Big Data Engineer Aug 2014 to Jul 2015

Client: Geodis, Grapevine, TX

The use-case for this project is to build a Hadoop Data Lake by migrating data from IMS Mainframe systems. Data Pipelines were developed and tested on Cloudera Distribution System.


• Implemented Kafka consumers for HDFS and Spark Streaming

• Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines from heterogenous data Sources

• Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.

• Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming.

• Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.

• Created data pipeline for different events of ingestion, aggregation and load consumer response data from AWS S3 bucket into Hive external tables and generated views to serve as feed for tableau dashboards.

• Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and XML.

• Used Apache NiFi to automate data movement between different Hadoop components and perform conversion of raw XML data into JSON, AVRO.

Environment: Hadoop, HDFS, AWS, Scala, Kafka, MapReduce, YARN, Spark, Pig, Hive, Scala, Java, NiFi, HBase, IMS Mainframe, Maven.

Role: Java/ J2EE Developer Oct 2012 to July 2014

Client: Infosys, Mysore

This project focuses on building a server to host the templates which will be consumed by a client’s link application.


• Developed the application using Spring Framework that leverages Model View Controller (MVC) architecture, Spring security and Java API.

• Implemented design patterns such Singleton, Factory pattern and MVC

• Deployed the applications on IBM WebSphere Application Server.

• Worked on Java script, CSS Style Sheet, Richfaces, jQuery.

• Worked one-on-one with client to develop layout, color scheme for his website and implemented it into a final interface design with the HTML5/CSS3 & JavaScript using Dreamweaver.

• Used advanced level of HTML5, JavaScript, jQuery, CSS3 and pure CSS layouts (table less layout)

• Wrote SQL queries to extract data from the Oracle & MySQL databases.

• Involved in Junit Testing for all test case scenarios

• Used CVS for version control across common source code used by developers.

Environment: Java, Oracle 11g Express, CVS, Struts, Spring 3.0, HTML, CSS, Java Script, Apache Tomcat, Eclipse IDE, REST, Maven, Junit

Role: Junior Java Developer May 2011 to Aug 2012

Client: Active Health Management, Hyderabad

Active Health management provides population health management solutions that are designed to manage the health care costs. We utilize leading-edge analytics and evidence based clinical standards to offer integrated population health program.


• Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and AJAX.

• Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.

• Developed EJB component to implement business logic using Session and Message Bean.

• Excellent working experience with Oracle10g including storage and retrieving data using Hibernate.

• Building and Deployed the application in WebLogic Application Server.

• Developed and executed Unit Test cases using JUnit framework by supporting TDD.

• Provided extensive pre-delivery support using Bug Fixing and Code Reviews.

Environment: J2EE, JDK 1.5, Spring 2.5, Struts 1.2, JSP, Servlets, EJB 3.0, Hibernate 3.0, Oracle 10g, PL/SQL, CSS, Ajax, HTML, java script, Log4j, JUnit, SOAP, Webservices.

