Sign in

Hadoop Developer

Sunnyvale, California, United States
January 22, 2018

Contact this candidate



: 219-***-**** :


Over all 5+ years of experience in IT which includes 2+ years’ experience using Apache Hadoop and experience using spark for analyzing the Big Data as per the requirement.

In depth knowledge of understanding the Hadoop architecture and its components such as HDFS, Job tracker, Task tracker, Name Node, Data Node, Resource Manager, Node Manager, Map Reduce programs and YARN paradigm.

Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.

Good Exposure on Apache Hadoop Map Reduce programming, Hive, PIG scripting and HDFS.

Hands on experience in installing, configuring, monitoring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Horton works, Flume, Kafka, Oozie, Elastic search, Apache Spark, Impala, R, QlikView.

Hands on experience in testing and implementation phase of all the big data Technologies.

Strong experience in writing Map Reduce programs for Data Analysis. Hands on experience in writing custom partitions for Map Reduce.

Experience in working with Cloudera Hadoop distribution.

Experience with distributed systems, large-scale non-relational data stores, RDBMS, NoSQL map-reduce systems, data modeling, database performance, and multi-terabyte data warehouses.

Experience in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).

Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.

Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.

Hands on experience in application development and database management using the technologies JAVA, RDBMS, Linux/Unix shell scripting and Linux internals.

Good Knowledge on real time data feeding platform-KAFKA.

Experience with Apache Nifi in Horton works dataflow.

Analyze and develop Transformation logic for handling large sets of structured, semi structured and unstructured data using Hive.

Experience with Kerberos.

Good understanding of HDFS Designs, Daemons, HDFS high availability (HA).

Experience in using and understanding of Pig, Hive and HBase and Hive Built-in functions.

Excellent understanding and knowledge of NOSQL databases like MongoDB, Hbase and Cassandra.

Having Big Data related technology experience in Storage, Querying, Processing and analysis of data.

Good experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Map Reduce, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, and Flume, Kafka, Apache Spark.

Good knowledge in creating Custom Serdes in Hive.

Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL HiveQL and Used UDFs from Piggybank UDF Repository.

Experience in understanding and managing Log Files, experience in managing the Hadoop infrastructure with Cloudera Manager.

Experience with Spark Streaming.

Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables.

Capable of building hive(hql), pig and map-reduce script and to adapt and learn new tools, techniques, and approaches.

experience in Cassandra and Spark (YARN).

Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.

Familiar with akka and play frameworks.

Evaluation of ETL and OLAP tools and recommend the most suitable solutions based on business needs.

Having knowledge of machine learning.

Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships.

Familiar in Core Java with strong understanding and working knowledge in Object Oriented Concepts like Collections, Multithreading, Data Structures, Algorithms, Exception Handling and Polymorphism as well Data Mining which includes Eclipse, Weka, R, Net beans.

Good knowledge and experience in Core Java, JSP, Servlets, Multi-Threading, JDBC, HTML.

Experience in working in 24X7 Support and used to meet deadlines, adaptable to ever changing priorities.



Programming: C, C++, Java.

Data Base: SQL, MySQL, HBase, MongoDB, Cassandra.

Operating Systems: Windows Different distributions of Linux/Unix/Ubuntu.

Script: JavaScript, Shell Scripting.

Web Technology: HTML, CSS, JSP, Web Services, XML, JavaScript.


Eclipse, Net Beans, MS Office, Microsoft Visual Studio


Worked in most of the phases of Agile and Waterfall methodologies.

Web/Application servers

Apache Tomcat, Web logic.

Domain Experience

Banking and financial services, Manufacturing and Retail.

Cluster Monitoring Tools

Ambari, Cloudera manager.

Big Data

Hive, Map Reduce, Hdfs, Sqoop, R, Flume, Spark 2.1.1, Apache Kafka, HBase, Pig, Elastic search, AWS, Oozie, Zookeeper, Foglight, Kerberos, Lamba Architecture, Apache hue, Apache Tez, YARN, Talend, Storm, Impala, Tableau and Qlikview.


CLIENT: Apple - Sunnyvale, CA June 2017 - Present

Role: Spark Developer/Hadoop Developer

Environment: Spark 2.1.1, Hadoop, Map Reduce, Yarn, HDFS, Hive, Pig, HBase, SQL, Cloudera Manager, Sqoop, Zookeeper, Kerberos, Oozie Java, Eclipse, weka, R, Flume, Foglight, Apache Kafka, Kerberos, Apache Talend, Java, Linux, Unix, Radar.


Provided a solution using HIVE, SQOOP (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.

Developed Spark scripts by using Java as per the requirement.

Developing Spark programs using Java API's to compare the performance of Spark with Hive and SQL.

Used Spark API Cloudera YARN to perform analytics on data in Hive.

Developed Java scripts using both RDD and Data frames/SQL/Data sets in Spark 1.6 and Spark 2.1 for Data Aggregation, queries and writing data.

Extensively use Zookeeper as job scheduler for Spark Jobs.

Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and RDD's.

Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.

Added security to the cluster by integrating Kerberos.

Worked with Kerberos and integrated it to the Hadoop cluster to make it more strong and secure from unauthorized access.

Extensively use Zookeeper as job scheduler for Spark Jobs.

Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

Involved in creating Hive tables and loading and analyzing data using hive queries.

Developed Hive queries to process the data and generate the data cubes for visualizing.

Converting SQL codes to Spark codes using Java and Spark-SQL/Streaming for faster testing and processing of data.

Migrated Flume with Spark for real time data and Developed the Spark Streaming Application with java to consume the data from Kafka and push them into hive.

Involved in development of overall Spark Streaming Application in place of flume.

Working extensively on Hive, SQL, Java, Spark, and Shell.

Worked with Protobufff Format in Spark and Kafka.

Involved in migration of Flume to Spark for fast processing.

Involved with Foglight consumer, flume consumer and Athena consumer.

Learn company security related items related to software development, data, access permissions and how to incorporate them into development life cycle.

Experience in implementing the physical data model in Netezza Database.

Developed a data flow to pull the data from REST API using Apache Nifi.

Complete understand on delivering Microsoft azure product with agile methodology.

Worked on 100 node multi clusters on Cloudera platform.

Migrating various Hive UDF's and queries into Spark SQL for faster requests.

Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Java.

Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.

Developed Kafka producer and consumer components for real time data processing.

Hands-on experience for setting up Kafka mirror maker for data replication across the cluster's.

Experience in Configure, Design, Implement and monitor Kafka Cluster and connectors, Oracle SQL tuning using explain plan.

Used to complete the Assigned radar’s in time.

Used to store the code in GIT repository and involved with Phabricator.

Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.

Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Data Lake.

CLIENT: Regions Corporation Bank Jan 2016 – Aug 2017

Role: Hadoop Developer

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, SQL, Cloudera Manager, Sqoop, Oozie Java, Eclipse, weka, R, Flume, Tableau, Apache Kafka, Horton works, Apache Talend, Spark, putty.


Created reports for BI team using Sqoop to export data into HDFS and Hive.

Implemented partitioning, dynamic partitions, bucketing in HIVE.

Used elastic search while exposing HDFS as a repository for long-term archival.

Migrating various Hive UDF's and queries into Spark SQL for faster requests.

Prepared custom shell scripts for connecting to Teradata and pulling the data from Teradata tables to HDFS.

Integrated Apache Kafka for data ingestion.

Worked on tableau to build customized interactive reports, worksheets and dashboards.

Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using java.

Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.

Developed the same Data flow for the SOAP web service to retrieve the data from the API using Apache Nifi in HDF.

Used to monitor our jobs using Hue and query hive queries.

Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.

Used the Data formats like Avro, Parquet.

Worked on Integration of Big data and cloud platforms Using Talend.

Develop and execute maintainable automation tests for acceptance, functional, and regression test cases.

Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and pig.

Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.

Delivered the solution using Agile Methodology.

Implemented Nagios and integrated with puppet for automatic monitoring of servers known to puppet.

Used JUNIT for unit testing and continuum for integration testing.

Used the Spark to fast processing of data in Hive and HDFS.

Used Spark SQL for Structured data processing using data frames API and Datasets API.

Created a high-level design approach to build a data lake which will embrace the existing history data and also to suffice the need to process the transactional data.

Implemented Unit Testing using JUNIT testing during the projects.

Developed Hive queries to process the data and generate the results in a tabular format.

Utilized Agile Scrum methodology.

Written Hive queries for data analysis to meet the business requirements.

Created Hive tables and worked on them using Hive QL.

Load and transform large sets of structured, semi structured and unstructured data.

CLIENT: COVANCE - Indianapolis, IN Feb 2015 – Dec 2015

Role: Java Developer/Hadoop Developer

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, Java/J2EE, SQL, Cloudera Manager, Sqoop, Eclipse, weka, R.


Hands on experience creating Hive tables and written Hive queries for data analysis to meet business requirements.

Experience in Sqoop to import and export the data Mysql.

Involved in processing of unstructured health care records using pig.

Integrating Health Care entities including nursing, Hospitals.

Involved in analyzing the medical billing scenarios patterned after the Client’s electronic logic library.

Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.

Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice-versa.

Created HBase tables to store variable data formats of data coming from different portfolios.

Created workflow and coordinator using Oozie for regular jobs and to automate the tasks of loading the data into HDFS.

Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization, and Multithreading.

Implemented Java Script for client-side validations.

Designed and developed user interface static and dynamic web pages using JSP, HTML and CSS.

Involved in generating screens and reports in JSP, Servlets, HTML, and JavaScript for the business users.

Provided support and maintenance after deploying the web application.

Writing Hive queries for joining multiple tables based on business requirement.

Used complex data types like bags, tuples and maps in Pig for handling data.

Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.

Developed Simple to complex MapReduce Jobs using Hive and Pig.

Experience in implementing data transformation and processing solutions (ETL) using Hive.

Experience in creating Oozie workflow jobs for Map-reduce/Hive/Sqoop/actions.

Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.

Loading files to HDFS and writing hive queries to process required data.

Loading data to hive tables and writing queries to process.

Involved in loading data from LINUX file system to HDFS.

Experience in developing Java MapReduce jobs.

Good knowledge on No-SQL databases- HBASE.

Proficient in adapting to the new Work Environment and Technologies.

Experience in managing and reviewing Hadoop log files.

CLIENT: WELLSFARGO – Hyd, India June 2012 - Nov 2014

Role: Java Developer

Environment: Windows, Linux, Java, HTML, CSS, Eclipse, Java Beans, SQL, XML, Multi-Threading.


Used Exception handling and Multi-threading for the optimum performance of the application.

Used the Core Java concepts to implement the Business Logic.

Key responsibilities included requirements gathering, designing and developing the applications.

Developed and maintained the necessary Java components, Enterprise Java Beans, Java Beans and Servlets.

Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization and Multithreading.

Implemented Java Script for client-side validations.

Used My Eclipse as an IDE for all development and debugging purposes.

Developed Proof of Concepts and provided work/time estimates for design and development efforts.

Coordinated with the QA lead for development of test plan, test cases, test code and actual testing, was responsible for defects allocation and ensuring that the defects are resolved.

Coordinating with Offshore team to provide the requirement, resolving issues and reviewing the deliverables.

Developed the application under J2EE architecture, developed Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.

Involved in the analysis, design, and development and testing phases of Software Development Lifecycle (SDLC) using agile development methodology.

Designed and implemented the UI using HTML, JSP, JavaScript and Java.

Used JDBC to connect the web applications to Data Bases.

Designed and developed user interface using JSP, HTML and JavaScript.

Developed the UI using JavaScript, JSP, HTML, and CSS for interactive cross browser functionality and complex user interface.

Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.

Developed UI using HTML, CSS.

Involved in system, Unit and Integration testing.

Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.

Resolved more priority defects as per the schedule.

Contact this candidate