Hadoop Developer

Location:

Altamonte Springs, FL, 32701

Posted:

December 01, 2017

Contact this candidate

Resume:

Riya Ratnani

Sr. Hadoop Developer

****.********@*****.***

678-***-****

Professional Summary:

Over 8 years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE, Hadoop and Spark related technologies using Cloudera, Hortonworks.

Hadoop Developer with 5 years of working experience in designing and implementing complete end-to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.

Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Apache Storm, Flume, Oozie, HBase and Zookeeper.

In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.

Hadoop related eco-systems as a Data Storage and Retrieval systems.

Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.

Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.

Good knowledge on Spark Ecosystem and Spark Architecture.

Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.

Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.

Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.

Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

Experience in handling messaging services using Apache Kafka.

Experience with migrating data to and from RDBMS into HDFS using Sqoop.

Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.

Worked on NoSQL databases including HBase, Cassandra and Mongo DB.

Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.

Experience in working with Java HBase API for ingestion processed data to HBase tables.

Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.

Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

Extensive knowledge in programming with Resilient Distributed Datasets (RDDs)

Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.

Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.

Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files and Databases.

Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading in Talend.

Used Talend for ETL processing based on business needs and extensively used Oozie workflow engine to run multiple Hive and Pig jobs.

Experience with Talend and Informatica/Data Exchange.

Solid experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.

Experience in managing and reviewing Hadoop log files.

Responsible in performing advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.

Experience with Apache Solr in replication, distribution, rebalancing and fault tolerance out of the box.

Experience in architectural patterns like Apache Lucene search development, full-text search development, cross-platform, High Performance Indexing and ranked searching.

Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.

Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, spring, Hibernate, Struts, JMS, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.

Profound knowledge on core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization.

Experience writing Shell scripts in Linux OS and integrating them with other solutions.

Expert at creating UML diagrams Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Microsoft Visio and IBM Rational Rose.

Good experience in development of software applications using Core Java, JDBC, Servlets, JSPs, Spring and RESTful Web Services.

Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.

Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.

Experience in Requirements Gathering/Analysis, Design, Development, Versioning, Integration, Documentation, Testing, Build and Deployment.

Efficient in packaging & deploying J2EE applications using ANT, Maven & Cruise Control on WebLogic, WebSphere & JBoss.

Experience in using Jenkins and Maven to compile the package and deploy to the Application Servers.

Deployment, Distributed and Implementation of Enterprise applications in J2EE environment

Good Understanding of bootstrap, spring rest and integration.

Strong Knowledge of Version Control Systems like SVN, GIT & CVS.

Familiar with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused, adaptive and quick learner with excellent interpersonal, technical and communication skills.

Technical Skills:

Big Data Technologies

Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios, Splunk, Elastic Search, Kibana

Hadoop Distributions

Cloudera, Horton Works, AWS

Operating Systems

Windows, Macintosh, Linux, Ubuntu, Unix, CentOS, Redhat.

Programming Languages

C, JAVA, J2EE, SQL, PigLatin, HiveQL, Scala, Python, Unix Shell Scripting

Java Technologies

JSP, Servlets, Spring, Hibernate, Maven

Databases

MS-SQL, Oracle, MS-Access, NoSQL, MySQL

Reporting Tools/ETL Tools

Tableau, Informatica, Data stage, Talend, Pentaho, Power View

Methodologies

Agile/Scrum, Waterfall, DevOps

Development Tools

Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access)

Professional Experience:

Client: Frontline Insurance, Longwood, FL November 2015 to Till Date

Role: Spark/Scala Developer

Description: Frontline is a Florida-based, multi-state Property/Casualty insurer serving residential and commercial property owners throughout the Southeast United States. Frontline specializes in smarter insurance products for primary, secondary, seasonal, and commercial property owners. Frontline has Client Data Management, Rating Management, and Reinsurance Management, policy administration, claims and billing. DataHub and InfoCenter will support Frontline’s enterprise-wide data and analytics needs.

This project is all about to capture the customer's commercial Properties details and process that data. The data, generated from different sources is posted in to the Local File System through upstream. From Local File System, the data is moved in to HDFS to process large amount of data we used Hadoop ecosystem. Once we gather information we can run strategies on the collected data and analyze data using Spark. The main goal of the project is to identify the costumers whose claims are greater than 10000 dollars and report it to the government.

Responsibilities:

Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.

Worked on the Spark SQL for analyzing the data.

Used Scala to write code for all Spark use cases.

Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD'S, YARN.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.

Worked in Spark SQL on different data formats like JSON and Parquet.

Developed Spark scripts by using Scala shell commands as per the requirement.

Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.

Load the data into Spark RDD and performed in-memory data computation to generate the output response.

Familiarity with Hadoop open source stack including Yarn, Kafka, Hive

I have been experienced with KAFKA to ingest data into Spark engine

Worked on streaming pipeline that uses Spark to read data from Kafka, transform it and write it to HDFS.

Extensive experience in using the mom with active MQ, apache storm, apache Spark & Kafka maven and zookeeper.

Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.

Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.

Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.

Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Involved in preparing JIL's for AutoSys jobs.

Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.

Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.

Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.

Implemented daily workflow for extraction, processing and analysis of data with Oozie.

Hadoop installation & configuration of multiple nodes on AWS EC2 system

Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.

Implemented a Continuous Delivery pipeline with Docker, Jenkins and GitHub, Nexus, Maven and AWS.

Environment: Spark - 1.5.2, Spark SQl, Java 1.8, Hive, HDFS, HQL, YARN, HBase, MapReduce, Sqoop, Flume, Oozie, Kafka, Scala, AWS Oracle 12c.

Client: Chase Bank, Chicago, IL May 2014 to November 2015

Role: Hadoop Developer

Description: Chase Bank is an American international banking and financial services holding company. Chase Bank delineates three different business segments when reporting results: Community Banking, Wholesale Banking, and Wealth, Brokerage and Retirement.

Responsibilities:

Worked on live 60 nodes Hadoop Cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1

Worked on Hadoop cluster using different Bigdata analytic tools including Kafka, Pig, Hive and Map Reduce.

Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.

Implemented data access jobs through Pig, Hive, HBase (0.98.0), Storm (0.91)

Involved in loading data from LINUX file system to HDFS

Importing and exporting data into HDFS and Hive using Sqoop.

Altered existing Scala programs to enhance performance and obtain partitioned results Spark tool.

Used SparkSQL for Scala &amp, Python interface that automatically converts RDD case classes to schema RDD.

Used SparkSQL to read and write table which are stored in Hive.

Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job daily.

Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.

Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers

Experience in data migration from RDBMS to Cassandra.

Created data-models for customer data using the Cassandra Query Language.

Experienced in developing Spark scripts for data analysis in both python and Scala.

Worked on processing unstructured data using Pig and Hive.

Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Used Impala to read, write and query the Hadoop data in HDFS or HBase.

Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.

Developed Pig Latin Scripts to extract data from the web server output files to load into HDFS.

Responsible in taking backups and restoration of Tableau repository.

Converted ETL operations to Hadoop system using Pig Latin operations, transformations and functions.

Experience in Talend migration project from one version to another is preferred.

Worked on majority of Talend components and can design simple ETL Jobs to handle complex Business Logic.

Knowledge of error handling and Performance tuning in Talend and SQL.

Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.

Exported the result set from Hive to MySQL using Shell Scripts.

Actively involved in code review and bug fixing for improving the performance.

Experience with Cassandra (DataStax distribution preferred)

Collaborate with development teams on architecture and deployment of NoSQL database systems like Cassandra

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Storm, Kafka, LINUX, Hortonworks distribution, Bigdata, Java APIs, Java collection, SQL, NoSQL, MongoDB, Cassandra.

Client:, AT&T, FL Sep 2013 to Apr 2014

Role: Hadoop Administrator/Developer

Responsibilities:

Responsible for installation, configuration, maintenance, monitoring, performance tuning and troubleshooting Hadoop Clusters in different environments such as Development Cluster, Test Cluster and Production.

Used Job Tracker to assign MapReduce tasks to Task Tracker in cluster of nodes.

Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.

Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.

Implemented Kerberos security in all environments.

Defined file system layout and data set permissions.

Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.

Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.

Involved in loading data from Linux and Unix file system to HDFS.

Involved in emitting processed data from Hadoop to relational databases or external file systems using Sqoop, HDFS GET or Copy to Local.

Involved in Cluster planning and setting up the multimode cluster.

Used Gangila to monitor and Nagios to send alerts about the cluster around the clock.

Commissioned and Decommissioned nodes from time to time.

Involved in HDFS maintenance and administering it through HDFS-Java API.

Worked with Hadoop developers and designers in troubleshooting MapReduce job failures and issues.

Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Sqoop, Cloudera Hadoop Distribution, HBase, Windows NT, LINUX, UNIX Shell Scripting.

Client: SumTotal Systems,, India Oct 2012 to Jul 2013

Role: Hadoop Administrator/Developer

Description: SumTotal Systems, Inc. is a software company that provides human resource management software and services to private and public-sector organizations. The company delivers solutions through multiple-cloud based channels, including Software as a Service (SaaS), Hosted Subscription and premises-based licensure.

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.

Imported and exported data into HDFS from Oracle database and vice versa using Sqoop.

Installed and configured Hadoop Cluster for major Hadoop distributions.

Used Hive and Pig as an ETL tool for event joins, filters, transformations and pre-aggregations.

Created partitions, bucketing across state in Hive to handle structured data.

Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.

Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume &amp and process the files by using Piggybank.

Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.

Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.

Created tables, secondary indices, join indices viewed in Teradata development environment for testing.

Captured data logs from web server into HDFS using Flume &amp for analysis.

Managed and reviewed Hadoop log files.

Environment: Hive, Pig, MapReduce, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.

Client: Dell, India Aug 2009 to Sep 2012

Role: Java Developer

Description: ZenQ is the leading provider of pure-play software testing services to clients across the globe. The company offers highest quality and efficient solutions to help the clients build quality products and solutions.

Responsibilities:

Involved in development of JavaScript code for client-side validations.

Developed the HTML based web pages for displaying the reports.

Developed front-end screens using JSP, HTML, jQuery, JavaScript and CSS.

Performed data validation in Struts from beans and Action Classes.

Developed dynamic content of presentation layer using JSP.

Accessed stored procedures and functions using JDBC Callable statements.

Involved in designing use-case diagrams, class diagrams and interaction using UML model with Rational Rose.

Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.

Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.

Developed Web Applications with rich internet applications using Java applets, Silverlight, Java.

Used JDBC for database access.

Played a key role in the high-level design for the implementation of the application.

Designed and established the process and mapping the functional requirement to the workflow process.

Environment: Java, Servlets, Java Beans, JSP, EJB, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP.

Education:

Bachelors in Electronics and Communication Engineering from JNTUK, India

Contact this candidate