Hadoop/Big Data Developer

Location:

Piscataway Township, NJ, 08854

Posted:

October 27, 2016

Contact this candidate

Resume:

*****.******@*****.***

609-***-****

PROFESSIONAL SUMMARY:

Overall 8 years of professional IT experience in which includes experience in Database and Big Data ecosystem related technologies.

Around 3 years of IT experience in Hadoop Ecosystem like Pig, Hive Flume, Zookeeper, Sqoop, Oozie, Python and also Spark.

Around 3 years of experience with MySQL, Oracle and Shell scripting in Health Care domain.

Around 2 years of experience with Java, HTML, XML, CSS and JavaScript in Shipping and Logistics domain.

Strong development skills around Hadoop – MapReduce, HiveQL, Pig Latin.

Strong knowledge of the Hadoop ecosystem and its associated components – Pig, Hive, Flume, Zookeeper, Sqoop, Oozie.

Experience in working with RDBMS technologies like MySQL and Oracle.

Experience in analyzing data by developing custom UDF’s in Hive and Pig.

Experience in handling Text, Sequence and Avro files.

Experience in working with Apache Sqoop for efficiently transferring data to and from Hadoop HDFS and RDBMS.

Experience in use of Shell scripting to perform tasks and job scheduling.

Experience in automating different kinds of jobs using the Oozie workflow engine.

Experience in designing and implementing security for Hadoop cluster with Kerberos secure authentication.

Good experience using Apache SPARK and Kafka.

Hands on experience with Spark-Scala programming with good knowledge on Spark Architecture and its In-memory Processing.

Experience in NoSQL databases like Cassandra and Mongo DB.

Hands on experience on Design and Development, UT/IT Testing, Debugging and Troubleshooting.

Experienced in developing, maintaining, and supporting automated scripts.

Excel in reviewing and understanding of code and Manual ground work for Technical Documentation.

Hands-on experience on various Version Control tools like Git, Perforce SVN, VSS and CVS for configuration management.

Proficient in communicating with people at all levels of hierarchy in the organization and a team player with strong programming and problem solving skills.

TECHNICAL SKILLS

Operating systems : Windows 95/98/NT/2000/XP, Unix flavors

Ticketing Tool : Service Now, Team Track and JIRA

Version Control : Perforce SVN, GIT, CVS and VSS

Programming Language : C, C++, Java with OOPs concepts and R

Scripting Languages : Python and Shell scripting

Java Frameworks : jQuery, Bootstrap, Spring and Angular JS

Database : MySQL, Oracle, Teradata, Mongo DB and Cassandra

Web Technologies : HTML, XML, CSS, JSP and JavaScript

Hadoop Distributions : Apache Hadoop (HDFS & MapReduce) and Cloudera Distributed Hadoop (CDH3, CDH4), Apache Spark and YARN.

Hadoop eco-system tools : Hive, impala, Pig, Flume, Oozie, Sqoop, Zoo Keeper, Kafka, Ambari, Scala and Tableau

PROFESSIONAL EXPERIENCE

Hadoop Developer

Verizon wireless - New Jersey June 2015 - Till Date

Description: Verizon is one of the largest communication technology companies in the world. Verizon operate America's largest 4G LTE wireless networks. The main theme of this project was to collect and analyze the raw logs from STB to put forth the customer’s field of interests and create reports, dashboards for marketing team.

Responsibilities:

Import the data from different sources like HDFS/HBase into Spark RDD.

Developed a data pipeline using Kafka and Storm to store data into HDFS.

Performed real time analysis on the incoming data.

Performed procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.

Real time streaming the data using Spark with Kafka.

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.

Worked on Big Data Integration and Analytics based on Hadoop, Spark, Kafka and web Methods technologies.

Worked on migrating MapReduce Java programs into Spark transformations using Spark and Scala.

Built Kafka Rest API to collect events from front end.

Built real time pipeline for streaming data using Kafka and Spark Streaming.

Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.

Responsible for the Implementation of POC to migrate map reduce jobs into Spark RDD transformations using java

Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.

Performance optimization dealing with large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other heavy lifting during ingestion process itself.

Used Spark API over Hadoop YARN to perform analytics on data in Hive.

Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.

Developed business specific Custom UDF's in Hive, Pig.

Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.

Optimized MapReduce javan code, pig scripts and performance tuning and analysis.

Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.

Environment: Hadoop, Cloudera, MapReduce, java, Spark, Scala, Spark Streaming, Spark Sql, YARN, Sqoop, Flume, Hive, Oozie, Pig, Kafka and Tableau.

Hadoop Developer

State Auto Insurance- Ohio Sep 2013 - June 2015

Description: State Automobile Insurance Mutual Insurance Company was founded to tackle the inequities in the insurance industry. From the very beginning, State Auto advocated the independent agency system as the best approach to serving the needs of its policyholders.

Responsibilities:

Involved in loading data from LINUX file system to HDFS.

Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.

Worked on loading log data directly into HDFS using Flume.

Responsible for managing data from multiple sources.

Experienced in running Hadoop streaming jobs to process terabytes of xml format data.

Implemented JMS for asynchronous auditing purposes.

Experience in defining, designing and developing Java applications, specially using Hadoop[Map/Reduce] by leveraging frameworks such as Cascading and Hive.

Successfully loaded files to Hive and HDFS from Cassandra.

Developed MapReduce jobs using Java API.

Wrote MapReduce jobs using Pig Latin scripts.

Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.

Installed and configured Hive and also written Hive UDFs.

Developed workflow using Oozie for running MapReduce jobs and Hive Queries.

Worked on Cluster coordination services through Zookeeper.

Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.

Experience in managing the CVS and migrating into Subversion.

Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.

Environment: Hadoop, Cloudera, MapReduce, Linux, Cassandra, Sqoop, Flume, Hive, Oozie, Pig, Zookeeper, CVS and Subversion.

Sr. Software Engineer

Value Labs Technologies - Hyderabad June 2010 - Aug 2013

Description: Value Labs is an Indian-based global IT services and consulting company that provides custom information technology and business consulting services.

Responsibilities:

Involved in Design reviews, Requirement Analysis, Preparation of technical designs.

Developed the Technical Specifications Document.

Developed Oracle pl/sql packages to load the data received from client through flat files and to load the data from one data system into other systems (ETL process).

Extensive use of performance driven PL/SQL based ETLs for loading Eligibility data.

Developed End of the Day processes.

Involved in writing Spring Validator Classes for validating the input data.

Extensive experience on modern frontend template in frameworks for JavaScript-including Bootstrap, AngularJS, jQuery.

Develop scripts to automate routine DBA tasks using Linux/UNIX Shell Scripts/Python (i.e. database refresh, backups, monitoring etc.)

Extensive experience in AWS development.

Introduced metadata oriented approach to create reusability among the developed code and reduce the LOC to be written.

Effectively handled various enhancements and solved various productions problem.

Involved in bug fixing and resolving issues.

Coordinating with the Onsite team and maintained effective communication through conference calls and e-mails.

Environment: SQLServer, Oracle, jQuery, Bootstrap, spring, Angular JS, Python, AWS and Unix Shell Scripting.

Software Engineer

Process Weaver - Hyderabad June 2008 - May 2010

Description: Process Weaver offers Inbound and Outbound TMS for Global Logistics. We support all modes (Parcel, LTL-TL, Rail and Ocean).

Responsibilities:

Participated in planning and development of UML diagrams like use case diagrams, object diagrams and class diagrams to represent a detailed design phase.

Designed and developed user interface static and dynamic web pages using JSP, HTML and CSS.

Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.

Performed unit testing, system testing and user acceptance test.

Used JavaScript for developing client side validation scripts.

Developed SQL scripts for batch processing of data.

Created tables, indexes, views and other objects using SQL.

Environment: HTML, JSP, CSS, JavaScript, MySQL and Visio.

Contact this candidate