Hadoop Developer

Location:

United States

Posted:

April 22, 2016

Contact this candidate

Resume:

SONAL SAXENA

615-***-****

**********.***@*****.***

PROFESSIONAL SUMMARY

Over 7+ years of extensive experience in Analysis, Design, Develop, Testing and maintenance of Hadoop / Java applications

Experience in developing big data applications using Hadoop framework and its eco systems like MapReduce, HDFS, Hive, Pig, HBase, Oozie, Sqoop and Flume

Experience in processing large sets of structured, semi-structured and unstructured data sets.

Experience in developing MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.

Experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets. Good experience in writing Map Reduce jobs using Java native code, Pig, Hive for various business use cases

Worked on streaming the data into HDFS from web servers using Flume

Worked extensively with Sqoop for importing/exporting data between relational sources like SQL Server/MySQL and HDFS/Hive

Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data

Experience in fine-tuning performance of Hive queries and Pig scripts

The Hive tables created as per requirement were Internal or External tables defined with appropriate Static and Dynamic partitions, intended for efficiency

Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data

Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.

Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms

Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs

Implemented Java HBase MapReduce paradigm to load raw data onto HBase database, and bulk importing data into HBase via Sqoop

Capable of building and deploying batches in shell script

Worked with different sources like Flat files, XML files, JSON, AVRO, Parquet and relational tables

Developed and Reviewed Java codes to identify any errors and fix them

Education and Certification

Bachelor of Technology in Computer Engineering

Microsoft SQL Server 2008, Database Development Certified in 2013

Oracle Certified Professional Java SE 6 Programmer in 2012

TECHNICAL SKILLS

Big Data Ecosystem

Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Flume

Tools

Eclipse, ANT, SharePoint, Microsoft SQL Server Management Studio 2008, Business Intelligence Development Studio 2008, RAD, Subversion, BMS Remedy, PUTTY, WinSCP, FileZilla, Service Now

Languages

Java, Python, Unix Shell Scripting,SQL, JavaScript, XML, HTML, AWK, Shell

Environment

UNIX, LINUX, Windows

Databases

MS SQL Server, MS Access, IBM DB2

EXPERIENCE SUMMARY

Laureate Education, Baltimore, MD Jan 2015 till now

Hadoop Data Engineer

Responsibilities:

• Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem

• Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster

• Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.

• Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms

• Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop

• Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.

• Tested Apache(TM) Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.

• Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra

• Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources

• Continuous monitoring and managing the Hadoop cluster using Cloudera Manager

• Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required

• Installed Oozie workflow engine to run multiple Hive and Pig jobs

• Used Mahout to understand the machine learning algorithms for an efficient data processing

• Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team

• Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.

• Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

• Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats

Environment:

Hadoop – CDH 5.0.2, PIG, Hive, Apache Sqoop, Oozie, HBase, Zoo keeper, Cloudera manager, 30 Node cluster with Linux-Ubuntu.

Java, Shell scripting, Python

Duke University Health System, Durham, NC Feb 2013 to Jan 2015

Hadoop Developer

Responsibilities:

• Migrating the data from MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

• Proposed an automated system using Shell script to Sqoop the data.

• Worked in Agile development approach.

• Created the estimates and defined the sprint stages.

• Developed a strategy for Full load and incremental load using Sqoop.

• Mainly worked on Hive queries to categorize data of different claims.

• Integrated the hive warehouse with HBase

• Written customized Hive UDFs in Java where the functionality is too complex.

• Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

• Helped generating Tableau reports for testing the connectivity to the corresponding Hive tables using Hive ODBC connector.

• Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).

• Monitored System health and logs and respond accordingly to any warning or failure conditions.

Environment:

CDH 4.0, HDFS, Hive, Java, Sqoop, Cloudera CDH4, MySQL, Tableau

Investor Online Network, Englewood Cliff, NJ Jun 2012 to Jan 2013

Java & Hadoop Developer

Responsibilities:

Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.

Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.

Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.

Supported code/design analysis, strategy development and project planning.

Created reports for the BI team using Sqoop to export data into HDFS and Hive.

Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Administrator for Pig, Hive and HBase installing updates.

Hive and Pig UDF's using PYTHON for evaluation of data.

Environment:

Java, Apache Hadoop, Hive, PIG, SQOOP, HBase

Eclipse, Microsoft SQL Server, Management Studio 2008, Service Now, LINUX, Sharepoint

Legal and General, Redhill, UK Mar 2009 to May 2012

Java Developer

Responsibilities:

Worked in an Agile Test driven Development environment

Implemented server side tasks using Servlets and XML.

Developed page templates using Struts Tiles framework.

Implemented Struts Action classes using Struts controller component

Created and deployed web pages using HTML, JSP, JavaScript and CSS.

Led the migration of monthly statements from UNIX platform to MVC Web-based Windows application using Java, JSP, and Struts technology.

Developed SQL statements to improve back-end communications.

Used ANT automated build scripts to compile and package the application

Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application in the WebSphere Server.

Prepared use cases, designed and developed object models and class diagrams.

Develop and Review Java codes to identify any errors and fix them.

Quality Center and Subversion Administration

Application Environment Setup and Status Reporting.

Worked with different sources like flat files, XML files, DB2, MS SQL Server.

Extracted data from Sales department to flat files and load the data to the target database.

Written stored procedures, triggers, and cursors.

Environment:

Java, J2EE, JavaScript, Microsoft SQL Server, Management Studio 2008

Eclipse, XML, HTML, Sharepoint, FileZilla, PUTTY, UNIX, SQL, ANT

Fidelity Information Services, Tampa, FL Aug 2006 to Jan 2009

SQL Developer

Responsibilities:

Maintaining the records of Employee enrolled in different benefits for various clients.

Importing the election information for different vendors during open enrollment with precision.

SQL script to generate various reports as per the client requirement. i.e Medical election report, Plan change report etc.

Implementing triggers and stored procedures to maintian the data integrity among different systems using T-SQL.

Automated manual task to enhance the performance of system.

Function creation to provide custom functionality as per the requirement.

Aware of potential blocking, deadlocking and write code to avoid those situations.

Translate business requirements into software applications and models.

Participating in discussions involving the application creation and understand the requirements and provide the back-end functionality for the applications.

Having experience in project Quality Assurance.

Instantiating applications in different databases for development, testing, education, and deployment in a production environment

Environment:

Windows XP, SQL Server Management Studio, BMS-Ready, PUTTY,

FileZilla, Eclipse

Contact this candidate