Sign in

Data Developer

Tampa, Florida, United States
December 01, 2017

Contact this candidate

Resume: Contact: 908-***-****

Hadoop Developer

Over 8 plus years of experience as a Hadoop Developer and has a strong background with file distribution systems in a big-data arena. Understands the complex processing needs of big data and has experience developing codes and modules to address those needs.

Executive Summary:

Over 7 years of experience spread across Hadoop, Java and ETL that includes extensive experience into Big Data Technologies and in development of standalone and web applications in multi-tiered environment using Java, Hadoop, Hive, HBase, Pig, Sqoop, J2EE Technologies (Spring, Hibernate), Oracle, HTML, Java Script.

Extensive experience on Big Data Analytics with hands on experience in writing Map Reduce Jobs on Hadoop Ecosystem including Hive and Pig.

Excellent knowledge on Hadoop Architecture as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.

Experience with distributed systems, large scale non-relational data stores, Map-Reduce systems, data modeling and big data systems.

Involved in developing solutions to analyze large data sets efficiently.

Excellent hands on with importing and exporting data from different Relational Database Systems like MySql and Oracle into HDFS and Hive and vice-versa using Sqoop.

Hands-on experience in writing Pig Latin scripts working with grunt shells and job scheduling with Oozie.

Experience in analyzing data using Hive QL, Pig Latin, and custom Map Reduce programs in Java.

Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.

Experience with databases like DB2, Oracle 9i, Oracle 10g, MySQL, SQL Server and MS Access.

Experience in creating complex SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions, Cursors, Index, triggers and packages.

Very good understanding on NOSQL databases like monody and HBase.

Have good Knowledge in ETL and hands on experience in Informatica ETL.

Worked on various database technologies like Oracle, DB2-UDB, Teradata

Extensive experience in creating Class Diagrams, Activity Diagrams, Sequence Diagrams using Unified Modeling Language(UML).

Extracted files from MySQL, Oracle, and Teradata 2through Sqoop 1.4.6 and placed in HDFS Hortonworks Distribution and processed.

Experienced in SDLC, Agile (SCRUM) Methodology, Iterative Waterfall.

Experience with various version control systems Clear Case, CVS, and SVN.

Expertise in extending Hive and Pig core functionality by writing custom UDFs.

Development Experience with all aspects of software engineering and the development life cycle. Strong desire to work for a fast-paced, flexible environment.

Proactive problem-solving mentality that thrives in an agile work environment, good Experience on SDLC (Software Development Life cycle).

Exceptional ability to learn new technologies and to deliver outputs in short deadlines.

Worked with developers, DBAs, and systems support personnel in elevating and automating successful code to production.

Possess strong Communication skills of written, oral, interpersonal and presentation.

Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.

Technical Skills:

Programming Languages: Java, python, Scala, Shell Scripting, SQL, PL/SQL.

J2EE Technologies: Core Java, Spring, Servlets, SOAP/REST services, JSP, JDBC, SML,


Big Data Ecosystem: HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Impala, Cassandra, Oozie,

Zookeeper, Flume, Ambary, Storm, Spark and Kafka. Hortonworks

Databases: NoSQL, Oracle 10g/11g/12C, SQL Server 2008/2008, MySQL 2003-2006.

Database Tools: Oracle SQL Developer, MongoDB, TOAD and PLSQL Developer, Teradata

Modeling Tools: UML on Rational Rose 4.0/7.5/7.6/8.1

Web Technologies: HTML5, JavaScript, XML, JSON, jQuery, Ajax, CSS3

Web Services: Web Logic, Web Sphere, Tomcat

IDEs: Eclipse, NetBeans, WinSCP.

Operating systems: Windows, UNIX, Linux (Ubuntu), Solaris, Centos, Ubuntu, Windows

Server 2003/2006/2008/2009/2012/2013/2016.


Citi Group Technologies, Tampa, FL 08/2016 – Present

Hadoop Developer

Roles and Responsibilities

Installed and Configured multi-nodes fully distributed Hadoop cluster.

Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.

Designed and implemented MapReduce based large-scale parallel relation-learning system.

Involved in end to end data processing like ingestion, processing, and quality checks and splitting.

Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake.

Involved in scripting (python and shell) to provision and spin up virtualized Hadoop clusters

Worked with NoSQL databases like Base to create tables and store the Data Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.

Developed custom aggregate functions using Spark SQL and performed interactive querying.

Wrote Pig scripts to store the data into HBase.

Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL

Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team. Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.

Create a complete processing engine, based on Hortonworks distribution, enhanced to performance.

Extracted files from MySQL, Oracle, and Teradata 2through Sqoop 1.4.6 and placed in HDFS Hortonworks Distribution and processed.

Extracted files from RDBMS through Sqoop and placed in HDFS and processed.

Spark Streaming collects this data from Kafka in near-real- time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).

Involved in Installing Hadoop Ecosystem components.

Responsible to manage data coming from different sources.

Setup Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning and performance tuning.

Experienced in installing, configuring and using Hadoop Ecosystem components.

Experienced in Importing and exporting data into HDFS and Hive using Sqoop.

Knowledge in performance troubleshooting and tuning Hadoop clusters.

Participated in development/implementation of Cloudera Hadoop environment.

Got good experience with NOSQL database such as HBase.

Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.

Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.

Extracted files from MySQL, Oracle, and Teradata 2through Sqoop 1.4.6 and placed in HDFS Hortonworks Distribution and processed.

Developed and delivered quality services on-time and on-budget. Solutions developed by the team use Java, XML, HTTP, SOAP, Hadoop, Pig and other web technologies.

Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.

Developed MapReduce programs to parse the raw data, populate staging tables and store the refined detain partitioned tables in the EDW.

Monitored and managed the Hadoop cluster using Apache Ambary.

Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and

troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Apache Hadoop (Cloudera), HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, java.

Tupperware - Orlando, FL 03/2015 – 07/2016

Hadoop Developer

Tupperware is the name of a home products line that includes preparation, storage, containment, and serving products for the kitchen and home. Tupperware develops, manufactures, and internationally distributes its products as a wholly owned subsidiary of its parent company Tupperware Brands. It is marketed by means of approximately 1.9 million direct salespeople on contract.

Roles and Responsibilities

Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.

Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.

Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.

Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.

Managed and reviewed Hadoop log files. Tested raw data and executed performance scripts. Shared responsibility for administration of Hadoop, Hive and Pig. Built wrapper shell scripts to hold this Oozie workflow.

Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.

Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS. Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.

Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.

Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unit testing.

Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to HIVE and IMPALA.

Worked on Map-Reduce Joins in querying multiple semi-structured data as per analytic needs. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.

Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.

Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.

Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Familiarity with NoSQL databases including HBase.

Wrote shell scripts for rolling day-to-day processes and it is automated.

Environment: Hadoop, MapReduce, YARN, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, Oracle, NoSQL and Unix/Linux, Kafka, Amazon web services.

Cerner – Kansas City, MO 09/2014 – 02/2015

Big Data/Hadoop Developer

Designing how data in Hadoop was going to be processed to make doing BI analysis on the data easier; wrote a set of SQL-like (Hive) jobs implementing parts of the design and developed code that ingests gigabytes of data into Progressive's Hadoop's cluster.

Roles and Responsibilities:

Involved in creating Hive tables, and loading and analyzing data using hive queries.

Developed and executed custom MapReduce programs, Pig Latin scripts and HQL queries.

Worked on importing the data from different databases into Hive Partitions directly using Sqoop.

Performed data analytics in Hive and then exported the metrics to RDBMS using Sqoop.

Involved in running Hadoop jobs for processing millions of records of text data.

Extensively used Pig for data cleaning and optimization.

Involved in HDFS maintenance and administering it through Hadoop-Java API

Configured Fair Scheduler to provide service level agreements for multiple users of a cluster

Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop.

Written Complex Map reduce programs.

Installed and configured Hive and written Hive UDFs.

Involved in writing Flume and Hive scripts to extract, transform and load data into Database.

Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.

Involved in writing Java API’s for interacting with HBase.

Implemented complex map reduce programs to perform joins on the Map side using distributed cache.

Developed multiple MapReduce jobs in java for data cleaning and preprocessing.

Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.

Extracted Tables using Sqoop and placed in HDFS and processed the records.

Environment: CDH, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Unix.

Sikorsky, Stratford, CT 06/2012 – 07/2014

Big Data/Hadoop Consultant

Roles and Responsibilities

Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.

Developed various Map reduce programs to cleanse the data and make them consumable by hadoop.

Migrated the existing data to Hadoop from RDBMS (Oracle) using Sqoop for processing the data.

Worked with sqoop export to export the data back to RDBMS.

Used various compression codecs to effectively compress the data in HDFS.

Written Pig Latin scripts for running advanced analytics on the data collected.

Created hive internal and external tables with appropriate static and dynamic partitions for efficiency.

Used Avro SerDe's for serialization and de-serialization and also implemented hive custom UDF's involving date functions.

Worked on a POC to benchmark the efficiency of Avro vs Parquet.

Implemented the end to end workflow for extraction, processing and analysis of data using Oozie.

Used various optimization techniques to optimize hive, pig and sqoop.

Environment: CDH, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Unix.

GSK, Research triangle park, NC 09/2009 – 05/2012

SQL Developer

Roles and Responsibilities

Created the Database, User, Environment, Activity, and Class diagram for the project (UML).

Implement the Database using Oracle database engine.

Designed and developed a fully functional generic n-tiered J2EE application platformthe environment was Oracle technology driven. The entire infrastructure application was developed using Oracle Java Developer in conjunction with Oracle ADF-BC and Oracle ADF- RichFaces.

Created an entity object (business rules and policy, validation logic, default value logic, security)

Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.

Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).

Designing GUI prototype using ADF 11G GUI component before finalizing it for development.

Used Cascading Style Sheet (CSS) to attain uniformity through all the pages Create Reusable Component (ADF Library and ADF Task Flow).

Experience using Version controls such as CVS, PVCS, and Rational Clear Case.

Creating Modules Using Task Flow with Bounded and Unbounded.

Generating WSDL (Web Services) And Create Work Flow Using BPEL.

Handel the AJAX functions (partial trigger, partial Submit, auto Submit).

Created the Skin for the layout.

Environment: Java core, Servlet, JSF, ADF Rich client UI Framework ADF-BC (BC4J) 11g, web services Using Oracle SOA (Bell), Oracle WebLogic.

Contact this candidate