hadoop spark developer

Location:

Posted:

October 17, 2017

Resume:

PROFESSIONAL SUMMARY:

Around * years of professional IT experience, 4+ years Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.

Experience in using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Zoo Keeper, Hive, Sqoop, Pig, Flume, Spark, Cloud era.

Knowledge and experience in Spark using Python and Scala.

Knowledge on big-data database HBase and NoSQL databases Mongo DB and Cassandra.

Experience in meeting expectations with Hadoop clusters using Cloudera and Horton Works.

Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.

Experience in Spark applications using Scala for easy Hadoop transitions.

Extending Hive and Pig core functionality by writing custom UDFs.

Solid knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Scribe, Hue, Flume, HBase, etc.

Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining.

Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.

Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform.

Loaded some of the data into Cassandra for fast retrieval of data.

Worked with both Scala and Java, Created frameworks for processing data pipelines through Spark.

Implemented batch-processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.

Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.

Experience in Database design, Data analysis, Programming SQL.

Experience in extending HIVE and PIG core functionality by using Custom User Defined functions.

Experience in writing custom classes, functions, procedures, problem management, library controls and reusable components.

Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on PIG, HIVE and SQOOP.

Experienced in integrating Java-based web applications in a UNIX environment.

Experience working with Red Hat Enterprise Linux.

TECHNICAL SKILLS:

Hadoop/Big Data

HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, and Spark (Python and Scala)

NoSQL Databases

HBase, Cassandra, MongoDB

Languages

C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R

Java/J2EE Technologies

Applets, Swing, JDBC, JSON, JSTL, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks

MVC, Struts, Spring, Hibernate.

ETL

IBMWebSphere/Oracle

Operating Systems

Sun Solaris, UNIX, Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies

HTML, DHTML, XML, WSDL, SOAP

Web/Application servers

Apache Tomcat, WebLogic, JBoss

Databases

Oracle, SQL Server, MySQL

Tools and IDE

Eclipse, NetBeans, JDeveloper, DB Visualizer.

Network Protocols

TCP/IP, UDP, HTTP, DNS

PROFESSIONAL EXPERIENCE:

Client: Bank of America Jan’16 – Present Location: Charlotte, NC

Role: Hadoop / Spark Developer

Description: Bank of America is one of the financial and commercial bank in USA, needs to maintain, process huge amount of data as part of day to day operations. As a Hadoop developer worked for Risk management team in the bank. Involved in maintaining the huge data and designing, developing predictive data models for business users according to the requirement.

Responsibilities:

Worked with Hadoop Ecosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.

Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote PigScripts for sorting, joining, filtering and grouping the data.

Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.

Developed spark programs using Scala, involved in creating Spark SQLQueries and Developed Oozie workflow for sparkjobs.

Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.

Used Hadoop FS actions to move the data from upstream location to local data locations.

Written extensive Hive queries to do transformations on the data to be used by downstream models.

Developed map reduce programs as a part of predictive analytical model development.

Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.

Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc. and ingested streaming data into Hadoop using Spark, Storm Framework and Scala.

Got pleasant experience with NOSQL databases like MongoDB.

Extensively used SVN as a code repository and Version One for managing day agile project development process and to keep track of the issues and blockers.

Written spark python for model integration layer.

Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.

Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.

Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.

Developed a data pipeline using Kafka, HBase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Scala, Python, Pig, Sqoop, Oozie, MongoDB, Map Reduce, SVN.

Client: T-Mobile Aug’14 – Nov’15

Location: Washington, DC. Role: Hadoop / Spark Developer

Description: T-Mobile, Inc. is a leading telecommunications corporation and provides communications and digital entertainment services in the United States and the world. The project involves in examining customer information, improves customer’s experience, and wants to provide better and feasible alternates as per the ongoing Marketing Strategy. As part of the team I am involved in data related to customer's reviews, suggestions and their inputs collectively grouped together on regular intervals.

Responsibilities:

Monitoring and managing daily jobs, processing around 200k files per day and monitoring those through RabbitMQ and Apache Dashboard application.

Monitored workload, job performance and capacity planning using InsightIQ storage performance monitoring and storage analytics, experienced in defining job flows.

Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.

Strong experience working on design and implemented a Cassandra based database and related web services for storing unstructured data.

Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.

Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the web_log data from servers/sensors.

Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.

Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.

Created reports for the BI team using Sqoop to export data into HDFS and Hive.

Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.

Worked on Mapreduce Joins in querying multiple semi-structured data as per analytic needs.

Involved in loading data from Unix File System into HDFS with different format of data (Avro, Parquet) and creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop.

Worked on setting up High Availability for GPHD 2.2 with Zookeeper and quorum journal nodes.

Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.

Worked in AWS environment for development and deployment of Custom HADOOP Applications.

Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.

Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services(AWS)and on private cloud infrastructure - Openstack cloud platform.

Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances.

Scheduling and managing cron jobs, wrote shell scripts to generate alerts.

Environment: Hadoop, AWS, Map Reduce, HDFS, Hive, Pig, Spark, Python, Java 1.6 & 1.7, Linux, Eclipse, Cassandra, Zookeeper

Client: Freddie Mac Nov’12 – July’14

Location: McLean, VA Role: Hadoop Developer

Description: Built Hadoop cluster ensuring High availability for Name Node, mixed-workload management, performance optimization, health monitoring, backup and recovery across one or more nodes.

Customer Sentiment rating is an important measure that helps any organization to know the actual response of the customer to the product or service they are being provided.

Responsibilities:

Designed the Hadoop jobs to create the product recommendation using collaborative filtering.

Designed the COSA pretest utility Framework using MVC, JSF Validation, Tag library and JSF Baking beans.

Integrated the Order Capture system with Sterling OMS using JSON Web service .

Configured and Implemented Jenkins, Maven and Nexus for continuous integration.

Mentored and implemented the test driven development (TDD) strategies.

Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.

Developed the Data transformation script using Hive and MapReduce.

Developed PIG scripts using Pig Latin.

Exported data using Sqoop from HDFS to Teradata on regular basis.

Developing Scripts and Batch Job to schedule various Hadoop Program.

Written Hive queries for data analysis to meet the business requirements and Designed and developed User Defined Function (UDF) for Hive.

Creating Hive tables and working on them using Hive QL.

Experienced in defining job flows.

Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Designed and implemented Map reduce-based large-scale parallel relation-learning system

Wrote the Map Reduce code for the flow from Hadoop Flume to ES Head.

Environment: Hadoop, Map Reduce, Horton Works, HDFS, Hive, Java, Jenkins, Maven, MVC, Cloudera, Pig, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, SQL connector.

Client: UHG August’11 – Oct’12 Location: India

Role: Java/J2EE Developer

Description: IIMS (Integrated Insurance Management System) helps the user to enroll the insurance through online. For the users IIMS will provide the quotes based on their specifications. It provides the claims for users through online.

Responsibilities:

Involved in Java, J2EE, struts, web services and Hibernate in a fast-paced development environment.

Followed agile methodology, interacted directly with the client on the features, implemented optional solutions

and tailor application to customer needs.

Used Apache POI for Excel files reading.

Developed the user interface using JSP and Java Script to view all online trading transactions.

Designed and developed Data Access Objects (DAO) to access the database.

Coded Java Server Pages for the Dynamic front end content that use Servlets and EJBs.

Coded HTML pages using CSS for static content generation with JavaScript for validations.

Used JDBC API to connect to the database and carry out database operations.

Used JSP and JSTL Tag Libraries for developing User Interface components.

Performing Code Reviews.

Performed unit testing, system testing and integration testing.

Involved in building and deployment of application in Linux environment.

Environment: Java, J2EE, JDBC, Struts, SQL. Hibernate, Eclipse, Apache POI, CSS.

Client: Intergraph June’09 – July’11

Location: Hyderabad, India

Role: Java/J2EE Developer

Description: This project enables the dealers to provide service warranty to the end customers. The end customers can buy the warranty from one dealer and can utilize the warranty service at any other dealer. IT system for this acts as the centralized system. It aids the IT systems of the dealers to generate the invoice to the end customers for service repairs. Also, it makes payments to the dealers for the service repairs. The dealer warranty system works in conjunction with the Billing and Invoice system.

Responsibilities:

Responsible for understanding the scope of the project and requirement gathering.

Developed the web tier using JSP, Struts MVC to show account details and summary.

Created and maintained the configuration of the Spring Application Framework.

Implemented various design patterns - Singleton, Business Delegate, Value Object and Spring DAO.

Used Spring JDBC to write some DAO classes which interact with the database to access account information.

Mapped business objects to database using Hibernate.

Involved in writing Spring Configuration XML files that contains declarations and other dependent objects

declaration.

Used Tomcat web server for development purpose.

Involved in creation of Test Cases for Unit Testing.

Used Oracle as Database and used Toad for queries execution and involved in writing SQL scripts,

PL/ SQL code for procedures and functions.

Used CVS, Perforce as configuration management tool for code versioning and release.

Developed application using Eclipse and used build and deploy tool as Maven.

Used Log4J to print the logging, debugging, warning, info on the server console.

Environment: Java, J2EE, JSON, LINUX, XML, XSL, CSS, Java Script, Eclipse

Contact this candidate