Hadoop/Spark Developer

Location:

United States

Posted:

September 08, 2017

Contact this candidate

Resume:

Suraj V

Contact: 937-***-****

Email: ********@*****.***

“Transforming data into information, and information into insight”

PROFILE: HADOOP DEVELOPER

Purveyor of competitive intelligence and holistic, timely analyses of Big Data made possible by the successful installation, configuration and developing of Hadoop ecosystem components for Organization’s productive results.

Experience Summary

Over 6 years of working experience in system requirements, analysis, design, testing, implementation and development for various tools like (HDFS, Map Reduce,Oracle, JDK, J2EE,Apache Kafka, Java Script, Java, JS, JDBC, JSP, SERVLETS, XML, Maven and SQL Server, and Mongo-DB)

Worked with most of the JDK versions for 6 years, in almost many modules.

More than two years of experience in installing, configuring and testing Big Data and Hadoop eco system components. Good understanding of Cassandra and Scala programming.

Good usage of Spark Core and Spark SQL with Scala and Java.

Worked on Lambda expressions, informal, closures or anonymous methods in Java-8.

Capable of processing large sets of structured, semi-structured, unstructured data and supporting systems application architecture.

Developed several programs us UNIX shell script.

Extensive working knowledge on YARN and KAFKA.

Extensively worked on cloud platforms like Amazon web services (AWS).

Written several Functional interfaces in Java-8.

Excellent hands on work with Hive UDF UDAF and UDTF.

Extensively involved in all phases of Software development life cycle including Analysis, design, development, Implementation, testing and support

Good knowledge in PL/SQL, stored Procedures, packages, functions, NoSQL database like Mongo DB and Cassandra.

Regular use of authentication protocols like Kerberos and LDAP

Loaded huge data into Spark RDD and do in memory data Computation to generate the Output response.

Had good work experience with Amazon Ec2 cluster, and worked with several instances.

Good Hands on work with Python, and very much comfortable in operating (Mac/Windows/Linux)

Exposure to Waterfall and Agile Software methodologies.

Expert in Problem solving, excellent analytical, troubleshooting and debugging skills

Good domain knowledge in Supply chain,Financial and Insurance domain

Worked on Apache FLUME distributed service

Used various compression techniques like LZO, G-ZIP, and Snappy.

Experience with house automation tool AUTOMIC.

Developed Map-reduce programs and libraries using Java-8, and extracted several abstract methods.

Strong experience in Web based applications design, development and implementation.

Excellent team player with good communication, presentation and highly motivated.

Education:

M.S (Computer Science) Sacred heart University (Connecticut)

Bachelors of Technology in Computer Science (J.N.T.University, Telangana)

Areas of Expertise:

Big Data Ecosystems: Hadoop, Map-Reduce, HDFS, Mongo DB, H-Base, Hive, Pig, Zookeeper, Oozie, Flume, Sqoop, Cassandra, Java-8.

Programming Languages: C/ C++, Java (Ant, springs, Maven).

Scripting Languages: JavaScript, HTML, Python, XML,JSP & Servlets, PHP and Bash

Databases: Oracle,NoSQL

UNIX Tools: Yum, RPM, Apache, red hat Linux

Tools: Eclipse, Cloudera, Horton-works, JDeveloper, J-Probe, Net-beans CVS, Ant.

Data Stage: IBM

Platforms: Windows (2000/XP), Linux, Solaris, AIX, AWS Platform.

Application Servers: Apache Tomcat 5.x 6.0, J-boss 4.0

Testing Tools: WSAD, RAD

Methodologies: Agile, UML, Design Patterns

Professional Experience:

Client: Portland General Electric Duration: 2015 [Oct] – Till date

Portland, Oregon Role: Hadoop Developer

Description: Portland General Electric (PGE) is a Utility Company which provides Electricity to Greater Portland area and other parts or Oregon which maintains the Electric efficiency, customer complaints, emergency issues, Keep track of bills, details, and addresses and all other info. Most of the PGE data is semi-structured and Un-structured, which made them to choose Hadoop File systems in their usage.

• Worked on several Big data Ecosystems like Oozie, Sqoop, Spark, Kafka, Flume, Pig, H-Base, Hive, and Sqoop with CDH5.

•POC on Migrating Map Reduce programs into Spark transformations using Spark SQL and Scala.

• Worked on Hadoop cluster during pre-production stage which ranged from 20-30 nodes and sometimes extended even more in production time.

• Worked on Spark Streaming and Spark SQL to run sophisticated applications on Hadoop.

• Extensively worked on NO-SQL databases like Cassandra.

• Installed several UNIX shell scripts in compliant system installation.

• Worked on concept of quorum with Kafka and zookeeper.

• Good Hands on work with Python, and very much comfortable in operating (Mac/Windows/Linux)

• Worked on server side process using JDK 8 and Java API’s us H-Base.

• Developed certain programs related to Scala Transformations such as map, filters and other concepts.

• Worked on Apache Spark SQL, Spark streaming using Scala programming.

• Worked on multi-node installation with zookeeper esseblers.

• Injected Elastic IP address and automated scaling concepts working with AWS (Amazon) EC2.

• Implemented Java-8 mechanisms, method references for lambda expressions.

• Developed UDF's in both Data frames/SQL / RDD/MapReduce and Scala scripts in Spark 1.6 for writing data back into OLTP system, Data Aggregation and queries.

• Configured with RPC-node, Listen-node, and configured seeds in Apache Cassandra.

•Worked on Cassandra clusters in Amazon Web Services Cloud and migrated data between environments.

• Good Knowledge on Data Stage with IBM to support extended metadata and for its management.

• Worked with the encrypted zone directory in HDFS Cloudera distribution to generate Data Encryption Keys.

•Worked together with ZOOKEEPER and KAFKA to make Kafka talk to Zookeeper by various classes.

• Written python scripts in Oozie work flows, to automate various jobs and.

•Developed Spark scripts by using Scala shell commands as per the requirement

• Worked on Data Serialization and HIVE serialization formats, which involves converting Complex objects into sequence bits by using CSV, PARQUET, JSON and AVRO formats.

• Created Hive aggregator to update the Hive table after running the data profiling job.

• Pig Scripts for joining, grouping, sorting and filtering the data.

• Worked on a POC for accessing big data on a distributed file system as part of IBM data stage to provide JSON support and for new JDBC connector.

• Implemented Bucketing, Dynamic and regular Partitioning in Hive.

•Designed and developed POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.

• Worked on data analysis in Cassandra using SQL, and supported analysis in complicated queries.

• Good understanding of ubiquitous enterprise connectivity as part of IBM data stage.

• Exported the analyzed data to Relational databases using Sqoop to generate reports and visualization for the BI team.

• Worked on Attributes, Resources, Fetcher cache and multiple disks in in Mesos.

• Managed rapidly growing Data sets using several components in Big data.

• Used Teradata utilities fast load, multi load, tpump to load data

• Given File permissions to access encrypted files and metadata to control them by HDFS.

• Document/Develop/Capture architectural best practices for building systems on AWS.

• Developed ETL workflow that pushes the webserver logs to Amazon S3 bucket.

• Reviewed and Managed Hadoop log files.

• Analyzed huge data sets to determine optimal way to aggregate and report on these data sets.

Environment: Hadoop, HDFS, H-Base, Sqoop, Pig, IBM data Stage, Mesos, Oozie, MapReduce, Python, Zookeeper, Hive, Oracle Cassandra, Flume, Teradata, UNIX Shell Scripting, MySql, JAVA.

Client: Vanderbilt University Duration: 2013[Dec] – 2015 [August]

Nashville, Tennessee Role: Hadoop Developer

Description: It’s a Private research University, commonly known as Vandy. The University in involved in various departments like medical care, education group and many other departments, to maintain a huge set of data and its maintenance as they have to update their data in every particular interval.

• Analyzed Business requirements of Big Data and transformed into Hadoop centric technologies.

• Worked on Hadoop cluster of range 8 nodes.

• Worked on indexes, scalability and query language supporting using Cassandra

• Worked on Data import and export from Teradata and Oracle into HDFS and Hive using Sqoop.

• Implemented custom UDF's for Hive to achieve comprehensive data analysis, and also created several JAVA-UDF. Good Understanding with Apache KAFKA

• Built H-Base to implement in-memory operation and bloom filters on per-column basis.

• Worked on Fast export, Fast Load, Multi Load and export from Teradata and Oracle into HDFS and Hive us Sqoop.

• Spanned different AWS instances like EC2-classic and EC2-VPC with cloud formation templates.

• Developed Pig Custom UDF's for performing various levels of optimization in custom input formats.

• Worked on streaming log data into HDFS from web servers using Flume.

• Used python to embed several applications .Worked on various standard libraries in Python.

• To filter data as per requirement we implemented custom interceptors for flume.

• To identify issues and behavioral patterns we used Hive and Pig to analyze data in HDFS.

• Extensively used PIG to communicate with Hive using HCatalog and H-BASE using Handlers.

• For optimized performance we defined static and dynamic partitions and created internal and external Hive tables.

• Created Hive tables to store the processed results in a tabular format.

• Wrote, tested and implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL.

• For running advanced analytics we developed Pig Latin scripts on the data collected.

• Extraction, processing and analysis of data is configured on daily workflow using Oozie Scheduler.

• Designed and implemented MapReduce-based on large-scale, parallel and relation-learning system.

• Developed Java spring and helper classes in business layer and used Junit for its testing.

• Worked on Shell scripts and involved in performance analysis of the application and fixed problems/suggest solutions.

• Used Eclipse Workbench where editors, perspectives, views, wizards and many worked as rich client platform.

• Extensively used JDBC for database transactions.

• Involved in Interface development for applications using JSP, JavaScript and Servlets.

• Created Utility classes and Java validation classes.

• Actively involved in Stress Testing of existing business components using Web-Logic Application Server.

• Created Sequence and Class diagrams by using Violet integrated with eclipse.

• For version repository we used Rational Clear Case.

• Extensive usage of Rest full web services to communicate with all external system throughout modules.

• Involved in creating various Utility, Helper and Reusable classes which are used across all modules of application.

Environment: Hadoop, MapReduce, Mongo DB, HDFS, Hive, Pig, Java, SQL, Sqoop, Oozie, NoSQL, : JDK- 8, J2EE, JDBC, Java 1.4,Java spring, Servlets, JSP, Web services, Flume, MVC, HTML, JavaScript 1.2.

Client: CITI Bank Duration: 2011 [March] - 2013 [June]

Hyderabad, Telangana Role: Java/Big-data Developer

Description: CITI bank is a global banking sector, conquering most of the states in India too, with Mumbai as its head-quarters. It has a staff of 7500 people involved in various streams. It’s highly focused to developing various kinds of applications, where it was striving for customer flexibility.

Involved in business technical issues.

Involved in coding, designing, debugging, documenting and maintaining applications.

Good understanding of Big-Data querying tools, and experience with No-SQL Database.

Developed web service API’s using java and java spring.

Developed front end GUI with Java Server Faces.

Good understanding of distributed computing principles.

Using servlets and JSPs implemented controller layer. View layer using JSPs, EL, JSTL and custom JSP tags.

Implemented HTML, CSS and JavaScript for UI’s.

Executed test cases manually to verify expected results.

Implemented various aspects at Service layer using Spring AOP.

Used Test Director, added test categories and test details.

Used Object/relational-mapping (ORM) solution, Hibernate, technique of mapping data involved in doing the GAP Analysis of the use cases and requirements.

Worked on user/business requirements and developed System test plans.

Environment: Servlets, Java (Jdk 1.6),JSPs, HTML, Java Beans, JavaScript, CSS, JQuery, JDBC, SQL, Windows 98,Oracle 9i/10g, JSP, Servlets, J2EE, Java 1.4, C, C++, PHP, Multi-threading, JDBC.

Client: Sutherland Global Services Duration: 2010 [May] – 2011 [Feb]

Hyderabad INDIA Role: Java Developer

Description: Sutherlands is a global service provider and offers analytics-driven IT services, transformation services, Being a 24/7 running service company, it always requires new specifications and some-one running their applications.

• Created/modified shell scripts for scheduling and automating tasks.

• Used JDBC to establish connection between the database and the application.

• Executed test cases manually to verify expected results.

• Implemented various aspects at Service layer using Spring AOP.

• Used Test Director, added test categories and test details.

• Implemented controller and view modules using Servlets and JSPs respectively.

• Wrote unit test cases using JUnit framework.

• Involved in designing, coding, debugging, documenting and maintaining a number of applications.

• Created the user interface using HTML, CSS and JavaScript.

Environment: Java-Script, Java spring, JUnit, JSP, Java Beans, HTML, CSS, Oracle 9i Java, Servlets.

Contact this candidate