Data Java Developer

Location:

Overland Park, KS

Posted:

March 02, 2017

Contact this candidate

Resume:

Name: Vinay M Email: ******.*****@*****.***

Role: Hadoop Developer Phone: +1-323-***-****

Professional Summary:

* ***** ** ********** ** developing, implementing and configuring Hadoop ecosystem and development of various web applications using Java, J2EE.

Having 4+ years of experience in Big Data Analytics using apache Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, YARN, Spark, Scala, Oozie, Kafka and Flume.

Work experience with different Hadoop distributions Horton Works and Cloudera.

Excellent understanding of Hadoop distributed File system and experienced in developing efficient MapReduce jobs to process large datasets.

Good working knowledge in using Sqoop and Flume for data ingestion into HDFS.

Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.

Implemented Talend jobs to load data from different sources and also integrated with Kafka.

Highly skilled in integrating Kafka with Spark streaming for high speed data processing.

Very good at loading data into spark schema RDD’s and querying them using Spark-SQL.

Good at writing custom RDD’s in Scala and also implemented design patterns to improve the performance.

Experienced in using apache Hue and Ambari to manage and monitor the Hadoop clusters.

Experience in analysing large amounts of data using Pig and Hive scripts.

Mastery in writing customized UDF's using java to extend Pig and Hive functionalities.

Sound knowledge in using Apache Solr to search against structured and un-structured data.

Worked with Azkaban and Oozie workflow schedulers to recurrently run Hadoop jobs.

Experience in implementing Kerberos authentication protocol in Hadoop for data security.

Experience in creating dash boards and generating reports using Tableau.

Experience in using Sequence files, ORC, Parquet and Avro file formats and compression techniques like LZO.

Worked on NoSQL databases like HBase, Cassandra to store the processed data.

Good knowledge in cloud integration with Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2), Amazon's Simple Storage Service (S3) and Microsoft Azure.

Hands on experience on UNIX environment and shell scripting.

Experienced in using version control system GIT, build tool Maven and integration tool Jenkins.

Expertise in development of Web Applications using J2EE technologies like Servlets, JSP, Web Services, Spring, Hibernate, HTML, JQuery, Ajax and etc.

Implemented design patterns to improve quality and performance of the applications.

Worked on Junit to test the functionality of java methods and used Python to do automation.

Good experience in using Relational databases Oracle, SQL Server and PostgreSQL.

Worked with Waterfall, agile, Scrum and Sprint software development framework for managing product development.

Areas of Expertise:

Hadoop Eco System

HDFS, MapReduce, Pig, Hive, Sqoop, Flume, Zookeeper, Oozie, Kafka, Storm, Talend, Spark, NiFi, Solr, Avro, and Crunch.

Programming languages

Java, Python, Scala.

No SQL Databases

HBase, Cassandra, MongoDB.

Databases

Oracle, SQL Server, PostgreSQL.

Web Technologies

HTML, JQuery, Ajax, CSS, JavaScript, JSON, XML.

Business Intelligence Tools

Tableau, Jasper reports.

Testing

Hadoop Testing, Hive Testing, MRUnit.

Operating Systems

Linux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP.

Hadoop Distributions

Cloudera Enterprise, Horton Works.

Technologies and Tools

Servlets, JSP, Spring (Boot, MVC, Batch, Security), Web Services, Hibernate, Maven, GitHub.

Application Servers

Tomcat, JBoss.

IDE’s

Eclipse, Net Beans, IntelliJ.

Work Experience:

Sprint, Overland Park, KS (Jan 2015 – Till date)

Role: Hadoop Developer

Description: The objective is delivering large-scale programs that integrate with technology for the clients to achieve high performance. Design, implement and deploy custom applications on Hadoop cluster. Providing issue based solutions using big data analytics. The technical stack includes Spark, Kafka, Scala, Pig, Hive and Oozie.

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution.

Worked on Kafka and REST API to collect and load the data on Hadoop file system and also used sqoop to load the data from relational databases.

Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra database.

Started using apache NiFi to copy the data from local file system to HDP.

Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs.

Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.

Worked with Avro, ORC file formats and compression techniques like LZO.

Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions, Dynamic Partitions, Buckets on HIVE tables.

Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.

Worked on migrating MapReduce programs into Spark transformations using Scala.

Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.

Using Job management scheduler apache Oozie to execute the workflow.

Using Ambari to monitor node’s health, status of the jobs and to run the analytics jobs in Hadoop clusters.

Worked on Tableau to build customized interactive reports, worksheets and dashboards.

Implemented Kerberos for strong authentication to provide data security.

Worked on apache Solr for indexing and load balanced querying to search for specific data in larger datasets.

Involved in performance tuning of spark jobs using Cache and using complete advantage of cluster environment.

Environment: Hadoop, HDP, Spark, Scala, Kafka, Hive, Sqoop, Ambari, Solr, Oozie, Cassandra, Tableau, Jenkins, Bit bucket, Hortonworks and Red Hat Linux.

Cerner, Kansas City, MO (June 2013 – Dec 2014)

Role: Hadoop Developer

Description: Cerner is an innovative group focusing on developing data-driven business and mathematical model-based solutions for the health industry, positioning Cerner as the world’s largest health informatics properties mediating petabyte-scale health data.

Responsibilities:

Design and develop analytic systems to extract meaningful data from large scale structured and unstructured health data.

Created Sqoop jobs to populate data present in relational databases to hive tables.

Developed UDF’s in java for enhancing functionalities of Pig and Hive scripts.

Solved performance issues in Pig and Hive scripts with deep understanding in joins, groups and aggregations and how these jobs does translated into MapReduce jobs.

Involved in creating Hive external tables, loading data, and writing Hive queries.

Developed the processed data in HBase for faster querying and random access.

Defined job flow using Azkaban scheduler to automate the Hadoop jobs and installed zookeepers for automatic node failovers.

Managing and reviewing Hadoop log files to find the source for job failures and debugging the scripts for code optimization.

Developed complex MapReduce Programs to analyse data that exists on the cluster.

Developed the processes to load data from server logs into HDFS using Flume and also loading from UNIX file system to HDFS.

Build a platform to query and display the analysis results in dashboard using Tableau.

Used apache Hue web interface to monitor the Hadoop cluster and run the jobs.

Implemented apache Sentry for role based authorization to access the data.

Developed Shell scripts to automate routine DBA tasks (i.e. data refresh, backups)

Involved in the performance tuning for Pig Scripts and Hive Queries.

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Azkaban, Tableau, Java, Maven, Git, Cloudera, Eclipse and Shell Scripting.

Kohl's, Sunnyvale, CA (May 2012 – May 2013)

Role: Hadoop Developer

Description: The project is migrating to big data world from RDBMS for storing and analyzing consumer information. The challenges are moving consumer data from RDBMS to HDFS, perform analysis on the data and forward the analyzed results to BI teams to make decisions.

Responsibilities:

Worked on Hadoop Cloudera cluster of 50 data nodes with Red Hat enterprise Linux installed.

Involved in loading data from UNIX file system to HDFS using Shell Scripting.

Importing and exporting data into HDFS from Oracle 10.2 database using Sqoop.

Developed ETL processes to load data from multiple data sources to HDFS using Sqoop, analyzing data using MapReduce, Hive and Pig Latin.

Developed custom UDF’s for pig scripts for cleaning unstructured data and used different joins and groups whenever required to optimize the pig scripts.

Created hive external tables on top of processed data to easily manage and query the data using HiveQL.

Involved in performance tuning of Hive Queries by implementing Dynamic Partitions, buckets in Hive to improve the performance.

Integrated Map Reduce with HBase to import bulk data using MR programs.

Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.

Wrote the Map Reduce jobs in java to parse the web logs, which are stored in HDFS and used MRUnit to test and debug MapReduce programs.

Implemented the workflows using Apache Oozie framework to automate tasks.

Coordinated with team in resolving the issues technically as well as functionally.

Environment: Cloudera, HDFS, MapReduce, Pig, Hive, Flume, Sqoop, HBase, Oozie, Maven, Git, Java, Python and Linux.

Codon Soft Pvt. Ltd, Hyderabad, India. (April 2010 – Mar 2012)

Role: Java Developer

Description: The project is to develop a web application for Sea Food Supplier Company Located in Australia, Which will take orders online, maintain the warehouse, alert the management if any stock is running out, giving the customers tracking information.

Responsibilities:

Involved in designing and development of the project using java and J2EE technologies by following MVC architecture of which JSP’s are views and Servlets as controllers.

Using StarUML designed network and use case diagrams to monitor the work flow.

Wrote server side programs to handle requests coming from different types of devices like iOS and using RESTful Web Services.

Implemented design patterns like Cache Manager and Factory classes to improve the performance of the application.

Used hibernate ORM tool to store and retrieve the data from PostgreSQL database.

Involved in writing test cases for the application using Junit.

Followed the Agile software development process to do this project and achieved the fast development.

Environment: JSP, Spring MVC, Spring Security, Servlets, Ajax, RESTful, Hibernate, Design Patterns, StarUML, Eclipse and PostgreSQL.

Codon Soft Pvt. Ltd, Hyderabad, India. (Dec 2008 – Mar 2010)

Role: Java Developer

Description: The project is for developing a web based application to eliminate all the paperwork in the laboratories, reading the data from different instruments and store the data in a relational database and generating business intelligence reports for the management.

Responsibilities:

Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.

Developed custom JSP tags for the application.

Writing queries for fetching and manipulating data using ORM software iBatis.

Used Quartz schedulers to run the jobs sequentially at given time.

Implemented design patterns like Filter, Cache Manager and Singleton to improve the performance of the application.

Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.

Deployed the application in client’s location on Tomcat Server.

Environment: HTML, Java Script, Ajax, Java, Servlets, JSP, iBatis, Tomcat Server, SQL Server, Jasper Reports.

Education:

B.E in Information Technology, Acharya Nagarjuna University, India

Contact this candidate