Anil .K
aczlnn@r.postjobfree.com
PROFESSIONAL SUMMARY
Over 8 years of extensive IT experience and over 4 years of experience in Hadoop eco-system and java
technologies like HDFS, Map Reduce, Apache Pig, Hive, HBase, Sqoop, Flume, YARN and Zookeeper.
Highly Proficient and in depth understanding of Hadoop Architecture and various components such as HDFS, Job
Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts and experience in working with
MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
Experience in importing and exporting Tera bytes of data between HDFS and Relational Database Systems using
Sqoop.
Expert in ingesting data into the Big Data eco System.
Proficiency in Spark using Scala for loading data from the local file systems like HDFS, Amazon S3, Relational and
NoSQL databases using Spark SQL, and Import data into RDD and Ingesting data from a range of sources using
Spark Streaming.
Hands on experience in working with Hadoop Ecosystems Including Hive, Pig, HBase, Oozie, Impala, Spark, Drill
and Hue.
Working knowledge on ETL and BI tools like Informatica and Teradata.
Extensive Knowledge in Development, analysis and design of ETL methodologies in all the phases of Data
Warehousing life cycle.
Worked in Building application platforms in the Cloud by leveraging Amazon Web Services, open source
technologies & best engineering practices of CI/CD.
Expertise in databases such as Oracle, MySQL, SQL Server and IBM DB2 databases to manage tables, views,
indexes, sequences, stored procedures, functions, triggers and packages, and expertise in NoSQL databases like
MongoDB, Cassandra to manage document oriented data, manage cluster and CRUD operations on data.
Thorough knowledge in core Java concepts like OOP, JAVA SWING, JDBC, JMS, Multi-Threading, JUnit and
advanced Java concepts like JSP, Servlets, Struts, HTML, XML, CSS, Hibernate, AJAX, SVN, Java Beans and
SPRING
Proficient in developing web based applications and client server distributed architecture applications in Java/ J2EE
technologies using Object Oriented Methodology.
Worked with cloud services like Amazon web services AWS and Google cloud.
Excellent Knowledge in Data Warehousing Concepts.
Well versed in using software development methodologies like Water Fall, Agile (SCRUM), and Test Driven
Development and Service orientation architecture.
Experience in using code repository tools - Tortoise SVN, GitHub, and Visual Source Safe.
Strong communication and analytical skills and a demonstrated ability to handle multiple tasks as well as work
independently or in a team
TECHNICAL SKILLS
Hadoop/Big data technologies Hive, Hbase, Sqoop, Pig, MapReduce, YARN, flume, Oozie, Zoo Keeper
J2SE/J2EE Technologies Java, J2EE, JDBC, JSP, Servlets, Spring, Java Beans
Web Services SOAP, RESTful
IDE Tools Eclipse, Net Beans, RSA, RAD, Oracle Web logic workshop
Cloud Technologies AWS EC2, S3
Databases Oracle, SQL-Server, MySQL server, MS SQL, IBM DB2, MongoDB, Cassandra
Web/Application Servers Apache Tomcat, IBM WebSphere, Web logic Application server, JBOSS
Programming or Scripting
C, Java, Unix Shell/Bash Scripting, Python.
Languages
Platforms Windows, Linux and Unix
Version Control Tortoise SVN, GIT and Visual Source Safe
Methodologies Agile/ Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Client: Johnson Controls, Milwaukee, WI May 15
Till Date
Role: Hadoop Developer
Description:
Johnson Controls is one of the top fortune 500 companies. When I was hired into Johnson s family they are developing a
new ingestion framework to replace their current framework with advanced features. In this project, I got a chance to
work various stages of frame work development using Hadoop tools.
Responsibilities:
Involved in HBASE setup and storing data into HBASE, which will be used for analysis
Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
Involved in loading data from LINUX file system to HDFS.
Developed Spark scripts by using Scala Shell commands as per the requirement.
Involved in converting Map Reduce programs into Spark transformations using Spark RDD on Scala.
Worked hands on with ETL process, responsible for running Hadoop streaming jobs to process terabytes of xml data.
Wrote and Implemented Apache PIG scripts to load data from and to store data into Hive.
Implementing Spark Streaming Applications in Java.
Combined visualizations into Interactive Tableau Dashboards and published them to the web portal.
Data analysis using Spark with Scala.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
Analyzed the SQL scripts and designed the solution to implement using PySpark.
Worked on reading multiple data formats on HDFS using Scala.
Extracted and updated the data into Monod using MongoDB import and export command line utility interface.
Implemented real time system with Kafka, Storm and Zookeeper.
Developed Hive Scripts equivalent to Teradata and performance tuning using Hive.
Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and
databases such as HBase, and MongoDB.
Client Communication and Participated in the requirements gathering with Business users
Co-ordinate with offshore and onsite team to understand the requirements and prepare High level and Low-level
design documents from the requirements specification.
Environment: Hive, Pig, HBase, MapReduce, Flume, Spark, Spark SQL, Spark Streaming, PySpark, Scala, Python,
Kafka, storm, Zoo Keeper, Shell Scripting, NoSQL database MongoDB, Oozie, Zoo Keeper, Shell/Bash Scripting,
YARN, JIRA, JDBC
Client: United Health Group, Minnetonka, MN Jan 14
April 15
Role: Hadoop Developer
Description:
United Health Group is well known for providing insurance and health services in the United States. In this project, we
developed a framework called DataFabric which aims to auto-ingest data from different data sources into their own
datalake for performing analysis on all data. Framework is developed in such a way to manage and maintain the
incremental data which would be coming along with historical data. The framework uses different environments like
Hive, Hbase, Talend, Hadoop environment. DataFabric framework is still under development which needs to face many challenges to
overcome and to become more efficient framework.
Responsibilities:
Responsible to manage data coming from different sources.
Storage and Processing in Hue covering all Hadoop ecosystem components.
Involved in creating Hive tables, and loading and analyzing data using Hive queries.
Developed Simple to complex MapReduce Jobs using Hive and Pig
Involved in system design and development in Core Java using Collections, Multithreading.
Experience in CI and CD with Jenkins.
Develop Pig Latin scripts to extract data from the output files to load into HDFS.
Develop custom UDF's and implement Pig scripts.
Implemented UDFs in java for hive to process the data that can't be performed using Hive inbuilt functions.
Developed simple to complex Unix shell/Bash scripting scripts in framework developing process.
Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
Involved in writing Flume and Hive scripts to extract, transform and load the data into Database.
Used Oozie to orchestrate the MapReduce jobs and worked with HCatalog to open up access to Hive's Metastore.
Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with
components on HDFS, Hive.
Created contexts to use the values throughout the process to pass from parent to child jobs and child to parent jobs.
Experienced in Talend Data Integration, Talend Platform Setup on Windows and Unix systems.
Worked on POC's to inetgrate Spark with other tools.
Created Oozie Jobs for workflow of Spark, Sqoop and Shell scripts.
Created Spark Application to load data into Dynamic Partition Enabled Hive Table.
Worked on state full transformation of Spark Application.
Using Mahout MapReduce to parallelize a single iteration.
Installation, management and monitoring of Hadoop cluster using Cloudera Manager.
Automation script to monitor HDFS and HBase through cronjobs
Translation of Business Processes into data mappings for building the Data Warehouse.
Created Parquet Hive tables with Complex Data Types corresponding to the Avro Schema.
Planned releases with the team using JIRA and Confluence.
Processed data with Hive and Teradata, and developed web applications using Java and Oracle SQL.
Environment: HDFS, Sqoop, HiveQL, HBase, Pig, Flume, Yarn, Oozie, Kafka, Zoo keeper, Apache Storm, Core Java,
Jenkins, Teradata, VPC, Apache Parquet, ETL, Git, UNIX/Linux Shell Scripting, NoSQL, JIRA
Client: Yale University, New Haven, CT June 13 Dec 13
Role: Hadoop Developer
Description:
Yale University is an American Private Ivy League research university in New Haven, Connecticut. Founded in 1701 in
Saybrook Colony as the Collegiate School, the University is the third-oldest institution of higher education in the United
States.
Responsibilities:
Analyzing the requirement to setup a cluster.
Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple
MapReduce jobs in java.
Worked with the infrastructure and admin team in designing, modelling, sizing and configuring Hadoop cluster of 15
nodes.
Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
Developed Unix/Linux Shell Scripts and PL/SQL procedures.
Extracted the data from MySQL into HDFS using Sqoop.
Created Hive queries to compare the raw data with Enterprise Data Warehouse (EDW) reference tables and
performing aggregates
Importing and exporting data into HDFS and Hive using Sqoop.
Writing Pig scripts to process the data.
Developed and designed Hadoop, Spark and Java components.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
Developed Unix/Linux Shell Scripts and PL/SQL procedures.
Building Enterprise Data Warehouse (EDW) on Amazon Redshift Database via Hadoop EMR (Pig).
Using various tools such as Jenkins, Ant, Maven, and Chef I established release management processes.
Configured and managed through the Amazon Web Services (AWS) Management Console using Amazon Cloud
Search.
Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection
from twitter source to HDFS with popular hash tags.
Implementing JavaScript Execution Core Java.
Analyzed the possibility of reusing existing BTEQ code and recommended compute/storage loads of Teradata to be
offloaded to Hadoop/Hive for business benefits.
Developing clustering, classification and recommender systems around Elastic Search and Mahout and compare and
contrast it with SPARK MLlib.
Migrated corporate Linux servers from physical servers to Amazon Web services (AWS) virtual servers.
Evaluated, recommended, maintained, and administered issue tracking tool bugzilla and managed issues with JIRA
The data was loaded using the Hive Parquet Serde s for the Avro Data Type.
Used SQL queries to retrieve data from Enterprise Data Warehouse (EDW).
Involved in Dimensional Modeling on AWS Redshift and Tuning.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and
Sqoop.
Unit tested a sample of raw data and improved performance and turned over to production.
Environment: HDFS, MapReduce, Core Java, Unix Shell Scripting, PL/SQL, Pig, Hive, Hbase, Sqoop, Flume, Oozie,
Zoo keeper, Core Java, HiveQL, Kafka, NoSQL, Spark, Amazon EMR, Amazon Redshift, Mahout, Amazon Web
services (AWS), Apache Parquet, ETL, Teradata, JIRA, Jenkins, Git, UNIX/Linux Shell Scripting, Java/J2EE
Century National Insurance, India Dec 2009
Jan 12
Java Developer
Description:
The New York Motor Vehicle Commission (MVC) is the government agency responsible for titling, registering and
providing plates and also licensing drivers in the U.S state of New York. It also provides online support for Renewing
titles, registrations, licenses etc.
Responsibilities:
Involved in deployment of full Software Development Life Cycle (SDLC) of the tracking system like Requirement
gathering, Conceptual Design, Analysis, Detail design, Development, System Testing and User Acceptance
Worked in Agile Scrum methodology
Involved in writing exception and validation classes using core java
Designed and implemented the user interface using JSP, XSL, DHTML, Servlets, JavaScript, HTML, CSS and AJAX
Developed framework using Java, MySQL and web server technologies
Validated the XML documents with XSD validation and transformed to XHTML using XSLT
Implemented cross cutting concerns as aspects at Service layer using Spring AOP and of DAO objects using Spring-
ORM
Spring beans were used for controlling the flow between UI and Hibernate
Services using SOAP, WSDL, UDDI and XML using CXF framework tool/Apache Commons
Worked on database interaction layer for insertions, updating and retrieval operations of data from data
base by using queries and writing stored procedures
Wrote Stored Procedures and complicated queries for IBM DB2. Implemented SOA architecture with Web
Used Eclipse IDE for development and JBoss Application Server for deploying the web application
Used Apache Camel for creating routes using Web Service
Used JReport for the generation of reports of the application
Used Web Logic as application server and Log4j for application logging and debugging
Used CVS version controlling tool and project build tool using ANT
Environment: Java, HTML, CSS, JSTL, JavaScript, Servlets, JSP, Hibernate, Struts, Web Services,, Eclipse, JBoss, JSP,
JMS, JReport, Scrum, MySQL, IBM DB2, SOAP, WSDL, UDDI, AJAX, XML, XSD, XSLT, Oracle, Linux, JBoss,
Log4J, JUnit, ANT, CVS
InfoTech, India Aug 08 Nov 09
Java Developer
Description:
The objective of Item Management is to set-up, maintain, and share Item information in a flexible system that easily
supports Unilever s growth, increases speed to market and improves data accuracy, while reducing user workload. Guided
Setup is one module of the project which functions as a wizard to complete the configuration and creation of various types
of items like Single Items, Multiple Items and Assortment Items.
Responsibilities:
Analysis, design and development of application based on J2EE and design patterns
Involved in all phases of SDLC (Software Development Life Cycle)
Developed user interface using JSP, HTML, CSS and JavaScript
Involved in developing functional model, object model and dynamic model using UML
Development of the Java classes to be used in JSP and Servlets
Implemented asynchronous functionalities like e-mail notification using JMS
Implemented Multithreading to achieve consistent concurrency in the application
Used the Struts framework for managing the navigation and page flow
Created SQL queries and used PL/SQL stored procedures
Used JDBC for database transactions
Developed stored procedures in Oracle
Involved in developing the helper classes for better data exchange between the MVC layers
Used Test Driven Development approach and wrote many unit and integration test cases
Used Eclipse as IDE tool to develop the application and JIRA for bug and issue tracking
Worked on running integrated testing using JUNIT and XML for building the data structures required for the Web
Service
Used ANT tool for building and packaging the application
Code repository management using SVN
Environment : Core Java, Struts, Servlets, HTML, CSS, JSP, XML, JavaScript, Water fall, Eclipse IDE, Oracle, SQL,
JDBC, JBOSS, JUNIT, ANT, JUNIT, Eclipse ANT, SVN, Apache Tomcat Server