Sign in

Data Manager

Aurangabad, Maharashtra, India
October 14, 2016

Contact this candidate


Venkata E-Mail

Direct Ph: 201-***-****



8+ years of IT industry experience encompassing wide range of skill set.

4+ years of experience in working with Big Data Technologies on system which comprises of several applications, highly distributive, massive amount of data using Cloudera, MapR and IBM BigInsights Hadoop distributions.

Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.

Excellent knowledge on Hadoop architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.

Good understanding of data replication, HDFS Concepts, High Availability, Reading/Writing data onto HDFS, data flow etc in HDFS.

Good knowledge of setting up Hadoop clusters in different distributions.

Experience on Administering and Monitoring of Hadoop Cluster like commissioning and decommissioning of nodes, file system check, Cluster maintenance, upgrades etc.

Experience in designing the multi node Hadoop cluster with master and slave nodes.

Experience on Cloudera, MapR and also IBM distribution.

Good understanding of Hadoop YARN which is Hadoop cluster resource management system and more popular these days.

Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySql and vice versa using Sqoop.

Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper and experience in setting up of Zookeeper on Hadoop Cluster.

Experience on running Oozie jobs daily, weekly or bi-monthly as needed for the business which will run in MapReduce way.

Experience on ETL and data visualization tool Pentaho data Integration, created jobs and transformations which makes analysis and some operations easier.

Good knowledge on NoSql Databases including HBase, MongoDB, MapR-DB.

Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.

Installations of Nagios, Ganglia open source tools on different environments.

Involved in maintaining and analyzing large data sets of memory in Petabytes efficiently.

Successful in running Spark on YARN cluster mode which can make performance faster.

Installation and configuration of Pentaho Data Integration in different environments.

Experience on deployment of Apache Tez on top of YARN.

Executed complex HiveQL queries for required data extraction from Hive tables which are created from HBase.

Monitoring Map Reduce jobs and YARN applications.

Good knowledge on Apache Solr which is used as search engine in different distributions.

Extensive experience on Object Oriented Analysis and Design, JAVA/J2EE technologies, Web services.

Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL and writing complex queries, Views, Triggers etc for different data models.

Experienced in SDLC, Agile Methodology.

Ability to meet deadlines without comprising in delivering right output.

Possess Strong Communication skills, Analytical skills.


Big Data Technologies HDFS, Hadoop, Hive, Pig, Oozie, Zookeeper, Impala, Sqoop,

MapReduce, Tez, Spark, Flume, HBase, MongoDB, Kafta, YARN

Distributions Cloudera, MapR, IBMBigInsights, Hortonworks

Languages JAVA, SQL, PigLatin, HiveQL, Shell Scripting

Database NoSQL (HBase, MapR-DB, MongoDB), Oracle, MySQL, DB2, MS

SQL Server, MS Access

BI Tools Tableau, Pentaho, Talend

Software, Platforms &Tool Eclipse, Putty, Cygwin, PentahoDI, Hue, JIRA


1. Actelion Pharmaceuticals US, Inc., San Francisco, CA August 2015 – Present

Big Data Engineer

Description: In this project I am working on Hadoop eco systems including MapR-FS, Hive, HBase, MapR-DB, Oozie, Pig, Drill, Zookeeper, MCS (MapR Control System), Pentaho, Talend etc with MapR distribution. Actelion Pharmaceuticals make use of data from different sources uses Hadoop based for comprehensive understanding of patients. This platform is very useful for data analytics like gender of patient, AGE, if they have any existing conditions before using the drugs and post usage impact of the drug on the patient etc.


Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.

Involved in Hadoop cluster administration and successful in maintenance of large volumes of storage.

Involved in running the Oozie jobs daily, weekly, bi-monthly as required to know about the MapR-FS storage and for capacity planning.

Developed the external tables in Hive which can be used for obtaining required data for analysis by writing queries.

Written queries in HiveQL to structure the data in a tabular format.

Created the tables in Hive and write data in using Talend hive components.

Experience in administering the cluster, commissioning and decommissioning of data nodes, backup and recovery, File System Management, cluster performance and maintaining the healthy cluster in MapR distribution which uses MCS for cluster monitoring.

Used Storm for Click Stream analysis which is very useful for online customer experience and started using Talend, in this project for this purpose.

Experience in managing and reviewing Hadoop log files.

Experience working with Sqoop to transfer data between the MapR-FS to relational database like MySQL and vice versa and used Talend for Sqoop.

Involved in installation of Nagios and Ganglia which is tool for provisioning and monitoring the Hadoop cluster and viewing the health of a cluster.

Used Apache Spark on YARN to have fast large scale data processing and to increase performance.

Created the jobs and transformations in Pentaho Data Integration, ETL tool which are useful in analyzing the customer behavioral analysis.

Involved in writing MapReduce jobs.

Experience on Drill which can deliver secure, interactive SQL analytics at petabyte scale, most popular SQL engine for big data.

Installed Apache Tez, a programing framework which is built on YARN in increase performance.

Implemented Zookeeper for the cluster to have the concurrent access.

Experience in writing MapReduce jobs and streaming jobs.

Experience in troubleshooting the issues and failed jobs in the Hadoop Cluster.

Able to tackle the problems and accomplished the tasks which should be done during the sprint.

Environment: MapR-FS, M4&5, MCS, Hue, MapReduce, Hive, Pig, Sqoop, Kafka, Storm, Spark, YARN, Zookeeper, Oozie, HBase, MapR-DB, Pentaho DI, Maven, Linux, Talend.

2. Flightstar Aircraft Services, Jacksonville, FL June2014 – July 2015

Hadoop Developer


Flightstar Aircraft Services is the opportunity that a connected aircraft presents could be one of the most significant step-function changes in aviation history. The vast quantity of data across myriad parameters that a state-of-the-art aircraft generates could provide improvements in flight operations, reliability, maintenance and safety. By using Hadoop to bring all the information from different data sources onto common platform which can help to improve the thing that to bring continuous weather data from aircraft. The ADC/INS/GPS combo can report temperature and winds aloft at virtually every point along the flight in real time. Such data could be fed to weather services to give them accurate up to the second information (though every 6 minutes would be more than enough) about a large amount of the earth's atmosphere. In turn, we would get better weather information and forecasting. Other sensors could be added to aircraft for moisture. As to FDR functions, data need not be transmitted in bulk all of the time, but in small critical samples for every few minutes, or bursts of richer data when there is a detected problem or on command of the pilots or on command from the airline ops center. This platform ingests multiple different files using Cloudera distribution. The data can be tuned up and transformed as required for analytics.


Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce.

Involved in increasing the performance of system by adding other real time components like Flume, Spark to the platform.

Installed and configured Spark, Flume, Zookeeper, Ganglia and Nagios on the Hadoop cluster.

Installed Oozie workflow engine to run multiple Hive and Pig Jobs.

Developed Map Reduce Programs for data analysis and data cleaning.

Working with Apache Crunch library to write, test and run MapReduce pipeline jobs.

Developed PIG Latin scripts for the analysis of semi structured data.

Continuous monitoring and provisioning of Hadoop cluster through Cloudera Manager.

Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.

Worked on Impala for obtaining fast results without any transformation of data.

Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.

Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.

Used Tableau for visualizing and analyzing the data.

Experience on using Solr search engine which can be used for indexing and searching the data.

Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH4.0, Sqoop, Kafka, Storm, Oozie, HBase, Cloudera Manager, Crunch, Tableau, Linux.

3. IBM, Dallas, TX August 2013 – April 2014

Big Data Software Developer

Description: I worked with Hadoop, Hive, HBase, HDFS, Java MapReduce, Oozie, Flume, Pig, Solr, Hadoop cluster administration, Lucene, Big-SQL, Spark etc with IBM BigInsights distribution. IBM has a large data coming in and data analytics play important role so it became the vendor for Hadoop and developed distribution with key features required for its client’s needs. In this project we pulled the data from different database systems, web and social media sites using IBM Big Insights and perform text analytics as required for the client’s business needs.


Involved in designing the architecture in Hadoop.

Responsible for administering Hadoop system which include commissioning and decommissioning data nodes, cluster performance, maintaining cluster health, monitoring the system in web console etc. in IBM BigInsights distribution.

Worked on with importing and exporting data from different Relational Database Systems likeDB2 into HDFS and Hive and vice-versa, using Sqoop.

Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.

Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.

Written HiveQL queries on the hive table which are external tables created from HBase and generated reports from the data which are very useful for analysis.

Developed Pig Scripts, which is used as ETL tool to do transformations, aggregation of data before loading data into HDFS.

Experience on Storm and Kafka to get steam of data.

Worked on Apache Solr which is used as indexing and search engine.

Developed unit test cases using MR unit on MapReduce code.

Experience on Big-SQL which is interactive SQL engine with low latency and which is very useful for business.

Environment: Hadoop, MapReduce, Hive, HDFS, Pig, IBM BigInsights V2.0, Sqoop, Kafka, Storm, Lucene Oozie, HBase, Big SQL, JAVA and Red Hat Enterprise Linux.

4. Cisco-WebEx, Milpitas, CA December 2012 – July 2013

Hadoop Developer

Description: Cisco-WebEx uses Hadoop environment to store all its customer information, reviews, log requests from customers and fraud prevention. I worked on HDFS, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, HBase and so on. I was responsible to setup the Hadoop cluster and install various packages on the distribution and move various application data from staging databases to HDFS.


Introduced and developed architecture for a data platform service based on Apache open Source Hadoop eco system with HDFS, Flume, Solr, Impala, Hive to ingest, store, index and analyze big data.

Evaluated NoSQL data store solutions and delivered recommendations.

Migrated the data from traditional database to NOSQL, MongoDB to analyze the influx of data using Hadoop ecosystem tools to optimize business processes

Used Sqoop to import data into HDFS and Hive from other data systems.

Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.

Experience in writing and running MapReduce jobs on MongoDB data and return results back to MongoDB.

Good understanding of choosing NoSql databases for Hadoop.

Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH3.0, Zookeeper Sqoop, Oozie, MongoDB, Cloudera Manager, Linux.

5. Cisco Systems, Bangalore, India June 2010 – October 2012

Java Developer

Description: This is Cisco System’s enterprise wide Service Desk web application to track all the tickets created, with their progress in different steps by having the features like creation, updating, searching, history and deletion of tickets in the workflow for each Ticket type. It tracks the tickets department wise and maintains the authentication levels based on the Service Desk user, Business user or administrator. It has maintenance of the tickets module. It has setup module which sets up the different attributes of the tickets for different ticket types either globally or per Service department. Whenever ticket status or workflow steps got changed, it will notify all the corresponding Service Desk or Business users through automated email by using Java mail technology.


Developed web components using JSP, Servlets and JDBC.

Analyzing the use-cases to understand the business requirements and to assess the technical implementation of the functionality.

Used Java Mail API extensively to send the automated emails whenever ticket status or workflow steps got changed.

Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.

Used tools like TOAD for SQL operations on Oracle Database.

Development of database interaction code to JDBC API making extensive use of SQL.

Query Statements and advanced Prepared Statements.

Used connection pooling for best optimization using JDBC interface.

Used EJB entity and session beans to implement business logic and session handling and transactions. Developed user-interface using JSP, Servlets, and JavaScript.

Wrote complex SQL queries and stored procedures.

Used JavaScript for Client side validation.

Environment: JSPs, Servlets, Java Beans, UML, JDK 1.5, Oracle, TOAD, Java Script, HTML and CSS.

6. Jayam Solutions, Hyderabad, India October 2008 – May 2010

Client: YES Bank(YBL)

Java Developer

Description: YBL is a product of Jayam solutions, Under YES LEAP, the flagship program of ISB, comprehensive financial services are provided to the SHG’s through Self Help Promoting Institutions(SHPI), appointed as Business Correspondent(BC) of the YES Bank. To monitor all field level activities through technology solution Tablet based solution had provided by JAYAM and it is implemented successfully based on JAVA.


Developed user interfaces templates using SPRING MVC, JSP.

Involved in development of form validations using simple form controller.

Responsible for implementation of controllers like simple form controller

Implementing design patters DAO, Singleton, Business delegate, strategy design pattern.

Used Spring 2.0 frame work to implement SPRING MVC Design pattern.

Designed, developed and deployed the J2EE components on Tomcat.

Used tools like Hibernate for OR-Mapping on Oracle database.

Involved in Transaction management and AOP using Spring.

Environment: JAVA/J2EE, JSP, Spring 2.0 framework, Oracle, Hibernate.

Contact this candidate