Sign in

Data Developer

Hyderabad, Telangana, India
February 21, 2018

Contact this candidate

Lakshmi Gowtham Gudivada

Hadoop Developer

Email: Phone: 570-***-****

Professional Summary:

Over 8+ years of extensive Professional IT experience, including 5+ years of Hadoop experience, capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.

Well experienced in the Hadoop ecosystem components like Hadoop, MapReduce, Cloudera, Horton works, Mahout, HBase, Oozie, Hive, Sqoop, Pig, and Flume.

Experience in using Automation tools like Chef for installing, configuring and maintaining Hadoop clusters.

Lead innovation by exploring, investigating, recommending, benchmarking and implementing data centric technologies for the platform.

Technical leadership role responsible for developing and maintaining data warehouse and Big Data roadmap ensuring Data Architecture aligns to business centric road map and analytics capabilities.

Experienced in Hadoop Architect and Technical Lead role, provide design solutions and Hadoop architectural direction

Strong knowledge with Hadoop cluster connectivity and security.

Demonstrates an ability to clearly understand user's data needs and identifies how to best meet those needs with the application, database, and reporting resources available.

Strong understanding of Data Modeling in data warehouse environments such as star schema and snow flake schema.

Extending Hive and Pig core functionality by writing custom UDFs

Great understanding of the structure of the relational database, which gives me advance to write complex SQL statement combine multiple joins and inline-view.

Proficient in writing Structured Query Language under Microsoft SQL Server and Oracle environment.

Create, manipulate and interpret reports with specific program data.

Determine client and practice area needs and customize reporting systems to meet those needs.

Hands on experience with HDFS, Map-reduce, PIG, Hive, AWS, Zookeeper, Oozie, HUE, Sqoop, Spark, Impala, Accumulo.

Good experience on general data analytics on distributed computing cluster like Hadoop using Apache Spark, Impala, and Scala.

Worked on different RDBMS databases like Oracle, SQL Server, and MySQL.

Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS and transferred large data sets between Hadoop and RDBMS by implementing SQOOP.

Good experience of NoSQL databases like MongoDB, Cassandra, and HBase.

Hands on experience developing applications on HBase and expertise with SQL, PL/SQL database concepts.

Excellent understanding and knowledge of ETL tools like Informatica.

Having experience in using Apache Avro to provide both a serialization format for persistent data and a wire format for communication between Hadoop nodes.

Extensive experience in Unix Shell Scripting.

Expertise in Hadoop workflows scheduling and monitoring using Oozie, Zookeeper.

Good Knowledge in developing MapReduce programs using Apache Crunch.

Strong experience as a Java Developer in Web/intranet, Client/Server technologies using Java, JEE technologies which includes Struts framework, MVC design Patterns, JSP, Servlets, EJB, JDBC, JSLT, spark XML/XLST, JavaScript, AJAX, JMS, JNDI, RDMS, SOAP, Hibernate and custom tag Libraries.

Experience in implementing enterprise applications using IBM Web sphere, Web logic and Tomcat.

Experienced in using JavaScript API including Java Script Objects, Browser Objects, and HTML DOM Objects

Supported technical team members for automation, installation and configuration tasks.

An excellent team player and self-starter with good communication and interpersonal skills and proven abilities to finish tasks before target deadlines.


Big Data

Hadoop, HDFS, Pig Hive Hbase, Sqoop, Cloudera, Oozie, Zookeeper, Cassandra, MongoDB


Java, SQL, PL/SQL, HTML, XML, JavaScript, C#


Java, J2EE, Servlets, Portlets (JSR 168, JSR 286), JSP, JSF, Java Beans, JDBC, EJB

Open source framework and web development

Struts, Spring, Hibernate, JavaScript, AJAX, Dojo, jQuery, Ehcache, Log4j, Ant, JBoss, Web services, SOA, SOAP, REST, WSDL and UDDI

Portals/Application servers

Weblogic, WebSphere Application server, WebSphere Portal server, JBOSS

Operating system

Windows, AIX, UNIX, Linux

Configuration Mgmt

CMVC, Clearcase, Clearquest, PVCS, CVS

Development Tools

Eclipse, Visual Studio, Net Beans, Rational Application Developer, WSAD, JUnit


NoSQL, Hbase, Cassandra, MongoDB, Oracle, SQL Server, DB2, MySQL, Toad, SQL

Software Engineering

UML 2.0, Rational Rose, Design Patterns (MVC, DAO etc)

Professional Experience:

Guardian life Insurance – Bethlehem, PA June 2016 – Till date

Hadoop Developer

Roles & Responsibilities:

Guardian Life Insurance is a mutual insurance agency, they are claimed by the policyholders who share in Guardian's genuine money related outcomes through yearly dividends. These instalments are controlled by the association's benefits, and Guardian has paid a profit each year since 1868. Involved in Writing high-performance, reliable and maintainable code. Using the cloudera I was able to move the log data from the Hive to HDFS and also worked on the Mapreduce and other environments in the Hadoop.

Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production.

Built APIs that will allow customer service representatives to access the data and answer queries.

Designed changes to transform current Hadoop jobs to HBase.

Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.

Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.

Extending the functionality of Hive and Pig with custom UDF s and UDAF's.

Developed Spark Application by using Scala.

The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop reports and established self-service reporting model in Cognos for business users.

Implemented Bucketing and Partitioning using Hive to assist the users with data analysis.

Used Oozie scripts for deployment of the application and perforce as the secure versioning software.

Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.

Develop database management systems for easy access, storage, and retrieval of data.

Perform DB activities such as indexing, performance tuning, and backup and restore.

Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components

Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.

Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.

Expert in creating PIG and Hive UDFs using Java to analyze the data efficiently.

Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.

Implemented AJAX, JSON, and Java script to create interactive web screens.

Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.

Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.

Support of applications running on Linux machines

Developed data formatted web applications and deploy the script using HTML5, XHTML, CSS, and Client-side scripting using JavaScript.

Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.

Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications.

Used Zookeeper to manage coordination among the clusters.

Experienced in analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suits the current requirements.

Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

JCPenney company, Inc. – Fargo, ND July 2014 – May 2016

Hadoop Developer

Roles & Responsibilities

JCPenney company, Inc. is an American retail chain with 1,014 locations in 49 U.S. states and Puerto Rico. In addition to offering customary stock, JCPenney stores regularly house a few rented offices. Involved in composing just pig contents and hive questions and running work processes and planning Hadoop employments utilizing Oozie. Used to perform various data cleaning operation using the tools in the Hadoop eco system.

Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.

Experienced in Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.

Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.

Implemented Kafka Storm topologies, which are capable of handling and channelizing high stream of data and integrating the storm topologies to filter and process that data across multiple clusters for complex event processing.

Along with the Infrastructure team, involved in design and developed Kafka and Storm based Data pipe line.

Developed SQOOP scripts for importing and exporting data into HDFS and Hive

Responsible for processing ingested raw data using Kafka and Hive.

Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.

Import the data from different sources like HDFS/HBase into Spark RDD.

Load balancing of ETL processes, database performance tuning and Capacity monitoring using Talend.

Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.

Used Sqoop to import customer information data from SQL server database into HDFS for data processing.

Loaded and transformed large sets of structured, semi structured data using Pig Scripts.

Involved in loading data from UNIX file system to HDFS.

Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.

Worked on the core and Spark SQL modules of Spark extensively.

Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.

Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDF.

Discussed the implementation level of concurring programing in spark using python with message passing.

Environment: Hadoop, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, spark, Horton works


Richardson, TX January 2012 – June 2014

Hadoop Developer

Roles & Responsibilities

Involved in data ingestion into HDFS using Sqoop from variety of sources like Teradata using the connectors like JDBC and import parameters.

Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.

Transformed raw data from several data sources into baseline data by developing Pig scripts and loaded the data into HBase tables.

Experience in Hive-HBase integration by defining external tables in hive and pointing the HBase as data store for better performance and lower I/O.

Helped market analysts by creating Hive queries to spot the emerging trends by comparing fresh data with HDFS reference tables and historical metrics.

Developed Hive UDFs (User Defined Functions) where the functionality is too complex, using Java and Python languages.

Worked on Creating Kafka topics, partitions, writing custom partitioner classes.

Used Kafka to store events from various system and processed using Spark Streaming to perform near real time analytics.

Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.

Worked on POCs and involved in fixing performance & storage issues using different Compression Techniques, Custom Combiners and Custom Partitioners using Java.

Used MRUnit for testing raw data and executed performance scripts.

Involved in automation of Hadoop jobs using Oozie workflows and coordinators.

Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.

Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in Hive and Reduce Side joins in Mapreduce.

Worked on Talend ETL to load data from various sources to Oracle DB. Used tmap, treplicate, tfilterrow, tsort and various other features in Talend

Involved in verifying cleaned data using Talend tool with other department.

Implemented Hortonworks HDP 2.4 distribution in Dev, SIT, Prod Environments.

Environment: HDFS, Mapreduce, Pig, Hive, HBase, Spark, Scala, Python, Sqoop, Oozie, Kafka, Talend, Hortonworks HDP, XML, Linux, Java, Maven, GIT, Jenkins

Tech Mahindra, India October 2010 – November 2011

PL/SQL Developer


Designing and developing applications using PL/SQL language for packages, procedures, functions, triggers and sequences in SDLC methodology.

Writing PL/SQL scripts to create schema, user, role, table, view, index, partition; sequences, etc

Importing data into and exporting data from database for BI Datamart.

Tuning SQL statements for BI Datamart to improve the performance to meet front-end user requirements.

Using Job Scheduler to schedule database jobs.

Preparing data modelling, data Extract / Transform / Load.

Extracting / Transforming / Loading application data from DB2 AS400 to Oracle database for migration.

Analyzing production issues and identifying root causes and selecting an understood way to interpret IT solutions to business clients.

Participating in daily scrum meeting to report and share project progress and new business requirements.

Created VGI database schema on Oracle 12c and 88 tables and relative views and indexes for custom code.

Designed and developed applications using packages, procedures, functions, sequences to support 15 midtier / java batch developers in AWD 10 project.

Built up the System Accept Test, Client Accept Test and Production region environments for VGI database.

Deployed all the code components from development region to the mentioned database regions.

Carried out the first POC design for the team to develop java code to mimic the legacy COBOL code for AWD migration from AS400 to midtier/java AWD 10

Migrated AWD (Automated Work-Object Distribution) from AS400 to Oracle database for AWD10 Project.

Installed and configured Oracle 12c database and SQL Developer

Infosys Technologies Ltd, India August 2008 – September 2010

Java Developer


Involved in the development of front end screen design using JSP.

Involved in the implementation of client side and server side programming using HTML and JavaScript.

Assisted in designing, building, and maintaining database to analyse life cycle of checking and debit transactions.

Determined techniques (with analyzed functional problems and requirements) most feasible for processing the data.

Database design and connectivity done by using JDBC bridge.

Developed Servlets in forwarding requests to different servers and Servlets.

Designed and developed Servlets to communicate between the presentation and business layer.

Used Java Script for Client side validation and JUnit framework for unit testing.

Created Session Beans and controller Servlets for handling HTTP requests from JSP pages.

Developed front- end interfaces and involved in Error Handling using JSP.

Was assigned the task of deploying the application different working environments.

Environment: JDBC, HTML, Java-Script, Log4J, JavaBeans, SQL Server, Web Logic


Available upon request

Contact this candidate