Data Java

Location:

Lilburn, GA, 30047

Posted:

August 14, 2017

Contact this candidate

Resume:

AMMI REDDY TETALA

+1-913-***-**** Email: ******.*********@*****.***

SUMMARY

8+ years of experience in IT including 4+ years of expertise in design and development of scalable distributed systems using Spark, Scala and Hadoop Eco System tools like Pig, Hive, Map Reduce (MRV1 and YARN), HBase, Sqoop, Flume, Kafka, Oozie, Zookeeper, Impala and having experience in Core Java and J2EE.

Knowledge in Cloudera Hadoop distributions and many more like Horton works, MapR and IBM Big Insights.

Experienced in working with Hadoop/Big-Data storage and analytical frameworks over Amazon AWS cloud using tools like SSH, Putty and Mind-Term

Experienced in collecting metrics for Hadoop clusters using Ambari & Cloudera Manager.

Experienced on YARN environment with Storm, Spark, Kafka and Avro.

Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.

Good Knowledge in writing MapReduce programs using Apache Crunch.

Experienced in writing Hadoop Jobs for analyzing data using Hive Query Language (HQL), Pig Latin (Data flow language), and custom MapReduce programs in Java.

Good understanding of NoSQL databases like MongoDB, Cassandra, and HBase.

Experienced in Storm builder topologies to perform cleansing operations before moving data into HBase.

Experienced in data loading from Oracle and MySQL databases to Hdfs system using Sqoop.

Hands on experience in configuring and working with Flume to load the data from multiple sources directly into Hdfs.

Experienced in Hadoop workflows scheduling and monitoring using Oozie, Zookeeper.

Experienced in developing a data pipeline using Kafka to store data into HDFS.

Experienced in using Apache Avro to provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes.

Experienced in creating tables on top of Parquet format in Impala.

Experienced in using Apache Drill data-intensive distributed applications for interactive analysis of large-scale datasets.

Experienced in using analytics packages like R and the algorithms provided by Mahout.

Good Experience with version control tools like CVS, SVN and GIT.

Experienced in search technology’s like SOLR & Lucene

Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple nodes.

Experienced in writing Spark scripts by using Python shell commands as per the requirement.

Written Python scripts to parse XML documents and load the data in database.

Experienced in scripting on Linux and Unix Shell.

Hands on experience with Agile and Scrum methodologies.

Good Knowledge on Amazon AWS cloud services (EC2, EBS, and S3).

Good Domain knowledge on Retail, Healthcare and Insurance.

Experienced of using build tools Ant and Maven.

Experienced in working with Avro, ORC file formats

Good Team player, Dependable Resource and ability to learn new Tools and Software quickly as required.

Good in coding using SQL, SQL*Plus, T-SQL, PL/SQL, Stored Procedures/Functions.

TECHNICAL SKILLS

Bigdata Technologies

HDFS and Map Reduce, Pig, Hive, Sqoop, Flume, Hue, Impala, YARN, Oozie, Zookeeper, MapR Converged Data Platform, CDH, HDP, EMR, Apache Spark, Apache Kafka, Apache STORM, Apache Crunch, Avro, Parquet, Apache NiFi.

Databases

Netezza, SQL Server, MySQL, ORACLE, DB2.

Development Methodologies

Waterfall, Agile Methodologies (Scrum).

Frameworks

MVC, Struts, Hibernate, Spring.

IDE Development Tools

Eclipse, Net Beans, Visual Studio.

Java Technologies

Java, J2EE, JDBC, JUnit, Log4j.

NoSQL Databases

HBase, MongoDB, Cassandra.

Operating Systems

Windows, Linux, Unix.

Programming Languages

C, Java, Python, Unix, Shell Scripting, C++.

Software Management Technologies

SVN, Git, Jira, Maven.

Web Technologies

HTML, CSS, JavaScript, PHP, XML, JQuery, Jsp, Servlet, Ajax.

Web Servers

Web Logic, Web Sphere, Apache Tomcat, JBOSS.

WORK EXPERIENCE

CLIENT: Rackspace - San Antonio, TX January 2016-Present

ROLE: Sr. Hadoop/Spark Developer

DESCRIPTION:

Rackspace delivers world-class services, tools and expertise at scale with your choice of the leading public, private and hybrid cloud technologies. For nearly two decades Rackspace helped customers architect, deploy and manage their IT workloads, in their data centers we extensively used Hadoop eco system for managing logs and process data in datacenter.

RESPONSIBILITIES:

Worked with a small team to develop an initial prototype of a NiFi big data pipeline. This pipeline demonstrated an end to end scenario of data ingestion, processing.

I personally designed and implemented custom NiFi processors that reacted, processed and provided custom detailed provenance reporting for all stages of the pipeline.

Worked with Apache NiFi to Develop Custom Processors for the purpose of processing and disturbing data among cloud systems.

Used apache Hue web interface to monitor the Hadoop cluster and run the jobs.

Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.

Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.

Migrated complex Map reduce programs, Hive scripts into Spark RDD transformations and actions.

Developed Scala scripts, UDF's using both SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into Rdbms through Sqoop.

Responsible to Load the data into Spark RDD and performed in-memory data computation to generate the output response.

Converted all the vap processing from Netezza and implemented by using Spark data frames and RDD's.

Worked in writing Spark Sql scripts for optimizing the query performance.

Writing UDF/Map reduce jobs depending on the specific requirement.

Worked on loading source data to HDFS by writing java code.

All small files will be merged and loaded into HDFS using java code and tracking history related to merge files are maintained in HBASE.

Implemented Hive custom UDF’s to integrate the Weather and geographical data which produces business data to achieve comprehensive data analysis.

Created algorithms for all complex Map and reduce functionalities of all MapReduce programs.

Written Sqoop scripts to import and export data in various RDBMS systems.

Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.

Written PIG scripts to process unstructured data and available to process in Hive.

Created Hive schemas using performance techniques like partitioning and bucketing.

Used SFTP to transfer and receive the files from various upstream and downstream systems.

Analyzed HBase data in Hive by creating external partitioned and bucketed tables

Developed Oozie workflow jobs to execute HIVE, PIG, SQOOP and map reduce actions.

Extensively worked in code reviews and code remediation to meet the coding standards.

Used the RegEx, JSON and Avro for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.

Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.

Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Worked along with the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.

Developed UDF's in java for enhancing functionalities of Pig and Hive scripts.

Experienced in collecting metrics for Hadoop clusters using Ambari

Used Mahout, MapReduce to parallelize a single iteration. Responsible for the implementation of application system with core Java and spring framework.

Created plugins in apache drill for low latency queries and used it with tableau using JDBC.

ENVIRONMENT: CDH, HDFS, SPARK, Pig, Hive, Beeline, Sqoop, Map Reduce, Oozie, Putty, HaaS (Hadoop as a Service), Java Netezza, Sub Version, Toad, Teradata, Oracle 10g, YARN, UNIX Shell Scripting, Agile Methodology.

CLIENT: Cerner - Kansas City, MO December 2014 – January 2016

ROLE: Hadoop Developer

DESCRIPTION:

Cerner is an innovative group focusing on developing data-driven business and mathematical model-based solutions for the health industry, positioning Cerner as the world's largest health informatics properties mediating petabyte-scale health data.

RESPONSIBILITIES:

Designed and developed analytic systems to extract meaningful data from large scale structured and unstructured health data.

Used apache Hue web interface to monitor the Hadoop cluster and run the jobs.

Worked on data flow between HDFS and Hive.

Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.

Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.

Using HIVE processed extensively ETL loadings on a Structured Data

Created Hive Partitions for storing Data for Different Trends under Different Partitions.

Connected the hive tables to Data analyzing tools like Tableau for Graphical representation of the trends.

Done with Time sensitive task to process and analyze data using Hive.

Involved in creating Hive external tables, loading data, and writing Hive queries.

Good understanding of Partitions, bucketing concepts in Hive and designed both Managed and

External tables in Hive to optimize performance.

Created Sqoop jobs to populate data present in relational databases to hive tables.

Developed UDF's in java for enhancing functionalities of Pig and Hive scripts.

Solved performance issues in Pig and Hive scripts with deep understanding in joins, groups and aggregations and how these jobs do translate into MapReduce jobs.

Extracted files from MySQL/DB2 through Sqoop and placed in HDFS and processed.

Defined job flow using Azkaban scheduler to automate the Hadoop jobs and installed zookeepers for automatic node failovers.

Developed Cluster coordination services through Zookeeper.

Involved in Managing and reviewing Hadoop log files to find the source for job failures and debugging the scripts for code optimization.

Developed complex MapReduce Programs to analyze data that exists on the cluster.

Developed the processes to load data from server logs into HDFS using Flume and loading from UNIX file system to HDFS.

Build a platform to query and display the analysis results in dashboard using Tableau.

Implemented apache Sentry for role based authorization to access the data.

Developed Shell scripts to automate routine DBA tasks (i.e. data refresh, backups)

Involved in the performance tuning for Pig Scripts and Hive Queries.

Developed Pig Scripts to pull data from HDFS

Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.

Responsible for cluster maintenance, commissioning and decommissioning of Data Nodes, cluster monitoring, troubleshooting, managing of data backups and disaster recovery systems, analyzing Hadoop log files.

Implemented automatic failover Zookeeper and zookeeper failover controller.

Solved performance issues in Pig and Hive scripts with deep understanding in joins, groups and aggregations and how these jobs do translate into MapReduce jobs.

ENVIRONMENT: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java, Maven, Git, Cloudera, Eclipse and Shell Scripting.

CLIENT: Williams-Sonoma, Inc, San Francisco, CA October 2013– December 2014

ROLE: Hadoop Developer

DESCRIPTION:

Williams-Sonoma, Inc., is an American public traded consumer retail company and it is a multi-channel specialty retailer of high quality products for the home, and in this project, I have managed the large data regarding the customer details and products information by using Hadoop eco systems and Big Data Components.

RESPONSIBILITIES:

Involved in managing nodes on Hadoop cluster and monitor Hadoop cluster job performance using Cloudera manager.

Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.

Involved in loading data from edge node to HDFS using shell scripting.

Created Map Reduce programs to handle semi/unstructured data like xml, Json, Avro data files and sequence files for log files.

Integrated Elastic Search and implemented dynamic faceted-search.

Played a key role in installation and configuration of the various Hadoop ecosystem tools such as Solr, Pig, HBase and Cassandra.

Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Designed and Developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.

Developed end-to-end search solution using web crawler, Apache Nutch & Search Platform, Apache SOLR.

Developed ETL job in Talend to load data from ASCII, Flat files.

Used pig loader for loading tables from Hadoop to various clusters.

Designed Talend jobs for data ingestion, enrichment and provisioning.

Designed and developed custom Java components for Talend.

Worked in migrating HiveQL into Impala to minimize query response time.

Created Hive tables, dynamic partitions, buckets for sampling, and working on them using HQL.

ENVIRONMENT: Hadoop, Scala, Map Reduce, HDFS, Spark, Scala, Kafka, AWS, Apache SOLR, Hive, Cassandra, maven, Jenkins, Pig, UNIX, Python, MR Unit, Git.

CLIENT: HSBC, Hyderabad, India September 2010- October 2013

ROLE: Java/J2EE Developer

DESCRIPTION:

This Project aims to help the employees of HSBC GLT to refer candidates for open positions in the organization. Through this application admin can post new referral schemes for open positions in the organization and accordingly the employees can refer candidates and can get the benefit of cash rewards or gifts that are available under a referral scheme. This project also has various reports, which helps the admin to do a detailed analysis as per his criteria.

RESPONSIBILITIES:

Involved in Designing, coding, testing and supporting the project, and developing middleware hub using Java/J2EE.

Involved in client meetings to gather the System requirements.

Generated Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose

Used UML to create class, action and sequence diagrams.

Written Java Script, HTML, CSS, Servlets, and JSP for designing GUI of the application.

Implemented Struts MVC design pattern as part of this project along with EXTJs for validations and controllers.

Developed Action Class components for performing business process execution and configured struts configuration specific xml file.

Junit, log4j were used for unit testing and as logging frameworks.

Implemented caching techniques, wrote POJO classes for storing data and DAO's to retrieve the data and did other database configurations using Hibernate.

Involved in writing the ANT scripts to build and deploy the application.

Worked closely with the testing team in creating new test cases and modeling to the use cases as unit test cases for the module before the testing phase.

Resolved scalability and performance issues both in Applications as well as in WebLogic Application Server.

Coordinated work with DB team, QA team, Business Analysts and Client Reps to complete the client requirements efficiently.

ENVIRONMENT: Core Java, JDBC, Servlets, Struts, JSP, Hibernate, AJAX, HTML5, CSS, ANT, Log4J, Junit, Oracle, Web Logic.

CLIENT: Nokia-RAC - Hyderabad, India December2008-September 2010

ROLE: Java Developer

DESCRIPTION:

RAC - Radio Access Configurator which configures the Network Elements. The main purpose of the RAC is to synch the NetAct Data Base with network element.

RESPONSIBILITIES:

Analyzed requirements and prepared Implementation Specification document.

Designed Class and Sequence diagrams on UML concept’s using Rational Rose.

Implemented JMS for sending XML file notification.

Implemented MDB for consuming and processing XML files.

Developed code for XML parsing using SAX and DOM.

Created Test Scripts and Test Scenarios for NW3NTF sub system in Stefa.

Used JTest for JUnit testing and Code coverage.

Used Clear Case for version control.

Interacted with end applications and performed Business Analysis and Detailed Design of the system from Business Requirement documents.

Used XML Altova Spy in testing XML documents.

Knowledge in Insurance markets over Property and Causality Insurance.

Implementation of SOAP UI testing procedures for Policy center ECIF interface.

Troubleshooting the Policy center in fixing cosmetic defects.

Implementation of Test plans for End to End testing.

Followed the implementation of TDD (Test Driven Development)

ENVIRONMENT: Swings, Servlets, Java, DOM, JDBC, Clear Case, WebSphere Application Server, Oracle.

Willing to relocate: Anywhere

Contact this candidate