Big Data Hadoop Developer

Location:

Allen, TX

Posted:

January 25, 2018

Contact this candidate

Resume:

SANGEETHA MALLI RAMESH BAPU

972-***-****, Allen, TX, USA

*******@*****.***

Summary

* ***** ** ** ********** in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Java/J2EE technologies

3+ years of hands on experience with Big Data Ecosystems including Hadoop (1.0 and YARN),Tableau, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Oozie, MongoDB, Zookeeper, Kafka, Maven, Spark, Scala, HBase, Cassandra(CQL)

Experience in installation, configuration and deployment of Big Data solutions

Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributes File System (HDFS), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming

Experience in analyzing the data using Hive UDF and Hive UDTF custom Map Reduce programs in Java

Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources

Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL

Worked on Agile/SCRUM software development .

Responsible for deploying the scripts into Github version control repository hosting service and deployed the code using Jenkins.

Using Jenkins AWS CodeDeploy plugin to deploy to AWS and Migrated applications to the AWS cloud.

Good experience in AWS services, Networking, Storage, and Cloud Technology.

Proficient in configuring Active Audit framework before ingesting files into HDFS by enabling filename check, record count check, file size check, duplicate check, missing file check and zero byte check. Enabled the Passive audit check after ingesting data into external hive tables by matching the count between the source file and hive table count.

Primarily responsible for designing, implementing, Testing, and maintaining database solution for Azure.

Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.

Hands on experience with Real time streaming using Kafka, Spark streaming into HDFS

Implemented pre-defined operators in Spark such as map, reduce, sample, filter, count, cogroup, groupBy, sort, reduce By Key, take, group By Key, union, left Outer Join, right Outer Join, and etc.

Developed analytical components using SparkSql and Spark Stream.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL using Scala

Deeply involved in writing complex Spark-Scala scripts, written udf's, Spark context, Cassandra sql context, used multiple API's, methods which support data frames, RDD's, data frame Joins, Cassandra table joins and finally write/save the data frames/RDD's to Cassandra database.

Proficient in Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC/ODBC

Technical Skills:

Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, Kafka, Spark core, Sparksql, Spark streaming, AWS, Azure Datalake

NoSQL Databases: Hbase, Cassandra, MongoDB

Build Management Tools Maven, Apache Ant

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting

Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit

Version control Github, Jenkins

Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x

Education:

Bachelor of Technology in Electronics & Instrumentation Engineering from Jawaharlal Nehru Technological University, Hyderabad, Andhra Pradesh in 2006

Certifications - From Big Data University

•Hadoop Fundamentals

•Accessing Hadoop Data Using Hive

•Introduction to Pig

Professional Experience:

HCSC-BCBSTX / BcForward, Dallas, TX Feb 2017 to Dec 2017

Big Data / Hadoop Developer

Responsibilities

Created a Zeke event in FTP Process to trigger on end of mainframe JCL job for Stoploss project in which the Zeke event triggers the datalake zena ingestion process.

Involved in creating Java Script to enable the Process variable for trigger to consumption and enabled the date timestamp partition.

Responsible in configuring Active Audit framework before ingesting files into HDFS by enabling filename check, record count check, file size check, duplicate check, missing file check and zero byte check. The Passive audit check is enabled after ingesting data into external hive tables by matching the count between the source file and hive tablecount.

Developed custom aggregate functions using Spark SQL and performed interactive querying.

Ingested the contract, commission, CVS claims historical files one time load into the incoming raw layer in HDFS file system and scheduled the incremental data in Zena scheduler by date timestamp partition.

Involved in adding the data to the new partition in hive external staging table to read data from partition and loaded the external Hive ORC tables with Snappy compression using Pig HCatalog scripts.

Applied several business rules as per the requirement in the data transformations and made data available to the downstream consumption teams.

Worked on Walgreens Member search project with tight time lines and configured the ingestion process by applying the business requirements in data transformations by eliminating header data from control files and exported the processed data from HDFS smith outgoing layer to ADW.

Real time data processing (Kafka, Spark Streaming & Spark Structured Streaming ), Worked on Spark SQL, Structured Streaming, MLib and using Core Spark API to explore Spark features to build data pipelines using SCALA, Implemented Spark streaming applications & fine tune to reduce shuffling.

Worked on Jira Scrum software development for issue tracking and release management. Responsible for the moving the ingestion scripts into Github version control repository hosting service and deployed the script using Jenkins.

Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.

Environment: Hadoop, HDFS, Pig, Hive, Java, Sqoop, HBase, Zena Scheduler, Jira, Github, Jenkins, Azure, Kafka, Spark Streaming, Spark SQL

VectorSoft, Dallas, TX Jan 2016 to July 2016

Big Data / Hadoop Developer

The project is based on collecting, analyzing the data from various sources and create a data lake by data cleansing and data enrichment. The Transformed data lake serves data for the customer analytics data mart, which powers interactive analysis and reporting needs.

Responsibilities

Configured Flume and Kafka to capture the data from various sources such as Clickstream data and twitter feeds

Involved in data ingestion from relational databases into HDFS using Sqoop

Data cleansing and data enrichment is done using Pig Latin and HiveQL

Build exception files for all non compliant data using Pig

Responsible for managing data from various sources

Created Hive External table for Semantic data and loaded the data into tables and query data using HQL

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting

Worked with different data sources like Avro data files, XML files, Json files, SQL server and Oracle to load data into Hive tables

Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files

Business Metrics are build as part of target platform using HiveQL

Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.

Using Jenkins AWS CodeDeploy plugin to deploy to AWS and Migrated applications to the AWS cloud.

Environment: Hadoop, HDFS, Pig, Hive, Java, Sqoop, Kafka, HBase, noSQL, Oracle 10g, PL/SQL, SQL Server, Windows NT, Tableau, AWS

POCs

Spark Streaming:

Created a port for live streaming and data is taken by streaming context

Used Maven as a deployment tool for Spark submit and generated a jar file with a sliding window interval of 5 secs

Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

The generated output is stored, used for creating Spark Data Frames for further analysis

Environment: Spark streaming, Scala, Maven,

Web log Streaming using Spark:

Used Flume agent for stimulating NASA log files as source and sink as Sparksink

Generated live streaming with a sliding window interval of 10 secs

The Custom Scala function is added to the source program for multiple operations

The generated output is transformed to Spark Data Frames/ RDD's and connected to Cassandra database

Environment: Flume, Spark, Scala, Maven, Cassandra

Idea Info Solutions, Bangalore Sep 2013 to May 2015

Hadoop developer

Responsibilities

The prime objective is to get the customer insights from the sources across the globe and heterogeneous applications in order to perform analytics from the structured, unstructured data in efficient and reliable manner.

•Worked on analyzing Hadoop stack and different big data tools including Pig and Hive, Hbase database and Sqoop

•Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop

•Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS

•Designed and Develop user defined functions to provide custom HIVE and PIG capabilities cross the application teams

•Created Hive External tables and loaded the data into tables and query data using HQL

•Collected the logs data from web servers and integrated into HDFS using Flume

•Worked on Impala for exposing data for further analysis and for generating transforming files from different analytical formats to text files

•Implemented test scripts to support test driven development and continuous integration

•Worked on tuning the performance of HIVE and PIG queries

•Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop

•Worked on Agile/SCRUM software development .

Environment: HDFS, Java, MapReduce, Pig, Hive, Impala, Hbase, Oozie, Sqoop, Flume, Linux.

PCS technology, Bangalore Mar 2009 to Oct 2011

Java Developer

Responsibilities

Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming Worked on creating EJBs that implemented business logic

Developed the presentation layer using JSP, HTML, CSS and client validations using JavaScript

Involved in designing and development of the ecommerce site using JSP, Servlet, EJBs, JavaScript and JDBC

Used Eclipse 6.0 as IDE for application development

Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer

Configured Struts framework to implement MVC design patterns

Designed and developed GUI using JSP, HTML, DHTML and CSS

Worked with JMS for messaging interface

Environment: Java, J2EE, HTML, DHTML, JSP, Servlets, XML, EJB, Sturts, GIT,Weblogic 8.1, SQL Server 2008R2, CentOS, UNIX, Linux, Windows 7/Vista/XP

Contact this candidate