BigData / Hadoop / Spark developer

Location:

Fremont, CA

Posted:

July 28, 2017

Contact this candidate

Resume:

Sri Harsha

630-***-**** Email: *********.***********@*****.***

Summary:

•Around 3 years of extensive IT experience with multinational clients which includes 3 years of Hadoop related architecture experience developing Big Data / Hadoop applications.

•Hands on experience with the Hadoop stack (MapReduce, Spark, HDFS, Sqoop, Pig, Hive, HBase, Flume, Oozie and Zookeeper).

•Well versed in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop, MapR & Cloudera.

•Proven Expertise in performing analytics on Big Data using MapReduce, Spark, Hive and Pig.

•Experienced with performing real time analytics on NoSQL databases like HBase.

•Worked with Oozie workflow & Tes/Tidal engine to schedule time based jobs to perform multiple actions.

•Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop/Spark.

•Analyzed large amounts of data sets writing Pig scripts and Hive queries.

•Experienced in writing MapReduce programs & UDFs for both Hive & Spark in Java/Scala.

•Experience with configuration of Hadoop Ecosystem components: Hive, Spark, Drill, Impala, HBase, Pig, Sqoop, Mahout, Zookeeper and Flume.

•Supported MapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Scala.

•Experience with Testing MapReduce programs using MRUnit, Junit and EasyMock.

•Experienced in writing functions, stored procedures, and triggers using PL/SQL.

•Experienced in working on RDBMS, OLAP, and OLTP concepts.

•Experienced with build tool ANT, Maven and continuous integrations like Jenkins.

•Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile methodologies.

•Motivated team player with excellent communication, interpersonal, analytical and problem solving skills.

•Highly adept at promptly and thoroughly mastering new technologies with a keen awareness of new industry developments and the evolution of next generation programming solutions.

Technical Summary:

Methodologies

Agile Scrum, Waterfall, Design patterns

Big Data Technologies

Hive, Spark, Drill, Impala, HBase, Sqoop, Pig, Hadoop, HDFS,

Map Reduce 2(YARN), Mesos,MapR, Cloudera Manager, Kafka,Amazon AWS, EC2.

Languages

Scala, Spark, Shell Scripting, REST, PIG Latin, NOSQL, Java, XML, XSL, SQL, PL/SQL, HTML, JavaScript, C.

J2EE Technologies

JSP, Servlet, Spring, Hibernate, Web services

Web Technologies

HTML, CSS, JavaScript

Databases

HDFS, DB2, ORACLE, SQL server & UNICA

Business Intelligence Tools

Tableau, Platfora, Qlik

Operating Systems

WINDOWS OS 7, 8, 10 UNIX and DOS

Education:

Masters of Science in Computer Science – GPA: 3.85 May 2015

University of Central Missouri, Warrensburg, MO.

Bachelor of Technology in Computer Science May 2013

Jawaharlal Nehru Technological University, Hyderabad, India.

Professional Summary:

Cisco Systems, San Jose, CA Nov 2015 – Till Date

Engineer IT (Hadoop/Spark Developer)

Responsibilities:

•Followed agile methodology and used Rally to maintain user stories.

•Installed, configured and maintained Hadoop MapR 5.2.1 Distribution.

•Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured data.

•Written extensive Spark/Scala programming using Data Frames, Data Sets & RDD's for transforming transnational database data and load it into hive/hbase tables.

•Importing and exporting data in HDFS and Hive from RDBMS using Sqoop.

•Handled Blob/Clob data types in Hive/Spark.

•Implemented kafka for broadcasting the logs generated using spark streaming.

•Worked on Apache Drill version 1.6 & Spark SQL for querying for a better performance.

•Implemented RESTful web services in JSON format to query from web browser.

•Involved in loading data into HBase from Hive tables to see the performance.

•Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.

•Involved in creating Hive tables, loading data as text, parquet and orc to write into hive queries.

•Written a custom MapReduce program for merging the data in incremental sqoop.

•Responsible for loading data from UNIX file system to HDFS.

•Created UNIX shell scripts (bash/Korn shell) for parameterizing the sqoop and hive jobs.

•Worked on TIDEL (TES) scheduler for job scheduler.

•Worked in production deployment and post production support in the team.

•Used Maven as the build tool and GIT for code management.

• Has experience working with off shore teams.

•Worked closely with BRT and QA team to fix the issues.

Environment: Hadoop, HDFS, MapReduce, Spark, Scala, Hive, Sqoop, Drill, Kafka,Oracle, ETL, UNIX, MapR, REST, Tableau, Platfora, Eclipse, Git .

Bank of America, Charlotte, NC Jan 2015 – Nov 2015

Spark Developer

Responsibilities:

•Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4 Distribution.

•Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured.

•Have written interactive queries in spark for streaming analytics.

•Integrated Spark in xPatterns, our big data analytics platform, as a replacement for Hadoop MR

•Usage of Sqoop to import data into HDFS from Oracle database and vice-versa.

•Created Partitions, Buckets based on State to further process using Bucket based Hive joins.

•Load and transform large sets of structured, semi structured using Hive and Impala.

•Using Spark to perform interactive exploration of large datasets.

•Written impala queries for managing the data.

•Moved Oracle Database data using Sqoop into Hive Dynamic partition tables using staging tables.

•Optimizing the Hive queries using Partitioning and Bucketing techniques for controlling the data distribution.

•Using Spark core for log transaction aggregation and analytics

•Integrated scheduler with Oozie work flows to get data from multiple data sources parallel using fork.

•Created Data Pipeline of MapReduce programs using Chained Mappers.

•Implemented Optimized join base by joining.

•Different data sets to get top claims based on state using MapReduce.

•Implemented complex MapReduce programs to perform joins on the Map side using Distributed Cache in Java.

•All this happens in a distributed environment.

•Developed several advanced MapReduce programs to process data files received.

•Created Hive Generic UDF's to process business logic that varies based on policy.

•Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop.

•Developed Unit test cases using Junit and MRUnit testing frameworks.

•Experienced in Monitoring Cluster using Cloudera manager.

•Used Kerberos authentication protocol to allow nodes communicating over a secured network.

•Used powerbroker for advance root privileges delegation and keylogging.

Environment: Hadoop, HDFS, MapReduce, Spark, Java, Hive, Sqoop, Oozie, Impala, Oracle, ETL, UNIX, Cloudera Manager.

Contact this candidate