Hadoop developer

Location:

United States

Posted:

January 18, 2021

Contact this candidate

Resume:

Name: Sharath

Professional Summary:

*+ years of overall IT experience with 5+ Years of experience in Big Data technologies which includes designing and implementing Map Reduce and Spark Architectures, 2+ years of AWS Cloud, 2+ years in JAVA Technologies and 1+ years in Elasticsearch (Elastic Cloud Enterprise)

Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume, Oozie, Zookeeper & knowledge of Mapper/Reduce/HDFS Framework.

Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.

Experience in AWS services like EMR, EC2, S3, Cloud Formation stacks, Aurora RDS, Cloud watch, SNS, Lambda and Step Functions.

Experience in creating IAM Roles, Security Groups, Firewalls and Load Balancers.

Good exposure and experience in Spark, Scala, Big Data and AWS Stack.

Experience in managing and reviewing Hadoop log files.

Have experience in Shell Scripting and used it extensively with Spark for data processing.

Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.

Experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase.

Analyzing Data through Hive QL, Pig Latin & MapReduce programs in Java.

Extending HIVE and PIG core functionalities by implementing custom UDF’s.

Performed ad-hoc queries on structured data using Hive QL and used Partition, bucketing techniques and joins with Hive for faster data access.

Installing and using Elasticsearch and Kibana, performing complex aggregations using curl commands to query data, Insert, Update and Delete records from/to Elasticsearch index.

Writing data to Elasticsearch index thru Spark using AWS EMR.

Importing and exporting data into HDFS and HIVE using Sqoop.

Good hands-on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark and Scala.

Hands on experience on Hortonworks, Cloudera & MapR Hadoop environments.

Experience in writing Maven and SBT scripts to build and deploy Java and Scala Applications.

Good understanding of Hadoop administration with Hortonworks, Cloudera & MapR.

Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.

Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.

Used Spark SQL on data frames to access hive tables into spark for faster processing of data.

Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.

Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.

Extract Real time feed using Kafka and Spark Streaming and convert it to DF and process data in the form of Data Frame and save the data as Parquet format in HDFS.

Experience in creating Impala views on hive tables for fast access to data.

Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.

Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables & HBase.

Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.

Involved in production monitoring using workflow monitor and experience in development and support environments.

Experienced in using waterfall, Agile and Scrum models of software development process framework.

Strong knowledge of version control systems like Bit Bucket and GITHUB.

Good level of experience in Core Java, JEE technologies, JDBC, Servlets and JSP.

Good knowledge in Oracle PL/SQL and shell scripting.

Experience in design and development of Web forms using Spring MVC, Java Script, JSON and JQ plotter.

Developing and maintaining the Web Applications using the Web Server Tomcat.

Experience in using IDEs like Eclipse and IntelliJ.

Possess Ability and Willingness for learning new technology as needed.

Excellent oral and written skills.

Good proficiency in Microsoft Excel, Word, Power presentations and visual studio.

Active team player with excellent interpersonal skills, keen learner with self-commitment & innovation.

Ability to meet deadlines and handle pressure in coordinating multiple tasks in the work environment.

Technical Skills:

Big Data Ecosystem

HDFS, Map Reduce, Hive, YARN, Spark, Flume, Kafka, Oozie, Sqoop, Impala, Zookeeper, Hbase and Cassandra

Hadoop Distributions

EMR, Cloudera, Hortonworks and MapR.

Languages

Java, Scala, Python, LINUX Shell Scripting, SQL

AWS Services

EC2, EMR, S3, Lambda, Step Functions, Cloud Formation Stacks. Aurora RDS, IAM Roles, Cloud Watch, SNS, VPC, Security Groups and Load Balancers.

Elastic Cloud Enterprise

Elasticsearch, Kibana and Logstash.

Scripts

JavaScript, Shell Scripting

Database

Oracle 10g, MySQL, MSSQL

No SQL Database

HBase, Cassandra, MongoDB

Web Servers

Apache Tomcat

Operating Systems

Windows, Linux (Cent OS)

CI/CD Pipeline

Jenkins and Bamboo.

Professional Experience:

Client: Anthem, Atlanta, GA Sep 2019 – Till Date

Role: Hadoop Developer

Roles & Responsibilities:

This project is part of PBM (Pharmacy Benefit Manager) and aims to provide analytics on the claims data processed in the firm over the time period. This will help customer to identify the total amount of claims they have made/processed and other insights of it.

Designing and developing Spark framework for ETL purpose using Spark modules like Spark core, Spark SQL, Spark datasets and data frames coded using Scala language.

Hands on coding with Scala for leveraging Apache Spark through the Scala APIs.

Developed various main & service classes through Scala using spark SQLs for the requirement specific tasks.

Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.

Run ETL Jobs using Apache Spark on AWS EMR to extract data from multiple Cloud services like AWS S3, EMR and write the data to Elasticsearch Index’s after performing required transformations.

Integrating different AWS Services and to configure, install and run Elastic Search on AWS EC2, creating the necessary Security Groups, VPC’s, load balancer and firewalls for successful integration.

Spin up EC2 instances, create IAM Roles, EMR Clusters and write Lambda Functions, Step Functions and Cloud Formation templates that are necessary for running daily jobs.

Automate and deploy Cloud Formation stacks to model or build the entire infrastructure and application resources like Lambda, Step Functions, Security Groups, Cloud watch rules and SNS.

Write Python scripts to run Transient EMR clusters, Spark and Hive jobs using boto3 library and to perform DML operations on Aurora RDS tables using pymysql library.

Write Python scripts to automate daily job runs. Scripts are written in Python to check cluster status and send triggers, trigger Step Functions or other Lambda functions, check job status and send email alerts, run Hive scripts, Terminate clusters and more.

Deploy Lambda functions written in Python to leverage the server less compute service.

Create Step Functions based on the requirement to orchestrate complex flows using Lambda Functions.

Developing the Tasks and setting up the requirement environment for running Hadoop in cloud on various instances.

Use Snowflake for Data storage, Processing and Analytics to achieve faster, easier and more flexible tool that is not built on any big data platform as Hadoop but instead uses new SQL query engine with and innovative architecture natively designed for cloud.

Create Stages on top of AWS S3, write Functions, Procedures, Tasks and Monitors to perform daily activities and then finally Merge to Target table.

Complete knowledge of installing Standalone Elasticsearch on AWS EC2 and Elastic Cloud Enterprise (ECE) to deploy Elasticsearch (ES) and Kibana using docker images.

Write data to Elasticsearch index using Spark thru AWS EMR.

Working on Elasticsearch and Kibana which is an integrated logging and dashboard system which also required when monitoring and managing the health of application.

Create ES indexes, perform Curl operations to query, perform complex aggregations, and update/delete the records in Elasticsearch index and performing alias.

Using Logstash to build pipelines to ingest Logs from Elasticsearch to AWS S3.

Working on Agile based model using JIRA for complete planning, organizing and collaborating with the team for all the solutions.

Understanding the mapping documents and need to work closely with Business Analysts for enhancements and modifications.

Performing code builds using Maven and deploys the code to repository using Bit Bucket as a version control tool, which maintains all the code and data repositories.

Working on Bamboo to build the trigger and to check the progress of each build.

Environment: Spark 2.4.3, Scala 2.11.12, Java 1.8.0_242, EMR 5.28.0, YARN, ECE 2.4.0, Elasticsearch 7.6.1, Kibana, AWS EC2, S3, Aurora RDS, Hive, Eclipse and Bit Bucket.

Client: Aetna, Hartford, CT July 2018 – Aug 2019

Role: Hadoop Developer

Roles & Responsibilities:

Worked on MapR Distributed platform to store and analyze huge volumes of data (typically Big Data)

Working on Big data/Hadoop infrastructure for batch processing as well as real time processing data. Responsible for building scalable distributed data solutions using Hadoop with Spark.

Write Kafka Consumer jobs to consume data in real time basis, make required Transformations & Aggregations & store it to Hive & HBase (NoSQL)

Consumed TB’s of IOT device Data through Kafka consumer in real time basis in JSON format & stored to Hive and Hbase tables.

Stored data in Hbase tables with different row Key combinations to store latest data per member per day as well as Historical data with Aggregations.

Populating HBase tables through automation. Used Spark Shell for querying database.

Work on developing various provision jobs using Spark-Scala to provision the data to downstream partners.

Worked on various optimization techniques in Spark - Scala to enhance the efficiency of jobs

Working with various file formats like JSON, CSV, Text and Parquet. Using Spark with Scala to read different file formats and loaded data into database.

Worked on provisioning the data to down streams in FHIRformat (Fast Healthcare Interoperability Resources). Has good understanding of different resource types available in FHIR & good knowledge on developing code for FHIR format.

Worked on setting up the infrastructure for Docker Containers & OpenShift.

Has working experience on various Container Orchestration tools like OpenShift & Kubernetes.

Worked on converting the developed code Jar file to Docker Images & deploy Docker Images to OpenShift environment to run them as Docker Containers.

Implemented continuous integration and continuous deployment (CICD) of code in production using Jenkins pipeline.

Build Tools used were Maven and SBT.

Implemented SPARK batch jobs.

Used Fortify Scan and SonarQube to improve the code coverage.

Worked with TWS workflow scheduler to build workflows with dependencies & to run jobs.

Environment: MapR Hadoop Distribution, Hive, Kafka, HBase, IntelliJ, Maven builds, Spark, Spark SQL, Oozie, Linux/Unix, Shell Scripting, GIT, OpenShift, Kubernetes, Jenkins Pipeline.

Client: E* Trade, Arlington, VA Apr 2017 – Jun 2018

Role: Hadoop Developer

Roles & Responsibilities:

Worked in a team with 30-node cluster and increased cluster by adding Nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.

Daily Monitoring of Cluster status and health using Hue UI.

Created data importing pipelines from the MySQL and Oracle into the HDFS using Sqoop.

Stored data in AWS S3 like HDFS. Also performed EMR operations on data stored in S3.

Scripted AWS environments using secured VPC and different data pipelines and Redshift cluster.

Written Hadoop MapReduce jobs using Java for processing data on HDFS.

Written Spark applications utilizing Spark-Core, Data frames, Spark-SQL using Scala.

Import the data from HDFS/HBase into Spark RDD.

Created Hive tables and implemented partitioning, dynamic partitions, buckets and created external tables to optimize performance.

Extensively Involved in loading data from UNIX file system to HDFS.

Involved in evaluating the business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

Performed CRUD operations in HBase.

Developed Hive queries to process the data.

Oozie for automating events for Data Ingestion and Processing.

Generated aggregations and groups and visualizations using Tableau.

Environment: HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, HBase, Yarn, Spark, Tableau, and Cloudera Manager.

Client: Bank of West, SFO, California Aug 2016 – Mar 2017

Role: Hadoop Developer

Roles & Responsibilities:

Involved in all phases of Installation and upgradation of Hadoop big data platform. Implementing

security for Hadoop big data platform

Designed the sequence diagrams to depict the data flow into Hadoop.

Involved in importing and exporting data between HDFS and Relational Systems like Oracle, MySQL and DB2 using Sqoop.

Prepare SOPs for product installations, upgrades and any other new process. Analyze encryption methodologies and implement them in the environment

Setup best practices for monitoring. Analyze Hardware, Software requirements for the projects

Help Application and Operations team to troubleshoot the performance issues

Implemented Partitioning, Dynamic Partitions and bucketing in HIVE for efficient data access.

Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.

Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.

Involve in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.

Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

Environment: Hadoop, HDFS, Pig, Hive, Spark, MapReduce, Java.

Client: AIG, NJ Sep 2013 – Jul 2015

Responsibilities:

Implemented Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object.

Developed data pipeline using Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.

Worked on Hortonworks environment.

Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.

Developed several new Map Reduce programs to analyze and transform the data to uncover insights into the customer usage patterns.

Developed Hive UDFs to validate against business rules before data move to hive table

Developed MapReduce jobs in both PIG and Hive for data cleaning and pre-processing.

Developed Sqoop scripts for loading data into HDFS from DB2 and preprocessed with PIG.

Created Hive External tables and loaded the data into tables and query data using SQL.

Involved in writing Flume and Hive scripts to extract, transform and load the data into Database.

Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL database and Sqoop.

Developed shell script to pull the data from third party system's into Hadoop file system.

Supported in setting up QA environment and updating configurations for implementing scripts with Pig.

Involved in Database design and developing SQL Queries, stored procedures on MySQL.

Involved in Database design with Oracle as backend.

Environment: Hadoop, MapReduce, HDFS, Sqoop, Pig, HBase, Hive, Horton Works, Cassandra, Zookeeper, Cloudera, Oozie, DynamoDB, MongoDB, Sqoop, NoSQL, SQL, Oracle, UNIX/LINUX, Big Query, GCP, Amazon EC2, Amazon S3.

Client: Pramati, INDIA June 2012– Aug 2013

Role: Java Developer

Responsibilities:

Involved in design, development and analysis documents in sharing with Clients.

Developed web pages using Struts framework, JSP, XML, JavaScript, Hibernate, springs, Html/ DHTML and CSS, configure struts application, use tag library.

Developed Application using spring and Hibernate, spring batch, Web Services like Soap and restful Web services.

Used Spring Framework at Business Tier and spring’s Bean Factory for initializing services.

Used AJAX, JavaScript to create interactive user interface.

Implemented client-side validations using JavaScript & server-side validations.

Developed Single Page application using angular JS & backbone JS.

Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.

Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.

Database modeling, administration and development using SQL and PL/SQL in Oracle 11g.

Coded different deployment descriptors using XML. Generated Jar files are deployed on Apache Tomcat Server.

Involved in the development of presentation layer and GUI framework in JSP. Client-Side validations were done using JavaScript.

Involved in configuring and deploying the application using WebSphere.

Involved in code reviews and mentored the team in resolving issues.

Undertook the Integration and testing of the various parts of the application.

Developed automated Build files using ANT.

Used Subversion for version control and log4j for logging errors.

Code Walkthrough, Test cases and Test Plans

Environment: HTML5, JSP, Servlets, JDBC, JavaScript, Json, spring, SQL, Oracle 11g, Tomcat, Eclipse IDE, XML, XSL, ANT, Tomcat 5.

Contact this candidate