Hadoop Developer

Location:

United States Military Acade, NY

Posted:

January 07, 2021

Contact this candidate

Resume:

Name: Ravi B

Big Data Engineer

E-Mail ID: ***********@*****.***

Contact No.: 703-***-****

Professional Summary:

• 7+ years Professional experience of IT in which 4+ Years of experience in design, development, and Implementation of Big data applications using Hadoop ecosystem frameworks and tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm HBase, Kafka, Flume, Nifi, Impala, Oozie, Zookeeper, Airflow, etc.

• Expertise in developing Scala and Java applications and good working knowledge of working with Python.

• Good Expertise in ingesting, processing, exporting, analyzing Terabytes of structured and unstructured data on Hadoop clusters in Healthcare, Insurance, and Technology domains.

• Experience in working with various SDLC methodologies like Waterfall, Agile Scrum, and TDD for developing and delivering applications.

• Experience in gathering requirements, analyzing requirements, providing estimates, implementation, and peer code reviews.

• In-depth knowledge of Hadoop Architecture and working with Hadoop components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, and MapReduce concepts.

• Demonstrated experience in delivering data and analytic solutions leveraging AWS, Azure or similar cloud data lake.

• Data Streaming from various sources like cloud (AWS, Azure) and on - premises by using the tools Spark.

• Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.

• Expertise in developing and tuning Spark applications using various optimizations techniques for executor tuning, memory management, garbage collection, Serialization assuring the optimal performance of applications by following best practices in the industry.

• Worked with various file formats such as CSV, JSON, XML, ORC, Avro, and Parquet file formats.

• Worked with various compression techniques like BZIP, GZIP, Snappy, and LZO.

• Expertise in writing DDLs and DMLs scripts in SQL and HQL for analytics applications in RDBMS and Hive.

• Expertise in working with Hive optimization techniques like Partitioning, Bucketing, vectorizations and Map side-joins, Bucket-Map Join, skew joins, and creating Indexes.

• Developed, deployed, and supported several MapReduce applications in Java to handle different types of data.

• Expertise in developing streaming applications in Scala using Kafka and Spark Structured Streaming.

• Experience in importing and exporting data from HDFS to RDBMS systems like Teradata (Sales Data Warehouse), SQL-Server, and Non-Relational Systems like HBase using Sqoop by efficient column mappings and maintaining the uniformity.

• Experience in working with Flume and NiFi for loading log files into Hadoop.

• Experience in working with NoSQL databases like HBase and Cassandra.

• Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.

• Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.

• Good Working Knowledge on working with AWS cloud services like EMR, S3, Redshift, EMR, Lambda, Glue, Data Pipeline, Athena for big data development.

• Experience in working with various build and automation like Maven, SBT, GIT, SVN, Jenkins.

• Experience in understanding of the Specifications for Data Warehouse ETL Process and interacted with the designers and the end users for informational requirements.

• Worked with Cloudera and Hortonworks distributions.

• Experienced in performing code reviews, involved closely in smoke testing sessions, retrospective sessions.

• Experienced in Microsoft Business Intelligence tools, developing SSIS (Integration Service), SSAS (Analysis Service) and SSRS (Reporting Service), building Key Performance Indicators, and OLAP cubes.

• Have good exposure with the star, snowflake schema, data modelling and work with different data warehouse projects.

• Good exposure to Python programming.

• Hands-on working with the reporting tool Tableau, creating dashboards attractive dashboards and worksheets.

• Strong experience in the design and development of relational database concepts with multiple RDBMS databases including Oracle10g, MySQL, MS SQL Server & PL/SQL.

• Involved in all phases of the Software Development Life Cycle (SDLC) in large scale enterprise software using Object-Oriented Analysis and Design.

Tools and Technologies:

Hadoop/BigData Technologies HDFS, Apache NIFI, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambari, Storm, Spark, and Kafka

No SQL Database HBase, Cassandra, MongoDB

Monitoring and Reporting Tableau, Custom Shell Scripts

Hadoop Distribution Horton Works, Cloudera, MapR

Build and Deployment Tools Maven, Sbt, Git, SVN, Jenkins

Programming and Scripting Java, SQL, JavaScript, Shell Scripting, Python, Pig Latin, HiveQL

Java Technologies J2EE, Java Mail API, JDBC

Databases Oracle, MY SQL, MS SQL Server, Vertica, Teradata

Analytics Tools Tableau, Microsoft SSIS, SSAS and SSRS

Web Dev. Technologies HTML, XML, JSON, CSS, JQUERY, JavaScript

IDE Dev. Tools Eclipse 3.5, Net Beans, My Eclipse, Oracle, JDeveloper 10.1.3, Ant, Maven, RAD

Operating Systems Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003

AWS Services EC2, EMR, S3, Redshift, EMR, Lambda, Glue, Data Pipeline, Athena

Network protocols TCP/IP, UDP, HTTP, DNS, DHCP

Professional Experience:

Client: AT&T, Warrenville, IL October 2019 – Present

Role: Hadoop Spark Developer

Responsibilities:

• Developed Spark applications to implement various aggregation and transformation functions of Spark RDD and Spark SQL.

• Worked on DB2 for SQL connection to Spark Scala code to Select, Insert, and Update data into DB.

• Used Broadcast Join in SPARK for making smaller datasets to large datasets without shuffling of data across nodes

• Involved in complete end to end application design, development, and deployment which includes analysis, data ingestion from multiple sources, processing, and persisting.

• Developed Sqoop jobs to ingest data from RDBMS Systems like Teradata and Oracle database into HDFS data lakes and S3 buckets.

• Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm, and web Methods technologies.

• Worked on NiFi data Pipeline to process a large set of data and configured lookups for Data Validation.

• Developed DDL and DML scripts to create external tables and analyze intermediate data for analytics applications in Hive.

• Created Spark Streaming jobs using Python to read messages from Kafka & download JSON files from AWS S3 buckets

• Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra

• Implemented Spark Scripts using Spark Session, Python, Spark SQL to access hive tables into spark for faster processing of data.

• Developed data processing applications in Scala using Spark RDD as well as Dataframes using SparkSQL APIs.

• Worked with Spark Session Object on Spark SQL and Data-Frames for faster execution of Hive queries

• Import the data from different sources like SQL Server into Spark RDD and developed a data pipeline using Kafka and Spark to store data into HDFS

• Used SparkSql to load JSON data and create schema RDD and load it into Hive tables and handled Structured data using SparkSql.

• Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.

• Involved in tuning and optimizing the long-running Hive queries using Hive Joins, vectorizations, Partitioning, Bucketing, and Indexing.

• Involved in tuning the Pyspark applications using various memory and resource allocation parameters, setting the right Batch Interval time, and varying the number of executors to meet the increasing load over time.

• Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.

• Developed Spark scripts to import large files from Amazon S3 buckets into the HDFS cluster.

• Involved in designing and developing tables in HBase and storing aggregated data from Hive Tables.

• Integrated Hive and Tableau Desktop report and published to Tableau Server.

• Developed shell scripts for running Hive scripts in Hive.

• Deployed Big data applications on the EMR cluster on AWS.

• Developed ETL applications to execute them using Glue.

Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, SparkSQL, Spark Streaming, Nifi, Airflow, EC2, EMR, S3, Redshift, EMR, Lambda, Glue, Data Pipeline, Athena Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera

Client: Aetna, Hartford, CT Jan 2018 – Aug 2019

Role: Big Data Developer

Responsibilities:

• Built NiFi flows for data ingestion purposes. Ingested data from Kafka, Microservices, CSV files from edge nodes using NIFI flows.

• Developed Spark programs using Scala to compare the performance of Spark with Hive and SparkSQL.

• Developed spark streaming application to consume JSON messages from Kafka and perform transformations.

• Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.

• Implemented Spark using Scala and SparkSql for faster testing and processing of data.

• Used JSON and XML SerDe’s for serialization and de-serialization to load JSON and XML data into hive tables.

• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs with Scala.

• Worked on creating Keyspace in Cassandra for saving the Spark Batch output.

• Worked on the Spark application to compact the small files present into the hive ecosystem to make it equivalent to block size of HDFS.

• Manage the migration of on-prem servers to AWS by creating golden images for upload and deployment

• Manage multiple AWS accounts with multiple VPC’s for both production and non-production where primary objectives are automation, build-out, integration, and cost control.

• Implemented the real-time streaming ingestion using Kafka and Spark Streaming

• Loaded data using Spark-streaming with Python.

• Involved in the requirement and design phase to implement Streaming Lambda Architecture to use real-time streaming using Spark and Kafka.

• Experience in loading the data into Spark RDD and performing in-memory data computation to process valid and invalid data.

• Developed Sqoop jobs to migrate data from SQL server to HDFS and vice versa.

• Worked on Performance Enhancement in Pig, Hive, and HBase on multiple nodes.

• Supported Map Reduce Programs that are running on the cluster and developed multiple Map Reduce jobs in Java for data cleaning and pre-processing.

• Created DDL and DML scripts for creating Hive tables and loading the processed data and perform various analytical operations using HiveQL.

• Developed MapReduce applications using Hadoop MapReduce and Yarn.

• Orchestrated jobs and data pipelines using Oozie Scheduler.

• Developing ETL jobs with organization and project defined standards and processes

Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, SparkSQL, Spark Streaming, Nifi, Oozie, EC2, EMR, S3, Redshift, EMR, Lambda, Glue, Data Pipeline, Athena Eclipse, UNIX Shell Scripting, Cloudera.

Client : CAMELOT INTEGRATED, Houston, TX November 2016 to Dec 2017

Role: Hadoop/Spark Developer

Responsibilities:

• Involved in architecture design, development, and implementation of Hadoop deployment, backup, and recovery systems.

• Developed MapReduce programs in Python using Hadoop to parse the raw data, populate staging tables, and store their fined data in partitioned HIVE tables.

• Enabled speedy reviews and first-mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.

• Converted applications that were on MapReduce to PySpark which performed the business logic.

• Involved in creating Hive tables, loading with data, writing hive queries that will run internally in map reduce way.

• Imported Teradata datasets onto the HIVE platform using Teradata JDBC connectors.

• Was involved in writing Fast Load and Multi Load scripts to load the tables.

• Involved in the Complete Software development life cycle (SDLC) to develop the application.

• Worked on analyzing the Hadoop cluster using different big data analytic tools including Pig, Hive, and Map Reduce.

• Worked with the Data Science team to gather requirements for various data mining projects.

• Worked with different source data file formats like JSON, CSV, and TSV, etc.

• Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig, and loaded data back into HDFS.

• Performed transformations, cleaning, and filtering on imported data using Hive, Map Reduce.

• Import and export data between the environments like MySQL, HDFS, and deploying into productions.

• Used Pig as an ETL tool to do transformations, event joins, and some pre-aggregations before storing the data onto HDFS.

• Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.

• Working on tuning long-running hive queries using various optimization techniques.

• Have written multiple-unit cases for the MapReduce applications using JUnit.

• Involved in developing Impala scripts to do Adhoc queries.

• Involved in importing and exporting data from HBase using Spark.

Environment: Apache Hadoop, AWS, EMR, EC2, S3, Horton works, Map Reduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Java, Oozie, Oracle, MySQL, Netezza, and UNIX Shell Scripting.

Client: Astellas Pharma, India Aug 2014 – Sep 2016

Role: Java Developer

Responsibilities:

• Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class models, Sequence Diagrams, and Activity diagrams for the SDLC process of the application.

• Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX

• Configured the project on WebSphere 6.1 application servers.

• Implemented the online application by using Core Java, JDBC, JSP, Servlets, and EJB 1.1, Web Services, WSDL.

• Used the Log4J logging framework to write Log messages with various levels.

• Implemented Struts for the controller logic.

• Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.

• Expert in writing, configuring, and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.

• Expert in writing Hibernate Query Language (HQL) and Tuning the hibernate queries for better performance.

• Updated Struts configuration file, Validation, and tiles XML document.

• Implemented Client-side validation using JavaScript

• Implemented based on MVC Architecture.

Environment: Java1.6, JSP, Struts1.x, Spring3.2, Hibernate4.6, Eclipse and Oracle10g

Client: Micronet Technicks, India Jun 2013 to Jul 2014

Role: Java Developer

Responsibilities:

• Designed and developed user interfaces using HTML, JSP, and Struts tags.

• Experienced in developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC, etc.

• Validating the views using validator plug-in in Struts Framework.

• Writing test cases using JUNIT, doing test-first development.

• Writing build files using ANT. Used Maven in conjunction with ANT to manage build files

• Used Hibernate for data persistence and interaction with the database.

• Involved in developing the Struts Action Classes.

• Develop test cases for Unit testing and sanity testing

Environment: Oracle, JDK, Struts, Hibernate, Tomcat, Windows 2000

Contact this candidate