Data Engineer Hadoop Developer

Location:

Herndon, VA

Posted:

March 24, 2023

Contact this candidate

Resume:

Name: Hari Priya

Data Engineer

Email: **************@*****.***

PROFESSIONAL SUMMARY

●Over 7 years of strong experience in Software Development and 5 years strong emphasis on Data Engineering and Data Analytics using large scale datasets.

●Strong experience in end-to-end data engineering including data ingestion, data cleansing, data transformations, data validations/auditing and feature engineering.

●Strong experience in programming languages like Java, Scala, and Python.

●Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka

●Good hands-on experiencing working with various Hadoop distributions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.

●Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.

●Expertise in developing production ready Spark applications utilizing Spark-Core, DataFrames, Spark-SQL, Spark-ML and Spark-Streaming API's.

●Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.

●Worked extensively on building real time data pipelines using Kafka for streaming data ingestion and Spark Streaming for real time consumption and processing.

●Worked extensively on Hive for building complex data analytical applications.

●Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.

●Experience in managing the Hadoop infrastructure with Cloudera Manager.

●Good exposure on usage of NoSQL databases column oriented HBase, Cassandra and MongoDB (Document Based DB).

●Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.

●Good experience working with Apache Nifi for building data flows from multiple sources like Ftp, Rest api’s etc.,

●Good experience working with AWS Cloud services like S3, EMR, Lambda functions, Redshift, Athena, Glue etc.,

●Solid experience in working with csv, text, Avro, parquet, orc, Json formats of data.

●Working experience on Core java technology, which includes efficient use of Collections framework, Multithreading, I/O & JDBC, Collections, localization, ability to develop new API for different projects.

●Experience in building, deploying, and integrating applications in Application Servers with ANT, Maven and Gradle.

●Experience in using IDE tools such as Visual Studio, NetBeans, and Eclipse and application servers WebSphere, WebLogic, and Tomcat

●Expertise in all phases of System Development Life Cycle Process (SDLC), Agile Software Development, Scrum Methodology and Test-Driven Development.

●Used Tomcat server for the application development and Utilized JIRA for task scheduling.

●Experience in using Version Control tools like Git, SVN.

●Experience in web application design using open source MVC, Spring and Spring Boot Frameworks.

●Adequate knowledge and working experience in Agile and Waterfall Methodologies.

●Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.

●Have good interpersonal, communication skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.

SKILL SET:

Hadoop Components

HDFS, Hue, MapReduce, Hive, Sqoop, Impala, Zookeeper, Flume

Spark Components

Spark RDD, Data Frames, SparkSQL, PySpark, Spark Streaming, Spark ML

Databases

Oracle, Teradata, Microsoft SQL Server, MySQL.

Programming Languages

Java, Python, Scala.

Web Servers

Windows server 2005/2008/2012 and Apache Tomcat.

Cloud Services

AWS S3, EMR, Redshift, Glue, Lambda, Step Functions, Athena, GCP Dataproc, Big Query

IDE

Eclipse, IntelliJ IDEA, PyCharm

NoSQL Databases

HBase, Cassandra, MongoDB.

Release Management Tools

Jenkins, Maven, SBT, GitHub, Jira

Development methodologies

Agile/Scrum

PROFESSIONAL EXPERIENCE:

Client: Kaiser Permanente, SFO, CA Sep 2021- Present

Data Engineer

Responsibilities:

●Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement using PySpark.

●Worked on writing Scala programs using Spark/Spark-SQL in performing aggregations.

●Developed MapReduce programs running on Yarn using Java to perform various ETL, cleaning and scrubbing tasks.

●Worked on different data formats such as Parquet, AVRO, Sequence File, Map file and XML file formats.

●Ingested data from various data sources into Hadoop HDFS/Hive Tables and managed data pipelines in providing DaaS (Data as Service) to business/data scientists for performing the analytics.

●Worked with technology and business user groups for Hadoop migration strategy.

●Installed & configured multi-node Hadoop Cluster and performed troubleshooting and monitoring of Hadoop Cluster.

●Worked on real-time data processing using Spark/Storm and Kafka using Scala.

●Wrote Spark Jobs using Scala for analyzing data.

●Developed web services in play framework using Scala in building stream data platform.

●Used Cassandra to store billions of records to enable faster & efficient querying, aggregates & reporting.

●Worked on writing SQL queries in retrieving data from Cassandra.

●Used DataMeer for integration with Hadoop and other sources such as RDBMS (Oracle), SAS, Teradata, and Flat files.

●Sqooped data from DB2, Oracle to Hadoop to increase the current retention period of 1 year to 5 years.

●Wrote Hive and Pig Scripts to analyze customer satisfaction index, sales patterns etc.

●Extended Hive and Pig core functionality by writing custom UDFs using Java.

●Orchestrated Sqoop scripts, Pig scripts, Hive queries using Oozie workflows.

●Worked on Data Lake architecture to build a reliable, scalable, analytics platform to meet batch, interactive and on-line analytics requirements.

●Integrated Tableau with Hadoop data source for building dashboard to provide various insights on sales of the organization.

●Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.

●Worked on AWS EC2, S3 & EMR for bursting requirements.

●Worked on setting up Apache NiFi and performed POC using NiFi in orchestrating data flows.

●Performed various POC’s in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive, Pig, Sqoop, Flume, Elastic Search.

●Worked on Spark in building BI reports using Tableau. Tableau was integrated with Spark using Spark-SQL.

●Worked on writing cluster automation using Chef & Docker.

Environment: Scala, Hadoop, MapReduce, Spark, Yarn, Hive, Pig, Nifi, Pyspark, Kafka, Hortonworks, Cloudera, Sqoop, Flume, Elastic Search, Cloudera Manager, Java, J2EE, Web services, Hibernate, Struts, JSP, JDBC, XML, Weblogic Workshop, Jenkins, Maven.

Client: AMEX, Phoenix, AZ Nov 2020 – Aug 2021

Senior Big data developer

Responsibilities:

●Ingested gigabytes of click stream data from external servers such as FTP server and S3 buckets on daily basis using customized home-grown Input Adapters.

●Created Sqoop scripts to import/export data from RDBMS to S3 data store.

●Developed various spark applications using Scala to perform cleansing, transformation, and enrichment of these click stream data.

●Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting.

●Troubleshooting Spark applications for improved error tolerance and reliability.

●Fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines.

●Created Kafka producer API to send live stream Json data into various Kafka topics.

●Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.

●Utilized Spark in Memory capabilities, to handle large datasets.

●Used Broadcast variables in Spark, effective & efficient Joins, transformations, and other capabilities for data processing.

●Experienced in working with EMR cluster and S3 in AWS cloud.

●Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

●Involved in continuous Integration of application using Jenkins.

●Interacted with the infrastructure, network, database, application, and BA teams to ensure data quality and availability

●Followed Agile Methodologies while working on the project.

Environment AWS EMR, Spark, Hive, HDFS, Sqoop, Kafka, Oozie, HBase, Scala, Java.

Nordstrom, Seattle, WA Jan 2019 – Oct 2020

Big Data/Hadoop Developer

Responsibilities:

Involved in creating data ingestion pipelines for collecting health care and providers data from various external sources like FTP Servers and S3 buckets.

Involved in migrating existing Teradata Datawarehouse to AWS S3 based data lakes.

Involved in migrating existing traditional ETL jobs to Spark and Hive Jobs on new cloud data lake.

Wrote complex spark applications for performing various de-normalization of the datasets and creating a unified data analytics layer for downstream teams.

Primarily responsible for fine-tuning long running spark applications, writing custom spark udf’s, troubleshooting failures etc.,

Involved in building a real time pipeline using Kafka and Spark streaming for delivering event messages to downstream application team from an external rest-based application.

Involved in creating Hive scripts for performing Adhoc data analysis required by the business teams.

Worked extensively on migrating on prem workloads to AWS Cloud.

Worked on utilizing AWS cloud services like S3, EMR, Redshift, Athena, and Glue Metastore.

Used broadcast variables in spark, effective & efficient Joins, caching, and other capabilities for data processing.

Involved in continuous Integration of application using Jenkins.

Environment: AWS EMR, Spark, Hive, HDFS, Sqoop, Kafka, Oozie, HBase, Scala, MapReduce.

Client: Qualcomm- India Feb 2017 - Oct 2018

Role: JAVA/J2EE Developer

Responsibilities:

●Developed JMS API using J2EE package.

●Made use of Java script for client-side validation.

●Used Struts Framework for implementing the MVC Architecture.

●Wrote various Struts action classes to implement the business logic.

●Involved in the design of the project using UML Use Case Diagrams, Sequence Diagrams, Object diagrams, and Class Diagrams.

●Understand concepts related to and written code for advanced topics such as Java IO, serialization and multithreading.

●Used DISPLAY TAGS in the presentation layer for better look and feel of the web pages.

●Developed Packages to validate data from Flat Files and insert into various tables in Oracle Database.

●Provided UNIX scripting to drive automatic generation of static web pages with dynamic news content.

●Participated in requirements analysis to figure out various inputs correlated with their scenarios in Asset Liability Management (ALM).

●Assisted design and development teams in identifying DB objects and their associated fields in creating forms for ALM modules.

●Also involved in developing PL/SQL Procedures, Functions, Triggers and Packages to provide backend security and data consistency.

●Responsible for performing Code Reviewing and Debugging.

Environment: Java, J2EE, UML, Struts, HTML, XML, CSS, Java Script, Oracle, SQL*Plus, PL/SQL, MS Access, UNIX Shell Scripting.

Alchemy Solutions – India Jan 2016 – Feb 2017

Java Developer

Responsibilities:

●Actively participated in requirements gathering, analysis, design, and testing phases.

●Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.

●Developed the entire application implementing MVC Architecture integrating JSF with Hibernate and Spring frameworks.

●Created and implemented stored procedures, functions, triggers, using SQL.

●Setting up client-side validations using JavaScript.

●Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.

●Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.

●Developed Web Services for data transfer from client to server and vice versa using Apache Axis and SOAP.

●Implemented various J2EE Design patterns like Singleton, Service Locator, DAO, and SOA.

●Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.

Environment: J2EE, JDBC, Java 1.4, Servlets, JSP, Struts, Hibernate, Web services, SOAP, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic 8.0, XML, JUnit, Oracle 10g, Web Sphere, Eclipse.

Contact this candidate