Data Software Engineer

Location:

Cherry Hill, NJ

Posted:

December 21, 2017

Contact this candidate

Resume:

SARANYA BALAMURUGAN

PROFESSIONAL SUMMARY:

*+ years of IT experience in development and implementation of Java, Big Data Hadoop and spark.

* ***** ** ********* ********** in big data implementation in Cloudera distribution.

Experience in ingestion, storage, querying, processing and analysis of big data.

Excellent hands on experience on Apache Hadoop technologies like Hive, Sqoop, Spark with Scala, Impala, HBase.

Good understanding and knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, MapReduce and Yarn concepts.

Expertise skills of Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.

Experience in developing Hadoop integration for data ingestion, data mapping and data process capabilities.

Expertise in writing shell Scripts, HiveQL queries and Map Reduce programs.

Experience in ETL process using Spark(Scala).

Hands on experience in Spark SQL and Spark Streaming.

Hands on experience in FLUME.

Worked in Agile Environment.

Knowledge on Apache Kafka.

Worked on semi structured data like JSON and XML files.

Extensive experience in development of web applications utilizing Core Java, JDBC, SQL, servlets, hibernate, Tomcat, Eclipse IDE, RDBMS Oracle.

Very good experience in complete project life cycle (design, development, testing and

implementation) of Web applications.

Hands on experience in VPN, Putty.

Exceptional ability to learn new concepts.

EDUCATION QUALIFICATION:

Bachelor of Technology (Information Technology) from Anna University, INDIA - 2009

TECHNICAL SKILLS:

Hadoop ECO Systems

HDFS, Map Reduce, HDFS, Hive, Sqoop, Flume, and HBase, Impala, Kafka

NO SQL

Hbase

Data Bases

MY SQL, Oracle 10g

Languages

Scala, Java, C/C++, SQL, HiveQL

Operating Systems

Windows XP/10, LINUX

IDE’s & Utilities

Eclipse, Hue

Professional Experience

Client: Deutsche Bank, NYC, NY. Nov 2016 - present

Project: Data Migration and Integration.

Role: Hadoop/Spark Developer

Description: Deutsche Bank is a German global banking and financial services company, with its headquarters in the Deutsche Bank Twin Towers in Frankfurt. It has more than 100,000 employees in over 70 countries, and has a large presence in Europe, the Americas, Asia-Pacific and the emerging markets. As of June 2017, Deutsche Bank is the 16th largest bank in the world by total assets.

Trading data from various data sources like MySQL, Oracle and in different formats like Text file, JSON and XML files are migrated to Hadoop environment using tools like Sqoop, Flume, Spark, and Hive.

In Hive data warehouse various layers are maintained for supporting data validation, data analytics and to store history of data.

Responsibilities:

Involved in gathering the requirements, designing, development and testing.

Involved in running MapReduce and Spark jobs over YARN.

Experience with Cloudera distributions (CDH5).

Developed the Sqoop scripts for import between MySQL, Oracle and Hadoop.

Configured and used FLUME for data ingestion from various data sources containing different file formats like JSON and XML.

Involved in creating UNIX shell scripts to automate sqoop imports from relational databases.

Involved in writing HDFS CLI commands.

Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.

Developed Spark codes using Spark-SQL for faster processing of data.

Used Spark-SQL Dataframes to convert JSON files into CSV format.

Created external Hive tables to store spark output.

Knowledge about different layers of data flow.

Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.

Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

Developing predictive analytic product by using Apache Spark, SQL/Hive QL.

Worked with different file format such as Avro, RC, ORC, Parquet file formats and compression techniques in Hive storage.

Hands on experience in creating and maintaining various layers in Hive data warehouse.

Created Hive Managed, External Tables and Views to store the processed data and integrate them to reporting tool Tableau.

Worked on data validation and data cleansing using Hive queries.

Involved in writing the HIVE join queries to extract data as per business requirement.

Involved in performance tuning for skewed data in HIVE.

Worked on Adhoc requests by providing the data based on user requirement.

Environment: Hadoop cloudera distribution (CDH 5.1.0), Apache Hadoop 2.6.0, Hive 1.1.0, SQOOP 1.4.6, Flume 1.6.0, Spark 1.6.0, Hue 3.5.0, Linux, core java, oozie, MySQL, Oracle, Windows

Client: Mallinckrodt pharmaceuticals, India Jan 2014 – Feb 2016

Project: Veeva CRM data Integration

Role: Hadoop Developer

Description: Source data from UBC and Veeva CRM are processed using Pig and Stored in Hive tables. Then these data are integrated to reporting system for data analysis.

Responsibilities:

Written SHELL Scripts to automate Sqoop commands to move data from relational databases into HDFS.

Configured and used FLUME for data ingestion from various data sources containing different file formats like JSON and XML.

Experience on loading and transforming of large sets of structured, semi structured.

Written the Apache PIG scripts to process the XML data and parse it into structured format.

Implemented best offer logic using Pig scripts.

Used Piggybank functions to process xml files.

Used Pig Diagnostic operators to evaluate step by step execution of Pig statements.

Used Eval operators in pig to do arithmetic calculations in data based on business need.

Written Hive join queries as per the requirement.

Worked with different file format such as Avro, RC, ORC, Parquet file formats and compression techniques in Hive storage.

Created Managed tables, Views to store processed data.

Experience in running MapReduce and Spark jobs over YARN .

Experience with Cloudera distributions (CDH5).

Involved in Code reviews.

Environment: Hadoop cloudera distribution (CDH 5.1.0), Apache Hadoop 2.6.0, Hive 1.1.0, SQOOP 1.4.6, Flume 1.6.0, pig 0.2.0, Hue 3.5.0, Linux, core java, oozie, Windows

Client: CISCO Systems, India June 2012 – Dec 2013

Project: CISCO Performance Analytics

Role: Software Engineer(J2EE)

Description: Used SQL queries to maintain database of CISCO costumers. Using Java application provided a user-friendly environment to do analysis of CISCO network performance

Responsibilities:

Designed the system with object-oriented methodology.

Participate in the whole SDLC lifecycle from the re-architecture stage to maintenance stage for products.

Gathered, analyzed and coded Business Requirements.

Extensively worked on SQL Queries, Stored procedures and Triggers.

Used Struts validation framework for validations.

Created the database tables with indexes and views in the database-using Oracle.

Responsible for Analysis, Coding and Unit Testing and Support.

Environment: Java, J2EE, Struts, SQL, JDBC, Eclipse, Windows.

Client: NMHG, India March 2010 – May 2012

Project: NMHG UI Support and Maintenance.

Role: Software Engineer(J2EE)

Description: NMHG (NACCO Materials Handling Group- world's largest lift truck manufacturers)

Java Application development and user interface maintenance was the main objective of this project. Provided an effective user interface application so that the customer can easily place purchase orders in NMHG website.

Responsibilities:

Involved in Analysis, Design, Coding and Development of custom Interfaces.

Involved in the feasibility study of the project.

Gathered requirements from the client for designing the Web Pages.

Participated in designing the user interface for the application.

Involved in writing complex sub-queries and used Oracle for generating on-screen reports.

Worked on database interaction layer for insertions, updating and retrieval operations on data.

Involved in deploying the application in test environment using Tomcat.

Environment: Java, J2EE, JSP, Servlets, EJB, Java Beans, JDBC, Oracle, Eclipse, Servlets, Windows.

Contact this candidate