Resume

Data Developer

Location:

United States

Posted:

July 01, 2016

Contact this candidate

Resume:

Kumar

Hadoop Developer

acvjar@r.postjobfree.com

+1-361-***-****

Professional Summary:

Having 2 1/2 years of experience in End-to-end in Big Data implementation with strong experience on major components of Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, HBase, Zookeeper, Kafka, Sqoop, Spark.

Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera (CDH3, CDH4) distributions and on amazon web services (AWS)

In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Map Reduce, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, schedulers, scalability and distributed platform architectures

Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.

Knowledge on programming languages Java, Scala and Linux.

Good understanding of NoSQL Data bases and hands on work experience in writing applications on No SQL databases like Cassandra, HBase.

Strong experience and knowledge of real time data analytics using Flume and Spark.

Experience in working with Spark tools like RDD transformations, spark Mlib and spark QL.

Good working experience on Installing and maintaining the Linux servers.

Good experienced in working with agile, scrum and Waterfall methodologies.

Educational Qualifications:

Master’s in Computer Science Aug, 2015

Texas A&M University Kingsville GPA: 3/4

B.Tech in Electronics & Communications Engineering May, 2013

Jawaharlal Nehru Technological University, India

GPA: 3.2/4.0

Technical Skills:

Programming Languages: Core Java, Scala, Shell-Scripting.

Operating System: Windows, Red Hat Linux, CentOS, Ubuntu, Mac OSX.

Application Software: Microsoft Office, Adobe Photoshop, Microsoft Visio.

Web Technology: HTML, CSS, JavaScript, jQuery, Express Node.js framework

Databases: SQLite, MySQL, Oracle DB, Teradata.

NoSql databases: Cassandra.

Hadoop Ecosystem: Spark, Hive, Pig, Sqoop, MapReduce, Nifi, Oozie.

IDE: Eclipse, Microsoft Visual Studio, NetBeans, IntelliJ.

Work Experience:

Hadoop Developer

Payless Shoe Source – Topeka, Kansas.

September 2015 to May 2016

Responsibilities:

Collaborated with Business analysts for requirements gathering, business analysis and designing of the Enterprise Data warehouse

Pre Processed the data ingested using Apache Pig to eliminate the bad records as per business requirements with the help of filter functions, User Defined Functions.

Involved in ETL design and development for extracting data from the heterogeneous source systems like DB2, UDB, MS SQL Server, Oracle, flat files, XML files and loading into Hadoop.

Documented ETL test plans, test cases, test scripts, test procedures, assumptions, and validations based on design specifications for unit testing, system testing, expected results, preparing test data and loading for testing, error handling and analysis.

Reporting and analysis of data using Zeppelin.

Involved in Unit testing, User Acceptance Testing to check whether the data is loading into target, which was extracted from different source systems according to the user requirements.

Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.

Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL.

Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.

Experience in managing nodes on Hadoop cluster and monitor Hadoop cluster job performance using Cloudera manager.

Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.

Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

Involved in NoSQL (DataStax Cassandra) database design, integration and implementation.

Good knowledge on Spark components like Spark SQL, MLib, Spark Streaming, Data Frames and GraphX.

Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

Developing predictive analytic using Apache Spark Scala APIs.

Wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling.

Involved in agile methodologies, daily scrum meetings, spring planning's.

Environment: Hadoop, Teradata, Cassandra, Spark streaming, Spark MLib, Spark GraphX, Spark SQL, Spark Data Frames, Pig, Hive, Sqoop, Scala

Hadoop Developer

Souls Tech - India

August 2013 to November 2014

Consumer Database (CDB) is a mainframe system, which stores the information about daily ongoing consumer registration and policy updates. The scope of this project is to migrate CDB application to the Bigdata world.

Responsibilities:

Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.

Wrote the Map Reduce jobs in core java to parse the web logs, which are stored in HDFS.

Wrote Junit test cases to test and debug Map Reduce programs in local machine.

Involved in loading data from UNIX file system to HDFS using Shell Scripting.

Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.

Transferred Data from RDBMS to HDFS using SQOOP.

Worked on developing ETL processes to load data from multiple data sources to HDFS using Sqoop, perform structural modifications using Map-Reduce, analyze data using Hive and visualizing in dashboards.

Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.

Experienced in converting ETL operations to Hadoop system using Pig Latin operations, transformations and functions.

Experienced in running Hadoop streaming jobs to process terabytes of formatted data using python scripts.

Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.

Implemented Partitioning, Dynamic Partitions, buckets in Hive

Integrated Hadoop Security with Active Directory by implementing Kerberos for authentication and Sentry for authorization.

Successfully loaded files to Hive and HDFS from HBase

Implemented the workflows using Apache Oozie framework to automate tasks.

Performance tuning of Hive Queries.

Coordinating the team and resolving the issues of the team technically as well as functionally.

Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, CDH, Kafka, Spark, Storm, Apache Crunch, Python, Maven, Linux.

Academic Projects:

Graduate Projects:

Title: Hadoop on Cloud Computing.

Tools: CentOS, Cloudera, Hortonworks, AWS.

Description: Growing popularity of cloud computing, enterprises are seriously looking at moving workloads to the cloud. There are issues around multi-tenancy, data security, software license, data integration, etc. In recent years, Hadoop has gained a lot of interest as a big data technology that can help enterprises cost-effectively store and analyze massive amounts of data. As enterprises start evaluating Hadoop with Cloud as Cloudera, Hortonworks and AWS.

Title: Database Encryption.

Description: Database Encryption is a process of converting data, within a database, in plain text format into a meaningless cipher text by means of a suitable algorithm. The database encryption protects the stored data. Database encryption is done to encrypt sensitive data like credit card numbers, medical records, personal information, etc. Using database encryption can limit a database administrator in copying or seeing information.

Undergraduate Projects:

Title: Application of Matlab in moving object detecting algorithm.

Description: Moving object detecting algorithm is one of the current research hotspots and widely used in field such as computer Vision and video processing. The study on the image processing tool kit using computer language Matlab was conducted to perform moving detecting technical processing on video images. First, video pre-processing steps such as frame separation, binary operation, gray enhancement and filter operation were conducted. Then the detection and extraction of Moving object was carried out on images according to frame difference based dynamic background refreshing algorithm. Finally, the desired video image casing on the moving object in the video.

Contact this candidate