Big Data Hadoop Developer

Location:

New York, NY

Posted:

April 05, 2018

Contact this candidate

Resume:

Anand Kumar Pandey

Professional Summary:

* ***** ** ************ ********** in IT, including Big Data Hadoop Development, ETL, SQL and Java application development with data based product.

Strong understanding of Hadoop eco system such as Hadoop Cluster,HDFS, Map Reduce, YARN framework, Pig, Hive, Sqoop, Apache Storm, Flume, Oozie, HBase and Zookeeper and Hadoop streaming.

Procedural knowledge on cleansing and analyzing data using Hive, on Hadoop Platform and also on Relational databases such as Oracle, SQL, Teradata.

Experience with Hive external and managed table, UDF, Aggregation, JOINS, and file formats like Parquet, ORC.

Extensively worked on Spark using Scala, Spark Context, Spark-SQL, RDD's Transformation, Actions, Datasets and Data Frames.

Worked on Kafka to streamline data into Hadoop HDFS

Experience with Sqoping RDBMS data into HDFS, Ozzie Jobs using Hue and Pig Scripts using various jar available in Piggybank.

In depth knowledge of database like SQL, MySQL and extensive experience in writing SQL queries, Stored Procedures, Triggers.

Outstanding in data migration from heterogeneous databases like Oracle, flat files to SQL Server.

Proficient in Extracting, Transforming and Loading (ETL) data from different type sources such as Excel, Oracle, and flat file using SQL Server Integration Services.

Good knowledge about core Java, object oriented programming, Collections, data structures’ in Java.

Good experience in processing Unstructured, Semi-structured and Structured data, in various file format like JSON, XML, CSV.

Learning AWS, EMR, Postgre, Rest API, and Redshift.

Technical Skills:

Big Data Technologies

Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Tez.

Hadoop Distributions

Cloudera, Hortonworks.

Operating Systems

Windows, Linux, Ubuntu, Unix.

Programming Languages

SQL, PL/SQL, JAVA, SQL, Pig Latin, Hive, Scala, Unix Shell Scripting

Database Tools

Enterprise Manager, Query Analyzer, SQL Profiler, Upgrade Wizard, Replication, Database Engine Tuning Advisor, Business Intelligence Development Studio (BIDS)

Databases

MS-SQL, MS-Access, NoSQL, MS SQL Server […] 2012, MySQL, Oracle.

Reporting Tools/ETL Tools

Tableau, SQL Server Integration Services (SSIS)

Methodologies

Agile/Scrum, Waterfall

Protocols

HTTP, TCP/IP, FTP

Web Technologies

Web Services, XML, HTML

Development Tools

Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office (Word, Excel, PowerPoint, Access)

Professional Experience:

Client: IMS Health PA July 2017 – Till date

Role: Hadoop Developer

Description: IMS Health and Quintiles are now IQVIA. We are committed to providing solutions that enable healthcare companies to innovate with confidence, maximize opportunities and, ultimately, drive healthcare forward.

Responsibilities:

Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig Scripts on data.

Designed workflows and coordinators in Oozie to automate and parallelize Hive and Pig jobs in Cloudera Hadoop (CDH 5.9.0 - 5.10.0).

Worked on Hive developing external table, managed table, pipeline for smooth ETL processing.

Gained familiarity with both HUE UI as well as accessing HDFS files and data.

Involved in developing Hive DDLs to create, alter and drop Hive tables and Storm &amp, Kafka.

Developed a data pipeline using Kafka and Storm to store data into HDFS.

Developed Hive UDF to parse the staged raw data to get the item details from a specific store.

Built re-usable Hive UDF libraries for business requirements which enabled users to use these UDF’s in Hive querying.

Designed workflow by scheduling Hive processes for Log file data which is streamed into HDFS using Flume.

Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.

Written multiple MapReduce program in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.

Developed Look-Up table using spark to compare the integrity of data imported into HDFS records for every field and table at each phase of the data movement process from the original source system to the final target.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and gained experience in using Spark-Shell and Spark Streaming.

Loaded and transformed large sets of structured, semi-structured and unstructured data.

Indulged in regular stand-ups meetings, status calls, Business owner meetings with stake holders, Risk management teams in an agile environment.

Supported code/design analysis, strategy development and project planning.

Followed Scrum implementation of scaled agile methodology for entire project.

Environment: Cloudera Hadoop Cluster, UNIX Servers, Shell Scripting, Java Map Reduce, Hive, Storm, Sqoop, Flume, Oozie, Kafka, Git, Eclipse, Tableau.

Client: Speedway Enon, Ohio May 2016- Jun 2017

Role: Hadoop Developer

Description: Become a Member Today! At Speedway, with nearly every purchase–candy bars, drinks, you name it–you earn points toward free fuel and food, merchandise, & gift cards! Register today.

Responsibilities:

Worked on live 60 nodes Hadoop Cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1

Worked on Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and Map Reduce.

Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.

Implemented data access jobs through Pig, Hive, and HBase (0.98.0).

Importing and exporting data into HDFS and Hive using Sqoop.

Altered existing Scala programs to enhance performance and obtain partitioned results Spark tool.

Worked on processing unstructured data using Pig and Hive.

Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Used Impala to read, write and query the Hadoop data in HDFS or HBase.

Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.

Developed Pig Latin Scripts to extract data from the web server output files to load into HDFS.

Responsible in taking backups and restoration of Tableau repository.

Converted ETL operations to Hadoop system using Pig Latin operations, transformations and functions.

Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.

Exported the result set from Hive to MySQL using Shell Scripts.

Actively involved in code review and bug fixing for improving the performance.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Storm, Kafka, LINUX, Hortonworks distribution, Big data, Java APIs, Java collection, SQL, NoSQL, MongoDB.

Client: Remonter Solutions, Hyderabad, India June 2014- July 2015

Role: SQL Developer

Description: Application development and user interface maintenance was the main objective of this project. Providing an effective user interface application so that the customer can easily place purchase orders in NMHG website.

Responsibilities:

Interacted with team leaders, business users and various teams during issue handling and to gather both Functional and Technical requirements

Actively participated in requirements gathering, analysis and design and testing phases.

Created and updated Database, table, view, stored procedures, and functions.

Worked extensively on sql Query to meet the requirement.

Responsible for use case diagrams, class diagrams and sequence diagrams using Rational Rose in the Design phase.

Involved in Analysis, design and coding.

Connectivity with Databases MySQL and Oracle.

Involved in writing the database integration code.

Used the JDBC for data retrieval from the database for various inquiries.

Involved in writing database connection classes for interacting with Oracle database.

Created quality working code to design, schedule and cost to implement use cases.

Expert in creating various SQL stored procedures, views, functions and temporary tables for data input to the Crystal Reports.

Environment: SQL 2008-12, MySQL 5.1, Oracle 10g, Apache Tomcat 6.0

Client: Candor Works, Pune, India July 2012- May 2014

Role: Application Developer (SQL/Java)

Description: Candor Works is a software company that provides software development, testing, maintenance services to clients across the globe. The company offers highest quality and efficient solutions to help the clients build quality products and solutions.

Responsibilities:

Streamlined records management within enrollment management program provided by 9 campuses. Automate student enrollment process. Decrease time for required creation of 4,000 records on spreadsheet.

Designed and built SQL Server database that supported 1,000,000+ active and alumni student records. Combined database with excel to automate spreadsheet development.

Reduced hours for creating spreadsheet to 2 hours, with zero errors, decrease in man-hours, improvement in system scalability, and production of additional reports.

Architected Access database, connected system to server, and integrated network with system's GUI that facilitated teamwork among agents.

Worked on university websites, also trained staff on updating the website/ personal profiles on the website.

Responsible for managing scope, planning, tracking and change control aspects of the project.

Involved in database design in client server by analyzing business requirements.

Supported Implementation Team in understanding the Business Requirement. Work on change requests and thoroughly testing the changed functionality with impact analysis.

Documented Use Cases, Functional and Technical Design Documents.

Created and maintained database objects, complex Stored Procedures, Triggers, and Tables, Views and SQL Joins and other statements for various applications.

Wrote new stored procedures and modified existing ones and tuned them such that they performed well.

Effectively used temporary tables for stored procedures considering the performance issues with the front-end application.

Environment: MySQL, Core Java, OOP’s, JDK, Eclipse, IntelliJ, Visual Studio, SQL Server, T-SQL.

Education:

Master’s in Computer Information science from University of South Alabama.

Bachelors in Computer Science from JNTU-Hyderabad

Contact this candidate