Big Data Lead

Location:

Raleigh, NC

Posted:

February 13, 2018

Contact this candidate

Resume:

Vikrant Sikarwar

Ph. No: +1-814-***-**** Email: *****************@*****.***

Raleigh NC

Big Data Lead - Technology

A technically competent and industry savvy IT professional with around 12 years of experience in Software Development industry, and 5 years in Big Data Hadoop world. I have worked on end-to-end Big Data projects with widening my scope from the core development to techno-functional role also. Professional experience in Software Development Life Cycle (SDLC) which including Design, Implementation and Testing during the development of software applications using Map Reduce, Big Data Technologies, Java, J2ee technologies, familiarity with configuration management and project execution. An effective team player who continuously seeks opportunities to master new domains and technologies.

Core Competencies

Big Data Analytics

System design and implementation

Client Relationship Management

Team management and leadership Release/Deployment Management

Web Application Development

Matured in using open source tools and technologies

Technical leadership

Tools and Technologies

Hadoop Technologies: Spark, Scala, Map-Reduce, PIG, Hive, Shell Script, Cascading, HBase, HDFS, Yarn, OOzie, Flume, Sqoop, MRUnit, Kafka, Cloudera/Hortonworks Hadoop, Pepper Data, Hue, Knox, Ranger.

Other Tools and Technologies: Java (1.5, 1.6,1.7, 1.8), JSP, Genkins Gradle, Liquibase for Database queries promotion, Servlets, Struts Frameworks, Spring Frameworks, Hibernate, Jenkins Gradle, Oracle 9i/10g, Toad/SQL Developer, RDBMS Concepts, Jakarta Tomcat 7.0, JBoss 5.1.0, WebSphere 6.1, Rally, JIRA, BugTrack, Web Services (RESTful/SOAP), Eclipse, IntelliJ14.2, Putty/WinSCP, ANT Script/Maven Script, for deployment CVS, SVN, GitHub and Tortoisehg, VisualVM

Summary

Development of multiple Map Reduce jobs in java and through Cascading API for data cleansing and preprocessing.

Defining processes for loading data from RDBMS into HDFS using Sqoop.

Defining processes for loading data from various sources into HDFS and HBase using Flume.

Development of PIG Jobs scripts for data cleansing and data ingestion.

Defining and automating the jobs deployment to Hadoop cluster with scheduling Oozie workflow to run the Jobs.

Using Ranger for the security and also build the Web application Api for the Operations team to define policy and access for Data-Lake.

Development on Spark SQL and Spark Streaming using Scala.

Worked on writing shell scripts for validating different checks on data through running hive queries.

Worked with gradle for deployment of the Hadoop Jobs through Knox to DataLake.

Worked with Jenkins gradle for creation of Jenkins jobs to be used for the deployment of the different workflow/coordinator Hadoop Jobs to DataLake in different environments(Dev/Tst/AT/Prod) using Knox

Using code versioning using GIT.

Design the application, participate in design discussions, and review design artifacts.

Handle client communication regarding requirements, design, etc.

Review the developed code and make sure it adheres to the design, standards and guidelines of the clients and Virtusa.

Providing the solution to fix and support the priority bugs in TEST, SIT, UAT and Pre-Production environments.

Following the Agile methodology like Daily Standup meetings with our Scrum Master, Status call with clients and Defects Triage Meetings.

Communicated with onsite coordinator for requirements understanding and clarifications.

Reviewing performance and code quality of the application.

Designed and developed base classes, framework classes and common re-usable Components.

Participate in meetings related to project management (with the client) and related to technical deliveries.

Deployment support for minor/major releases.

Manage onsite incidents.

Educational Qualifications

M.C.A. (Master of Computer Applications) from IGNOU,India

Hadoop Certification from Smplilearn

Spark and Scala Certification from Big Data University

DOEACC ‘A’ Level certification in Computer programming, India

PROFESSIONAL EXPERIENCE

IBM WATSON HEALTH ANALYTICS

Hadoop Technical Lead

August 2016 – Present

DataLake - NCHA

The NCHA is a non-profit organization and provides multiple types services to North Carolina hospitals and healthcare organizations. This entity fosters collaboration between healthcare providers, organizations and agencies through various kinds of programs, services and initiatives, and promotes improvements in the quality of affordable healthcare in NC through its various informational and educational programs.

Responsibilities

Act as overall technical authority for the project.

Manage all managed services teams and provide technical leadership

Develop Map Reduce jobs in cascading for data cleansing and data-processing to be extracted to data-mart.

Create Hive tables and writing Hive queries for data processing and analysis.

Worked on Spark SQL and Spark Streaming using Scala

Write pig scripts for data cleansing.

Documented the systems processes and procedures for future references.

Responsible for moving the clinical streaming data from source to HDFS through Flume.

Validating the Hadoop log files in case job failures and data drop out.

Wrote workflow.xml for scheduling Oozie workflow.

Wrote shell scripts for validating different checks on data through running hive queries.

Wrote Sqoop Jobs for moving data from the HDFS to Oracle DataMart.

Worked with gradle for deployment of the Hadoop Jobs through Knox to DataLake.

Worked with Jenkins.gradle file for creation of Jenkins jobs to be used for the deployment of the different

Workflow/coordinator Hadoop Jobs to DataLake in different environments(Dev/Tst/AT/Prod)

Using code versioning using GIT.

Worked on versioning the code in artifactory and promoting the same through the Jenkins jobs.

Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner.

Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.

Gather requirements and identify requirement gaps.

Design the application, participate in design discussions, and review design artifacts.

Handle client communication regarding requirements, design, etc.

Review the developed code and make sure it adheres to the design, standards and guidelines of the clients and VirtusaPolaris.

Support the onsite team on technical issues.

Reviewing performance and code quality of the application.

Following the Agile methodology like Daily Standup meetings with our Scrum Master, Status call with clients and

Defects Triage Meetings.

Environment: Core Java1.8, Cascading Framework, Sqoop Framework, Flume, Spark, Scala, Hive, PIG, Map-Reduce, HBase, Shell-Script, Hortonworks Hadoop, Ranger, Oozie Framework, Knox, HDFS, IntelliJ14.2, TortoriseHg and Jenkins

VANGUARD

Senior Hadoop Developer

January 2015 - July 2016

Fund Data Analysis

The Vanguard Group is an American investment management company that manages approximately $3.0 trillion in assets. It is the largest provider of mutual funds and now the second-largest provider of exchange-traded funds (ETFs) in the world. Performed analysis on huge data sets and helped the organization get a competitive advantage by preparing the data for different applications for the Portfolio analysis, Funds Comparison trends and Log Analysis. The project involved using various user data for people across US to do the analysis for their impact on different funds holdings. Data was in Excel files, CSV, text where Map- Reduce program and PIG was used to get the specific data required and moved to HDFS for Hive to do Analysis and purge with the holdings data. Further Insurance Funds holdings data was imported from Oracle database to HDFS through SQOOP. Hive is used to do the analysis from both inputs and it gives an output as csv file, which is consumed by R language to do the further computations on the data and display the results of analysis in form of graph, charts etc

Responsibilities

Involved in loading data from RDBMS into HDFS using Sqoop.

Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for

Further analysis.

Developed multiple Map Reduce jobs in java for data cleansing and preprocessing.

Involved in writing pig scripts and hive QL.

Done POC to Configure Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala

Involved in creating Hive tables, loading with data and writing Hive queries for data processing and analysis.

Responsible for moving the data from source (Oracle) to HDFS.

Gained experience in managing and reviewing Hadoop log files.

Involved in scheduling Oozie workflow jobs.

Responsible for developing data pipeline using flume, Sqoop and Pig to extract the data from weblogs and

Store in HDFS.

Involved in code promotion using SVN.

Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner.

Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.

Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.

Extensively used Agile Practices for Iteration Planning, Time Estimation, Development and Delivery.

Environment: Hadoop, HDFS, Map-Reduce, Hive, Pig, Hbase, Spark, Scala R-Language, JDK 1.6, Oracle, Sqoop, Hue, MrUnit, Log4j, Eclipse IDE, Apache Poi.

GE TRANSPORTATION

Team Lead

April 2013 – December 2014

GET Billing

The project is to build a web based billing application for GE Transportation team, to be able to bill their customers using the existing database. With help of this application, a centralized system is developed which would have all the info related to the customer contracts, locos, overhauls, escalation, mileage and daily rates etc. This application will be used by various teams in GE Transportation like Finance modelling Team, Commercial Risk team, Transactional Risk team, CMR team, Operation Team, DW team and IT Team.

Responsibilities

Developed multiple Map Reduce jobs in java for data cleansing and preprocessing.

Involved in loading data from RDBMS into HDFS using Sqoop.

Involved in writing pig scripts and hive QL.

Involved in loading data from various sources into HDFS using Flume.

Responsible for the development activities at offshore of the Project, doing the designing of the technical flow and review the progress of the team.

Responsible for all the activities from requirement analysis, Quality control and coordination between functional and development team.

Worked on the Spring Framework classes and designing of system classes

Customer interaction related to project and auditing the bugs and fixes.

Milestone Tracking, Defect Management, Resource Utilization and Tracking Team Progress

Environment: Hadoop, HDFS, Map-Reduce, Hive,Flume, JDK 1.5, JSF, Servlets, A4J, JSTL, Spring, SOAP, Web Services, HTML, CSS, Restful WebServices, Java Script, Jboss Server, Oracle(Database), MyEclipse, Junit, Log4j.

Previous Projects

Project

SMART OTR (Real Time Cockpit), GE Oil & Gas

Role

Sr. Software Developer

Duration

Nov-2010 to March-2013

Domain

Manufacturing

Technology

XML, XSLT, JUnit, spring, Oracle 9i, JBoss Portal Server, Tomcat, Web-Services.

Project Description

Smart OTR is a web based system

Sub Module:- Quality Metrics

For uploading of Excel files to system with the computations.

Responsibilities

•Responsible for the development activities of the modules assigned, with the team aligned and as per the requirement specifications.

•Responsible for all the activities from requirement analysis, Quality control.

•Customer interaction related to project and auditing the bugs and fixes.

Milestone Tracking, Defect Management, Resource Utilization and Tracking Team Progress

Project

Service Outsourcing, Service Power

Role

Sr. Software Developer

Duration

Dec 2008 - Oct 2010

Domain

Service Industry

Technology

JDK 1.5, JSP, Servlets, Struts, Hibernate, HTML, CSS, Java Script, Tomcat 5.5, Web Sphere, Oracle(Database), MyEclipse.

Project Description

This portal consists of the management of customer’s orders and with the client management and handling of ASP’s at various locations who perform the job with tracking of the order with help of BPO staff. In this when a user purchases the product it also purchases the services along with that and those services are being handled by service power through the different ASP’s enrolled along with them at different service areas handling different service catalogs. Regulatory defines the different rules by which the whole process is governed.

Responsibilities

Responsible for the development activities of the modules assigned, with the team aligned and as per the requirement specifications.

Responsible for the integration with pay pal, implementation of https and many other application level handlings.

Responsible for all the activities starting from the requirement analysis, designing.

Responsible for the estimations, done on the component based model.

Worked on UI Specification sheets and BRD documents.

Customer interaction related to project.

Project

OP Plan, Genpact

Role

Software Developer

Duration

Sep 2008 - Nov 2008

Technology

JDK 1.5, JSP, Servlets, Struts, Hibernate, HTML, CSS, Java Script, Tomcat 5.5, Web Sphere, Oracle(Database), MyEclipse.

Project Description

The portal is used by the genpact to view it’s P&L statement for the different verticals and horizontals and combinations of both.

Responsibilities

Responsible for the development activities of the modules assigned.

Involving in system/integration testing

Ensure that final deliverables confirms to requirements

Preparing Technical specifications

Project

GEFanuc Portal, GE

Role

Software Developer

Duration

April 2008 - Aug 2008

Technology

JDK 1.4, JSP, Servlets, Struts, Ajax, Hibernate, HTML, CSS, Java Script, Tomcat 5.5, Web Logic, Oracle(Database), MyEclipse, SiteBulder, Interwoven, XML

Project Description

The portal GE Fanuc Intelligent Platforms is used by GE for providing support to it’s customers around the globe to stay competitive by continually adding electronic intelligence to their products and processes.

GE Fanuc Intelligent Platforms goal is to supply the computer brainpower, and enable its customers to gain and maintain a competitive advantage.

This portal is mainly into embedded systems, automation and product management, discovering about the cutting edge products for CNC applications.

Responsibilities

Responsible for the development activities of the modules assigned.

Preparing technical specifications

Unit testing of the developed modules

Participating in release management

Participating in Integration testing

Project

ACBS Portal, GE

Role

Software Developer

Duration

Nov 2007 - March 2008

Domain

Manufacturing

Technology

JDK 1.4, JSP, Servlets, HTML, CSS, Java Script

Tomcat 5.5, Web Logic, Oracle(Database), Net Beans5.0. Interwoven, Ajax, XML, Lucene Search Engine, open-deploy.

Project Description

This is knowledge portal which is used by the GE to store the SOP(Standard operating Procedures), and to schedule the trainings, which is being assigned to the different users

Responsibilities

Responsible for the development activities of the modules assigned.

Participating in release management

Involved in DD preparation and Coding Phases.

Unit testing of the developed modules

Project

Expert Tracker Portal, Genpact

Role

Software Developer

Duration

May 2007-Oct 2007

Domain

Service Industry

Technology

JDK 1.4, JSP, Servlets, HTML, CSS, Java Script

Tomcat 5.5, Web Sphere6.0, Oracle(Database), Net Beans5.0.

Responsibilities

-Responsible for the development activities of the modules assigned.

-Participating in release management

-Involved in DD preparation and Coding Phases.

-Analyzing functional specifications

Project

Knowledge Portal, Genpact

Role

Software Developer

Duration

Aug 2006 - April 2007

Domain

Service Industry

Technology

JDK 1.4, JSP, Servlets, HTML, CSS, Java Script

Tomcat 5.5, Web Sphere6.0,oracle(Database), Net Beans5.0. Lucene Search Engine

Responsibilities

Responsible for the development activities of the modules assigned.

Participating in release management

Involved in DD preparation and Coding Phases.

Analyzing functional specifications

Work Experience

Virtusa Corporation

Designation: Sr. Consultant

Period: 26th July 2016 - till Date

UST Global LLC

Designation: Sr. System Analyst

Period: 16th Dec 2014 -25th July 2016

Genpact US Software

Designation: Consultant

Period: 29th-Jan-2014 till 15th Dec 2014.

Genpact India

Designation: Consultant

Period: 31-08-2006 till 28th-Jan-2014.

Contact this candidate