Resume

Data Sales

Location:

Pune, Maharashtra, India

Posted:

March 10, 2020

Contact this candidate

Resume:

KASHINATH WALIKHINDI

CONTACT DETAILS: +91-985******* / 917-***-****

adb8ix@r.postjobfree.com /adb8ix@r.postjobfree.com BIG DATA EXPERT

Career Objective : My goal is to learn new technologies to fulfill personal goals and Putting my skills in place to grow the organization to newer heights. Total IT Experience : 14 Years (3 + years in Big Data) Profile : Big Data and Hadoop Expert

Big Data- Spark, Scala, Hadoop, MapReduce, Pig, Hive, Sqoop, Kafka, Java, Python, and Mongo db.

Linux, HP UNIX, Shell Scripting, Python, C, C++, Core Java, Oracle, SQL, PL/SQL.

Areas of Expertise:

Big Data Echosystems : Hadoop, Spark, Spark SQL, Spark Streaming, MapReduce, HDFS, Pig, Hive, Sqoop, Oozie.

Progmanning Languages : Scala, Python, PySpark, Scala, Java, C, C++ Scripting Languages : Unix Shell scripting, SED, AWK Databases : Oracle (SQL, PL/SQL), MySQL, Postgres, MongoDB Operating Systems : Linux, UNIX, Solaris, and Windows AWS cloud : S3, SNS, Lambda, EC2, beanstalk, and EMR PROFESSIONAL EXPERIENCE:

Ideas to Impacts (2019 September to Present) – Pune, MH(India) – Big Data Architect Ideas to Impacts Pvt. Ltd. (I2I) is a product-based software company, in Pune. Mphasis (2007 to Sept 2019) – Pune, MH (India) – Technical lead Mphasis is an IT service management company, was acquired by EDS in 2006 and operated the company as an independent HP/EDS unit and recently sold its major HPE stakes to Blackstone Group LP.

Synetairos Technologies (2004 to 2007) – Pune, MH (India) – Sr. Analyst Synetairos Technologies is a software services firm specialized in Banking, Finance, and Insurance. Synetairos is a subsidiary of Kale Consultants and provides consulting software services. It has branches in Pune, Mumbai, Bangalore, and Chennai. EXPERTISE

Application

Development

Big Data, Spark and

Scala

Hadoop, HDFS,

Pig, Hive, Sqoop,

Flume, Kafka

Python, Java, C,

C++

UNIX, Linux,

Solaris, Unix Shell

scripting, AWK,

SED

Oracle, SQL,

PL/SQL

Data Modelling

Software

Development

Lifecycle (SDLC)

Team Building &

Leadership

CERTIFICATIONS

Big Data Hadoop and Spark

Developer certification

https://certificates.simplicdn.net/sha

re/335463.pdf

K a s h i n a t h _ n w @ o u t l o o k . c o m + 9 1 9 8 5 0 8 1 2 5 7 7 Responsibilities:

• Communications with Client and meetings.

• Strong Experience in designing and developing applications in Spark by using Java and Scala.

• Experience in developing applications in Spark SQL, Spark Streaming, and Spark ML lib.

• Strong experience in the Software Development Life Cycle (SDLC) phases which include Analysis, Design, Implementation, Testing, and Maintenance.

• Extracted the data from RDBMS (Oracle, and MySQL) into HDFS and transferred processed data to Oracle by using Sqoop.

• Created Sqoop jobs with incremental load to populate Hive External tables.

• Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.

• Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

• Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.

• Experience in using ORC File, AVRO and Parquet file formats.

• Strong working experience on Linux, Solaris, and HP-UNIX.

• Expertise in UNIX Shell scripting, AWK, SED scripting.

• Automation by developing a UNIX Shell scripting.

• Conducted trainings to team on "UNIX and Shell scripting" and mentored /helped them in various trouble shooting activities.

• Developed UNIX shell script tool that is used for generating standard UNIX shell script templates, this would make shell scripting task easier and faster. It reduces 30% to 40% of shell scripting work.

• Strong working experience in Java, C, C++.

• Experience in Python scripting.

• Experience in MySQL, Oracle, SQL, PL/SQL.

• Automated GES data load activity by writing Python scripts, PL/SQL, and UNIX shell scripting.

• Responsible for creating and modifying the PL and SQL procedure, function according to the business requirement.

Spark and Scala certification

https://certificates.simplicdn.net/sha

re/341722.pdf

Apache Kafka certification

https://certificates.simplicdn.net/sha

re/473155.pdf

https://udemy-

certificate.s3.amazonaws.com/pdf/U

C-VB36KVX6.pdf

Apache Kafka - Cluster

Setup & Administration

https://udemy-

certificate.s3.amazonaws.com/pdf/U

C-841A69CZ.pdf

EDUCATION

B. E. Computer Science,

UBDT Engineering,

Davangere, Karnataka,

India.

P a g e 2

PROJECT DETAILS

Project Name : DataBuck (FirstEigen, Chicago US)

Technology :

• Apache Spark, Spark SQL, Hive, Java, Hadoop, HDFS, Kafka, Java, Oracle DB, and Postgress DB, WinSCP, Putty.

• AWS cloud - S3, SNS, Lambda, EC2, beanstalk, EMR.

• Unix Shell scripting.

• Linux, HDFS cluster.

Description :

FirstEigen is an US based company in Chicago city. Databuck is one of main product of FirstEigen. DataBuck is an autonomous, self-learning, Big Data Quality validation and Data Matching tool. Machine Learning is used to simplify the elaborate and complex validations and reconciliations. DataBuck learns about the Data Quality behavior and models it using advanced Machine Learning techniques. It develops Data Quality Fingerprints at multiple hierarchical levels accounting for multiple data cyclicality patterns and monitors incoming data for reasonableness. Machine- learning capabilities then autonomously set validation checks w/o manual intervention. It is built ground-up on a Big Data platform (Spark) and is >10x faster than any other tool or your own custom scripts. Autonomous Machine Learning enables the tool to be set up and working on multiple data sets in just 3 Clicks, with no coding needed. DataBuck can accept data from all major data sources, including Hadoop, Cloudera, Hortonworks, MapR, HBase, Hive, MongoDB, Cassandra, Datastax, HP Vertica, Teradata, Oracle, MySQL, MS SQL, SAP, Amazon AWS, MS Azure, and more.

Roles/Responsibilities :

• Communicate with Customer and meetings.

• Requirement analyses.

• Provided design recommendations and thoughts leadership to sponsors/stakeholders

• Developing a Apache Spark applications using Java for data processing data.

• Developing Java and Unix shell scripts for running jobs as per given rules.

• Running and Testing the applications on AWS EC2 cluster.

• Creating AWS EC2 cluster and Installing Databuck application on EC2. Project Name : Advisory Cloud Data Delivery (ACDD, Charles Schwab Bank, US) Technology :

• Apache Spark, Spark SQL, Hive, Java, and Unix scripting Hadoop, HDFS, Java, Spring Boot, Spring Cloud, Maven, Bitbucket, Bamboo, WinSCP, PuTTY, JIRA, Confluence.

Description :

The Charles Schwab Corporation is a bank and stock brokerage firm based in San Francisco, California. It was founded in 1971 by Charles R. Schwab. It is ranked 13th on the list of largest banks in the United States and it is also one of the largest brokerage firms in the United States. The company offers an electronic trading platform to trade financial assets including common stocks, preferred stocks, futures contracts, exchange-traded funds, options, mutual funds, and fixed income investments. It also provides margin lending, and cash management services, as well as services through registered investment advisers. We developed an application to process Capital Market data of Advosory service of Charles Schwab Bank. The data is processed by using Hadoop tools like Apache Spark, Spark SQL, and Hive on YARN P a g e 3

cluster. We process customer and their Advisor data daily. The output data is used by downstream applications for futher processing.

Roles/Responsibilities :

• Communicate with Customer and meetings.

• Requirement analyses.

• Provided design recommendations and thoughts leadership to sponsors/stakeholders.

• Developing an Apache Spark and Spark SQL applications using Java for data extraction, and processing data.

• Develop Spark SQL queries and Hive Queries.

• Scheduing and running applications.

• Testing the applications on Hadoop YARN cluster. Project Name

Technology

: Bank data Analysis

• Apache Spark, Scala, and Sqoop

• Linux, Hadoop, HDFS cluster

Description

The bank's direct marketing campaign project to increase deposits and identify the potential customers which facilitates the various parameters to assess marketing success/failure rates, customer qualities, behaviors, and deposit scheme subscription trends based on their age, marital status and other interests.

Roles/Responsibilities :

• Developed a Apache Spark applications for data cleaning, preprocessing, and processing data.

• Provided design recommendations and thought leadership to sponsors/stakeholders that resolved technical problems.

• Managed and reviewed Hadoop log files.

• Tested raw data and executed performance scripts. Project Name

Technology

: Warren, OH - DELPHI Engineering, US

• Languages : Java, Python, Scala, Unix, Shell Scripting, JDBC, and weblogic.

• Operating System: HP-Unix.

• Data Base: ORACLE 9i, SQL, PL/SQL

• Hadoop, Spark, MapReduce, HDFS, Pig, Hive, Sqoop, Flume. Description

The company provides combustion systems, electrification products and software and controls, and operates in the passenger car and commercial vehicle markets, and in vehicle repair through a global aftermarket network. The company has 20,000 employees including 5,000 engineers. It is headquartered in London, U.K. and operates technical centers, manufacturing sites and customer support services.

P a g e 4

Delphi application - DGSS (Delphi Global Sourcing System) is a global system to allow Delphi to procure materials from other suppliers. The software is a Supply Chain Management tool; allowing for a single repository to manage part/supplier/contract/customer information. The system manages available suppliers and sourcing process. It creates requests for quotes from suppliers. DGSS system receives large amount of data for Part, Supplier, Customer, and Contract information daily and weekly. So need to process the huge data using Hadoop Map Reduce and Spark, load processed data into the database using Sqoop so that user can generate reports to analyze the data. Also process the log files using Flume and Hadoop Map Reduce. Delphi uses these data to –

• Analyze the heart beat of market.

• Analyze the requirements of Parts in the market and a territory wise.

• Analyze the Customer feedbacks, suggestions and requirements.

• Analyze the Supplier data.

Roles/Responsibilities :

• Meetings and Communicate with Customer.

• Extracted the data from RDMBS(Oracle) into HDFS using Sqoop.

• Created and worked Sqoop jobs with incremental load to populate Hive External tables.

• Provided design recommendations and thought leadership to sponsors/stakeholders that resolved technical problems.

• Developed MapReduce jobs for data cleaning, preprocessing, and processing data.

• Managing and reviewing Hadoop log files.

• Tested raw data and executed performance scripts.

• Production Support and maintainance.

Project Name : Aadhaar dataset analysis - Enrollments process detail Technology :

• Apache Spark, and Scala

• Linux, Hadoop, HDFS cluster

We need to analyzing a dataset from Aadhaar card - a unique identity issued to all residents of Indians. The issuing authority - UIDAI provides a catalog of downloadable datasets collected at the national level. Our dataset of interest is "Enrollments processed in detail". We have to analyze the data based on verious catagories like state wise, district, Agency, and gender wise. Highest and lowest enrollments by using Spark Core and Spark SQL. Roles/Responsibilities :

• Developed a Apache Spark applications cleaning, preprocessing and processing the Aadhar dataset.

• Provided application design details to sponsors/stakeholders that resolved technical problems.

• Tested raw data and executed performance improvments.

• Managing and reviewing Hadoop log files.

P a g e 5

Project Name

Technology

: Movie Review Analysis

• Apache Spark, and Scala

• Linux, Hadoop, HDFS cluster

Description

IMBD is an online database of movie-related information. IMBD users rate the movies and provide reviews. They rate the movies on a scale of 1 to 5; 1 being the worst and 5 being the best. The dataset also has additional information, such as the release year of the movie, runtime. We have to analyze the data and find following information.

•

The total number of movies

The maximum rating of movies

The number of movies that have maximum rating

The movies with ratings 1 and 2

The list of years and number of movies released each year The number of movies that have a runtime of two hours Roles/Responsibilities :

• Developed a Apache Spark applications cleaning, preprocessing and processing the Movies data.

• Provided design recommendations and thought leadership to sponsors/stakeholders that resolved technical problems.

• Tested raw data and executed performance scripts Project Name : GES Data Migration - DELPHI Engineering, US. Duration : 2 years

Technology :

Languages : Python, Shell Programming, and Perl scripting. Operating System: HP-Unix.

Data Base: ORACLE 9i, SQL, PL/SQL

Description :

Working on various data management activities with customized TcE application called (GES). The activities including data loading, extraction, migration, cleanse, repair and rework to and from Oracle database.

Role/Responsibilities :

• Written UNIX shell script to Automate Data Loading from file into Database.

• Leading Data migration activities for the PLM tool (TcE) from Offshore

• Involving in requirement analysis, estimation, coding using SQL, PL/SQL, Python, Perl and UNIX Shell Scripts, Database performance tuning, troubleshooting business / technical aspects and deploying high-end quality deliverables

• The data migration projects includes

- Bearing Divestiture – Data Extraction and Reporting

- Diesel Data Migration (AS400 to TcE PLM) – Data Migration

- Logical Separation of Suspension Data from TcE PLM – Data Extraction and Loading - Europe Wave 2/ Wave 3 Data Migration – Data Loading - ECAD MIDAS Sherpa Shutdown – Data Loading

P a g e 6

-Global Exhaust Divestiture – Data Extraction and Reporting

• Involving in data migration, conversion, extraction, cleansing and repair of customized TcE PLM data

• Mentoring team members in doing performance tuning and Query optimization

• Created and modified python, perl / shell scripts for various projects

• Handling Database creation / configuration, capacity planning, SQL query tuning and backups

• Conducting trainings, code reviews, knowledge sharing on new techniques to members and documentations

Project Name

: UNIX Shell Script Tool: Generate Shell Script Template (Unix) Duration

: 1 years

Technology

: Unix, Linux, Unix Shell scripting.

Written a UNIX shell script tool used to generate standard shell script templates. Purpose: This script is written to generate standard and more structured shell script template, this would make shell scripting task easer and faster. It reduces 30% to 40% of shell scripting work. Easy to debug and would made maintenance work easer. The shell script template has been designed considering following modules:

• Start: It checks and make sure all required information provided.

• Initialize: Do all initialization such as create log directories, log file, etc.

• Process: Here if script is using database then it provides database connection information.

• Terminate: It provedes how the job terminated, whether succefully processed, or terminated due to any external signals, permission issue, etc. Features:

• It gives following features.

• Connection to database.

• Handling log information.

• Command Line Processing.

• Signal handling

The shell script template has been designed considering following modules: Start:

This section helps to user how to run the script. If the user is not aware of how to use the script, then this helps. There are two modules:

Usage: The script will check the syntax, if user is not running the script correctly, then it will display usage of script.

Help: User can know what script will do by using help option. This tells some brief information about the script and its usage.

Initialize:

Here the programmer can initialize global variables, create temporary files, directories, etc. Process:

P a g e 7

Here programmers do actual coding of the script, they can add their own functions. If the programmers want to use command line options and arguments to in their code, this tool will ask information and generate code for this.

• Handling logs - a function writes log information with time stamp into a given log file.

• Connecting to Database – two functions are provided to execute SQL command. One function executes given SQL command and other executes a given SQL file. Termination:

The program can terminate successfully or by receiving any terminate signals. If the program terminate by receiving signals then clean up is required before exit. Project Name

: C2P – Intrepid SPRINT Account (Finance & Telecom domain US) Duration

: 2 years

Technology

• Languages: C, C++, Python, and UNIX Shell Programming

• Operating System: Sun Solaris 8

• Data Base: ORACLE 8i

Description

Intrepid called as C2P is incentive compensation system which drives the performance and profitability of the sales force of the Sprint Organization for increase the company's revenue and market share. As incentives for meeting Sprint's business objectives, sales and sales support staff receive monthly, quarterly and annual incentive compensation for various kinds of revenue generated and products sold. Presently Sprint Business, WSG, GSD, Inside Sales, Sprint Local BWM, Sprint International and PCS are being compensated. At present More than 6,000 Representatives Compensated. Currently the Application Supports Over 350 Compensation Plans. This Application is used by 40 Compensation Analysts - Compensation Administration, 10 Financial and Systems Analysts - Plan Implementation, 5 Business Systems Analysts – Production Support and more than 6,000 Sales Representatives. Approximately 18,000 statements and reports are produced monthly for Sales Compensation Analysts and Sales Representatives. EDS India Solution Centre C2P project team provide Production support for C2S system and develop changes to the existing system based service requests / change request from the client. Roles/Responsibilities :

• Communicate with Customer

• Requirement analysis, estimation

• Designing, coding, code review, and testing

• Production Support and maintainance

Project Name

: Real Time Asset Management (RAM)

Duration

: 2 Years

Technology :

• Languages: C, C++, STL, Python, and Shell Programming, IPCs- Pipes and Signals, and Java.

• Operating System: Linux

P a g e 8

Description :

End users use equipments from an Equipment Manufacturer (EM) for production. Typically, such equipments are costly and need regular maintenance as well as breakdown support from EM’s Service Engineers.

RAM system is responsible for providing maintenance intelligence and performance information of the Equipment at the End User site. The equipment at the End User site can break down for various reasons like

Lack of preventive Maintenance

Lack of monitoring important operation parameters from the maintenance perspective. Specific operational problem such as unavailability of input material. In case of an Equipment failure (Incidence), the Service Engineer would typically visit the End User site to solve the problem. Such visits to remote sites are costly. Moreover the traveling time adds to the grievance of both EM and the End User.

Purpose of the system

The primary purpose of the system is helping the equipment user and the manufacturer in several ways. Major focus is provided on incidence management and incidence avoidance. This can be achieved by:

Monitoring Equipment status and therefore health remotely.

Remote diagnosis of Equipment faults.

Remote analysis of past (historical) data leading to early warnings and prognostic alarms.

Make high-level performance comparisons to understand deviations from design performance and use the information to derive early indications useful for preventive maintenance (predictive maintenance).

Important features:

1. The current status of the equipment is available remotely. This includes current parameter values and alarms.

2. The historical data associated with the equipment is stored and made available centrally. 3. Historical data is available for analysis. Such analysis can be carried out using.

Reports generated out of the system.

Raw data available in the system.

Qualitative trend analysis.

Role/Responsibilities :

• To look into the complete functionality of the software.

• The software Planning and Designing.

• Coding, code review, and Testing.

Project Name

: Automated Call Center for Medical Transcription

Duration

: 1.5 years

Technology :

• Languages : C, C++, and Shell Scripting. Linux Internals, Linux IPCs

• Operating System : LINUX

• Data Base: Postgres

P a g e 9

Description

Here the main goal was to have fully automated data transfer from the doctor to the transcription and back. The doctor calls into the system and records his voice. This voice file is then sent from the US site to the site in India, which is supposed to be the gateway for the transcription here in India. This file is then transferred to the Transcription where the transcription takes place. The file that has been transcribed and quality assured, reaches the India gateway site. This file is then send back to US from which it is dispatched to the customer via. E-mail, fax etc. Whole process is fully automated. No intervention of any person is required and all the information of the file is stored in the database at each hop it takes. Role/Responsibilities :

• To look into the complete functionality of the software.

• The whole life cycle of the software viz.

• The software planning and development.

• Coding, code review, and testing.

Project Name

: Telephone Billing System

Duration

: 1 Years

Technology

•

Languages: ‘C’ Language, RS-232_C, Plugs and Sockets Description

The data is captured through a computer, which is interfaced using the RS-232_C Serial Interface to the Telephone exchange and stored in a file. Then using this file, the calls are separated and stored in their respective files. From these files the data is exported to the database for report processing, including the cost of the calls.

Using this software, daily reports, and cost of the calls made can be generated. Role/Responsibilities :

• To look into the complete functionality of the software.

• The software Planning and Designing.

• Coding, code review, and Testing.

P a g e 10

Contact this candidate