Data Engineer

Location:

Vasant Nagar, Karnataka, India

Posted:

January 17, 2021

Contact this candidate

Resume:

Shivank Data engineer

EXPERIENCE SUMMARY

Technical Stack:

Work Highlights:

WORK EXPERIENCE (9 YEARS)

American Express (Engineer I) – Aug 2019- present

Ernst and Young (Supervising Associate)- Feb 2019- Aug 2019 SanDisk India Device Design Centre (WD) (Bigdata Lead Engineer) Mar-2017 – Jan 2019 Accenture (NOV-2014 – MAR-2017) (Sr. Software Engineer) HP Autonomy (NOV-2011 - OCT2014) (Software Engineer) EDUCATIONAL QUALIFICATION

B.Tech(CS), Mangalyaan University (Aligarh), UP

Languages

•Java, SQL, HTML, Scala,

Python

Hadoop-

Technologies

•Spark, Impala,

Map-Reduce,

Hive, Pig, Sqoop

Hadoop Flavors

•Cloudera (CCH

5.12)

Frameworks

•REST, SOAP

Databases

•MySQL, Oracle,

DB, Presto DB

IDE Tools

•Eclipse, SQL

Developer, SoapUI,

Scala IDE

A Technically Adept and

Confident Software Engineer

with overall 8.9 years of

experience in design,

development, analysis, and

support of Big data

applications with multi-cross

platforms for Multiple global

client

Exposure in Hadoop eco system and its

technologies like SPARK, Map-Reduce, Hive,

Sqoop, Pig and Casandra, Talend etc.

Worked on data pipeline

automation using airflow.

Knowledge in healthcare, resources-

utilities, manufacturing and digital sectors.

Well versed experience in Cloudera and

Hortonworks Hadoop distribution.

2.6+ years of Experience on

AWS and azure cloud services.

Worked on Banking (Citibank and BOFA),

HealthCare (Boston Science) and Manufacturing

(Western Digital)

PROJECT EXPERIENCE

American Express:

Offer and merchant personalization:

Implemented the end to end pipeline using spark for extracting the offers and merchant data for multiple source systems.

Worked on UDF in hive for encrypting and decrypting stored in hive.

Worked on scoring the offer information using machine learning catboost algo.

Automated the batch job process for weekly and monthly jobs.

Worked on Realtime processing api for ingesting data in Hbase and elastic search EYLENS

Lens is a consumer-grade mobile app that allows all EY users to look into their Outlook data (email, calendar and contacts) in new and insightful

way. Lens is designed to aid the user’s personal productivity with actionable and contextual insights – available when most needed.

Key Contributions:

Built the end to end real time streaming data ingestion platform using Apache Kakfa, Spark Streaming and Apache NiFi

Involved in coding and design - Spark Streaming using Java

Analyzed the data from Neo4j, generated actionable insights (Cards) based on Business requirements

Implemented multiple Flows in Apache NiFi using processors like HandleHttp, ConsumeKafka, Publish Kafka, Put Mongo etc

Managed the Dev environment (Infrastructure setup-Planning & implementation)-Creation of HDP Cluster, Vm’s,Installed Neo4j cluster,MongoDB

Involved in Performance testing of Nifi-Increased the Performance by reducing Lineage Time

Worked with spark -sql,data Frames to generate insights by reading data from Mongo DB/Neo4j DB and storing into Parquet

Involved in Architecture and Design of the Project

Involved in tuning the spark parameters -Streaming/Batch jobs

Implemented one of the spark-streaming to save records in Neo4j

Involved in Complete Deployment and Documentation for Dev,QA,UAT,Production Environment

Implemented Azure Load balancers,Azure Key-vault in Dev,QA Environments

Involved in Creation of Version 2 of Scoring Algorithm-increased performance by using effective caching mechanism

Involved in creation of Oozie workflows, Coordinators in Dev,QA,UAT,Prod Environments

Implemented AD authentication mechanism using Nifi

Performed various levels of unit testing and system testing.

Involved in estimation of Stories, Planning and creation of stories for sprints(Agile),Presented weekly demos to the client

Involved in Fixing/resolving Bugs

Company: Western Digital

Project: Defectnet

Description: The project is to implement the data pipeline for the drive during the manufacturing process and to do data analysis on the drive failure Project: VORT and ORT

Responsibilities:

Worked on building various solution using hive and spark to score the drive during the various test phases.

Build streaming layer using kinesis for predicting the live scores for the drive in VORT and ORT test phases.

Monitoring various cron based pipelines running on EC2 server. Project: GrabnGo

Responsibilities: Worked on build API using flask module in python Company: Accenture

Project: Real Time Graph Analytics

Client: Product Development

Description: The project is designed to load structured data {csv files} from SFTP/S3 to graph database {Neo4j/Titan} in real time.

Responsibilities:

Implementation of python producer code to read data from SFTP and put in Kinesis Developed a spark code to consume data from kinesis in real time Designed Scala logic to transform the real time data into graph insert queries Developed a connector code for Titan/Neo4j

Project: Real time analytics framework

Description: Analyzing water sensor information to predict leakage. (Cassandra, Apache Spark, Java)

Responsibility:

Designed and implemented the Kinesis Stream Connector, Kafka Stream Connector and Flume Connector

Developed parser block which extracts the structure of the data entry tossed from the connector, and relays it to the writer.

Company: Accenture

Project: EDW Offload

Client: Product Development

Description: The project is designed to offload the data from Teradata DB to hdfs g HIVE. Hence to handle large volume of data for the analytical purpose. Responsibilities:

o Big DATA/Hadoop Implementation: Responsible for designing/Implementation of hive queries, Indexes views in the HDFS.

o Designed & developed different TD facility in Hadoop, BTQ, Mload, Fload. o Responsible for developing custom defined UDFs

o Sqooping data from Teradata DB.,

o Created Generic Utility to Convert Tera Data DDL Scripts, statistics to Automated Hive DDL Scripts using java.

Company: Accenture

Project: Intelligent Patient System

Client: Healthcare clients in EU and US region

Description: Aimed at reducing readmission of heart patients. Responsibility: Used mirth for data ingestion of the health care data (HL7v2) and to convert from v2 to v3 for further analytics.

Used Mgrid to convert Hl7v3 data to database format which stores in postgres. Worked on data migration using talend form postgres to Cassandra Company: Hp Autonomy

Project : Supervision (S6)

Client : Citi Bank, BOFA and Deutsche bank

Role : Hadoop Developer

Project Description:

Digital Supervisor enables corporate compliance officers to review, categorize, and-if necessary-act upon the content of digital communications, such as email messages, email attachments, and instant messages, according to the requirements of various regulatory agencies. Digital Supervisor provides a Web interface through which compliance officers can review copies of messages sent or received by individuals according to assigned employee supervisor group(s).

Roles and Responsibilities:

Design & Develop solutions using Hadoop to tackle big data, information retrieval & analytics problems.

Create & update Hive schema to support a highly-available data warehouse for BI teams.

Understand customer analytic needs and translate them into pluggable Hadoop Hive

Automated several jobs for pulling data from FTP server and load data into Hive tables.

Contact this candidate