Shivank Data engineer
EXPERIENCE SUMMARY
Technical Stack:
Work Highlights:
WORK EXPERIENCE (9 YEARS)
American Express (Engineer I) – Aug 2019- present
Ernst and Young (Supervising Associate)- Feb 2019- Aug 2019 SanDisk India Device Design Centre (WD) (Bigdata Lead Engineer) Mar-2017 – Jan 2019 Accenture (NOV-2014 – MAR-2017) (Sr. Software Engineer) HP Autonomy (NOV-2011 - OCT2014) (Software Engineer) EDUCATIONAL QUALIFICATION
B.Tech(CS), Mangalyaan University (Aligarh), UP
Languages
•Java, SQL, HTML, Scala,
Python
Hadoop-
Technologies
•Spark, Impala,
Map-Reduce,
Hive, Pig, Sqoop
Hadoop Flavors
•Cloudera (CCH
5.12)
Frameworks
•REST, SOAP
Databases
•MySQL, Oracle,
DB, Presto DB
IDE Tools
•Eclipse, SQL
Developer, SoapUI,
Scala IDE
A Technically Adept and
Confident Software Engineer
with overall 8.9 years of
experience in design,
development, analysis, and
support of Big data
applications with multi-cross
platforms for Multiple global
client
Exposure in Hadoop eco system and its
technologies like SPARK, Map-Reduce, Hive,
Sqoop, Pig and Casandra, Talend etc.
Worked on data pipeline
automation using airflow.
Knowledge in healthcare, resources-
utilities, manufacturing and digital sectors.
Well versed experience in Cloudera and
Hortonworks Hadoop distribution.
2.6+ years of Experience on
AWS and azure cloud services.
Worked on Banking (Citibank and BOFA),
HealthCare (Boston Science) and Manufacturing
(Western Digital)
2
PROJECT EXPERIENCE
American Express:
Offer and merchant personalization:
Implemented the end to end pipeline using spark for extracting the offers and merchant data for multiple source systems.
Worked on UDF in hive for encrypting and decrypting stored in hive.
Worked on scoring the offer information using machine learning catboost algo.
Automated the batch job process for weekly and monthly jobs.
Worked on Realtime processing api for ingesting data in Hbase and elastic search EYLENS
Lens is a consumer-grade mobile app that allows all EY users to look into their Outlook data (email, calendar and contacts) in new and insightful
way. Lens is designed to aid the user’s personal productivity with actionable and contextual insights – available when most needed.
Key Contributions:
Built the end to end real time streaming data ingestion platform using Apache Kakfa, Spark Streaming and Apache NiFi
Involved in coding and design - Spark Streaming using Java
Analyzed the data from Neo4j, generated actionable insights (Cards) based on Business requirements
Implemented multiple Flows in Apache NiFi using processors like HandleHttp, ConsumeKafka, Publish Kafka, Put Mongo etc
Managed the Dev environment (Infrastructure setup-Planning & implementation)-Creation of HDP Cluster, Vm’s,Installed Neo4j cluster,MongoDB
Involved in Performance testing of Nifi-Increased the Performance by reducing Lineage Time
Worked with spark -sql,data Frames to generate insights by reading data from Mongo DB/Neo4j DB and storing into Parquet
Involved in Architecture and Design of the Project
Involved in tuning the spark parameters -Streaming/Batch jobs
Implemented one of the spark-streaming to save records in Neo4j
Involved in Complete Deployment and Documentation for Dev,QA,UAT,Production Environment
Implemented Azure Load balancers,Azure Key-vault in Dev,QA Environments
Involved in Creation of Version 2 of Scoring Algorithm-increased performance by using effective caching mechanism
Involved in creation of Oozie workflows, Coordinators in Dev,QA,UAT,Prod Environments
Implemented AD authentication mechanism using Nifi
Performed various levels of unit testing and system testing.
Involved in estimation of Stories, Planning and creation of stories for sprints(Agile),Presented weekly demos to the client
Involved in Fixing/resolving Bugs
3
Company: Western Digital
Project: Defectnet
Description: The project is to implement the data pipeline for the drive during the manufacturing process and to do data analysis on the drive failure Project: VORT and ORT
Responsibilities:
Worked on building various solution using hive and spark to score the drive during the various test phases.
Build streaming layer using kinesis for predicting the live scores for the drive in VORT and ORT test phases.
Monitoring various cron based pipelines running on EC2 server. Project: GrabnGo
Responsibilities: Worked on build API using flask module in python Company: Accenture
Project: Real Time Graph Analytics
Client: Product Development
Description: The project is designed to load structured data {csv files} from SFTP/S3 to graph database {Neo4j/Titan} in real time.
Responsibilities:
Implementation of python producer code to read data from SFTP and put in Kinesis Developed a spark code to consume data from kinesis in real time Designed Scala logic to transform the real time data into graph insert queries Developed a connector code for Titan/Neo4j
Project: Real time analytics framework
Description: Analyzing water sensor information to predict leakage. (Cassandra, Apache Spark, Java)
Responsibility:
Designed and implemented the Kinesis Stream Connector, Kafka Stream Connector and Flume Connector
Developed parser block which extracts the structure of the data entry tossed from the connector, and relays it to the writer.
Company: Accenture
Project: EDW Offload
Client: Product Development
Description: The project is designed to offload the data from Teradata DB to hdfs g HIVE. Hence to handle large volume of data for the analytical purpose. Responsibilities:
4
o Big DATA/Hadoop Implementation: Responsible for designing/Implementation of hive queries, Indexes views in the HDFS.
o Designed & developed different TD facility in Hadoop, BTQ, Mload, Fload. o Responsible for developing custom defined UDFs
o Sqooping data from Teradata DB.,
o Created Generic Utility to Convert Tera Data DDL Scripts, statistics to Automated Hive DDL Scripts using java.
Company: Accenture
Project: Intelligent Patient System
Client: Healthcare clients in EU and US region
Description: Aimed at reducing readmission of heart patients. Responsibility: Used mirth for data ingestion of the health care data (HL7v2) and to convert from v2 to v3 for further analytics.
Used Mgrid to convert Hl7v3 data to database format which stores in postgres. Worked on data migration using talend form postgres to Cassandra Company: Hp Autonomy
Project : Supervision (S6)
Client : Citi Bank, BOFA and Deutsche bank
Role : Hadoop Developer
Project Description:
Digital Supervisor enables corporate compliance officers to review, categorize, and-if necessary-act upon the content of digital communications, such as email messages, email attachments, and instant messages, according to the requirements of various regulatory agencies. Digital Supervisor provides a Web interface through which compliance officers can review copies of messages sent or received by individuals according to assigned employee supervisor group(s).
Roles and Responsibilities:
Design & Develop solutions using Hadoop to tackle big data, information retrieval & analytics problems.
Create & update Hive schema to support a highly-available data warehouse for BI teams.
Understand customer analytic needs and translate them into pluggable Hadoop Hive
Automated several jobs for pulling data from FTP server and load data into Hive tables.