Data Engineer

Location:

Seattle, WA

Posted:

January 10, 2024

Contact this candidate

Resume:

Prashant

Shishodia

DATA ENGINEER

Personal Details

Seattle, 98104

United States

503-***-****

**************@*****.***

Social Links

Github

Technical Skills

Amazon Web Services

Data Pipeline

Data Partitioning

Data Streaming

Data Warehousing

Big Data

Extract Transform Load (ETL)

Cloud Infrastructure

Data Quality

Data Processing

Microsoft Azure

Google Cloud Platform

Data Discovery

Data Security

Data Transformation

Serverless Processing

Data Movement

Profile

Proliwc Data Engineer 7ith K years of experience in design, development and deployment of data architectures in an efwcient and timely manner. Well-versed in application of soft7are frame7orks and state-of-the-art technologies. —ey strengths include being self-driven, achievement-oriented and a bar raiser. Work Experience

Data Engineer, Amazon.com, Seattle

S E P T EMBER 2 0 2 1 • PRESENT

Project 1: 50% reduction in data processing across ETL data pipeline to ynallv improfe rehresB time oh Iusiness (ntelligence )I(b dasB oards H Situation: ;igh batch processing time 7as observed 7hile ingesting 12 datasets from Salesforce APIj the data pipeline consisted of 5 extract qobs to be run daily in se6uence for 5- hours causing superFuous usage of compute resources. Moreover, qob failures caused data delays and posed a risk of breaching the Service-Level-Agreement (SLA) done 7ith BI team.

H Task: Resolution 7as to re-architect the data pipeline using modern ETL techni6ues to fetch data efwciently and deliver results on committed SLA.' H Action: Developed Python scripts utilizing AWS App%lo7 APIs to extract Salesforce datasets and unload them to BI teamJs S3 bucketj established a data pipeline featuring ETL qobs to-be run in parallel, including user authentication, data 6uality checks and error handlingj soft7are development best practices 7ere follo7ed using Code Revie7 and Moduled Chain Management process. H Result: Time taken in data processing and data delivery 7as reduced by approximately 50O,'7hich in turn lessened refresh time of BI dashboards. Project 2: kIuilt an inhrastructure stacW using AwS cloud defelopment Wit to process efent notiycation messages and stream output to data -areBouse in real:time' H Situationq Career Gro7th Action (CGA) data of employees need to be sourced from an API service and made available to data scientists and BI engineers. H Task: Create an ETL data pipeline to ingest data from CGA API service, process and store it in teamJs data 7arehouse in the re6uired format. H Action: Developed infrastructure stack in python to receive notiwcation messages from API in 6ueues and process VS&N obqects via Lambda functionj Fattened columnar data 7as transformed via qoins 7ith various product-end tablesj output 7as streamed and loaded into Redshift tables in near real-time. H Result: Data scientists and BI engineers 7ere able to access CGA data in the desired format for the purpose of creating ML models and business reporting.' Project 3: k(mplemented data 'ualitv cBecWs ased on customersx heed acW and e1pedited product launcB v C montB

H Situationq Prior to product launch, customer survey reported data 6uality issues at 3 granularities - site, qob-title and qob-level headcount metrics. H Task: Perform root-cause analysis of business logic in code and introduce appropriate 6uality checks across all 3 grains in data pipeline. H Action: Data 6uality checks 7ere put in place on ro7-count and site-headcount threshold, non-zero, non-null criteria, valid cost-center and headcount values. H Result: Customer feedback conwrmed accuracy in data metrics 7hich further resulted in product go-live to happen 1 month earlier than proposed. Data Storage

Data /isualization

&perational Y risk auditing

&rchestration Tools

Infrastructure-as-a-code

Development Tools

SQL

Python

Intelliq

PyCharm

DataGrip

AWS CodeArtifact

CILCD Stack

AWS CodeCommit

AWS CodeBuild

Apache Spark

AWS Codepipeline

Apache —afka

AWS CodeDeploy

Apache %link

Dra7.io

/ersion Control

Dev&ps

Apache AirFo7

Amazon Redshift

AWS Glue

Amazon —inesis

Ploud Engineer, Amazon we Serfices, Uortland

V ULM 2 0 1 9 • AUGUST 2 0 2 1

Project 4: kPost:safings oh appro1. MSD C50,000 acBiefed 'uarterlv v arcBitecting a data laWe hor ehycient storage, data cataloging, retriefal, and user access controlk H Situationq Inefwcient data storage and sub-standard data preparation for ETL 7orkloads 7as resulting in high 6uery cost and execution time. H Task: Build a data lake 7ith 7ell-dewned folder structure, data format, size range and user access based on product-level granularity. H Action: Prepared end-to-end design and implemented data lake in 3 logical data zones named as landing, staging and curatedj cra7lers 7ere setup for automated data catalogingj developed AWS Glue ETL qobs to transform data and collect S3 obqects in stagingLcurated zone in respective product-level path structurej wne-grained user access control 7as put in place to securely retrieve data in S3. H Result: Bucketing, data partitioning and effective storage of data immensely reduced time taken in ETL process via Redshift Spectrum and Amazon Athena. Project 5: kjerged cross:team data -areBouses -itBin tigBt pro&ect deadlines H Situationq BI TeamJs data 7arehouse cluster 7as mismanaged causing huge 6uery run-time and unplanned cluster maintenance. Leadership decided to use Data Engineering teamJs expertise to serve BI needs. H Task: BI data 7arehouse resources should be merged 7ith DE teamJs cluster. H Action: Identiwed and performed all actions needed for migration in co-ordination 7ith stakeholders, i.e., migrating data marts, cluster capacity, Workload Management conwguration, etc.j scaled up cluster nodes post-migration as per load-test resultsj BI teamJs cluster 7as deprecated. H Result: Migration 7as performed successfully 7ithin timelines earning trust of our customers and enhanced inter-team business opportunities. Iusiness (ntelligence Engineer, LOT, Electrical and Automation, jum ai V ULM 2 0 1 4 • MAM 2 0 1 K

Project 6: k(ncreased Rferall E'uipment Ehyciencv )REEb and reduced jean Time to Nepair )jTNb oh 50 P3P macBines across : manuhacturing plants H Situation: Spikes noticed in do7ntime of manufacturing machines and tool breakdo7n across 3 production plants.

H Task: Devise an automated solution for calculation and monitoring of &EE and MTR of production machines and further ensure preventive maintenance. H Action: Designed analytical dashboards to calculate the &EE of 12 CNC machines most active on the production Foorj Managed end-to-end implementation of Plant Maintenance mobile application.' H Result: Weekly reports of &EE calculation sent to all 3 Plant ;eads Y engineersj on an average, &EE got increased by 15O and MTR reduced by 40O in 2 years. Education

jaster oh Science : janagement in (nhormation Svstems V UNE 2 0 1 K • MAM 2 0 1 9

Mays Business School, Texas AYM University, College Station IacBelor oh TecBnologv : jecBanical Engineering

V ULM 2 0 1 0 • MAM 2 0 1 4

Motilal Nehru National Institute of Technology, Allahabad

Contact this candidate