PRASHANT GAIGAVALE
• San Ramon, Ca • 510-***-**** • ********.*********@*****.***
• LinkedIn • GitHub • Visa: Green Card
Sr. Data Engineer
AWS Cloud Practitioner Certified • AWS ML Specialty Certified • Databricks ML Associate Certified Highly skilled Senior Data Engineer with over 7 years of data engineering experience. Demonstrated success in developing and architecting data pipelines.
Technical Strengths:
Python/SQL/PySpark/Plsql/Spark Databricks/Oracle/Postgress AWS/Azure GIT/Jenkins/Agile Data Modelling/dbt Airflow
Glue/Redshift/DynamoDB TensorFlow/SageMaker Azure Data Factory Soft Skills:
Positive Communication Detail-oriented Analytical Thinking Work Experience
Gap - Consultant
Sr. Data Engineer - Consultant 03/2022 – 09/30/2024
§ Created massive parallel processing pipeline, using Python, Pyspark and SQL on Databricks Spark running on Azure cloud, to migrate Adobe Clickstream’s 138Tb data
§ During migration of clickstream increased data scalability by 800% and achieved 70% performance improvement in migration using Spark optimization and shuffle size tuning
§ Implemented a complex ETL pipeline to generate Weekly Demand metrics using a 4-stage process.
§ Improved data quality by 40% of facts and dimensions when migrating from old source system to Databricks delta lake as source, using SQL and in-depth analysis of the data requirements
§ Achieved 70% savings in storage cost all across database by vacuuming most of the history data
§ Completed POC on creating ETL pipelines using Azure Data Factory and Azure Synapse
Follow CI/CD process using Agile, Databricks, Jenkins and Github.
Collaborated with Data Science and Machine Learning Engineering team for data requirements Self-Entrepreneurship 06/2020 – 02/2022
Utilized time to learn ML and Cloud technologies and worked on a stealth project to create a ML based technology product
William Sonoma
Database Consultant 09/2019 – 06/2020
Improved Oracle SQL query performance by 90% by reducing sizes of audit tables and creating partitions.
Boosted overall Oracle database performance by 70% by setting up regular gather statistic jobs, tuning SQL queries, dropping unused indexes and creating new efficient indexes
Within one-month time period upgraded Oracle production databases from 8i and 9i to 12c that helped company to retain Oracle support for the new versions. Bank of the West
Sr. Data Engineer 09/2015 – 07/2018
Created data pipeline, using Python, Perl, SQL and PLsql to automate a highly complex manual refresh of test databases from production data saving manual work hours of DBA by 4000%
Performed Data cleaning using SQL for fraud detection models that helped model performance by 20%
Created data models and ETL pipelines to load data for Comprehensive Capital Analysis and Review (CCAR) project that scaled the data growth from 30Tb to 200Tb within 18 Months in Oracle data warehouse.
Improved CCAR data quality by 30% which helped produce highly accurate and reliable reports for Feds saving the bank from millions of dollars in penalties
Completed following POC for AWS Cloud:
- Migrated on-premise Oracle database to AWS Cloud Oracle RDS using AWS DMS.
- Created ETL process to process files landing into S3 and load it into AWS redshift using Glue and Lambda
- Using EMR cluster processed files to load into redshift database. American Express
Data Engineer 01/2014 – 07/2015
Worked closely with development team and dba team to identify data warehouse requirements and design and build data models
Created ETL pipeline to extract transaction data and perform transformations and load into data warehouse. Improved performance by 95% using proper indexes, partitions and query tuning.
Created AWS Redshift service to build cloud warehouse from the on-premise oracle data warehouse to meet the data requirements of data analyst team.
VISA
Software Developer 07/2013 – 01/2014
Achieved 1500% savings in manual efforts by automating data transfer from 20T production database to new migrated to be production databases. This build was product in itself and play key role in meeting the go- live date of this migration
Prior Experiences: NetApp, Wells Fargo, AT&T, Blue Dart, MV Software Consulting Prior Roles:
• Oracle DBA (8i-19c) • Developer (C++, Plsql, SQL) • Team Lead Machine Learning Projects:
Landmark detection and Tracking of Robots Position SLAM Python May’21-May’21 Accomplished Landmark detection RMSE of 0.2456 and Robots position RMSE of 0.6867. Lane Detection Computer Vision Python OpenCV Mar’21-Apr’21 Developed lane detection pipeline using edge detection and image transformation techniques to accurately identify lanes for a Self-Driving Car.
Image Captioning CNN LSTM Python Pytorch Jan’21-Feb’21 Achieved Perplexity of 6.829 using pre-trained model (resnet50) as CNN-Encoder and LSTM as Decoder. Training and Testing was done using COCO 2104 dataset.
Education:
Mumbai University: Bachelor of Engineering
Certifications:
Stanford University: Machine Learning, CS229i, Foundation of Data Science, Natural Language Processing
Cornell University: Machine Learning
Udacity: Computer Vision, Nanodegree
Coursera: Deep Learning specialization, Tensor Flow specialization