Post Job Free
Sign in

Data Engineer

Location:
Hoboken, NJ
Posted:
April 13, 2025

Contact this candidate

Resume:

SHIVANI SETHI

***************@*****.*** +1-201-***-**** LinkedIn Github Kaggle New York City

EDUCATION

STEVENS INSTITUTE OF TECHNOLOGY 2024 - Present

Masters of Science in Information Systems

JAYPEE INSTITUTE OF INFORMATION TECHNOLOGY 2017- 2021 Bachelor of Technology with Honors, Electronics and Communication Engineering GPA 8.0/10 PROFESSIONAL EXPERIENCE

Testbook.com (one of India’s leading digital platforms for competitive exam preparation) May 2022 - Apr 2023 Data Engineer, Tech Team

● Led the development of Google Pub/Sub workers, reducing latency in system banner link generation by 30% and streamlining PDF file creation. Scaled this workflow by implementing it on Google Pub/Sub which was initially deployed on Cloud Run.

● Designed and implemented a Data Lake using gold and silver storage on a GCP bucket. Storing data in the form of parquet files in GCP. Completing Airbyte POC and analyzing if it can be used to create Data Lake as per our use case.

● Scheduled YouTube livestream events across multiple channels, integrating YouTube APIs into the Class Booking System (CBS). Developed daily CBS event bookings, ensuring integration for each YouTube channel, developed a RTMP key generation system.

● Enhanced system security by implementing daily OTP generation for cross-platform mobile teams, supporting over 10 engineers and improving access control.

● Increased user engagement by 37% through algorithms that compute user performance metrics across chapters, providing detailed monthly analysis reports to users on a monthly basis based on users activity on the system (used LSTM for the same).

● Developed automated ETL pipelines using Apache Airflow for user communication via emails, IVR, WhatsApp, and SMS, reducing manual effort by 50% and significantly boosting customer engagement. Cognizant Aug 2021- Mar 2022

Programmer Analyst Trainee, Manual and Automation Testing

● Enhanced website functionality through data retrieval and analysis, ensuring timely project delivery and improved user experience. Automated testing workflows using Selenium with Java, optimizing testing efficiency and accuracy. Cognizant Feb- Aug 2021

Intern, Automation Testing, Remote

● Conducted comprehensive evaluations of six websites, achieving high-quality assessment scores as part of an eighth-semester university project.

● Gained expertise in Selenium with Java via professional training. KEY TECHNICAL SKILLS

● Programming Languages: Python, C++, Java, Arduino, C

● Data Structures and Algorithms: OOP, Arrays, Strings, Pointers, STL, Recursion, Dynamic Programming, Graphs

● Data Science and ML Algorithms: LSTM, Markov Chains, Word Embeddings, Reinforcement Learning, Pyspark, Linear Regression, Natural Language Processing, Naive Bayes, K-Means, Decision Trees, Neural Networks, CNN, Transfer Learning

● Databases & Cloud Services: MySQL, PostgreSQL, MongoDB, Google Cloud Platform, Cloud Run, Cloud Functions

● Data Pipelines & Warehouse: ETL Pipelines, Apache Airflow, Airbyte

● Web Development, Tools and Automation: Flask, Django, Rest APIs, Postman, Jenkins, Git, Selenium ACADEMIC PROJECTS

AI Image Caption Bot Jan - Feb 2020

● Developed an AI model to generate accurate captions for images using the Flickr 8K dataset, achieving 88% precision.

● Technology/Tools: NLP/NLTK, Glove Embeddings, Transfer Learning, LSTM, Keras Functional API Music Generation Project Nov - Dec 2019

● Built a Recurrent Neural Network to generate original music compositions from MIDI data, improving note prediction accuracy by 20%.

● Technology/Tools: RNN, Markov Chains, Music21

Movie Recommendation Engine Aug - Dec 2020

● This project recommends movies to users based on their previously selected preferences on the basis of three types of filtration techniques,namely, demographic, content-based filtering and collaborative based filtering. The results are displayed on an HTML web page (web scraping). Implemented embedding layers for personalized movie recommendations.

● Kaggle Dataset TMDB and Movielens are used here.

● Technology/Tools: Python, Flask

CERTIFICATIONS

● Machine Learning and Artificial Intelligence Specialization (IIT-Delhi) Dec 2019 Advanced coursework in supervised, unsupervised learning and deep learning algorithms.

● Mastering Data Structures and Algorithms (IIT-Delhi) Jan 2021



Contact this candidate