Post Job Free
Sign in

Data Engineer, ETL, Python, SQL, AWS, Spark, Databricks

Location:
Newark, NJ
Posted:
March 02, 2025

Contact this candidate

Resume:

Ó 908-***-****

vinaygazula.dev

[ **************@*****.***

VINAY RAM GAZULA github.com/vinay-ram1999

linkedin.com/in/vinayramg

orcid.org/0009-0007-3924-5258

EDUCATION

New Jersey Institute of Technology Newark, NJ Sept 2023 — May 2025 Master of Science in Data Science GPA: 3.89/4

Coursework: Introduction to Big Data, Advanced Database Systems Design, Machine Learning, Data Visualization SRM University AP Amaravati, India Aug 2017 — May 2021 Bachelor of Technology GPA: 8.51/10

SKILLS

Languages: Python (PySpark, Polars, Pandas, NumPy, scikit-learn, PyTorch, TensorFlow), SQL (ANSI SQL, PL/SQL), Scala (Apache Spark), Go, Rust, Bash, C

Databases: PostgreSQL, MySQL, Oracle, MongoDB, MS SQL Cloud: AWS (Athena, EC2, Glue, Lambda, RDS, Redshift), Azure (Data Factory, Synapse Analytics), GCP (BigQuery) Big Data: Apache Spark (Spark SQL, Dataset API, Dataframe API), Databricks, Apache (Hive, Iceberg, Flink, Kafka, Airflow), DBT, Snowflake

Analytics: Tableau, Apache Superset, Power BI, Excel CI/CD: Git, GitHub, GitHub Actions, Docker, Kubernetes, Terraform, Jenkins, GitLab EXPERIENCE

New Jersey Institute of Technology Newark, NJ

Research Assistant NJIT Engineering Education Research July 2024 — Present

• Analyzed institutional student data with machine learning and developed regression models (Multiple Linear Regression, Random Forest, XGBoost, LightGBM) in R to predict GPA and identify academic performance factors Teaching Assistant CS-634 Data Mining Course May 2024 — Present

• Evaluated weekly homework, quizzes, and projects with prompt, equitable feedback, while mentoring students in Python coding to enhance their project outcomes

Research Assistant Data and Knowledge Engineering Lab Sept 2023 — May 2024

• Integrated xAI tools (LIME, SHAP, Anchors, PDP, ALE) into “SolarFlareNet”—a deep learning framework for space weather research—successfully retrained the model to maintain 90.7% accuracy and presented findings at FLAIRS Impetus Bengaluru, India

Data Engineer July 2020 — Aug 2023

• Optimized data lakes in Amazon S3 by applying lifecycle policies, converting partition files to Parquet, and sorting for run-length encoding, which resulted in a 25% reduction in storage and I/O costs

• Collaborated with analytics teams to design data models for KPI generation and efficiently handled 20+ ad hoc requests, ensuring alignment with business needs

• Developed robust ETL pipelines using Apache Spark on Databricks to ingest upstream master data and populate Apache Iceberg tables, cutting processing times by 40% for faster business insights

• Utilized Apache Airflow to orchestrate complex data workflows, enhancing pipeline reliability and operational efficiency in a dynamic cloud environment

• Leveraged AWS Glue Data Catalog and AWS Lake Formation to standardize metadata management and enforce data governance policies, reducing integration complexities and accelerating analytics workflows PROJECTS

EDA on Global Soil Respiration Data MySQL, Tableau 0 Github

• Analyzed the Global Soil Respiration Database (SRDBv5) by cleaning and querying data in MySQL and creating interactive Tableau dashboards for visual insights WebScraping Using R R, rvest, ggplot 0 Github

• Developed an R-based web scraping tool using rvest to collect and update articles from “Parasites & Vectors”, optimizing resource usage and performing data cleaning and exploratory analysis with regex and ggplot eComputer Store Database System MySQL, Python, Streamlit, Pandas 0 Github

• Developed a comprehensive MySQL database system for an e-commerce store by designing ER diagrams, translating them into a relational schema, and building an interactive web interface with Python’s Streamlit library AlgoTrade API Python, yfinance, Pandas, Tensorflow, ks-api-client 0 Github

• Developed a fully automated NSE stock trading bot in Python by integrating real-time and historical data with yFinance, training ML models (including LSTM) for stock price prediction, and executing live trades via the Kotak Securities API PUBLICATIONS

1. Interpretable Deep Learning for Solar Flare Prediction — IEEE ICTAI 2024 2. An Interpretable Transformer Model for Operational Flare Forecasting — FLAIRS 2024 CERTIFICATIONS

• Data Engineering Bootcamp — DataExpert.io 2025

• Google Data Analytics Professional Certificate — Coursera 2023



Contact this candidate