Data Engineer Analyst

Location:

Jersey City, NJ

Salary:

75000

Posted:

July 15, 2025

Contact this candidate

Resume:

RAHIL SHAH

781-***-**** ************@*****.*** LinkedIn GitHub

SUMMARY

Technical Data Analyst & AI Data Engineer with 2.5+ years of experience, who has deployed serverless ChatGPT bots on AWS, built scalable ETL/ELT pipelines, and implemented CI/CD frameworks to streamline delivery and boost engineering efficiency. I’ve slashed manual metadata tasks by 99% and modernized legacy COBOL systems into automated pipelines. From MLOps to Kubernetes, I solve complex data challenges end-to-end. With proven leadership, deep technical expertise, and sharp business impact, I also have hands-on experience using SAS Enterprise for data cleaning and statistical analysis, making me well-suited for hybrid analytical and visualization roles. Proficient in SQL, Python, SSRS, SSAS, including Confluent Kafka. Technical Skill

Languages: Python, SQL, R, Scala, SAS.

Frameworks & Libraries: Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch, OpenCV, NLTK, LangChain. Database: MySQL, PostgreSQL, MongoDB, DynamoDB, Oracle, Redshift, BigQuery, Data Lakes (AWS S3), Data Warehouses. Machine Learning & AI: A/B Testing, Generative AI (ChatGPT), NLP, XGBoost. Data Engineering & ETL Tools: Apache Airflow, PySpark, Databricks, Docker, CI/CD Pipelines, Kubernetes, Redis. Visualization: Tableau, Power BI, Streamlit.

Cloud Platforms: SageMaker, GCP (BigQuery), Azure Certified (AZ-900), AWS (S3, Lambda, Glue, EMR). Others: MS Excel (VLOOKUP, Index, Pivot Table), Alteryx, Git/GitHub, Data Governance, Mining, Wrangling, & Storytelling Experience

JerseySTEM Dec 2024 – Present

AI Data Engineer Jersey City, NJ

• Integrated ChatGPT with Webex as a serverless bot using AWS Lambda, API Gateway, and Redis for company-wide use, leveraging event-driven workflows and optimized caching to enhance scalability and response efficiency.

• Modernized legacy data warehousing by migrating COBOL-based processes to a Python system on Linux, utilizing Kubernetes, Helm Charts, and Airflow for efficient orchestration, and improving scalability.

• Assisted in estimating the performance of a caching solution using PySpark on AWS EMR clusters, achieving a Hit Rate of 72%.

• Developed an ETL pipeline using PySpark to process hotel data on AWS EMR clusters, automating execution with AWS Lambda and EventBridge for daily scheduled runs. Accura Engineering & Consulting Services, Inc. May 2024 – Oct 2024 Data Analyst, Engineering & Operations Intern Atlanta, GA

• Led a team of 4, seamlessly integrating Databricks with Collibra Catalog using the JDBC Spark driver, and automating metadata ingestion for 260+ schemas using Python scripting and Tidal jobs to reduce manual effort by 99%.

• Engineered an ETL pipeline for data processing from AWS S3 data lake to Redshift data warehouse. Transformed the data with AWS Glue using crawlers and orchestrated the pipeline using Apache Airflow.

• Developed SQL queries to extract data from MYSQL, Oracle, and PostgreSQL, optimized for efficiency, and exposed them as APIs.

• Utilized SAS Enterprise for data cleaning, statistical analysis, and formatted output delivery for internal reporting needs. Mahaveer Construction May 2021 – Jun 2023

Data Engineer & Analyst Mumbai, India

• Led end-to-end A/B testing initiatives across product features and marketing channels, collaborated with product and engineering teams to define metrics, validate test significance, and drive data-informed decisions that improved user engagement by 18%.

• Developed scalable data pipelines using CI/CD workflows, ensuring efficient data movement between Data lakes & Data warehouses, which enhanced data accessibility and processing speed.

• Built and maintained interactive dashboards using Power BI and Tableau, translating complex datasets into actionable business insights for cross-functional stakeholders, improving reporting efficiency, and strategic visibility.

• Developed reporting solutions using SSRS and SSAS for financial and operational metrics, implemented Kafka-based event-driven pipelines using Confluent Kafka for real-time data streaming and integration with AWS Lambda & S3. Academic Projects

Generative AI Applications GitHub Fall 2024

• Developed AI agents and chatbots with a Streamlit-based interface where users can engage in natural language dialogue to pose questions and gain insights about the input PDF file(s) or anything in general.

• Leveraged OpenAI’s embeddings to process the files into a FAISS vector database, created a RAG pipeline in LangChain with ChatGPT-3.5, and used prompting techniques to reduce token size. Customer Churn Prediction GitHub Spring 2024

• Performed survival analysis to identify the likelihood of churn and built a tree-based classification model to predict churn.

• Implemented SMOTEEN and hyperparameter tuning to solve the class imbalance problem, optimized the AUC score by 15% to 98.9%, and deployed the XGBoost model as a user-friendly web application using Streamlit. Education

Master of Science, Information Technology Aug 2023 – May 2025 Clark University, Worcester, MA

Contact this candidate