Post Job Free

Resume

Sign in

Data Engineer

Location:
Sunnyvale, CA
Posted:
November 20, 2023

Contact this candidate

Resume:

Rajvee Shah

315-***-**** ad1bed@r.postjobfree.com www.linkedin.com/in/rajveeshah

EXPERIENCE

Data Engineer, Slalom Jul 2021 – Sep 2023

• Designed, tested, and deployed 20+ data integrations using AWS Aurora and S3 for seamless data transfer across multiple vendor systems, with a strong emphasis on detailed documentation

• Maintained and optimized 4 ETL pipelines on a Hadoop platform to support reporting while addressing data quality issues using Hive for enhanced data reliability

• Streamlined client’s operational processes by 90% by ingesting & pre-processing public comments using an API key into their AWS ecosystem to enable language processing

• Used LLMs & traditional NLP to develop sentiment classification, common-sense reasoning & clustering with 98% accuracy for policy around gun safety initiative from federal & state level using AWS Sagemaker Data Engineer, Slalom (Client: Apple) Sep 2022 – Mar 2023

• Built, tested & deployed an API using Datadog to source data for Tableau to enable real-time tracking of test results

• Enhanced the incremental data loading process by extending the functionality of four existing datamart pipelines by deploying Python jobs through Apache Airflow

• Proactively mitigated production failures by developing 2 platform enhancements using Splunk, as near real time notification systems & produced documentation for best practices with specifications of corresponding processes Data Analyst Intern, Syracuse University Jul 2020 – Jun 2021

• Extracted 6M+ entries from GoFundMe website in SQL Server and performed data cleaning by indexing and using joins to reduce data complexity, altered tables, and loaded it into a CSV file format

• Deployed Tableau dashboards to create reports for goal amount by state control & analyzed its effect on the amount

• Analyzed the data using Python to extract important features by chi-squared testing and other statistical techniques TECHNICAL SKILLS

Programming: Python (Pandas, NumPy, Scikit-learn, Matplotlib), SQL, R (ggplot, dplyr), SQLAlchemy Databases: PostgreSQL, AWS Aurora, Snowflake, Microsoft SQL Server, DBeaver Business Intelligence: Tableau, Power BI, Advanced Excel (Power Pivot, Vlookup, KPIs), Data Modeling, A/B Testing Tools: Hadoop, Spark, Hive, Hue, Jupyter, Datadog, Sagemaker, S3, EMR, EC2, Airflow, Git, dbt, BigQuery, Matillion, Alteryx Algorithms: Association Rule Mining, Support Vector Machine, K-Nearest Neighbors, Ensemble Learning, HuggingFace EDUCATION

Syracuse University, School of Information Studies, Syracuse, NY May 2021 Master of Science in Information Management, Certificate of Advanced Study in Data Science GPA: 3.88/4.00 Courses: Data Warehouse, Big Data Analytics, Business Analytics, Data Analysis & Decision Making, Project Management Sardar Patel Institute of Technology, University of Mumbai, India May 2019 Bachelor of Engineering in Electronics Engineering GPA: 3.6/4.00 RELEVANT PROJECTS

Business Intelligence Solutions for Fudgemart Inc. (SQL, Visual Studio, Power BI) Oct 2020 – Nov 2020

• Performed ETL, integrated & loaded 9M+ records from different data sources like flat files, OLTP to build an enterprise data warehouse for Fudgemart Inc. and projected to boost order fulfillment efficiency by 23%

• Created data pipelines to counter latency, automated data flow and constructed a MOLAP cube by leveraging SSAS

• Built interactive Power BI dashboards to track KPIs, interpret delay patterns to expediate order fulfillment process Diabetic Patient Readmission Predictive Analysis (Python, Jupyter Notebooks, Streamlit) Sep 2020 – Dec 2020

• Built & evaluated 8 machine learning models by one hot encoding factorization & feature engineering, & determined strongest contributors to hospital readmission, to improve provided healthcare by 17%

• Performed chi-squared test & Principal Component Analysis to enable feature selection & data visualizations

• Developed a web app on Streamlit to tune hyperparameters & display an overview of the best performing model



Contact this candidate