Post Job Free

Resume

Sign in

Data Aws

Location:
San Jose, CA
Posted:
November 29, 2020

Contact this candidate

Resume:

RAHIL NALIN SHAH

San Francisco Bay Area adh7z3@r.postjobfree.com 617-***-**** LinkedIn: rahilshah10 GitHub: rahilshah10 OBJECTIVE

A data engineering professional with experience in creating data pipelines, analytics, visualizations looking for opportunities in a fast paced, highly analytical organization.

TECHNICAL SKILLS

Languages: Python, SQL, NoSQL, Java, HTML, CSS Technology: AWS Lambda, Linux, SSH, Django, AWS EC2, Apache Spark, TensorFlow, Hive, Toad, Visual Studio Databases: MySQL, PostgreSQL, MongoDB, Athena, Oracle SQL, SQL Server Warehousing and BI: Talend, SSAS, SSIS, SSRS, Alteryx, AWS Glue, Tableau, Qlik Sense, Power BI, AWS QuickSight Data Science Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn PROFESSIONAL EXPERIENCE

Cloud Services Data Engineer, LibDib Feb 2020 – Present

● Designed, built & deployed highly scalable API pipeline to synchronize customer attributes with HubSpot using Python, AWS Lambda, Docker and AWS CloudWatch which reduced manual intervention by 60%

● Upgraded search performance by 20x using Django QuerySets for PostgreSQL database of over 10 million products

● Achieved 4x faster delivery times by analyzing inventory transactions and weekly sales data, using Python and Django, for demand forecasting

● Enabled data driven business insights on market trends, and logistics by maintaining weekly dashboards for sales, operation, and inventory perspective using AWS QuickSight

● Created and enhanced features for LibDib’s support dashboard, contributing to better workflows and smarter suggestions Data Science Research Analyst, Northeastern University (AWS, Python) Sep 2019 – Dec 2019

● Devised a pipeline to scrape, preprocess and append apartment rental data from Craigslist using AWS Lambda on Python

● Leveraged AWS Glue to transform data from JSON to Parquet format to achieve 99.7% cost savings

● Utilized Amazon Athena for querying underlying data in a serverless setting for exploratory analysis

● Designed a dashboard in QuickSight, updated periodically, to see trends and gain insights Data Analyst Intern, Granite Telecommunications (SQL Server, Oracle DB, Hadoop, Hive, SSIS) Aug 2018 – Dec 2018

● Constructed an ETL pipeline to extract ~130 million rows of data from SQL Server & Oracle and loaded in Cloudera Hadoop cluster, increasing speed by 62%

● Built PowerBI dashboards pertaining to customers, tickets, and inventory to identify churn and aid retention

● Implemented protocols to log DML changes, enhance security and resilience to SQL injection by creating around ~3000 stored procedures and necessary triggers

● Improved transaction speeds by performing query tuning and migrating stored procedures from SQL Server to Oracle

● Set up better database administration protocols to reduce privilege abuse and weak audits EDUCATION

Northeastern University Sep 2017 – Dec 2019

Master of Science in Information Systems

ACADEMIC PROJECTS

Data Warehouse (Talend, SSIS, SSAS, PowerBI, Tableau, Qlik Sense) May 2018 – Aug 2018

● Led team of 3 to develop a data warehouse, pipelining data from diverse sources using Talend Data Integration and SSIS

● Created and optimized processes in data warehouse to import, retrieve and analyze data in the Retail System

● Implemented error handling, load statistics, slowly changing dimensions, currency conversion, and performance tuning

● Built custom dashboards to analyze sales and customer segmentation using Tableau, Qlik Sense, and PowerBI Ride Sharing Optimization (Python, Apache Spark, TensorFlow, Facebook Prophet, AWS EC2) Jan 2018 – Apr 2018

● Streamlined choice–making process between Uber and Lyft to create an efficient data model for both drivers and passengers by showing real time and future price estimates

● Compiled a python script to get latest data from Uber, Lyft, Yelp, Open Weather APIs and automated it on Amazon EC2 creating about 86,400 rows of data over 2 months

● Collaborated with 3 peers to forecast ride sharing price estimates with TensorFlow utilizing RNN with a Mean Squared error of 0.7119

● Prepared an additive model for time series data utilizing Facebook Prophet package to forecast ride sharing price estimates one day ahead and show an overall trend, weekly trend, daily and time of day trend



Contact this candidate