SIDDHANT SAPTE
857-***-**** Boston, MA *****.*@************.*** LinkedIn: https://www.linkedin.com/in/siddhant-sapte/ EDUCATION
Master of Science in Analytics GPA: 3.7 Northeastern University Boston, MA, USA Apr 2020 Bachelor of Engineering in Electronics and Telecommunication University of Mumbai Mumbai, India Jul 2015 SKILLS
Programming Languages : Python (pandas, numpy, seaborn, scikit-learn, scipy, tensorflow), R (shiny, dplyr, tidyr), SQL, NoSQL Data Tools and Databases : Tableau, Power BI, QuickSight, Excel, Hadoop, Spark, Hive, ELK, MySQL, Oracle 11g, Redshift Technologies : AWS, GCP, Databricks, Docker, Cloudera, Google Analytics, GitHub Techniques/OS : Web Scrapping, ETL, Agile (Scrum), Windows, Linux, Ubuntu WORK EXPERIENCE
Research Assistant AI Skunkworks Northeastern University Boston, MA Apr 2020 – Present
• Created a dash (framework in python for web applications) dashboard to visualize COVID-19 spread with a website and doing research on the doubling rate of the virus (https://covid19-visual-app.herokuapp.com/)
• Integrated different data sources with SSIS to dump COVID data into SSMS and created analytical reports in Tableau Graduate Teaching Assistant (ML in R and Python) Northeastern University Boston, MA Jan 2020 – Apr 2020
• Mentored students for ML algorithms, code debugging, conducted lab sessions and assisted professor for assignments System Analyst Larsen and Toubro Infotech Mumbai, India Mar 2017 – Jul 2018
• Conducted log analysis and event correlation using ELK (Elasticsearch, Logstash, Kibana) to reduce server errors by 30%
• Improved server efficiency by 10% by developing a python script to scrape errors from server log and HTTP access log files
• Created Tableau dashboards showing server error report to ensure security and performance of various applications
• Developed SQL queries in Oracle DB to bulk upload transaction data for SOA applications reducing latency by 25% Data Engineer Larsen and Toubro Infotech Mumbai, India Sep 2015 – Mar 2017
• Helped architecting IoT solutions to analyze the live streaming data and to take action based on the live events
• Transformed unstructured sensors data into a structured dataframe by performing ETL operations in python
• Analyzed transformed sensors data to carry out trend analysis to detect irregularity of sensor values
• Automated business intelligent IoT dashboards with Tableau depicting performance of sensors in applications FREELANCE EXPERIENCE : Northeastern University
Data Analyst RedCrow Boston, MA Jan 2020 – Mar 2020
• Architected a pipeline to dockerize a flask application containing machine learning model on AWS EC2 instance
• Achieved success prediction of a startup by implementing a classification model by comparing the F-beta score, accuracy, AUC of Logistic Regression, XGBoost and SVM
Data Analyst Pluralpoint Inc Boston, MA Sep 2019 – Dec 2019
• Designed a pipeline to derive information from images into a CSV file leveraging AWS Textract with an accuracy of 90%
• Classified images based on appearance by building Convolution Neural Networks (CNN) using kerass
• Preprocessed images to enhance image features important for text extraction using OpenCV increasing clarity by 70% Data Analyst VIACOM Inc Boston, MA Apr 2019 – Jun 2019
• Imputed CPM values for ad campaigns on different pages applying Random Forest Algorithm with accuracy of 95%
• Identified set of Facebook pages, posts giving maximum conversion rate depending on the demographics and ad placement using K-Means clustering
• Pitched a pricing model to increase conversion rate depending on demographics and ad placement by a factor of delta PROJECTS : Northeastern University GitHub: https://github.com/siddsapte12 OList E-commerce Data Analysis (AWS, Pyspark, Python)
• Built an AWS Redshift data warehouse by dumping data from S3 data lakes and AWS RDS
• Created AWS data pipelines to sync incremental data by writing AWS glue jobs for ETL and batch processing
• Made use of Lambda functions to trigger and automate ETL/Data Syncing process
• Used AWS Athena for ad-hoc analysis and AWS QuickSight to make dashboards and KPI reports Airbnb Analysis (Pyspark, Databricks,AWS, S3)
• Leveraged databricks to analyze data stored in AWS S3 and used spark sql to find out different insights from data Serverless File Upload Web Application (Python, AWS, MySQl )
• Deployed a serverless web application to upload files through an optimized pipeline on S3 bucket to save cost by 90%
• Integrated Lambda function with API gateway to upload images in S3 and store the user metadata in MySQL database