Post Job Free

Resume

Sign in

Data Analyst

Location:
Tampa, FL
Salary:
80000
Posted:
April 04, 2021

Contact this candidate

Resume:

www.linkedin.com/in/sudeeplunawat adle43@r.postjobfree.com 813-***-**** https://github.com/formyownsake 14616, Grenadine drive, Apt 1 Tampa, FL, 33613

EDUCATION

University of South Florida, Tampa, FL

MS in Business Analytics and Information Systems (GPA- 3.7) Aug 2019 - May 2021

Relevant coursework: Big Data for Business, Data Science Programming, Advanced Database and Systems Design

University of Mumbai, India Aug 2012 – May 2016

Bachelor of Technology in Electronics and Telecommunication (GPA- 3.8)

PROFESSIONAL EXPERIENCE

Data Analyst,University of South Florida Florida, USA, Dec 19 – Present

Implemented a RNN-based Siamese NLP model to classify valid YouTube video submissions based on captions, reducing semesterly manual efforts by about 2000 submission reviews.

Developed Tableau dashboards for student submission data to gather insights and improve the student experience, increasing student participation by 30%.

Implemented a YouTube scraper to extract transcripts of video submissions and preprocess them for NLP.

Created database and schema objects including tables and indexes to store student submission data. Developed scripts to automate duplicate removal.

Collaborate with Tableau Corporation to deliver Tableau Workshops in Muma Business school.

Accenture, Data Engineer Mumbai, India, Nov 2016 – Dec 2018

Worked on a Data integration project for a leading US-based Insurance firm, designed ETL solutions leveraging AWS, Hadoop, Spark in agile methodology for Business Insurance Application impacting 23 industry segments.

Developed ETL data pipelines using Spark, S3 and AWS EMR as per business requirements.

Ingested data from multiple sources – S3, Teradata, CSV, parquets, Avro.

Employed Techniques such as Executor tuning, caching, repartition, broadcast join to performance tune spark jobs and reduce execution time by 20%.

Used bucketing, partitioned indexes to optimize SQL queries, reducing execution time for SQL views by 20 mins.

Involved in Sanity checks for data quality impacting operations of 23 departments and 52 applications.

Implemented a script to analyze impacted jobs due to attribute change and notify stakeholders. This effort was recognized by leadership as it helped reduce manual project errors by 50% and minimized communication efforts.

PROJECTS

Analyze International Space Station Real-Time data (ETL: Kafka, Spark Streaming, Tableau)

Built an application that consumes ISS data and provides Map visualization on its position in real-time. Built using the ISS API, Kafka, and Spark Streaming.

Genomic Data Analysis (ETL ML): Built a local web app and regression model to predict drug responses on various genes on gene data, patient data, and gene ontology (1+ TB). Built using Flume, Spark, Flask.

Invalid Video Classification (for CDS Tableau): Implemented a NLP-based machine learning model to classify real and fake students YouTube video submissions based on their context, achieving a 93% accuracy.

Skills Used: Tokenization, Sequencing, Word embeddings, Sentiment Analysis, Similarity, LSTM, One shot learning, Autoencoders, Siamese Networks, Keras, Pandas, Siamese Networks, RNN

Analyze Crime data (ETL, Tableau, Postgre): Connect Postgre to Spark using JDBC and analyze Crime Data of NY.

SKILLS

Certifications: CCA-175 Hadoop and Spark developer (Lic:100-023-743), Tableau Desktop Specialist (CID: 1037620), MTA 98-381: Introduction to Programming Using Python

ETL: Spark, Flume, Kafka, Spark Streaming, Hive, TeraData, Excel, Pig, Ab-Initio, Data Modelling

Cloud: AWS EMR, S3, Redshift

Storage & DB (SQL, NoSQL): S3, HDFS, Hive Metastore, TeraData DB, SQL Server, MongoDB.

Machine Learning: Keras, NLP, Genism, Tokenization, Siamese Networks, Pandas, LSTM, Tableau, NumPy

Languages: Python, Scala, SQL, HiveQL, PySpark, Unix Shell Scripting, TeraDataSQL

Scheduling & Migration: AutoSys, Airflow, Git, Bitbucket

Sudeep Lunawat



Contact this candidate