ABHINAV REDDY KAITHA
********@**.*** j 812-***-**** j LinkedIn: linkedin.com/in/abhinavkaitha/ j Github: github.com/Abhinavkaitha EDUCATION
Indiana University Bloomington Bloomington, IN
Master of Science in Data Science May 2020
Relevant Coursework: Applied Machine Learning, Advanced Database Concepts, Elements of Artificial Intelligence, Machine Learning in Computational Linguistics, Statistical Modeling, Applied Algorithms, Deep Learning Systems, Data Visualization Jawaharlal Nehru Technological University Hyderabad, India Bachelor of Technology in Engineering September 2013 - May 2017 SKILLS
Languages: Python, R, SQL, Google Script, PySpark, HTML, C
Technologies/Tools: GCP, AWS, Git, Tableau, Databricks, Tensorflow, Alteryx, Power BI, Plotly, Dash
Machine Learning: Classification, Clustering, Regression, Time Series Analysis, Feature Engineering, Neural Networks
Big Data Frameworks/Databases: Apache Spark, Airflow, PostgreSQL, Cassandra, Apache Parquet, Amazon Redshift EXPERIENCE
Insight Data Science Seattle, WA
Data Engineering Fellow January 2020 - Present
Data Preprocessing: Built a tool to give insights from 200 GB of tweets, by changing it to parquet format with the help of Ec2 instance and hosted the data on S3
Machine Learning: Built SVM and Regularised Logistic Regression models on the text data using pyspark ml package and EMR clusters. Staged the final predictions on RDS Postgres database
Data Visualization: Built interactive Choropleths from scratch using Dash and managed all the scripts using Apache Airflow The BeeCorp Bloomington, IN
Data Science Intern October 2019 - December 2019
Feature Engineering: Extracted features using percentiles, clustering, and transformations from the existing temperature arrays present on S3 buckets. Fetched weather data using dark sky API and created features from variables like wind speed, wind direction, precipitation
Machine Learning: Built an XGBoost regressor, Random Forest regressor on the new features and the old features. Fine-tuned the model using grid search. Increased the accuracy by 10% when compared to the previous model used by the company to identify the frame strength Kelley School of Business Bloomington, IN
Marketing Analyst May 2019, August 2019 - December 2019
Data Engineering: Automated the process of extracting the required fields from Google Analytics using Google Script. Later built a data pipeline to move the data from Google Analytics to the postgres database using Python
Attribution Modeling: Identified important data fields captured in tools like SalesForce CRM, SalesForce Cloud marketing tool etc, to build an attribution model for studying the impact of different ad channels on the number of applications to the full time MBA program AARP Washington, D.C
Internal Audit Summer Intern June 2019 - August 2019
Exploratory Data Analysis: Analysed 1.6 million transactions (Categorical, Numerical) and presented the results with Tableau
Re-sampling: Balanced the data set using re-sampling techniques like SMOTE and BalanceCascade
Machine Learning: Identified the risky transactions using logistic regression, tree based algorithms and neural networks by using custom loss functions. Achieved a precision and recall of 0.95 by adjusting the probability cut off with the help of ROC Accenture Hyderabad, India
Associate Software Engineer May 2017 - March 2018
SQL: Designed and implemented SQL queries to export and load data from Postgres database
Regression Analysis: Analyzed raw data using statistical techniques and provided insights to the client PROJECTS
Data Modeling with Postgres and Cassandra October 2019
Star Schema: created a Postgres database with tables designed using star schema, to optimize queries for a particular analytic focus
Denormalizing and modeling: Processed the event data to create a denormalized dataset and modeled the Cassandra tables for the required queries
ETL Pipeline: Built an ETL pipeline using Python and SQL to populate and test the databases Data lake with Spark November 2019
Star Schema: Using the song and log datasets, created a star schema optimized for queries on song play analysis.
Preprocessing using Spark: Built an ETL pipeline that extracts their data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables for further analysis