Data Engineering

Location:

Seattle, WA

Posted:

July 09, 2020

Contact this candidate

Resume:

ABHINAV REDDY KAITHA

********@**.*** j 812-***-**** j LinkedIn: linkedin.com/in/abhinavkaitha/ j Github: github.com/Abhinavkaitha EDUCATION

Indiana University Bloomington Bloomington, IN

Master of Science in Data Science May 2020

Relevant Coursework: Applied Machine Learning, Advanced Database Concepts, Elements of Artiﬁcial Intelligence, Machine Learning in Computational Linguistics, Statistical Modeling, Applied Algorithms, Deep Learning Systems, Data Visualization Jawaharlal Nehru Technological University Hyderabad, India Bachelor of Technology in Engineering September 2013 - May 2017 SKILLS

Languages: Python, R, SQL, Google Script, PySpark, HTML, C

Technologies/Tools: GCP, AWS, Git, Tableau, Databricks, Tensorﬂow, Alteryx, Power BI, Plotly, Dash

Machine Learning: Classiﬁcation, Clustering, Regression, Time Series Analysis, Feature Engineering, Neural Networks

Big Data Frameworks/Databases: Apache Spark, Airﬂow, PostgreSQL, Cassandra, Apache Parquet, Amazon Redshift EXPERIENCE

Insight Data Science Seattle, WA

Data Engineering Fellow January 2020 - Present

Data Preprocessing: Built a tool to give insights from 200 GB of tweets, by changing it to parquet format with the help of Ec2 instance and hosted the data on S3

Machine Learning: Built SVM and Regularised Logistic Regression models on the text data using pyspark ml package and EMR clusters. Staged the ﬁnal predictions on RDS Postgres database

Data Visualization: Built interactive Choropleths from scratch using Dash and managed all the scripts using Apache Airﬂow The BeeCorp Bloomington, IN

Data Science Intern October 2019 - December 2019

Feature Engineering: Extracted features using percentiles, clustering, and transformations from the existing temperature arrays present on S3 buckets. Fetched weather data using dark sky API and created features from variables like wind speed, wind direction, precipitation

Machine Learning: Built an XGBoost regressor, Random Forest regressor on the new features and the old features. Fine-tuned the model using grid search. Increased the accuracy by 10% when compared to the previous model used by the company to identify the frame strength Kelley School of Business Bloomington, IN

Marketing Analyst May 2019, August 2019 - December 2019

Data Engineering: Automated the process of extracting the required ﬁelds from Google Analytics using Google Script. Later built a data pipeline to move the data from Google Analytics to the postgres database using Python

Attribution Modeling: Identiﬁed important data ﬁelds captured in tools like SalesForce CRM, SalesForce Cloud marketing tool etc, to build an attribution model for studying the impact of different ad channels on the number of applications to the full time MBA program AARP Washington, D.C

Internal Audit Summer Intern June 2019 - August 2019

Exploratory Data Analysis: Analysed 1.6 million transactions (Categorical, Numerical) and presented the results with Tableau

Re-sampling: Balanced the data set using re-sampling techniques like SMOTE and BalanceCascade

Machine Learning: Identiﬁed the risky transactions using logistic regression, tree based algorithms and neural networks by using custom loss functions. Achieved a precision and recall of 0.95 by adjusting the probability cut off with the help of ROC Accenture Hyderabad, India

Associate Software Engineer May 2017 - March 2018

SQL: Designed and implemented SQL queries to export and load data from Postgres database

Regression Analysis: Analyzed raw data using statistical techniques and provided insights to the client PROJECTS

Data Modeling with Postgres and Cassandra October 2019

Star Schema: created a Postgres database with tables designed using star schema, to optimize queries for a particular analytic focus

Denormalizing and modeling: Processed the event data to create a denormalized dataset and modeled the Cassandra tables for the required queries

ETL Pipeline: Built an ETL pipeline using Python and SQL to populate and test the databases Data lake with Spark November 2019

Star Schema: Using the song and log datasets, created a star schema optimized for queries on song play analysis.

Preprocessing using Spark: Built an ETL pipeline that extracts their data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables for further analysis

Contact this candidate