Intern Data Analyst

Location:

Posted:

April 29, 2021

Resume:

Chingyi (Ethan) Ie

Phone: 310-***-**** Email: *********@*****.*** LinkedIn: linkedin.com/in/chingyi-ie GitHub: github.com/ieching Education

University of California, Los Angeles ’21 (B.Sc. Computational Biology, GPA: 3.66, 5x Dean’s Honor List) Skills: Python, R, SQL, Tensorflow, Tableau, AWS, Distributed Systems, NLP, Airflow Professional Experience

Proofpoint, Sunnyvale, CA

01/21 – Present Data Engineering Intern (Spring ’21)

• Updated production level dashboards on Grafana to allow more granular visualization of spam detection KPIs by refactoring database backend on AWS EC2 and Airflow.

• Identified and resolved duplicate counting issue in hourly spam KPI aggregation by updating MySQL query logic and internal libraries, leading to a 100% decrease in duplicate counts.

• Improved metric calculation efficiency by 70% by migrating hourly backfill processes from Airflow to the codebase and integrating metric calculations from Python into MySQL queries. 06/20 – 08/20 Software Engineering Intern, Machine Learning (Summer ’20)

• Productionized an end-to-end similarity search model that identifies similar spam emails in order to find similar attack campaigns using Landmark MDS and Approximate KNN.

• Improved query speed by 75% by daemonizing process through refactoring existing codebase and creating a RESTful web service using FastAPI.

• Decreased model training time by over 98% by scaling model horizontally and parallelizing across multiple servers using Spark on AWS EMR to handle big data. W3LL, Berkeley, CA

10/20 – 12/20 Software Engineering Intern, Data (Fall ’20)

• Developed a content recommendation engine in Python based on collaborative filtering for implicit data that recommends articles and initialized a database system using SQLite.

• Implemented a text analyzer that color-codes medium articles based on various sentiments and integrated it with the front-end through a RESTful web service using Flask. UCLA Center for Health Policy Research, Los Angeles, CA 06/19 – 06/20 Data Analyst

• Preprocessed and analyzed big data using R’s data.table and produced visualizations using ggplot2 which were adopted to the final report to the Department of Health Care Services. Research Experience

01/20 – 03/20 Undergraduate Researcher (Supervisor: Thomas Vallim)

• Implemented various multiple regression models including LASSO and Elastic-Net regression using Python to identify key proteins that are correlated with different lipid levels.

• Performed permutation and random testing within each cross-validation fold using UCLA’s computing cluster to verify model significance.

• Reduced the number of proteins from 2919 to around 6-10 and achieved an R-squared of around 0.6-0.8 for each model when comparing predicted to actual values. Leadership Experience

01/20 – 06/20 Machine Learning Lead (Affinity, UCLA)

• Led a team of 3 in using NLP techniques in Python such as POS-tagging and word vectors to build a news political bias predictor as part of a Chrome extension.

• Oversaw the development of an ensemble of classifiers using Naïve Bayes, SVM, and Logistic Regression, which achieved an AUROC of 0.80 (One-vs-One) for a 3-class classification model.

• Implemented an LSTM classifier using Tensorflow which achieved an AUROC of 0.79. 04/20 – 06/20 Project Lead (Bruins Without Borders, UCLA)

• Performed a descriptive analysis of the homelessness population in LA, which is shared with the California Policy Lab to better understand and predict homelessness patterns.

• Oversaw feature selection using regularized linear models and Random Forests to identify key predictors of homelessness and implemented K-means clustering to group data points.

Contact this candidate