Chingyi (Ethan) Ie
Phone: 310-***-**** Email: adl1vv@r.postjobfree.com LinkedIn: linkedin.com/in/chingyi-ie GitHub: github.com/ieching Education
University of California, Los Angeles ’21 (B.Sc. Computational Biology, GPA: 3.66, 5x Dean’s Honor List) Skills: Python, R, SQL, Tensorflow, Tableau, AWS, Distributed Systems, NLP, Airflow Professional Experience
Proofpoint, Sunnyvale, CA
01/21 – Present Data Engineering Intern (Spring ’21)
• Updated production level dashboards on Grafana to allow more granular visualization of spam detection KPIs by refactoring database backend on AWS EC2 and Airflow.
• Identified and resolved duplicate counting issue in hourly spam KPI aggregation by updating MySQL query logic and internal libraries, leading to a 100% decrease in duplicate counts.
• Improved metric calculation efficiency by 70% by migrating hourly backfill processes from Airflow to the codebase and integrating metric calculations from Python into MySQL queries. 06/20 – 08/20 Software Engineering Intern, Machine Learning (Summer ’20)
• Productionized an end-to-end similarity search model that identifies similar spam emails in order to find similar attack campaigns using Landmark MDS and Approximate KNN.
• Improved query speed by 75% by daemonizing process through refactoring existing codebase and creating a RESTful web service using FastAPI.
• Decreased model training time by over 98% by scaling model horizontally and parallelizing across multiple servers using Spark on AWS EMR to handle big data. W3LL, Berkeley, CA
10/20 – 12/20 Software Engineering Intern, Data (Fall ’20)
• Developed a content recommendation engine in Python based on collaborative filtering for implicit data that recommends articles and initialized a database system using SQLite.
• Implemented a text analyzer that color-codes medium articles based on various sentiments and integrated it with the front-end through a RESTful web service using Flask. UCLA Center for Health Policy Research, Los Angeles, CA 06/19 – 06/20 Data Analyst
• Preprocessed and analyzed big data using R’s data.table and produced visualizations using ggplot2 which were adopted to the final report to the Department of Health Care Services. Research Experience
01/20 – 03/20 Undergraduate Researcher (Supervisor: Thomas Vallim)
• Implemented various multiple regression models including LASSO and Elastic-Net regression using Python to identify key proteins that are correlated with different lipid levels.
• Performed permutation and random testing within each cross-validation fold using UCLA’s computing cluster to verify model significance.
• Reduced the number of proteins from 2919 to around 6-10 and achieved an R-squared of around 0.6-0.8 for each model when comparing predicted to actual values. Leadership Experience
01/20 – 06/20 Machine Learning Lead (Affinity, UCLA)
• Led a team of 3 in using NLP techniques in Python such as POS-tagging and word vectors to build a news political bias predictor as part of a Chrome extension.
• Oversaw the development of an ensemble of classifiers using Naïve Bayes, SVM, and Logistic Regression, which achieved an AUROC of 0.80 (One-vs-One) for a 3-class classification model.
• Implemented an LSTM classifier using Tensorflow which achieved an AUROC of 0.79. 04/20 – 06/20 Project Lead (Bruins Without Borders, UCLA)
• Performed a descriptive analysis of the homelessness population in LA, which is shared with the California Policy Lab to better understand and predict homelessness patterns.
• Oversaw feature selection using regularized linear models and Random Forests to identify key predictors of homelessness and implemented K-means clustering to group data points.