BENJAMIN NGUYEN
**** **** ****** • Berkeley, CA, 94704 • 714-***-**** • adk7h2@r.postjobfree.com
Education
UNIVERSITY OF CALIFORNIA, BERKELEY May 2020
B.A. Data Science
Work Experience
CLINICAL DATA ANALYST — PRINCIPIA, A SANOFI COMPANY June 2020 — Present
• Acting chief engineer of data pipelines, algorithms, and visualizations for Data Management and Clinical Research Development at Sanofi’s South San Francisco branch.
• Managed clinical trials by developing a resource allocation model which determines number of full-time employees that should be assigned to concurrent and new studies, while predicting durations of potential future studies.
• Led development on programming code that automatically generates Patient Profiles, generalizing code infrastructure to run efficiently across any study.
DATA SCIENTIST — CORNERSTONE AI March 2021 — Present
• Act as a consultant to identify, triage, and solve data science problems regarding ETL and app- deployment, utilizing custom JavaScript and complex SQL queries.
• Developing cross-functional, multi-page apps with interactive data graphs via Dash & Plotly through Python, SQL, HTML, CSS, JavaScript.
BIO-STATISTICS INTERN — PRINCIPIA BIOPHARMA June 2019 — August 2019
• Created a pipeline to automate and solve data-management problems regarding new patient data coming in every week for the company’s three largest studies.
• Pioneered a standardized, automated system using Python and R to generate individual Patient Profiles for Pemphigus patients treated with Rilzabrutinib.
• Successfully pitched the data-visualizing platform Tableau to Principia by doing a Country- analysis of two major studies — finding significant correlations in both and putting Tableau into the budget at a larger scale.
Project Work
ANALYSIS OF PRESIDENT TRUMP’S TWITTER TWEETS
• Manipulated Twitter API Data on President Trump to draw self-directed conclusions based on an NLP sentiment analysis of controversial tweets.
YELP RATING PREDICTIONS with DEEP NEURAL NETWORKS
• Implemented multiple Deep Neural Network Models (RNN, LSTM, BERT) to attain optimal efficacy in predicting Yelp Ratings (>83% efficacy).
NEW YORK TAXI-RIDE REGRESSION MODEL AND EDA
• Built regression model using a processing pipeline with Haversine distances and other features to predict duration of taxi rides in New York with a mean absolute error of under 300 seconds.
• Utilized SQL for data querying and cleaning, Seaborn for complex data visualization, and Sci-Kit Learn to complete the regression model with feature engineering, cross-validation, and Tikhonov regularization.
Relevant Skills
Languages: Python, SQL, R, Java, JavaScript, Spark, Scheme Libraries: Tensorflow, Pytorch, Pandas, Scipy, Numpy, Matlab, Sci-kit Learn, Seaborn, ggplot2 Skills: Tableau, Advanced Jupyter Notebooks, Interactive Data Visualization, Data Management, Machine Learning, Neural Networks, NLP, Algorithms, Databases, Sampling, Statistical Analyses