Data Science Visualization

Location:

Hoboken, NJ

Posted:

May 24, 2024

Contact this candidate

Resume:

Weijian Qin

ad5xz4@r.postjobfree.com 608-***-**** https://www.linkedin.com/in/weijian-qin/

EDUCATION

Weill Cornell Medicine, Cornell University Expected Aug 2024 M.S. in Health Informatics GPA: 3.7/4.0

● Courses: Research Methods and Data Visualization (Tableau), Introduction to Biostatistics (STATA), Natural Language Processing (Python), Artificial Intelligence (Python), Data Management (SQL) University of Wisconsin-Madison Dec 2022

B.S. in Consumer Behavior and Marketplace Studies GPA: 3.8/4.0 TECHNICAL SKILLS

● Programming: Python (Pandas, Scikit-Learn, Pytorch), SQL, R

● Data Tools: Tableau, Excel, Databricks, Pyspark, Power BI, Git, LaTeX, SAS, GCP, AWS, Snowflakes

● Data Science Methods: A/B Testing, Data Wrangling, Database Management, Data Visualization, Machine Learning PROFESSIONAL EXPERIENCE

Data Science Specialist Dec 2023 till now

Hospital Alma Mater/Weill Cornell Medicine New York, NY

● Gathered and cleaned clinical note data in SQL, merge 39.4K patient records from various specialties, and implement de-identification processes to protect patient privacy, saving hospital staff 5,500+ hours of manual file handling

● Conducted EDA to define project scope, deployed Large Language Models and Transformer-based models in Python to simplify clinical notes for patients with limited medical knowledge, enhancing communication efficiency

● Developed a validation prototype to assess the accuracy and informativeness of medical notes, rigorously reviewed and approved by Cornell faculty and medical professionals

● Collaborated closely with cross-functional product and engineering team to conduct and analyzed A/B testing experiments to evaluate operational efficiency improvement Research Assistant Dec 2023 till now

Weill Cornell Medicine New York, NY

● Collaborated closely with researchers from the Department of Population Health Sciences to conduct comprehensive literature reviews on mental health studies

● Utilized rule-based and LLM models (FLAN-T5) to extract demographic information from 1.17 GB of adolescent Reddit posts, identified mental health metrics such as social connectedness, social isolation using developed lexicons

● Conducted exploratory data analysis to create a word cloud to visually represent the most frequent words in the text data using Python data visualization tools such as Matplotlib and Seaborn Data Analyst Intern Apr - Jul 2023

Sina Weibo Beijing, CN

● Segmented Weibo text using Python's "Jieba algorithm," performed word frequency analysis, and constructed engineered features such as sentiment, brand, celebrities, and product categories as model variables

● Employed an ensemble of FastText, BERT, and LSTM models in Python to generate features, improving model accuracy by 15%. Applied K-means clustering to categorize the top 50 quarterly hot trends from the ensemble results

● Designed and implemented ETL solutions with over 100GB data. Maintain data infrastructure including building more than 10 data tables, reconstructed the databased structure, improving the query efficiency about 50%

● Constructed and tested data pipeline and fully automated & interactive Tableau dashboards to identify trends in customer preference and tailor ad-hoc media posting strategies for top luxury clients such as Dior and Louis Vuitton Business Analyst Intern Oct 2022 - Mar 2023

Beijing Zhongshang Huimin E-commerce Co. Ltd. Beijing, CN

● Performed SQL queries to build data collection, storage, and processing infrastructure. Analyzed business problems and converted report data to actionable items

● Benchmarked machine learning models such as Random Forest, CNN, and SVM in Python to select the best model for two sales prediction problems, improving sales forecasts and reducing overstock by 23.7% PROJECTS

Customer Churn Prediction Nov 2023 - Feb 2024

● Completed a comprehensive analysis for predicting user churn, including feature engineering and model optimization

● Processed data and visualized behavioral patterns between churned and active customers

● Implemented Random Forest, Boosting, and SVM models, fine-tuned parameters using random search and grid search, and optimized the best-performing model to achieve 95% accuracy and 84.2% recall rate

Contact this candidate