Post Job Free
Sign in

Data Science Analytics

Location:
Philadelphia, PA
Posted:
May 10, 2024

Contact this candidate

Resume:

Jia Xu

+1-267-***-**** ad5mlx@r.postjobfree.com GitHub: github.com/ChiaPatricia Website: luckyjiaxu.com EDUCATION

University of Pennsylvania Philadelphia, PA

M.S. in Social Policy and Data Analytics GPA: 3.86/4.0 (2021-2022 Scholarship) Aug. 2021 - May 2023 Nanjing University of Finance and Economics Nanjing, China B.S. in Mathematics and Applied Mathematics GPA: 3.76/4.0 Valedictorian Summa Cum Laude Sep. 2017 – Jun. 2021 TECHNICAL SKILLS

Programming Languages: Python (Pandas, PyTorch, TensorFlow), R, SQL, MATLAB, LaTeX Data Science Tools: Git, Spark, SQLite, AutoML, Linux, ITK-SNAP, ArcGIS, Power BI, Tableau Cloud & Platforms: OpenAI API, Google Cloud Platform (GCP), AWS EC2, Kafka, ESRI Certifications: IBM Professional Certificate in Data Science Group: Queer in AI, Society of Women Engineers, Girls Who Code PROFESSIONAL EXPERIENCE

Research Scientist University of Pennsylvania Health System, DBEI - Philadelphia, PA Dec. 2021 – May. 2024

• Led 5 interdisciplinary research projects to develop AI solutions for disease detection, healthcare LLMs, and medical database analysis, leading to 5 first-author publications in top conferences & Journals (NeurIPS, EMNLP, AMIA & AAIC).

• Developed 8 machine learning and deep learning pipelines to optimize model performance, streamlining large-scale data processing, feature selection, hyperparameter tuning, and model evaluation.

• Collaborated with 20+ healthcare stakeholders and experts to build an NLP-based web portal for drug term extraction. Research Assistant Political Science Department, University of Pennsylvania - Philadelphia, PA Feb. 2023 – July. 2023

• Assessed media censorship trends in Venezuela from 2006 to 2009 through textual data mining techniques and similarity analyses based on TF-IDF between and within newscasts and newspaper corpora.

• Deployed a GCP environment to establish connections with database buckets, addressing and resolving issues related to operating system inconsistencies and batch file processing using the Cloud SDK.

• Employed and fine-tuned configurations for the Speech-to-Text API and the Whisper Model on GCP to transcribe massive audio data, increasing transcription efficiency by 15%. Technical Consultant Wharton AI & Analytics for Business - Philadelphia, United States Sep. 2022 – Apr. 2023

• Leveraged 3 regression and time series forecasting models with Python and MySQL, predicting show success and future ratings for FOX Entertainment programs.

• Pioneered the SKU Reverse Recommendation Engine for Master Kong Company by processing enterprise-level data using Spark, implementing 6 models for sales prediction, and assessing the price elasticity of original product combinations.

• Created 10+ detailed reports and presentation materials that included code deliveries and technical documentation. Data Analyst Intern KPMG - Shanghai, China Apr. 2021 – Jun. 2021

• Streamlined 4 data pipelines in Python for audit data analysis, reducing human errors by 30% and saving 16 hours/week.

• Automated weekly dashboards with SQL Server, Python, and PowerBI to analyze and track 20+ key financial KPIs. SELECTED RESEARCH PROJECTS

MentalGPT: Harnessing AI for Compassionate Mental Health Support (AAIC accepted, AMIA & EMNLP under review)

• Finetuned LLaMA2, Mistral, and other SOTA 7B models with QLoRA method on GPT-generated data and real-life interview data to provide mental health advice, outperforming all base models and GPT4 on 7 evaluation metrics 30%+.

• Developed a benchmark dataset tailored for mental health applications, featuring simulations and scenarios from professional counseling examinations, facilitating performance evaluation of models like Fastchat, Gemini and GPT-4. MoodShaker: Blend your mood – Custom Cocktails at a Click! (2nd Place, Wharton Hack-AI-Thon, Technical Track)

• Developed MoodShaker, an AI platform transforming cocktail discovery with personalized recipes tailored to users' moods and tastes, using OpenAI's GPT-4, Gradio, and Soundraw, hosted on Hugging Face Spaces for a novel experience.

• Delivered 600+ customized cocktail recommendations for users within the first week of launch. Trustworthy AI in Healthcare (NeurIPS 2023 Main Track)

• Innovated a framework to reduce correlation disparities linked to protected attributes in CCA, enhancing fairness in healthcare decision-making and mitigating discriminatory outcomes.

• Maintained global projection matrices in CCA, aligning correlation levels with group-specific matrices, effectively mitigating unfairness without sacrificing accuracy as validated on synthetic and real datasets (NHANES, ADNI, MHAAPS). NHANES Database Analysis

• Screened 168 valid exposomes and phenotypes from 1,192 variables and handled missing data with the MICE algorithm.

• Identified key factors affecting phenotypic age using 10 machine learning regression models with cross-validation.

• Conducted mediation analysis to uncover causal links between environmental exposures and the gap between phenotypic and chronological age, incorporating biological variables and visualizing results using D3 libraries.

• Applied Fair CCA to analyze the interplay of phenotypic, environmental, and socio-economic factors on health outcomes. PUBLICATIONS

5 first author (NeurIPS, AAIC, AMIA & Science); 2 co-author; 3 conference pre; 3 first author under review (Nature, EMNLP, AMIA) SELECTED AWARDS

2nd Place, Wharton Hack-AI-Thon(Technical Track, Leader, MoodShaker – Cocktail AI Platform) Apr. 2024 1st Place, Massive Data Institute Green Space Data Challenge (Leader, Community Health & Student Prize) Mar. 2023 Finalist, 2022 Wharton People Analytics Global Student Case Competition (Great Resignation) May 2022 1st Place Prize, Wharton Customer Analytics and Fulton Bank Datathon 2022 (Customer Default Prediction) Apr. 2022 National Second Prize, China Mathematical Contest in Modeling (Team Leader, top 3% of 41,826 teams) Nov. 2020 Finalist, Interdisciplinary Contest in Modeling (Team Leader, top 1% of 20,954 teams) Apr. 2020



Contact this candidate