Data Analyst Machine Learning

Location:

San Diego, CA

Salary:

80000

Posted:

November 14, 2023

Contact this candidate

Resume:

Huanyu (Doris) Jiang

Email: ***************@*****.*** Phone: 858-***-**** Location: California (open to relocate) EDUCATION

Columbia University - M.S. Data Science - GPA: 3.9 Sep 2021 – Dec 2022 UC San Diego - B.S. Applied Mathematics - Major GPA: 3.8 Sep 2017 – Mar 2021 SKILLS

Languages: Python [Pandas, Sklearn, NumPy, Pytorch] • R • SQL • Java/JS Tools: AWS [SageMaker, S3, RDS, Athena, Glue, Lambda] • Azure • Tableau/PowerBI • Excel Knowledge Areas: ETL • Machine Learning • Database • Data Visualization • A/B Testing WORK EXPERIENCE

Antra, INC San Diego, CA

Data Analyst Tech Stack: Azure, PowerBI, Angular, JavaScript Jan 2023 – Present (6+ months)

● Create database with Azure SQL Database, designed schema and maintain tables to support 20+ components usage

● Implement a Single Page Application using Angular, for a wide range of clients with 100% on-time delivery while staying 5% under budget

● Built and deployed data models into production, raised 60% user retention rate

● Visualize data analysis result and explain business logic to stakeholders with PowerBI

● Use Git for version control and Jira for collaboration with the team, work in a Agile environment Amazon.com Service LLC Seattle, US

Data Scientist Intern Tech Stack: Python, AWS, Tableau, NLP May 2022 – Aug 2022 (3 months)

● Presented high quality document clustering results to stakeholders on 110,000+ employee survey responses by performing topic modeling with LDA and BERTopic (PyTorch & Sklearn), achieved DBCV score of 0.4

● Using AWS Athena, S3, SageMaker, designed and deployed an end-to-end multi-model training pipeline for multi-label classification task to production service

● Reduced 95% manual labeling workload by optimized training pipeline, fine tuned sentence transformer model and two clustering models (HDBSCAN and KMeans), achieved over 90% precision on pivotal categories

● Visualized results on Tableau, collaborated with business stakeholders, provided topic descriptions on employee’s major pain points during onboarding process, maintain data dashboard for progressive model improvements Siemens Ltd China Beijing, China

Data Analyst Tech Stack: Python, Hadoop, MySQL, Spark, Tableau Mar 2021 – Aug 2021 (6 months)

● Designed ETL pipeline on large scale data with Hadoop and engineered machine learning features with Python

● Performed complex SQL queries and explored data with MySQL, explained data logic to stakeholders

● With Spark MLlib library, constructed training pipeline for fraud detection on XGBoost model, reached 85% recall, led to 70% reduce in misclassification rate, enhanced customer trust and overall system performance

● Created interactive visualizations with Tableau to present classification results and communicate business strategies to stakeholders on loss prevention

HK Applied Science & Technology Research Institute Hong Kong AWS Data Engineer Intern Tech Stack: Python, AWS Jun 2019 – Aug 2019 (3 months)

● Collaborated with cross-functional teams to understand product requirements and translate them into data engineering specifications

● Designed and implemented an end-to-end data pipeline using AWS Glue and Lambda

● Leveraged AWS S3 to optimize data storage/retrieval, ensuring efficient access for predictive analysis algorithms

Contact this candidate