Post Job Free
Sign in

Data Engineer Scientist

Location:
Hona, South Kivu, Democratic Republic of the Congo
Posted:
September 15, 2025

Contact this candidate

Resume:

Xiangyue Wang in Highland Park, NJ, United States • ********.******@*******.*** • 520-***-****

Technical Skills

Business, Salesforce

Education

Johns Hopkins University Aug 2023 - Jul 2024

MASTER'S in Business analytics • GPA: 3.40

University of Arizona Aug 2019 - May 2023

BACHELOR'S in Economics • GPA: 3.90

Work Experience

Zemantics Princeton, NJ Nov 2024 - Present

AI Data Engineer/Data Scientist

• Designed and deployed a Retrieval-Augmented Generation AI assistant by integrating LangChain with internal HR documents, enabling accurate answers to policy queries and providing efficient user support and training

• Built and automated ELT pipelines using Databricks, standardizing weekly updates from employee and sales datasets to ensure clean, structured inputs for business analytics and improving system efficiency

• Developed and deployed FastAPI endpoints to serve AI responses, enabling seamless integration with internal platforms and supporting real-time data access for business stakeholders

• Implemented intelligent query routing logic that classifies incoming questions, improving answer precision through analytical problem solving and enhancing system performance

• Collaborated with DevOps and front-end teams to deploy solutions, monitored system integrations, and troubleshooted issues while managing multiple priorities in a dynamic environment WorkMagic Remote Sep 2024 - Oct 2024

AI Model Fine-Tuning Intern

• Leveraged OpenAI GPT API with Python on GCP to conduct large-scale data labeling, applying analytical problem solving to compare datasets and inform model adjustments for improved efficiency

• Fine-tuned an in-house LLM by systematically experimenting with prompt engineering, demonstrating ability to manage multiple priorities in a dynamic environment while optimizing system performance

• Designed and executed iterative training workflows, logging performance metrics to evaluate and provide insights for business requirements and system improvements

• Collaborated closely with stakeholders and engineering team to improve system capabilities, utilizing strong communication skills to troubleshoot issues and ensure successful deployment Microsoft Remote Dec 2022 - Feb 2023

Part-time Assistant

• Gathered and cleaned sales data of the newly acquired company data in SQL, conducted EDA to define project scope, performed analytical problem solving, created data visualizations, built and tested XGBoost & K-means model in Python to provide prediction of identify popular products, with over 80% AUC-ROC

• Collaborated with stakeholders to gather business requirements, conducted exploratory data analysis to understand patterns and trends in product popularity, main sales regions and valuable user information over time using Python data visualization tools

• Performed SQL queries to establish data collection, storage, and processing infrastructure, improving system efficiency by analyzing popular products, main sales regions, and high-quality user information and transformed report data into actionable insights

• Collaborated closely with cross-functional product and engineering teams to derive valuable business insights, provided training and communication support to ensure adoption, and developed focused marketing strategies through sales data analysis

Institute of Computing Technology Remote Jul 2022 - Sep 2022 Data Analyst Intern

• Led two interns and managed the team to conduct analytical problem solving through statistical analysis, demonstrating excellent communication skills and ability to provide training to team members while efficiently managing multiple priorities

• Created 10+ data visualizations using R to effectively communicate business insights, collaborating with stakeholders to gather requirements and provide recommendations for improved efficiency in data representation

• Developed and tested a data pipeline, creating fully automated and interactive Tableau dashboards to monitor system performance, troubleshoot issues, and enable effective business KPI metrics tracking for stakeholders Projects

Customer Churn Prediction

• Conducted end-to-end churn prediction analysis, collaborating with business stakeholders to gather requirements and provide analytical problem-solving solutions

• Processed data and visualized behavior patterns between churned and active customers, providing insights and recommendations to improve business efficiency

• Employed Random Forest, Boosting, and SVM models in Salesforce, fine-tuning parameters through random and grid search, achieving a final model with 95% accuracy and an 84.2% recall rate

• Communicated complex analytical findings to multiple stakeholders and provided training on implementing data-driven solutions

Electricity Usage Forecasting Project

• Led a team to forecast electricity usage by splitting the dataset to reserve the last year for validation, demonstrating strong analytical problem solving skills and effective communication with business stakeholders

• Utilized Python to apply differencing and Box-Cox transformation, stabilizing the mean and achieving homoscedasticity, resulting in improved efficiency and system performance

• Analyzed ACF and PACF plots to determine the most suitable forecasting model, providing insights and recommendations that improved business decision-making processes

• Compared multiple models using AIC, BIC, and Lack of Fit Test, selecting the optimal model and achieving a MAPE of 1.21% when comparing the forecast to actual data for the last year, showcasing ability to manage multiple priorities in a dynamic environment



Contact this candidate