XIAOYANG FEI
+1-540-***-**** ************@*****.*** Boston, MA, USA linkedin.com/in/shawnfei/ PROFESSIONAL EXPERIENCE
Doggo Onboard Adventures LLC Boston, MA, USA
Data Engineer September 2023 - Present
• Led the design and implementation of a modular, event-driven data architecture with Airflow, AWS Lambda, and S3, automating travel data processing and powering 3 analytics products used in executive planning.
• Architected and governed a scalable data modeling layer using DBT and Redshift, turning fragmented data into reliable tables that powered self-serve dashboards across teams and sped up decision-making.
• Established a data quality assurance strategy using Great Expectations, integrated into GitHub Actions CI/CD pipelines and monitored through AWS CloudWatch, ensuring 100% reliability across production datasets and reducing time-to-detection of pipeline issues.
• Acted as a bridge between engineering and business, partnering with analytics and revenue teams to define key data domains, enabling segmentation strategies, demand forecasting, and optimization of underperforming cruise routes.
• Spearheaded the internal push for data observability and incident readiness, reducing pipeline downtime by 45% and enhancing stakeholder trust through transparency and accountability metrics. Doggo Onboard Adventures LLC Boston, MA, USA
Data Engineer Intern July 2023 - September 2023
• Wrote and optimized MySQL queries to clean and transform raw cruise and passenger data for reporting and analysis in Amazon Redshift.
• Consolidated three disparate data sources into a unified data lake using Python scripting, streamlining data access for the Business Intelligence team and reducing data silos across the organization.
• Validated the accuracy of DBT-generated analytics tables by developing a suite of 20+ automated SQL-based tests, reducing data discrepancies in customer segmentation by 15%.
• Configured 15+ Airflow DAGs and AWS Lambda functions for daily and weekly data processing, ensuring 99.99% data availability, and enabling real-time decision-making for stakeholders across three departments. Yiming Technology Beijing, China
Machine Learning Engineer Intern May 2023 - July 2023
• Developed a predictive model for the company’s coffee roaster, optimizing roasting parameters using real-time IoT sensor data to control temperature, wind speed, and other factors for optimal flavor, consistency, and quality.
• Increased coffee roasting consistency by 34% and cupping scores (evaluated by professional coffee tasters) by 26.7% compared to human operation by deploying Temporal Fusion Transformer models for real-time quality control.
• Deployed models on AWS (S3, EC2, RDS, SageMaker, Lambda) with CI/CD pipelines, enabling seamless retraining, efficient resource- management, and scalable deployment for continuous optimization and performance improvement.
• Forged collaborative relationships with engineers and product managers, crafting API designs and comprehensive documentation that- supported 10+ successful integrations within six months. Virginia Tech Transportation Institute Blacksburg, VA, USA Data Scientist January 2023 - May 2023
• Constructed an AI-driven driver emotion recognition system, decreasing false positives by 22% using advanced analysis correlating crash reports with driver affect and surroundings in real-time.
• Integrated CNN-LSTM architecture within a real-time emotion detection model, processing 30 frames per second with a latency of less than 100ms, enabling immediate feedback and intervention strategies for distracted drivers.
• Delivered data-backed presentation to Tesla using PowerBI, showcasing a 15% improvement in driver reaction time due to the model and leading to exploration of model integration in driver safety protocols. EDUCATION
Northeastern University September 2023 - May 2025
Master's, Data Science
Virginia Tech August 2019 - May 2023
Bachelor's, Data Science
CERTIFICATIONS
AWS Certified Data Engineer - Associate Google Cloud Professional Data Engineer SnowPro® Advanced: Data Engineer IBM Data Engineering Professional Certificate Databricks Certified Data Engineer Associate Terraform Associate MySQL 8.0 Database Admin Professional MySQL 8.0 Database Developer Oracle Certified Professional SKILLS
Languages & Frameworks: Python (Pandas, NumPy, Scikit-learn, TensorFlow), SQL (MySQL, PostgreSQL, Redshift), R, Node.js, Bash Data Engineering & Infrastructure: Airflow (ETL orchestration), DBT (data modeling & testing), Great Expectations, Redis, Pinecone, Data Lakes, Data Warehousing (Redshift, Snowflake), MongoDB, PostgreSQL Machine Learning & Modeling: Supervised & Unsupervised Learning, Clustering, Regression, Classification, A/B Testing, Time Series Forecasting, Real-time Inference, Sensor Fusion, Deep Learning (CNNs, LSTMs, Transformers) Cloud Platforms & MLOps: AWS (S3, EC2, RDS, SageMaker, Lambda, CloudWatch), Docker, API Deployment, GCP, Azure Visualization & Tools: PowerBI, Jupyter, Git, Postman, JIRA