XIAOYANG FEI
+1-540-***-**** ************@*****.*** Boston, MA, USA linkedin.com/in/shawnfei/ PROFESSIONAL EXPERIENCE
Doggo Onboard Adventures LLC Boston, MA, USA
Data Engineer July 2023 - Present
• Architected event-driven platform with Airflow, Lambda, and S3, ingesting 100K+ records/day across 5 cruise partners with
<5 min latency.
• Spearheaded real-time DAGs achieving 99.9% SLA compliance; introduced CloudWatch alerts enabling real-time anomaly resolution.
• Modeled analytical warehouse schemas in dbt + Redshift, improving dashboard query performance by 40%.
• Unified 8+ pipelines into a single lakehouse, saving 25+ hours/month of manual ETL work across teams.
• Championed CI/CD using GitHub Actions and Great Expectations, boosting test coverage on 60+ tables and reducing rollback incidents.
• Created 10+ PowerBI dashboards with KPIs from cross-functional input, reducing stakeholder decision cycle time by 20%.
• Standardized external partner APIs and onboarding playbooks, cutting support tickets by 25% and improving external developer experience.
• Facilitated bi-weekly sprint planning and retros with op teams; mentored 2 interns and led onboarding sessions. Doggo Onboard Adventures LLC Boston, MA, USA
Data Engineer Intern May 2023 - July 2023
• Developed robust Python/SQL ETL scripts to process 50K+ daily cruise/passenger records into Redshift, reviewed by senior engineers.
• Collaborated with product managers and finance stakeholders to integrate 4+ systems into a unified reporting layer.
• Implemented 20+ custom dbt tests to validate transformations; resolved anomalies before affecting downstream dashboards.
• Automated ingestion workflows with Airflow and Lambda, saving the data team 90% of routine tasks.
• Partnered with DevOps to containerize data services using Docker, improving reproducibility and reducing deployment friction by 40%.
Virginia Tech Transportation Institute Blacksburg, VA, USA Data Analyst January 2023 - May 2023
• Designed multimodal data pipeline (video, biometrics, GPS) using Python, OpenCV, and GCP Storage with <150ms latency.
• Devised NLP pipeline using GCP Speech-to-Text, and BERT to structure 300+ hours of dashcam footage—cut annotation effort 40%.
• Maintained 800GB+ of data in GCP; enforced IAM controls, lifecycle rules, and executed 50+ unit tests with Pytest & Great Expectations.
• Presented findings to Tesla's ADAS team, contributing to a 10% simulator response time improvement. EDUCATION
Northeastern University September 2023 - May 2025
Master's, Data Science
Virginia Tech August 2019 - May 2023
Bachelor's, Data Science
CERTIFICATIONS
IBM Data Engineering
Professional Certificate
AWS Certified Data Engineer -
Associate
MySQL 8.0 Database Admin
Professional
MySQL 8.0 Database
Developer Oracle Certified
Professional
SKILLS
Programming & Scripting: Python, SQL (MySQL, PostgreSQL, Redshift), R, Bash, Node.js, Scala Data Engineering & Infrastructure: Airflow, dbt, Redshift, Apache Spark, Apache Kafka, Apache Hive, Great Expectations, Snowflake, MongoDB, Data Lake, PySpark, Hadoop, Databricks, Docker, Redis Cloud Platforms & Data Warehousing: Amazon Web Services (S3, EC2, RDS, Lambda, CloudWatch, Redshift, SageMaker),
(Speech-to-Text, Cloud Storage, BigQuery, IAM, Monitoring, Pub/Sub), Azure (Data Factory, Synapse Analytics, Blob Storage, Monitor, Key Vault)
ETL & Pipelines: JIRA, CI/CD for ETL, Real-time & Batch Data Processing, Data Quality Testing Visualization & BI: Power BI, Tableau, Jupyter Notebooks, Jira, Postman