Xiandong Peng Email: ********.*@*****.***
LinkedIn: linkedin.com/in/xiandong-xp/ Mobile: 484-***-**** Experience
•
KMK Consulting Inc. Morristown,NJ
Sr. Data Engineer July 2021 – current
Architected and deployed a HIPAA-compliant data lake on AWS S3, integrating 15+ data sources such as Claims, Veeva and IQVIA, using Python and AWS Glue ETL jobs, reducing 91% raw data ingestion latency from 8hr to 45min
Built a DBT-driven transformation layer on Snowflake to model patient outcomes and adverse events from 10+ sources such as Veeva and EHRs, reducing report latency by 90%
Designed Airflow DAGs to automate end-to-end workflows, including AWS Lambda-triggered ingestion, DBT transformations, and Snowflake materialized view refreshes.
Leveraged AWS Lambda to automate data masking and zero-copy cloning in Snowflake, enabling secure, real-time executive dashboards with 99.9% SLA compliance for 50+ C-suite stakeholders.
Integrated AWS Lambda with SQL databases, AWS Redshift, and S3 to automate real-time data ingestion, transformation, and visualization for a client’s novel mental disorder drug
Led legacy system modernization for pharma clients, replacing SAS and Excel with cloud-native ETL pipelines with Python and AWS Glue, reducing report generation time by 80% and manual errors by 98%
Replaced static Excel reports with dynamic Tableau dashboards featuring drill-down capabilities, real-time updates, and stakeholder-specific KPIs, cutting ad-hoc data requests by 75%
Designed and implemented CI/CD pipelines using AWS CDK and GitHub Actions to automate testing, validation, and deployment of ETL workflows for pharmaceutical datasets (IQVIA, Veeva CRM, clinical trial data), reducing deployment cycles by 43% and ensuring error-free releases for mission-critical analytics Data Engineer June 2018 – July 2021
Delivered 98% accurate sales forecasts for pharmaceutical clients using exponential smoothing time series models and multivariate linear regression in R, enabling data-driven inventory planning and reducing supply chain costs by 15%
Built enterprise solutions emphasizing business metrics, data integrity, and availability, performed analytics, handled ad-hoc reporting requests, and engaged directly with leadership to recommend big data solutions using Python and SAS
Worked with multiple external/internal stakeholders, and implementation teams to develop, enhance and integrate analytics solutions and establish consultative and operational excellence for healthcare clients
Performed forecasting analysis for clients using Python Numpy and Pandas to implement exponential smoothing time series and regression model to enhance the overall accuracy by 33.3%
•
Bosch Rexroth Bethlehem, PA
Database Intern Jul 2017 - May 2018
Developed an application using C#, SQL, and Asp.NET providing enterprise solutions to manufacturing users
Contributed to a multi-product line project, employing VBA and SQL to save 40% process time with supply chain, inventory, and production data
Enforced optimized procedures to get information from data warehousing and shorten the production time required to configure each product from 10 min. to 30 sec
Projects
• Machine Learning Project: Santander Bank Product Recommendation
Imputed missing data and transformed raw data for training models
Employed Logistic Regression and Random Forest using Python (Scikit-learn) and Apache Spark to recommend products to users based on a massive high-dimensional dataset
Achieved an 83% accuracy score, ranking among the top 25% of 1.7k teams with the logistic regression model on the test database
Skills Summary
• Database Snowflake, MySQL, SQL Server, AWS Redshift, MongoDB, PostgreSQL, Hadoop
• Programming Java, Python(Spark, Pandas), Scala, Bash, SAS, SQL, C#, C++, R
• Cloud Services AWS(Redshift, Glue, Lambda, S3), GCP(BigQuery), Azure(Data Factory, Synapse Analytics)
• DevOp Docker,Linux, Airflow, Git, Kafka, DBT, Heroku
• BI Tools Tableau, PowerBI, AWS QuickSight, Excel
• Others Data Modeling, Scrum Master, Business Intelligence Education
•
Georgia Institute of Technology Atlanta, GA
Master of Science in Computer Science, in-major GPA: 3.6/4.0 Jan 2020 – May 2022 Courses: Software Architecture, Database, Software Development Process, Operations System, Distributed Systems, Artificial intelligence
•
Lehigh University Bethlehem, PA
Master of Eng. in Healthcare Systems Engineering, GPA: 3.55/4.0 Sep 2016 – May 2018 Courses: Data Mining, Machine Learning, Optimization, Stochastic, Information Technology