Post Job Free
Sign in

Data Engineer

Location:
Trenton, NJ
Posted:
March 05, 2025

Contact this candidate

Resume:

Xiandong Peng Email: ********.*@*****.***

LinkedIn: linkedin.com/in/xiandong-xp/ Mobile: 484-***-**** Experience

KMK Consulting Inc. Morristown,NJ

Sr. Data Engineer July 2021 – current

Architected and deployed a HIPAA-compliant data lake on AWS S3, integrating 15+ data sources such as Claims, Veeva and IQVIA, using Python and AWS Glue ETL jobs, reducing 91% raw data ingestion latency from 8hr to 45min

Built a DBT-driven transformation layer on Snowflake to model patient outcomes and adverse events from 10+ sources such as Veeva and EHRs, reducing report latency by 90%

Designed Airflow DAGs to automate end-to-end workflows, including AWS Lambda-triggered ingestion, DBT transformations, and Snowflake materialized view refreshes.

Leveraged AWS Lambda to automate data masking and zero-copy cloning in Snowflake, enabling secure, real-time executive dashboards with 99.9% SLA compliance for 50+ C-suite stakeholders.

Integrated AWS Lambda with SQL databases, AWS Redshift, and S3 to automate real-time data ingestion, transformation, and visualization for a client’s novel mental disorder drug

Led legacy system modernization for pharma clients, replacing SAS and Excel with cloud-native ETL pipelines with Python and AWS Glue, reducing report generation time by 80% and manual errors by 98%

Replaced static Excel reports with dynamic Tableau dashboards featuring drill-down capabilities, real-time updates, and stakeholder-specific KPIs, cutting ad-hoc data requests by 75%

Designed and implemented CI/CD pipelines using AWS CDK and GitHub Actions to automate testing, validation, and deployment of ETL workflows for pharmaceutical datasets (IQVIA, Veeva CRM, clinical trial data), reducing deployment cycles by 43% and ensuring error-free releases for mission-critical analytics Data Engineer June 2018 – July 2021

Delivered 98% accurate sales forecasts for pharmaceutical clients using exponential smoothing time series models and multivariate linear regression in R, enabling data-driven inventory planning and reducing supply chain costs by 15%

Built enterprise solutions emphasizing business metrics, data integrity, and availability, performed analytics, handled ad-hoc reporting requests, and engaged directly with leadership to recommend big data solutions using Python and SAS

Worked with multiple external/internal stakeholders, and implementation teams to develop, enhance and integrate analytics solutions and establish consultative and operational excellence for healthcare clients

Performed forecasting analysis for clients using Python Numpy and Pandas to implement exponential smoothing time series and regression model to enhance the overall accuracy by 33.3%

Bosch Rexroth Bethlehem, PA

Database Intern Jul 2017 - May 2018

Developed an application using C#, SQL, and Asp.NET providing enterprise solutions to manufacturing users

Contributed to a multi-product line project, employing VBA and SQL to save 40% process time with supply chain, inventory, and production data

Enforced optimized procedures to get information from data warehousing and shorten the production time required to configure each product from 10 min. to 30 sec

Projects

• Machine Learning Project: Santander Bank Product Recommendation

Imputed missing data and transformed raw data for training models

Employed Logistic Regression and Random Forest using Python (Scikit-learn) and Apache Spark to recommend products to users based on a massive high-dimensional dataset

Achieved an 83% accuracy score, ranking among the top 25% of 1.7k teams with the logistic regression model on the test database

Skills Summary

• Database Snowflake, MySQL, SQL Server, AWS Redshift, MongoDB, PostgreSQL, Hadoop

• Programming Java, Python(Spark, Pandas), Scala, Bash, SAS, SQL, C#, C++, R

• Cloud Services AWS(Redshift, Glue, Lambda, S3), GCP(BigQuery), Azure(Data Factory, Synapse Analytics)

• DevOp Docker,Linux, Airflow, Git, Kafka, DBT, Heroku

• BI Tools Tableau, PowerBI, AWS QuickSight, Excel

• Others Data Modeling, Scrum Master, Business Intelligence Education

Georgia Institute of Technology Atlanta, GA

Master of Science in Computer Science, in-major GPA: 3.6/4.0 Jan 2020 – May 2022 Courses: Software Architecture, Database, Software Development Process, Operations System, Distributed Systems, Artificial intelligence

Lehigh University Bethlehem, PA

Master of Eng. in Healthcare Systems Engineering, GPA: 3.55/4.0 Sep 2016 – May 2018 Courses: Data Mining, Machine Learning, Optimization, Stochastic, Information Technology



Contact this candidate