Data Engineering Leader - ML-Pipeline Architect - 10+ Years Experience

Location:

Naperville, IL

Salary:

160000

Posted:

November 15, 2025

Contact this candidate

Resume:

Kunal Singh

P: 812-***-**** LinkedIn Email: ***********@*****.***

Summary:

• Data Engineering leader with 10+ years delivering transformational solutions Deep expertise in Python, SQL,DBT,BigQuery Palantir Foundry, and Spark with proficiency in PowerBI for business intelligence. Specialized in ML pipeline architecture and translating complex requirements into scalable technical solutions that drive measurable business impact.

Education:

Indiana University Bloomington, Bloomington, IN May - 2024 Master of Science in Computer Science Specialization: Machine learning GPA: 3.5 Work Experience

Lead Data Engineer Tyson Foods, Springdale AR Sep 2024 – Present

• Engineered ELT pipelines using DBT to create fact and dimension tables, enabling Power BI dashboards and analytics that delivered $22M in annual cost savings.

• Architected enterprise supply chain planning application in Palantir Foundry delivering $100M cost savings through automated demand forecasting, inventory optimization, and procurement automation using Spark SQL, TypeScript Workshop applications, and dimensional data models

• Built OCR-powered logistics system processing 120K+ vendor invoices annually with 99% accuracy using AIP Logic and Agentic AI, generating $1M annual savings through automated validation

• Designed dimensional data models and star schemas enabling real-time and batch data integration with supply chain systems, logistics platforms, and enterprise applications using AWS Glue, Redshift, generating $1M quarterly operational savings.

• Implemented comprehensive data governance frameworks and quality validation pipelines using Python and AWS services (CloudWatch, IAM, Lake Formation), ensuring data lineage, policy compliance, and accessibility for investigative analytics and ML model training workflows.

Solution Architect & Data Manager Novaturient Tours and Travel Ltd, London UK Jun 2020 – Jul 2022

• Architected end-to-end data lake infrastructure using PySpark and cloud-native technologies, processing terabytes of structured/unstructured data and driving $22M operational savings through advanced analytics.

• Designed dimensional data models and built Spark-based ETL pipelines enabling real-time sync/async data flows with supply chain systems, improving data processing efficiency and reducing latency by 35%.

• Built cross-functional relationships with Product Managers, Data Scientists, analysts to gather requirements and translate business priorities into scalable technical solutions using Python and distributed computing frameworks. Co-founder & Data Engineer Planet Nutrifit - Diet & Fitness Studio, India. Apr 2010 – May 2020

• Architected scalable ML pipeline infrastructure using Python and Spark with model training/tuning capabilities, personalizing fitness plans and enhancing user engagement by 50% through real-time data processing.

• Designed web application architecture increasing traffic by 50% and search revenue by 40% through data-driven optimization, A/B testing, and Python-based analytics. Software Engineer Infosys Technologies Ltd, India Dec 2006 – Mar 2010

• Architected and optimized large-scale ETL processes for financial systems using distributed processing with Spark, achieving 30% acceleration in data availability.

• Led cross-functional teams implementing real-time analytics infrastructure using Python and SQL, reducing processing time by 30% for high-volume financial data analysis. Key Skills

Technical Skills:

• Programming Languages: Python, PySpark, SQL, R, TypeScript.

• Big Data & Processing: Apache Spark, PySpark, AWS EMR, distributed computing, parallel processing.

• Data Pipeline Architecture: Airflow, Apache Kafka, AWS Glue, AWS Lambda, stream /batch processing.

• Data Infrastructure: AWS (S3, Redshift, Glue, Lake Formation), BigQuery, Data Lakes, Data Warehouses, DBT

• Cloud Platforms: AWS (S3, EC2, Lambda, Redshift, EMR, Glue, CloudWatch, IAM, Lake Formation), Google Cloud

(Dataproc, BigQuery)

• Data Modelling: Dimensional modelling (star/snowflake schemas), data vault, schema design for real-time/batch integration.

• ML & Analytics: ML pipeline development, feature engineering, model deployment, scikit-learn, pandas, NumPy.

Contact this candidate