Post Job Free
Sign in

Senior Data Engineer - 12+ Years in Data Platforms and ML Infra

Location:
Lawrenceville, GA
Posted:
March 12, 2026

Contact this candidate

Resume:

JEAN BAPTISTE MBAYA

Phone: +1-770-***-**** - Email: *********@*****.*** - LinkedIn: https://www.linkedin.com/in/jean-baptiste-mbaya-2833a963

Summary

Experienced Senior Data Engineer with over 12 years of expertise in designing, building, and managing scalable, enterprise-grade data platforms. Proven ability to lead the end-to-end delivery of robust data pipelines, data warehousing solutions, and machine learning infrastructure. Combines deep technical proficiency in modern data stacks with a strong focus on performance optimization, cost management, and driving business value through reliable data systems.

Project

Led an end-to-end machine learning project to predict and prevent drone failures. Responsibilities included data pipeline creation, feature engineering, model training, and developing a production-ready anomaly detection system.

Project Links

GitHub: https://github.com/MrAshTag/Drone_Faults_Anomaly_Dectection

Presentation: https://www.vimeo.com/602099549

Technical Proficiencies

Data Engineering & Languages: Python (PySpark, Pandas), SQL (Advanced), dbt, R, Scala, DAX, C#, Javascript, Java

Data Platforms & Warehousing: Snowflake, Databricks, AWS (Redshift, S3, Athena), Azure Cloud, Apache Spark

Data Pipeline & ETL/ELT: Snowpipe, SnowSQL, Databricks Workflows, SSIS, Alteryx, Talend, Custom Python ETL modules

Data Modeling & Architecture: Data Vault 2.0, Dimensional Modeling, ERwin, ER/Studio

BI & Visualization: Power BI, Tableau (Server & Desktop)

AI/ML Engineering: OpenAI GPT frameworks, LLM Fine-tuning, NLP, Scikit-learn, MLflow, ChatGPT, LLama

Methodologies & Tools: Agile (Scrum), Data vault 2.0, Git, JIRA, ServiceNow, SharePoint

WORK EXPERIENCE

Ecatalog Pricing Data Engineer ThermoFisher Scientific 7/2024 – Present

Architected and engineered scalable ETL pipelines using Python and SQL to consolidate and validate pricing data from disparate sources into the enterprise e-Catalog, improving data accuracy by ~25%.

Automated manual data validation and quality checks by implementing Python (Pandas, NumPy) scripts and machine learning models, reducing average processing time by 40%.

Designed and deployed predictive models (scikit-learn) within the data pipeline to forecast price elasticity and recommend optimal discounting strategies.

Optimized underlying SQL database structures and queries, significantly enhancing performance and reducing report generation latency.

Established automated monitoring and alerting systems for data quality and pricing outliers using Python.

Designed and deployed interactive Tableau and Power BI dashboards

Tools: Python, SQL, Snowflake, Databrick, Power BI, JIRA, ServiceNow,Talend, dbt, Teams, Azure Cloud, Data Factory, Exce, Access

Senior Data Engineer AT&T Feb 2021 – Jun 2024

Architected and engineered a scalable data pipeline using Databricks, PySpark, and Python to automate the ingestion and transformation of multi-source data from ServiceNow and Jira, improving data availability for analytics by 40%.

Led the design and implementation of a Snowflake data warehouse, consolidating disparate datasets and implementing automated data ingestion via SnowSQL and Snowpipe, which improved data accessibility and reduced storage costs by 15%.

Fine-tuned and deployed large language models (LLMs) using OpenAI's frameworks to enhance NLP capabilities for customer interaction platforms and internal knowledge management systems.

Pioneered cost and performance optimization initiatives by identifying underutilized resources and legacy systems; architected a server decommissioning plan resulting in total savings of over $2 million.

Developed reusable Python modules in Jupyter notebooks to automate recurring data processes, slashing manual reporting time by 90%.

Engineered automated financial reports in Tableau, including Waterfall charts to visualize profit and loss drivers and 12-month rolling reports to track revenue trends and forecast performance against budget.

Collaborated with data scientists to operationalize machine learning models, including a drone fault detection system, by building robust data pipelines for model training and inference.

Designed and deployed interactive Tableau and Power BI dashboards for diverse stakeholders (executives, operations, and marketing)

Tools: Python, PySpark, SQL, Snowflake, Databricks, AWS, Azure, Power BI, JIRA, ServiceNow, OpenAI API, Talend, dbt, AWS S3, AWS Glue, Sharepoint, Teams, Azure Cloud, Data Factory, Excel

Data Engineer Millennial R&D Feb 2019 – Dec 2019

Designed and implemented logical and physical data models using ERwin to support new application features and reporting requirements.

Engineered a data integration solution by connecting a web application to a third-party API using Node.js, enabling real-time data consumption from Google Cloud data structures.

Developed and deployed interactive Power BI and Tableau dashboards to track key performance indicators (KPIs), providing actionable insights to stakeholders.

Collected data with SQL.

Performed data mining, searching for patterns and clues.

Imported and merged data from different sources and different formats into Tableau.

Analyzed big data from internal and external users with Tableau to provide actionable insight.

Performed data visualization and built interactive dashboards in Tableau.

Tools: ERwin, SQL, DAX, Power BI, Tableau, Alteryx, Node.js, AWS S3, Google Cloud Platform, Trello, Box, Excel, Pivot Tables, Vlookup

Data Analyst / Engineer The Coca-Cola Company - Freestyle Aug 2018 – Jan 2019

Designed and deployed interactive Tableau dashboards that informed strategic decision-making.

Built and optimized data pipelines using SQL and Python to merge disparate data sources from AWS Athena and Redshift, creating unified datasets for analysis.

Engineered Tableau dashboards to monitor machine performance and part failure rates, enabling predictive maintenance strategies for Freestyle fountain machines.

Analyzed A/B test results and customer segmentation data to provide data-driven recommendations that increased marketing campaign ROI by 15%.

Used Alteryx to merge web data from different sources; cleaned it up of tags, colons, semicolons, dash, random words and organized it into one central organized table.

Used DAX functions in Power BI to compute data in calculated columns.

Tools: Postgres, SQL, Python, AWS (Athena, Redshift, S3), Tableau, Alteryx, Power BI, Excel, Dbeaver

Data Analyst / Engineer Millennial R&D Jan 2015 – Nov 2017

Developed and maintained ETL processes using SSIS and C# scripts to cleanse and load data from structured and unstructured sources into SQL Server databases.

Utilized Alteryx for advanced data blending and preparation, creating centralized data tables for enterprise reporting.

Authored Python scripts with PySpark for large-scale data processing and analysis, supporting data science initiatives.

Modeled and documented database architectures using Erwin and implemented security and refresh schedules on Tableau Server.

Tools: MS SQL Server, SSIS, T-SQL, C#, Python, Alteryx, Tableau, Power BI, Erwin, AWS S3

Key competencies

oComplex and advanced SQL queries

oData Mining,

oModel Training

oAssociation Rule

oSupervised & unsupervised Machine Learning

oRanked 2nd at the International Collegiate Programming Contest Kennesaw University, Nov 11th,17.

oTableau, SAS (Big data visualization, interactive dashboards, and story)

oPower pivot (Pivot tables, Pivot charts, DAX)

oApache Spark, JavaScript, Android App Dev, Java, python, C#. SQL, IBM Watson Analytics.

oMS Virtual Studio, Android SDK, BleuJ, Drjava, MySQL Workbench, MS Access, MS Excel.

oRegressions, forecasting methods, variances, and level of significance in MS Excel and Tableau.

oExcel for Operations Management application in Finance, Marketing, Capital Budgeting and Production

oDemonstrable skills in financial processes and strong time management

oTaking telephone orders and queries, handling customer complaints

oSuperb organizational skills and time management skills to perform multiple tasks within limited time frames.

oExcellent analytical and problem-solving skills,

oRecognizes core business issues and their impact on reaching strategic objectives.

oOpen AI

oChatGPT

oLLM

Education & credentials

Masters in Data Science: Fall 2022. Illinois Institute of Technology

Bachelors in management information system & minor in IT Georgia Gwinnett College

Certificate in Data Science WOZ U, a Certification program founded By Steve Wozniak of Apple

Other Trainings

o ETL and SSIS /Lynda.com

o Networking Foundations: Servers

o Advanced SQL for Data Scientists /Lynda.com

o Apache Cordova /Lynda.com

o Tableau 10 Essential Training (Big data)

o Ethical Hacking: System Hacking visualization, Dashboard and Story) /Lynda.com

o CSS Essential Training /Lynda.com

o Power Pivot 2013 & SharePoint 2013

o Apache Spark/Lynda.com

o Android App Development /Lynda.com

o Microsoft SQL Server 2016 /Lynda.com



Contact this candidate