JEAN BAPTISTE MBAYA
Phone: +1-770-***-**** - Email: *********@*****.*** - LinkedIn: https://www.linkedin.com/in/jean-baptiste-mbaya-2833a963
Summary
Experienced Senior Data Engineer with over 12 years of expertise in designing, building, and managing scalable, enterprise-grade data platforms. Proven ability to lead the end-to-end delivery of robust data pipelines, data warehousing solutions, and machine learning infrastructure. Combines deep technical proficiency in modern data stacks with a strong focus on performance optimization, cost management, and driving business value through reliable data systems.
Project
Led an end-to-end machine learning project to predict and prevent drone failures. Responsibilities included data pipeline creation, feature engineering, model training, and developing a production-ready anomaly detection system.
Project Links
GitHub: https://github.com/MrAshTag/Drone_Faults_Anomaly_Dectection
Presentation: https://www.vimeo.com/602099549
Technical Proficiencies
Data Engineering & Languages: Python (PySpark, Pandas), SQL (Advanced), dbt, R, Scala, DAX, C#, Javascript, Java
Data Platforms & Warehousing: Snowflake, Databricks, AWS (Redshift, S3, Athena), Azure Cloud, Apache Spark
Data Pipeline & ETL/ELT: Snowpipe, SnowSQL, Databricks Workflows, SSIS, Alteryx, Talend, Custom Python ETL modules
Data Modeling & Architecture: Data Vault 2.0, Dimensional Modeling, ERwin, ER/Studio
BI & Visualization: Power BI, Tableau (Server & Desktop)
AI/ML Engineering: OpenAI GPT frameworks, LLM Fine-tuning, NLP, Scikit-learn, MLflow, ChatGPT, LLama
Methodologies & Tools: Agile (Scrum), Data vault 2.0, Git, JIRA, ServiceNow, SharePoint
WORK EXPERIENCE
Ecatalog Pricing Data Engineer ThermoFisher Scientific 7/2024 – Present
Architected and engineered scalable ETL pipelines using Python and SQL to consolidate and validate pricing data from disparate sources into the enterprise e-Catalog, improving data accuracy by ~25%.
Automated manual data validation and quality checks by implementing Python (Pandas, NumPy) scripts and machine learning models, reducing average processing time by 40%.
Designed and deployed predictive models (scikit-learn) within the data pipeline to forecast price elasticity and recommend optimal discounting strategies.
Optimized underlying SQL database structures and queries, significantly enhancing performance and reducing report generation latency.
Established automated monitoring and alerting systems for data quality and pricing outliers using Python.
Designed and deployed interactive Tableau and Power BI dashboards
Tools: Python, SQL, Snowflake, Databrick, Power BI, JIRA, ServiceNow,Talend, dbt, Teams, Azure Cloud, Data Factory, Exce, Access
Senior Data Engineer AT&T Feb 2021 – Jun 2024
Architected and engineered a scalable data pipeline using Databricks, PySpark, and Python to automate the ingestion and transformation of multi-source data from ServiceNow and Jira, improving data availability for analytics by 40%.
Led the design and implementation of a Snowflake data warehouse, consolidating disparate datasets and implementing automated data ingestion via SnowSQL and Snowpipe, which improved data accessibility and reduced storage costs by 15%.
Fine-tuned and deployed large language models (LLMs) using OpenAI's frameworks to enhance NLP capabilities for customer interaction platforms and internal knowledge management systems.
Pioneered cost and performance optimization initiatives by identifying underutilized resources and legacy systems; architected a server decommissioning plan resulting in total savings of over $2 million.
Developed reusable Python modules in Jupyter notebooks to automate recurring data processes, slashing manual reporting time by 90%.
Engineered automated financial reports in Tableau, including Waterfall charts to visualize profit and loss drivers and 12-month rolling reports to track revenue trends and forecast performance against budget.
Collaborated with data scientists to operationalize machine learning models, including a drone fault detection system, by building robust data pipelines for model training and inference.
Designed and deployed interactive Tableau and Power BI dashboards for diverse stakeholders (executives, operations, and marketing)
Tools: Python, PySpark, SQL, Snowflake, Databricks, AWS, Azure, Power BI, JIRA, ServiceNow, OpenAI API, Talend, dbt, AWS S3, AWS Glue, Sharepoint, Teams, Azure Cloud, Data Factory, Excel
Data Engineer Millennial R&D Feb 2019 – Dec 2019
Designed and implemented logical and physical data models using ERwin to support new application features and reporting requirements.
Engineered a data integration solution by connecting a web application to a third-party API using Node.js, enabling real-time data consumption from Google Cloud data structures.
Developed and deployed interactive Power BI and Tableau dashboards to track key performance indicators (KPIs), providing actionable insights to stakeholders.
Collected data with SQL.
Performed data mining, searching for patterns and clues.
Imported and merged data from different sources and different formats into Tableau.
Analyzed big data from internal and external users with Tableau to provide actionable insight.
Performed data visualization and built interactive dashboards in Tableau.
Tools: ERwin, SQL, DAX, Power BI, Tableau, Alteryx, Node.js, AWS S3, Google Cloud Platform, Trello, Box, Excel, Pivot Tables, Vlookup
Data Analyst / Engineer The Coca-Cola Company - Freestyle Aug 2018 – Jan 2019
Designed and deployed interactive Tableau dashboards that informed strategic decision-making.
Built and optimized data pipelines using SQL and Python to merge disparate data sources from AWS Athena and Redshift, creating unified datasets for analysis.
Engineered Tableau dashboards to monitor machine performance and part failure rates, enabling predictive maintenance strategies for Freestyle fountain machines.
Analyzed A/B test results and customer segmentation data to provide data-driven recommendations that increased marketing campaign ROI by 15%.
Used Alteryx to merge web data from different sources; cleaned it up of tags, colons, semicolons, dash, random words and organized it into one central organized table.
Used DAX functions in Power BI to compute data in calculated columns.
Tools: Postgres, SQL, Python, AWS (Athena, Redshift, S3), Tableau, Alteryx, Power BI, Excel, Dbeaver
Data Analyst / Engineer Millennial R&D Jan 2015 – Nov 2017
Developed and maintained ETL processes using SSIS and C# scripts to cleanse and load data from structured and unstructured sources into SQL Server databases.
Utilized Alteryx for advanced data blending and preparation, creating centralized data tables for enterprise reporting.
Authored Python scripts with PySpark for large-scale data processing and analysis, supporting data science initiatives.
Modeled and documented database architectures using Erwin and implemented security and refresh schedules on Tableau Server.
Tools: MS SQL Server, SSIS, T-SQL, C#, Python, Alteryx, Tableau, Power BI, Erwin, AWS S3
Key competencies
oComplex and advanced SQL queries
oData Mining,
oModel Training
oAssociation Rule
oSupervised & unsupervised Machine Learning
oRanked 2nd at the International Collegiate Programming Contest Kennesaw University, Nov 11th,17.
oTableau, SAS (Big data visualization, interactive dashboards, and story)
oPower pivot (Pivot tables, Pivot charts, DAX)
oApache Spark, JavaScript, Android App Dev, Java, python, C#. SQL, IBM Watson Analytics.
oMS Virtual Studio, Android SDK, BleuJ, Drjava, MySQL Workbench, MS Access, MS Excel.
oRegressions, forecasting methods, variances, and level of significance in MS Excel and Tableau.
oExcel for Operations Management application in Finance, Marketing, Capital Budgeting and Production
oDemonstrable skills in financial processes and strong time management
oTaking telephone orders and queries, handling customer complaints
oSuperb organizational skills and time management skills to perform multiple tasks within limited time frames.
oExcellent analytical and problem-solving skills,
oRecognizes core business issues and their impact on reaching strategic objectives.
oOpen AI
oChatGPT
oLLM
Education & credentials
Masters in Data Science: Fall 2022. Illinois Institute of Technology
Bachelors in management information system & minor in IT Georgia Gwinnett College
Certificate in Data Science WOZ U, a Certification program founded By Steve Wozniak of Apple
Other Trainings
o ETL and SSIS /Lynda.com
o Networking Foundations: Servers
o Advanced SQL for Data Scientists /Lynda.com
o Apache Cordova /Lynda.com
o Tableau 10 Essential Training (Big data)
o Ethical Hacking: System Hacking visualization, Dashboard and Story) /Lynda.com
o CSS Essential Training /Lynda.com
o Power Pivot 2013 & SharePoint 2013
o Apache Spark/Lynda.com
o Android App Development /Lynda.com
o Microsoft SQL Server 2016 /Lynda.com