This is a concise version of my resume.
To obtain my references or a full version of this resume, please, email me to *********@*****.***. Call or text to 770-***-**** JEAN BAPTISTE MBAYA
Phone: +1-770-***-**** - Email: *********@*****.*** - LinkedIn: https://www.linkedin.com/in/jean-baptiste-mbaya-2833a963 Summary
Experienced Senior Data Engineer with over 11 years of expertise in designing, building, and managing scalable, enterprise-grade data platforms. Proven ability to lead the end-to-end delivery of robust data pipelines, data warehousing solutions, and machine learning infrastructure. Combines deep technical proficiency in modern data stacks with a strong focus on performance optimization, cost management, and driving business value through reliable data systems.
Project
Led an end-to-end machine learning project to predict and prevent drone failures. Responsibilities included data pipeline creation, feature engineering, model training, and developing a production-ready anomaly detection system. Project Links
GitHub: https://github.com/MrAshTag/Drone_Faults_Anomaly_Dectection Presentation: https://www.vimeo.com/602099549
Technical Proficiencies
• Data Engineering & Languages: Python (PySpark, Pandas), SQL (Advanced), dbt, R, Scala, DAX, C#, Javascript
• Data Platforms & Warehousing: Snowflake, Databricks, AWS (Redshift, S3, Athena), Azure Cloud, Apache Spark
• Data Pipeline & ETL/ELT: Snowpipe, SnowSQL, Databricks Workflows, SSIS, Alteryx, Talend, Custom Python ETL modules
• Data Modeling & Architecture: Data Vault 2.0, Dimensional Modeling, ERwin, ER/Studio
• BI & Visualization: Power BI, Tableau (Server & Desktop)
• AI/ML Engineering: OpenAI GPT frameworks, LLM Fine-tuning, NLP, Scikit-learn, MLflow, ChatGPT, LLama
• Methodologies & Tools: Agile (Scrum), Data vault 2.0, Git, JIRA, ServiceNow, SharePoint WORK EXPERIENCE
Ecatalog Pricing Data Engineer ThermoFisher Scientific 7/2024 – Present
• Architected and engineered scalable ETL pipelines using Python and SQL to consolidate and validate pricing data from disparate sources into the enterprise e-Catalog, improving data accuracy by ~25%.
• Automated manual data validation and quality checks by implementing Python (Pandas, NumPy) scripts and machine learning models, reducing average processing time by 40%.
• Designed and deployed predictive models (scikit-learn) within the data pipeline to forecast price elasticity and recommend optimal discounting strategies.
• Optimized underlying SQL database structures and queries, significantly enhancing performance and reducing report generation latency.
• Established automated monitoring and alerting systems for data quality and pricing outliers using Python. Tools: Python, PySpark, SQL, Snowflake, Databricks, AWS, Azure, Power BI, JIRA, ServiceNow, OpenAI API, Talend, dbt, AWS S3, AWS Glue, Sharepoint, Teams, Azure Cloud, Data Factory, Excel This is a concise version of my resume.
To obtain my references or a full version of this resume, please, email me to *********@*****.***. Call or text to 770-***-**** Senior Data Engineer AT&T Feb 2021 – Jun 2024
• Architected and engineered a scalable data pipeline using Databricks, PySpark, and Python to automate the ingestion and transformation of multi-source data from ServiceNow and Jira, improving data availability for analytics by 40%.
• Led the design and implementation of a Snowflake data warehouse, consolidating disparate datasets and implementing automated data ingestion via SnowSQL and Snowpipe, which improved data accessibility and reduced storage costs by 15%.
• Fine-tuned and deployed large language models (LLMs) using OpenAI's frameworks to enhance NLP capabilities for customer interaction platforms and internal knowledge management systems.
• Pioneered cost and performance optimization initiatives by identifying underutilized resources and legacy systems; architected a server decommissioning plan resulting in total savings of over $2 million.
• Developed reusable Python modules in Jupyter notebooks to automate recurring data processes, slashing manual reporting time by 90%.
• Collaborated with data scientists to operationalize machine learning models, including a drone fault detection system, by building robust data pipelines for model training and inference. Tools: Python, PySpark, SQL, Snowflake, Databricks, AWS, Azure, Power BI, JIRA, ServiceNow, OpenAI API, Talend, dbt, AWS S3, AWS Glue, Sharepoint, Teams, Azure Cloud, Data Factory, Excel Data Engineer Millennial R&D Feb 2019 – Dec 2019
• Designed and implemented logical and physical data models using ERwin to support new application features and reporting requirements.
• Engineered a data integration solution by connecting a web application to a third-party API using Node.js, enabling real-time data consumption from Google Cloud data structures.
• Developed and deployed interactive Power BI and Tableau dashboards to track key performance indicators
(KPIs), providing actionable insights to stakeholders.
• Collected data with SQL.
• Performed data mining, searching for patterns and clues.
• Imported and merged data from different sources and different formats into Tableau.
• Analyzed big data from internal and external users with Tableau to provide actionable insight.
• Performed data visualization and built interactive dashboards in Tableau. Tools: ERwin, SQL, DAX, Power BI, Tableau, Alteryx, Node.js, AWS S3, Google Cloud Platform, Trello, Box, Excel, Pivot Tables, Vlookup Data Analyst / Engineer The Coca-Cola Company - Freestyle Aug 2018 – Jan 2019
• Built and optimized data pipelines using SQL and Python to merge disparate data sources from AWS Athena and Redshift, creating unified datasets for analysis.
• Engineered Tableau dashboards to monitor machine performance and part failure rates, enabling predictive maintenance strategies for Freestyle fountain machines.
• Analyzed A/B test results and customer segmentation data to provide data-driven recommendations that increased marketing campaign ROI by 15%.
• Used Alteryx to merge web data from different sources; cleaned it up of tags, colons, semicolons, dash, random words and organized it into one central organized table.
• Used DAX functions in Power BI to compute data in calculated columns. Tools: Postgres, SQL, Python, AWS (Athena, Redshift, S3), Tableau, Alteryx, Power BI, Excel, Dbeaver This is a concise version of my resume.
To obtain my references or a full version of this resume, please, email me to *********@*****.***. Call or text to 770-***-**** Data Analyst / Engineer Millennial R&D Jan 2015 – Nov 2017
• Developed and maintained ETL processes using SSIS and C# scripts to cleanse and load data from structured and unstructured sources into SQL Server databases.
• Utilized Alteryx for advanced data blending and preparation, creating centralized data tables for enterprise reporting.
• Authored Python scripts with PySpark for large-scale data processing and analysis, supporting data science initiatives.
• Modeled and documented database architectures using Erwin and implemented security and refresh schedules on Tableau Server.
Tools: MS SQL Server, SSIS, T-SQL, C#, Python, Alteryx, Tableau, Power BI, Erwin, AWS S3 Key competencies
o Complex and advanced SQL queries
o Data Mining,
o Model Training
o Association Rule
o Supervised & unsupervised Machine Learning
o Ranked 2nd at the International Collegiate Programming Contest Kennesaw University, Nov 11th,17. o Tableau, SAS (Big data visualization, interactive dashboards, and story) o Power pivot (Pivot tables, Pivot charts, DAX)
o Apache Spark, JavaScript, Android App Dev, Java, python, C#. SQL, IBM Watson Analytics. o MS Virtual Studio, Android SDK, BleuJ, Drjava, MySQL Workbench, MS Access, MS Excel. o Regressions, forecasting methods, variances, and level of significance in MS Excel and Tableau. o Excel for Operations Management application in Finance, Marketing, Capital Budgeting and Production o Demonstrable skills in financial processes and strong time management o Taking telephone orders and queries, handling customer complaints o Superb organizational skills and time management skills to perform multiple tasks within limited time frames. o Excellent analytical and problem-solving skills, o Recognizes core business issues and their impact on reaching strategic objectives. o Open AI
o ChatGPT
o LLM
Education & credentials
Masters in Data Science: Fall 2022. Illinois Institute of Technology Bachelors in management information system & minor in IT Georgia Gwinnett College Certificate in Data Science WOZ U, a Certification program founded By Steve Wozniak of Apple Other Trainings
o ETL and SSIS /Lynda.com
o Networking Foundations: Servers
o Advanced SQL for Data Scientists /Lynda.com
o Apache Cordova /Lynda.com
o Tableau 10 Essential Training (Big data)
o Ethical Hacking: System Hacking visualization, Dashboard and Story) /Lynda.com o CSS Essential Training /Lynda.com
o Power Pivot 2013 & SharePoint 2013
o Apache Spark/Lynda.com
o Android App Development /Lynda.com
o Microsoft SQL Server 2016 /Lynda.com