Machine Learning Data Engineer

Location:

Atlanta, GA

Posted:

June 02, 2025

Contact this candidate

Resume:

CHARLES TALATOTI

+1-618-***-**** *******************@*****.***

SUMMARY:

Experienced AI/ML and Data Engineer with 5 years of expertise in developing and deploying machine learning models, building scalable data pipelines, and working with cloud-based solutions.

Skilled in implementing end-to-end MLOps pipelines to automate the machine learning lifecycle, ensuring faster deployment, monitoring, and reproducibility.

Proficient in Python, SQL, and ML frameworks such as TensorFlow, PyTorch, and Scikit-learn for model development and evaluation. Hands-on experience with data engineering tools like Apache Airflow, Apache Kafka, Databricks, and cloud services including AWS, Azure (Dataflow).

Strong understanding of data preprocessing, feature engineering, model training, and real-time inference for use cases like predictive analytics, and customer segmentation.

Experienced in CI/CD practices for machine learning and data workflows, enabling continuous integration and delivery of models in production. Passionate about solving real-world problems using AI, continuously learning and exploring new technologies in machine learning, cloud computing, and big data.

Integrated MLOps practices into Agile workflows, ensuring model versioning, automated testing, and CI/CD pipelines align with sprint cycles.

SKILLS:

Programming Languages

Python, SQL, NoSQL, JavaScript, HTML, CSS, Bash

Machine Learning & AI

AI Models, Machine Learning, Deep Learning, PyTorch, TensorFlow, Scikit-learn, LLMs

Natural Language Processing

NLP, Text-to-Speech, AI Chatbots, Transformers, Signal Processing

Data Engineering

PySpark, Data Preprocessing, Data Handling, Data Analysis, Redis, PostgreSQL, Pandas, NumPy

Cloud Platforms

AWS (S3, Lambda, SageMaker), GCP (BigQuery, Dataflow, Vertex AI), Azure (ADF, ML Studio)

MLOps & DevOps

CI/CD, Docker, Kubernetes, Model Deployment, MLFlow, Jenkins, Terraform

Data Visualization & Analysis

Microsoft Office, Data Analyst tools, Matplotlib, Seaborn, Quality Control

Database & Storage

SQL, PostgreSQL, NoSQL, Redis, BigQuery, Data Lakes

Security & Compliance

Encryption, Data Masking, RBAC, HIPAA/GDPR compliance

Project & Process Management

Jira, Agile/Scrum, Project Management, Training & Development

Web Technologies

HTML, CSS, Web Design

Automation & Testing

Selenium, PyTest, Unit Testing

EXPERIENCE:

Project Title: AI-Powered Medical Diagnosis at BJC, St. Louis

Duration: JANUARY 2024 – Present

Role: Senior ML/GenAI Engineer

Tech Stack: Python, FastAPI, Llama 3, HuggingFace, WhisperX, Pyannote, BART-large, Spring Boot, PostgreSQL, Pinecone, FAISS, AWS (EC2, Lambda, S3, DynamoDB, CloudWatch), Kubernetes, Jenkins, GitLab CI/CD, Power BI

Developed a PDF Q&A chatbot with asynchronous processing, dynamic prompts, and FAISS-based similarity search.

Built a medical diagnostic assistant using LLaMA-3.1-8B with Pinecone for context-aware responses.

Improved diagnostic model reliability through rigorous dataset preparation and evaluation on dermatology datasets.

Created an identity card fraud detection system using custom image cropping and enhanced preprocessing logic.

Achieved 94% accuracy in ID fraud detection by optimizing detection pipelines and analyzing performance metrics.

Deployed WhisperX and Pyannote-based transcription pipeline with summarization using BART-large.

Designed FastAPI and RabbitMQ-based real-time processing backend for scalable meeting transcription.

Developed and tested secure RESTful APIs for healthcare data exchange using Spring Boot.

Automated deployments with Jenkins and GitLab CI/CD and implemented API tests using Postman and JUnit.

Used Kubernetes for scalable deployment and resource management of AI services.

Implemented monitoring dashboards using AWS CloudWatch and Power BI for diagnostic and chatbot performance.

Reduced patient ID verification time by 40% and improved chatbot response accuracy by 30%.

Project Title: Enterprise Data Engineering for Cloud Data Warehousing & Analytics at ARTIS Inc

Duration: August 2020 – December 2022

Role: Data Engineer

Tech Stack: Snowflake, Snowpipe, Kafka, dbt, Tableau, Power BI, Prometheus, Python, SQL

Designed and deployed a scalable Snowflake architecture, optimizing storage and compute resources for large-scale data operations using Warehouses Snowflake (virtual warehouses).

Developed and optimized ETL workflows using Snowflake SQL to streamline data extraction, transformation, and loading processes.

Created logical and physical data models within Snowflake, using Data Build Tool to support complex data analysis requirements.

Implemented advanced query optimization techniques in Snowflake using Caching and Virtual Warehouses, reducing execution times by 30%.

Set up real-time data ingestion pipelines using Snow pipe and Kafka to ensure immediate data availability for analytics and decision-making.

Utilized Snowflake’s clustering and automatic re-clustering features to enhance data retrieval efficiency and reduce storage costs.

Established proactive system health checks and performance monitoring using Snowflake’s Account Usage views and Prometheus for external monitoring.

Leveraged Snowflake features such as materialized views, data shares, and clones to facilitate efficient data management and sharing.

Automated repetitive data management tasks using Snowflake Tasks and Streams to enhance operational efficiency.

Monitored and optimized Snowflake credit usage and storage costs through careful management of virtual warehouses using Snowflake’s Resource Monitors.

Developed complex SQL scripts for data manipulation and aggregation using Snowflake SQL and Python to support business intelligence activities.

Ensured compliance with regulatory requirements by setting up secure data environments and conducting regular audits using Snowflake's Compliance features.

Integrated Snowflake with BI tools such as Tableau and Power BI for advanced data visualization and analytics.

Project Title: Scalable Cloud Data Pipelines for Marketing at Technomics Inc, Pune India

Duration: September 2018 – August 2020

Role: Data Engineer

Tech Stack: AWS (S3, Glue, Spark, EC2, Lambda, Kinesis, Redshift, Athena, CloudWatch), Kafka, Hadoop, MongoDB, Cassandra, MySQL, Jenkins, Tableau, Airflow

Engineered complex data engineering solutions on a cloud-based platform using AWS services such as S3, Glue, Spark, Kafka, Hadoop.

Designed and built AWS infrastructure components including S3, EC2, ECS, Kinesis, DynamoDB, SNS, SQS, Lambda, Redshift, Athena, and CloudWatch.

Collaborated closely with the Business team to implement data strategies, develop data flows, and create conceptual data models.

Optimized data ingestion processes, reducing preprocessing time by 30% and improving data quality.

Developed and maintained data lakes on AWS S3, integrating data from primary and secondary sources and enhancing flexibility and scalability of data evaluation processes.

Engineered data pipelines using Apache Spark, AWS EMR, and AWS Glue for efficient ETL of large-scale marketing data.

Leveraged AWS Lambda for serverless data processing pipelines and performed transformations using cloud. Implemented solutions for processing and analyzing streaming data using Kafka and Spark Streaming.

Orchestrated complex data workflows using Apache Airflow DAGs to automate data pipeline executions and monitoring.

Collaborated with data architects to optimize data warehouses, integrating structured and unstructured data sources like CSV, JSON, and Parquet through ELT processes.

Utilized relational databases (MySQL, Oracle DB, Microsoft SQL Server) and NoSQL databases (MongoDB, Cassandra DB, HBase) for data storage and processing.

Integrated AWS S3 with Snowflake for data warehousing, supporting business analytics, reporting, and dashboarding.

Monitored system performance, troubleshooted issues, and optimized data pipelines to prevent bottlenecks.

Implemented version control using Git integrated with AWS Cloud for managing codebase changes.

Created interactive dashboards using Tableau, AWS Athena to visualize marketing analytics and derive actionable insights.

Contact this candidate