CHARLES TALATOTI
+1-618-***-**** *******************@*****.***
SUMMARY:
Experienced AI/ML and Data Engineer with 5 years of expertise in developing and deploying machine learning models, building scalable data pipelines, and working with cloud-based solutions.
Skilled in implementing end-to-end MLOps pipelines to automate the machine learning lifecycle, ensuring faster deployment, monitoring, and reproducibility.
Proficient in Python, SQL, and ML frameworks such as TensorFlow, PyTorch, and Scikit-learn for model development and evaluation. Hands-on experience with data engineering tools like Apache Airflow, Apache Kafka, Databricks, and cloud services including AWS, Azure (Dataflow).
Strong understanding of data preprocessing, feature engineering, model training, and real-time inference for use cases like predictive analytics, and customer segmentation.
Experienced in CI/CD practices for machine learning and data workflows, enabling continuous integration and delivery of models in production. Passionate about solving real-world problems using AI, continuously learning and exploring new technologies in machine learning, cloud computing, and big data.
Integrated MLOps practices into Agile workflows, ensuring model versioning, automated testing, and CI/CD pipelines align with sprint cycles.
SKILLS:
Programming Languages
Python, SQL, NoSQL, JavaScript, HTML, CSS, Bash
Machine Learning & AI
AI Models, Machine Learning, Deep Learning, PyTorch, TensorFlow, Scikit-learn, LLMs
Natural Language Processing
NLP, Text-to-Speech, AI Chatbots, Transformers, Signal Processing
Data Engineering
PySpark, Data Preprocessing, Data Handling, Data Analysis, Redis, PostgreSQL, Pandas, NumPy
Cloud Platforms
AWS (S3, Lambda, SageMaker), GCP (BigQuery, Dataflow, Vertex AI), Azure (ADF, ML Studio)
MLOps & DevOps
CI/CD, Docker, Kubernetes, Model Deployment, MLFlow, Jenkins, Terraform
Data Visualization & Analysis
Microsoft Office, Data Analyst tools, Matplotlib, Seaborn, Quality Control
Database & Storage
SQL, PostgreSQL, NoSQL, Redis, BigQuery, Data Lakes
Security & Compliance
Encryption, Data Masking, RBAC, HIPAA/GDPR compliance
Project & Process Management
Jira, Agile/Scrum, Project Management, Training & Development
Web Technologies
HTML, CSS, Web Design
Automation & Testing
Selenium, PyTest, Unit Testing
EXPERIENCE:
Project Title: AI-Powered Medical Diagnosis at BJC, St. Louis
Duration: JANUARY 2024 – Present
Role: Senior ML/GenAI Engineer
Tech Stack: Python, FastAPI, Llama 3, HuggingFace, WhisperX, Pyannote, BART-large, Spring Boot, PostgreSQL, Pinecone, FAISS, AWS (EC2, Lambda, S3, DynamoDB, CloudWatch), Kubernetes, Jenkins, GitLab CI/CD, Power BI
Developed a PDF Q&A chatbot with asynchronous processing, dynamic prompts, and FAISS-based similarity search.
Built a medical diagnostic assistant using LLaMA-3.1-8B with Pinecone for context-aware responses.
Improved diagnostic model reliability through rigorous dataset preparation and evaluation on dermatology datasets.
Created an identity card fraud detection system using custom image cropping and enhanced preprocessing logic.
Achieved 94% accuracy in ID fraud detection by optimizing detection pipelines and analyzing performance metrics.
Deployed WhisperX and Pyannote-based transcription pipeline with summarization using BART-large.
Designed FastAPI and RabbitMQ-based real-time processing backend for scalable meeting transcription.
Developed and tested secure RESTful APIs for healthcare data exchange using Spring Boot.
Automated deployments with Jenkins and GitLab CI/CD and implemented API tests using Postman and JUnit.
Used Kubernetes for scalable deployment and resource management of AI services.
Implemented monitoring dashboards using AWS CloudWatch and Power BI for diagnostic and chatbot performance.
Reduced patient ID verification time by 40% and improved chatbot response accuracy by 30%.
Project Title: Enterprise Data Engineering for Cloud Data Warehousing & Analytics at ARTIS Inc
Duration: August 2020 – December 2022
Role: Data Engineer
Tech Stack: Snowflake, Snowpipe, Kafka, dbt, Tableau, Power BI, Prometheus, Python, SQL
Designed and deployed a scalable Snowflake architecture, optimizing storage and compute resources for large-scale data operations using Warehouses Snowflake (virtual warehouses).
Developed and optimized ETL workflows using Snowflake SQL to streamline data extraction, transformation, and loading processes.
Created logical and physical data models within Snowflake, using Data Build Tool to support complex data analysis requirements.
Implemented advanced query optimization techniques in Snowflake using Caching and Virtual Warehouses, reducing execution times by 30%.
Set up real-time data ingestion pipelines using Snow pipe and Kafka to ensure immediate data availability for analytics and decision-making.
Utilized Snowflake’s clustering and automatic re-clustering features to enhance data retrieval efficiency and reduce storage costs.
Established proactive system health checks and performance monitoring using Snowflake’s Account Usage views and Prometheus for external monitoring.
Leveraged Snowflake features such as materialized views, data shares, and clones to facilitate efficient data management and sharing.
Automated repetitive data management tasks using Snowflake Tasks and Streams to enhance operational efficiency.
Monitored and optimized Snowflake credit usage and storage costs through careful management of virtual warehouses using Snowflake’s Resource Monitors.
Developed complex SQL scripts for data manipulation and aggregation using Snowflake SQL and Python to support business intelligence activities.
Ensured compliance with regulatory requirements by setting up secure data environments and conducting regular audits using Snowflake's Compliance features.
Integrated Snowflake with BI tools such as Tableau and Power BI for advanced data visualization and analytics.
Project Title: Scalable Cloud Data Pipelines for Marketing at Technomics Inc, Pune India
Duration: September 2018 – August 2020
Role: Data Engineer
Tech Stack: AWS (S3, Glue, Spark, EC2, Lambda, Kinesis, Redshift, Athena, CloudWatch), Kafka, Hadoop, MongoDB, Cassandra, MySQL, Jenkins, Tableau, Airflow
Engineered complex data engineering solutions on a cloud-based platform using AWS services such as S3, Glue, Spark, Kafka, Hadoop.
Designed and built AWS infrastructure components including S3, EC2, ECS, Kinesis, DynamoDB, SNS, SQS, Lambda, Redshift, Athena, and CloudWatch.
Collaborated closely with the Business team to implement data strategies, develop data flows, and create conceptual data models.
Optimized data ingestion processes, reducing preprocessing time by 30% and improving data quality.
Developed and maintained data lakes on AWS S3, integrating data from primary and secondary sources and enhancing flexibility and scalability of data evaluation processes.
Engineered data pipelines using Apache Spark, AWS EMR, and AWS Glue for efficient ETL of large-scale marketing data.
Leveraged AWS Lambda for serverless data processing pipelines and performed transformations using cloud. Implemented solutions for processing and analyzing streaming data using Kafka and Spark Streaming.
Orchestrated complex data workflows using Apache Airflow DAGs to automate data pipeline executions and monitoring.
Collaborated with data architects to optimize data warehouses, integrating structured and unstructured data sources like CSV, JSON, and Parquet through ELT processes.
Utilized relational databases (MySQL, Oracle DB, Microsoft SQL Server) and NoSQL databases (MongoDB, Cassandra DB, HBase) for data storage and processing.
Integrated AWS S3 with Snowflake for data warehousing, supporting business analytics, reporting, and dashboarding.
Monitored system performance, troubleshooted issues, and optimized data pipelines to prevent bottlenecks.
Implemented version control using Git integrated with AWS Cloud for managing codebase changes.
Created interactive dashboards using Tableau, AWS Athena to visualize marketing analytics and derive actionable insights.