Machine Learning Data Scientist

Location:

New York City, NY

Salary:

150K

Posted:

November 03, 2024

Contact this candidate

Resume:

Sompoth Supathanasinkasaem

Phone: +1-323-***-**** Email: **************@*****.***

LinkedIn: https://www.linkedin.com/in/sompoth-supathanasinkasaem-767ab2328

Location: 9707 Oak Street Bellflower, CA 90706

Professional Summary

Senior Machine Learning Engineer and Data Scientist with 12 years of experience driving innovative AI solutions in computer vision, natural language processing (NLP), Large Language Models (LLMs), Generative AI (GenAI), time-series forecasting, and recommendation systems. Proven track record of building and deploying scalable machine learning pipelines across diverse industries, optimizing model performance, and leading cross-functional teams. Expertise in distributed systems and cloud-based architectures (AWS, GCP, Azure). Adept at leveraging advanced ML frameworks and tools to deliver impactful, data-driven results, while effectively communicating insights to stakeholders. Skilled in MLOps, model evaluation, and end-to-end automation, with a focus on enhancing operational efficiency and business outcomes.

Skills

Programming & Software Engineering

Languages: Python, R, Java, Scala, C++, Javascript, Typscript

Version Control: Git, GitHub

API Development: Flask, FastAPI, REST, GraphQL

System Design: Designing scalable systems, optimizing performance

Machine Learning & AI

Algorithms: Linear regression, Decision trees, SVMs, Neural Networks, Clustering (K-means, DBSCAN), PCA

Deep Learning: CNNs, RNNs, OCR, Image Segmentation, Object Detection, NLTK, NLP, LLMs, OpenAI, Transformers (BERT, GPT, and other variants), GenAI, RAG, LangChain, Huggingface, Transfer Learning,

Framework: PyTorch, TensorFlow, Keras, Caffe, Scikit-learn, OpenCV

Data Science Notebook: Jupyter, PyCharm, Colab

Data Engineering

Data Wrangling: Pandas, NumPy, Dask

Big Data Tools: Hadoop, Spark, Databricks

Data Storage: MySQL, PostgreSQL, MongoDB, Data lakes, Neo4j, Vector Database (Pinecone, FAISS)

MLOps & Cloud Platforms

Deployment: TensorFlow Serving, Docker, Kubernetes

Cloud Platforms: AWS(S3, SageMaker, Lambda, EC2…), Google Cloud, Azure, GeminiAI, VertexAI

CI/CD Pipelines: Jenkins, Airflow, GitHub, Terraform

Feature Engineering & Tuning

Feature Selection: Feature importance, correlation analysis

Dimensionality Reduction: PCA, t-SNE

Model Tuning: Grid search, Random search, Bayesian optimization

Visualization & Communication

Data Visualization: Matplotlib, Seaborn, Tableau, D3.js, Plotly

Frontend Technologies: React.js, Vue.js, HTML, CSS, JavaScript

Collaboration: Stakeholder Engagement, Team Mentorship, Agile Development, Cross-functional teams, A/B testing insights, JIRA, Confluence

Additional Skills

Can-do attitude Problem-solving Out-of-box thinking Product-Oriented Documentation Active listening Team player Best practices

Work Experience

Senior ML Engineer/Data Scientist

PricewaterhouseCoopers

April 2024 – Present New York, NY

Directed the development and implementation of cutting-edge denoising models, achieving 11% performance enhancement.

Automated MLOps pipelines using Jenkins for CI/CD and Airflow for workflow orchestration, reducing deployment time by 35% and accelerating model iteration cycles.

Implemented monitoring solutions through CloudFormation to track ML application performance, resulting in a 25% reduction in downtime and improved issue resolution time.

Utilized OpenCV's image processing functions and implemented deep learning applications with PyTorch, improving image processing tasks by 20%.

Developed and enhanced OCR systems using CNN-based text detection models, improving inference time by 10%.

Leveraged AWS GroundTruth for data labeling and SageMaker for scalable training, significantly reducing model deployment time.

Transitioned from VGG-16 to MobileNetV3 to optimize processing time while maintaining performance in image enhancement tasks.

Applied Graph Convolutional Networks (GCN) and BERT for Key Information Extraction (KIE), enhancing document extraction accuracy by 15%.

Developed engaging visualizations with Pandas, Seaborn, Matplotlib, and Tableau, enhancing insight communication to stakeholders, leading to a 20% increase in data-driven decision-making.

Leveraged PySpark's distributed computing framework on AWS EKS for handling large datasets, reducing data processing times by 20%.

Senior ML Engineer/Data Scientist

Amazon

May 2020 – Mar 2024 Orlando, FL

Managed a team of 8 engineers to develop AI solutions, improving operational efficiency.

Designed and implemented scalable infrastructure using AWS CloudFormation for ML environments, reducing deployment time by 15% and enhancing system reliability.

Developed modular templates for provisioning resources like EC2, S3 and RDS to support machine learning.

Collaborated with cross-functional teams to integrate Generative AI solutions into existing applications, resulting in 20% increase in feature adoption among users.

Led the development of an NLU and NLG system for a customer support chatbot, implementing intent recognition and dialogue generation models that enhanced response accuracy and user satisfaction by 15%.

Built large-scale NLP models using LLM techniques for automated report generation, cutting down content creation time by 20%.

Engineered and deployed a document parsing and question-answering system using state-of-the-art NLP techniques and OpenAI models, reducing document processing time by 15%.

Leveraged state-of-the-art language models such as Llama2 for content generation (GenAI) based on specific prompts and questions using the LangChain Library

Implemented TensorFlow and Huggingface frameworks, contributing to 25% performance increase in RAG AI applications.

Demonstrated expertise in Convolutional Neural Networks (CNNs), deep neural networks, and attention mechanisms, using the Caffe2 library for cutting-edge machine learning techniques.

Guided the development of a fraud detection model using SageMaker, reducing fraudulent transactions by 15% and improving overall system security.

Optimized image recognition tasks using Transformer based models achieving 95%+ accuracy.

Optimized AI workloads by migrating them to AWS, boosting model training speed by 20%.

Refactored code into Java to enable low-latency orchestration, enhancing real-time trade execution performance by 50%.

Utilized PySpark's distributed computing framework on AWS EKS for efficient handling of massive datasets, achieving a 50% increase in processing speed and scalability for large-scale data operations.

Integrated continuous deployment (CD) processes using Jenkins, GitHub and Docker, reducing production errors by 15%.

Implemented data pipelines using Kubeflow and Airflow, optimizing model retraining and deployment processes, which reduced deployment time by 35% and improved overall model accuracy by 20%.

Utilized advanced transfer learning techniques with SageMaker to train models using pre-trained architectures, leveraging GPU acceleration and CUDA, resulting in a 25% improvement in model training time and a 15% increase in performance metrics.

Modernized a low-latency orchestrator by refactoring and migrating code to Java, ensuring rapid trade execution and achieving a 20% reduction in latency for real-time data consistency.

Provided guidance to junior developers via Jira and Confluence, fostering a collaborative and productive team environment

Senior ML Engineer/Data Scientist

Zillion Technologies

Oct 2013 – Apr 2020 Columbia, MD

Deployed Generative AI models into production environments using MLOps practices, ensuring scalability and reliability while monitoring performance and user feedback.

Led the design and implementation of predictive models for financial risk assessment, increasing prediction accuracy by 15%, which improved client decision-making.

Supervised the development of an NLP-powered chatbot, enhancing customer support efficiency by 15%.

Developed and fine-tuned recommendation systems using advanced NLP models, driving customer engagement by 20%.

Developed text generation and image synthesis algorithms for AI-based financial products, contributing to a 20% increase in product adoption.

Utilized Retrieval-Augmented Generation (RAG) techniques to improve model outputs, achieving a 20% increase in accuracy for specific NLP tasks.

Led Image segmentation and classification projects using CNNs, improving model accuracy for medical applications by 15%.

Spearheaded the implementation of computer vision models in AWS, leveraging GroundTruth for data labeling and SageMaker for scalable training and deployment.

Conducted benchmarking of logo detection models (YOLO, SSD, ResNet, Fast RCNN) within the TensorFlow framework, improving model accuracy by 10%.

Boosted the performance of large datasets by leveraging Spark and Hadoop, resulting in a significant reduction in analysis time by 15%, leading to faster decision-making.

Managed and integrated data pipelines, improving workflow efficiency by automating processes with Jenkins and Airflow.

Applied advanced MLOps practices to streamline model deployment and continuous integration across the organization.

Directed cloud-based data warehouse implementation with Databricks and Terraform resulting in a cost savings of 15% on data storage and processing expenses.

Ensured Seamless data integration across platforms using relational databases (MySQL, PostgreSQL, Oracle Database) and non-relational databases (MongoDB, Redis, Apache Cassandra)

Utilized Java and Scala to develop stream processing logic and data transformation tasks, resulting in a 15% increase in data processing speed and a 25% reduction in data errors, facilitating real-time analytics and decision-making.

Managed version control of CloudFormation templates with Git for improved collaboration.

Designed and executed data pipelines using Jenkins, Airflow, Databricks, and PySpark, reducing execution time by 20%.

Provided a collaborative environment for data scientists using JupyterHub for complex machine learning projects, improving team productivity by 15% and facilitating smoother project workflows.

Incorporated Docker containerization, Kubernetes, Power BI and Databricks into the development workflow to streamline application deployment and ensure consistent environments across development, testing and production

Optimized an automatic data processing system by conducting analysis, testing, and debugging of a complex codebase, significantly improving system reliability and performance.

Worked closely with cross-functional teams to communicate technical outcomes and ensure alignment with business objectives.

Education

University of California, Berkeley (UCF) Berkeley, CA 2009 – 2013

Bachelor’s Degree Computer Science

Certifications

AWS Certified Machine Learning – Specialty

Microsoft Certified Azure Data Scientist Associate

Google Professional Machine Learning Engineer

Certified TensorFlow Developer

OpenAI API Professional Certification

Data Science Professional Certificate (Coursera)

Contact this candidate