Won Jin
Data Scientist/Machine Learning Engineer
Gmail: *********@*****.*** Phone: 770-***-****
Professional Summary-
•9+ years of experience working on Data Science and Machine Learning projects.
•Expertise in a multitude of Artificial Intelligence / Machine Learning (AI/ML) Algorithms: Logistic Regression, Decision Trees, Random Forest, XGBoost, KNN, SVM, Linear/Polynomial Regression, Deep Learning / Neural Network Architectures (ANN, CNN, RNN, GAN), K-Means Clustering, Hierarchical Clustering, DBSCAN, Spectral Clustering, Collaborative Filtering, PCA.
•Proficient in harnessing generative AI to create compelling, data-driven content, enhancing communication and engagement in various professional contexts.
•Worked on statistical analysis, exploratory data analysis, data visualization, data preprocessing, feature engineering, dimensionality reduction, predictive modeling, design of recommendation systems, built applications in Computer Vision, used Natural Language Processing (NLP) techniques for sentiment analysis, text summarization, model deployment, reporting.
•Adept in programming languages like R and Python, including Big Data technologies like Spark, Hadoop ecosystem tools, HIVE, HDFS, Spark SQL, and PySpark, Spark ML.
•Skilled in using Dplyr, SciKit, Ggplot2, Pandas, Numpy, Matplotlib, Seaborn, Sklearn, TensorFlow, PyTorch, Keras
•Established scalable, efficient, automated processes for large-scale data analyses, model development, model validation, and model implementation.
•Experienced working on MLOps tools in cloud platforms like AWS, GCP, Azure, CI/CD pipelines, Kubernetes, Container
•Detail-oriented, Creative Thinker, Problem solver, Versatile, Results-Driven, Fast-Learner
•Demonstrated ability to distill high-performance solutions from data to drive business strategy.
Skills
Programming Languages, Frameworks, Solutions: Python, Java, R, R-Shiny, JavaScript, SQL, MATLAB, SPSS, Minitab, Hive, Spark, Scala
Deep Learning: TensorFlow, Keras, PyTorch Machine perception, Data Mining, Machine Learning algorithms, Neural Networks
Data Science Specialties: Machine Learning, Natural Language Processing, Internet of Things (IoT) analytics, Social Analytics, Predictive Maintenance
Modeling and Methods: Multivariate analysis, Stochastic Gradient Descent, Bayesian Analysis, Inference, Models, Regression Analysis, Linear models, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics
Databases: Azure, Google, Amazon RedShift; HDFS, RDBMS, SQL and NoSQL, data warehouse, data lake, and various SQL and NoSQL databases and data warehouses.
IDEs: HiveQL, Spark, Spark SQL, Storm, Scala, Impala, MapReduce, Kinesis, EMR Jupyter, Spyder, IntelliJ, Eclipse
Data Frameworks: R, Python
Analytic Tools: Random Forest, Gradient Boosting Machine (GBM), TensorFlow, Classification and Regression Trees (CART), CNN, PCA, RNN, Regression, Naïve Bayes
Visualization: Tableau, R, R Shiny, GgPlot2, PowerBI, Seaborn, Matplotlib
Version Control: GitHub, Git, SVN
Soft Skills: Deliver presentations and highly technical reports; collaborate with stakeholders and cross-functional teams, advise how to leverage analytical insights; develop clear analytical reports that directly address strategic goals.
Professional Experience
Sr Data Scientist - Accenture, (Client: SEGE (San Diego Gas & Electric)/ Sempra), Sugar Hill, GA (Remote)
January 2023 to Current
Led the project and contributed to the development of an interactive application tailored for SDGE stakeholders. This innovative app is designed to empower users by enabling them to effortlessly search for any asset currently in use by the company. Once a user initiates a search, the application seamlessly retrieves and presents comprehensive information related to the selected asset. This project represents a significant milestone in leveraging data science to enhance operational efficiency and decision-making within SDGE.
•Worked on speech-to-text, speech-to-intent, and text mining.
•Developed and implemented predictive maintenance systems using computer vision to detect issues in poles and wires, contributing to improved operational efficiency and reduced downtime.
•Utilized deep learning-based object detection algorithms, such as EfficientDet and YOLO, to identify and localize poles and wires in images, enhancing the accuracy of maintenance predictions.
•Trained and evaluated convolutional neural networks (CNNs) to classify detected objects into maintenance categories, enabling proactive scheduling of maintenance activities.
•Implemented Generative AI, NLP, and Large Language Models (LLMs) to automate the extraction of information from plain text PDFs, resulting in efficient data structuring and storage.
•Implemented natural language processing and statistical modeling-based approaches for process optimization.
•Established innovative standards for code development in close collaboration with team members, ensuring a consistent and efficient approach to project execution
•Leveraged the Python Lifelines library to conduct survival analysis, contributing to the application's development and enhancing its analytical capabilities
•Performed data pre-processing and cleaning to prepare data sets for further statistical analysis
•Solved analytical problems, and effectively communicated methodologies and results
•Refactored legacy code to minimize redundancy and improve code readability, laying the foundation for future use and maintainability
•Developed a reusable functions library that can be utilized by the team, streamlining development processes and fostering code reusability
•Cleaned and verified data from multiple sources, ensuring data integrity and reliability for subsequent analysis
•Maintained open and constant communication with stakeholders, I provided meaningful data and insights to support informed decision-making
•Conducted rigorous unit testing and actively participated in code reviews to uphold code quality and best practices
•Involved in extensive data manipulation, including aggregation, filtering, and mapping, was performed using the Pandas library to prepare data for analysis
•Created and managed AWS Glue workflows, enabling the scheduling and simultaneous execution of multiple Glue job
•Designed and implemented Glue jobs and orchestrated the storage of outputs in S3 buckets, ensuring efficient data processing
•Employed Glue crawlers to facilitate the reading of data stored in S3 with Athena, enhancing data accessibility
•Created comprehensive documentation for stakeholders, elucidating the logic behind the Glue jobs and fostering a deeper understanding of the technical aspects
•Crafted technical documentation to facilitate knowledge transfer within the team, ensuring continuity in case of transitions or expansions.
•Programming experience with machine learning/deep learning frameworks such as TensorFlow, Keras, PyTorch Machine perception, Data Mining, Machine Learning algorithms, Neural Networks
Sr Data Scientist (AI-NLP) - Cisco Systems, San Jose, CA (Remote)
August 2021 to January 2023
Helped in building and maintaining Contact Center AI Rapid Response Virtual Agent, Cisco’s virtual assistant with capabilities for voice and chat.
•Worked on speech-to-text, speech-to-intent, and text mining.
•Implemented natural language processing and statistical modeling-based approaches for process optimization.
•Performed data pre-processing and cleaning to prepare data sets for further statistical analysis.
•Worked on NLP techniques of data cleaning, stemming, lemmatization, and embedding.
•Solved analytical problems, and effectively communicated methodologies and results.
•Constructed an NLP-based Deep Learning RNN model utilizing BERT embedding and LSTM layers in Tensorflow, Keras, and Python deployed on the Google Cloud Platform (GCP)
•Implemented MLOps by building a CI-CD pipeline using Kubeflow to automate preprocessing, training, hyper-parameter tuning, training, and deployment.
•Improved response time and response quality which led to the rapid adoption of virtual agents by businesses enabling them to respond to incoming inquiries on time, giving their customers 24/7 access, and reducing wait times for simple requests significantly while relieving much of the load from human customer care agents.
•Programming experience with machine learning/deep learning frameworks such as TensorFlow, Keras, PyTorch Machine perception, Data Mining, Machine Learning algorithms, Neural Networks
Machine Learning Engineer - Swift Transportation, Norfolk, VA (Remote)
September 2019 to July 2021
Implemented Big Data Analytics to optimize routing, streamline factory functions, and give transparency to the entire supply chain. The data came from a variety of sources, including enterprise systems, traffic analysis, weather forecasts, location information, mobile internet, and GPS-enabled smartphones which we analyzed to solve many logistics optimization problems. Our main goal was to save money and time by avoiding late delivery and proper utilization of resources. This included planning routes and managing staff scheduling. Data was utilized on destination locations, shipping areas, parking, and time of delivery to prevent backlogs at the unloading point and ensure deliveries were made at a time when they could be received.
•Data Extraction, Data Cleaning, and Integration, Feature Engineering, Modelling, Model Validation, Visualization, Reporting, Deployment
•Performed Ad-hoc reporting/customer profiling, and segmentation using R/Python.
•Created statistical models for the collected data, exploratory, and pre-processing, to provide conclusions with decision guides.
•Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
•Performed univariate and multivariate analysis of the data to identify any underlying pattern in the data and associations between the variables.
•Programmed a utility in Python that used multiple packages (Scipy, Numpy, Pandas)
•Implemented Regression and Classification using supervised algorithms: Linear Regression, Logistic Regression, Decision trees, KNN, Naive Bayes, Random Forest, and XGBoost.
•Validated machine learning classifiers using various model performance metrics.
•Built CI/CD pipelines using AWS Sagemaker to automate preprocessing, modeling, validation, deployment, and monitoring.
Data Scientist - Axiom, Atlanta, GA
April 2018 to September 2019
Axiom provides enterprise advisory solutions and risk management. The firm relies on big data analytics to provide strategic insights and reporting to outside clients. I worked on analytics projects for various clients, cleaning data and implementing time series to forecast product demand.
•Implemented time-series analysis on product demand data, tested stationarity, and analyzed trends and seasonality.
•Used SARIMAX, ARIMA, Deep learning-based RNN, etc. for time-series-based demand forecasting.
•Business Data Analytics - Involved in ETL/BI requirement gathering and conversion into useful functional requirements.
•Leveraged big data tools and supporting technologies for extracting meaningful insights from large data sets using components like Hadoop, Spark ML, and SparkSQL.
•Developed Spark modules for predictive analytics & machine learning in Hadoop.
•Involved in test data preparation and performed data profiling to learn about user behavior and merged data from multiple data sources.
•Utilized machine learning algorithms such as linear regression, multivariate regression, Naïve Bayes’, Random Forests, K-means, & KNN for data analysis.
•Source to target data Mapping document preparation.
•Developed report wireframes along with SQL schema definitions.
•Worked with Data Warehouse architecture and wrote SQL queries.
•Applied dimension modeling to identify dimension & fact tables and associated data elements.
•Familiar with wealth and asset management concepts as well as trading life cycles.
•Conducted in-depth data analysis on the reports/dashboards to identify gaps.
•Designed, developed, and maintained daily and monthly summary, trending, and benchmark reports repository.
•Created Views and Table-valued Functions, Common Table Expression (CTE), and joins complex subqueries to provide the reporting solutions.
Data Scientist - Discover Financial Services, Riverwoods, IL
January 2017 to April 2018
Discover Financial Services lending products and payment services go far beyond consumer credit, to enable millions of merchants to run their businesses, explore new markets, power fin-tech companies, and most importantly—help people to build a brighter financial future. My role involved working with a Data Science team to implement machine learning algorithms to detect anomalies and classify fraud transactions. Logistic Regression and Convolutional Autoencoders were used to detect and identify anomalies, and a Rest API-based deployment was integrated with Fraud Prevention systems.
•Consulted with regulatory and subject matter experts to gain a clear understanding of information and variables within data streams.
•Produced solutions using HBase Scala, Spark, Spark Streaming, Python, Hadoop, Kafka, Cassandra, MongoDB, MLLib, and a variety of machine learning methods, including classifications, regressions, and dimensionally reduction.
•Developed Spark/Scala and Python for regular expression (regex) projects in the Hadoop/Hive environment.
•Produced machine learning models for predicting signs of fraud.
•Implemented models in SAS interfaced with MSSQL databases and scheduled updates on a timely basis.
•Worked with unbalanced data issues using Synthetic Minority Over Sampling, SMOTE, and TOMEK LINK algorithms.
•Worked with huge datasets from Big Data with Hadoop, Spark, HDFS, and Map Reduce.
•Created a portfolio risk dashboard covering all aspects of the credit lifecycle for retail unsecured loans.
•Contrived information from structured and semi-structured data elements collected from both internal and external sources.
•Participated in the peer review process to ensure the correctness, accuracy, and quality of work produced by the Data Science and Engineering teams.
•Performed unit testing of peer code as part of the peer review process.
Data Scientist - The Hershey Company, Derry Township, PA
May 2014 to January 2017
The Hershey Company is a multinational company and one of the largest chocolate manufacturers in the world. It also manufactures baked products, such as cookies and cakes, and sells beverages like milkshakes, and many more that are produced globally. I served on a Data Science team that utilized various Natural Language Processing (NLP) techniques on data sources mined from a large number of external sources to do sentiment analysis covering multiple products produced by the company.
•Constructed cutting-edge Deep Learning techniques like Bidirectional Long-Short-Term Memory (BLSTM) architecture and Recurrent Neural Networks using Python and TensorFlow for accurate named entity recognition.
•Worked with TensorFlow and Python to build proofs of concept for new models, new features for existing models, and new internal products, and then went to live production and deployed these as finished models, features, or products.
•Worked with Apache Spark and PySpark for data streaming to ensure models were continuously updated with new data about the evolution of markets.
•Implemented Lambda Architecture with Apache Spark on an AWS EMR cluster for data streaming, as well as longer-term data storage and mining.
•Worked with Python to build internal packages to either automate or streamline certain workflows in continuous model validation, reporting, data cleaning, data quality testing, and reporting and dependency checking within a given project.
•Performed continuous validation, testing, and implementation of models, and integration of new features.
•Performed testing on new features from new data sources to improve model accuracy in all areas.
Education
•Bachelor of Science Degree in Statistics, minor in Computer Science, University of Illinois Urbana – Champaign