Senior Data Scientist with over 10 years of experience in Data Science and Machine Learning domain providing analytics and custom development for specific business use cases. Expert Skills in mathematics and analytics with hands-on development of machine learning algorithms, deep learning, and data modeling to derive innovative solutions to enhance performance, productivity, and quality of deliverables in any industry.
Experience in the application of Naïve Bayes, Analysis, Neural Networks/Deep Neural Networks, and Random Forest machine learning techniques.
Creative thinking/strong ability to devise and propose innovative ways to look at problems by using business acumen, mathematical theories, data models, and statistical analysis.
Discover patterns in data using algorithms and use experimental and iterative approaches to validate findings.
Advanced statistical and predictive modeling techniques to build, maintain, and improve real-time decision systems.
Work with product managers to productize algorithms and solutions.
In-depth knowledge of statistical procedures that are applied in both Supervised and Unsupervised Machine Learning problems.
Machine learning techniques for marketing and merchandising ideas.
Advanced analytical teams to design, build, validate, and refresh data models.
Excellent communication skills (verbal and written) to communicate with clients/stakeholders and team members.
Ability to quickly gain an understanding of niche subject matter domains, and design and implement effective novel solutions to be used by other subject matter experts.
Experience implementing industry-standard analytics methods within specific domains and applying data science techniques to expand these methods, for example, using Natural Language Processing methods to aid in normalizing vendor names, implementing clustering algorithms, and deriving novel metrics.
Experience with AWS, GCP & Azure-Data Bricks platforms
Technical Skills Summary
Python, R, MATLAB, SQL
NumPy, Pandas, SciPy, TensorFlow, PyTorch, Matplotlib, Seaborn
Google Colab, Jupyter, Spyder, RStudio, CodeBlocks
Natural Language Processing & Understanding, Machine Intelligence, Machine Learning algorithms
Machine Perception, Data Mining, Machine Learning algorithms, Neural Networks, TensorFlow, Keras
Text understanding, classification, pattern recognition, recommendation systems, targeting systems, ranking systems
Applied Data Science
Natural Language Processing, Machine Learning, Predictive Maintenance
Advanced Data Modelling, Forecasting, Predictive, Statistical, Sentiment, Exploratory, Stochastic, Bayesian Analysis, Inference, Models, Regression Analysis, Linear models, Multivariate analysis, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics, Decision Analytics, Big data and Queries Interpretation, Design and Analysis of Experiments
Classification and Regression Trees (CART), Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, RNN, Regression, Naïve Bayes, BERT, LSTM, CNN
Bayesian Analysis, Statistical Inference, Predictive Modelling, Stochastic Modelling, Linear Modelling, Behavioural Modelling, Probabilistic Modelling, Time-Series analysis
Excellent communication and presentation skills; ability to work well with stakeholders to discern needs accurately, leadership, mentoring, and coaching
Amazon Web Services (AWS), Google Cloud Platform (GCP), Azure
Senior Data Scientist
DuPont, Wilmington, DE (Remote), July 2020 – Present
DuPont is a chemical company operating on a global scale across many product areas. As a Senior Data Scientist, I led the Spark Digital team at DuPont supporting several different projects across several technologies and businesses. The primary technological focus of these projects was on NLP, Time Series, manufacturing process optimization, and model deployment and automation. Within each of these projects, a dialogue was established with stakeholders to identify the primary problem and subsequent solution that best fit the needs of the business, followed by model development and continuous feedback by the business, and finally ending with either analysis or a productionalized model being deployed and delivered. Used Google Cloud Platform
In-depth analysis and diagnosis of the business problem to solve the root issue and ensure maximum value add.
Continuous discussions with stakeholders to develop solutions effective for the business.
Experience working with a large international team across multiple projects.
Used Google Colab, BigQuery, GCP DataFlow and Vertex AI
Presentation of results, challenges, and roadmaps to executives.
Autonomous extraction of targeted information from unstructured text using LDA, hLDA, BERT, ELMo, RNN, LSTMs, and other NLP techniques.
Models were created using deep-learning algorithms in Python through frameworks such as TensorFlow.
Implementation of data-driven visualizations displayed on a front-facing dashboard.
Utilization of batch and bash scripting to automate backend tasks.
Development of a solution visible to a high level within the business against a complicated technology stack.
Management of technical and business interests.
Custom model innovation based on the unique conditions of each project.
Rapid prototyping and PoC generation to help kickstart new projects.
Data flow and aggregation management across multiple systems through efficient Python scripting.
Analysis of metrics to validate model performance and ensure that model value is evaluated consistently and apparent.
Design of creative deployment solutions working within the framework of technology presently used by the business.
Created a CICD pipeline to automate end-end machine learning model-driven product
Implemented on Hadoop environment using Spark tools like PySpark, Python, Spark SQL, Spark ML, etc.
Explored cloud platform services in AWS, GCP, and Azure.
Led knowledge transfer sessions to ensure business ownership of the developed solution through the instruction of personnel.
Senior Data Scientist
Enhance IT, Atlanta, GA, October 2019 – July 2020
Enhance IT is a contracting company involved in a variety of technologies operating primarily out of Atlanta, Georgia. Recently, as the Covid-19 pandemic has spread across the US hospitals have found their resources spread thin, particularly staff and diagnostic tools such as CT scans to help diagnose patients. Subsequently, there is a demand in the market by clients for technology to speed up this process and increase the number of patients that can be treated with the same amount of personnel and equipment. To help accomplish this, a model was constructed using a convolutional neural network to analyze various CT scans and identify cases of Covid-19 from other respiratory ailments.
Worked as part of a team of experienced data scientists in a collaborative effort.
Classification of Covid-19 and Non-Covid-19 cases using models built in Python using TensorFlow and Keras.
Cleaned and pre-processed data by removing images with labels and markings along with resizing images to a standardized size.
Used a generator to rotate and mirror images to synthetically increase image pool size and introduce rotational invariance to the model.
Constructed several Autoencoder and classification models both supervised and unsupervised with various combinations in between.
Constructed several Convolutional Neural Networks based on architectures such as VGG-16 and the Inception-V1.
Optimized model architecture for the data and resources available. The final product achieved an 83% overall test accuracy with 78% sensitivity, compared to market-available products with 67% sensitivity.
Data Scientist and Machine Learning Engineer
Associated Press, Austin, TX, April 2017 – August 2019
Associated Press is an online news outlet that handles a large amount of incoming information from various other sources including social media feeds. However, much of the information gathered is not conducive to making articles, either due to the information not being news or incorrect. To solve this issue and reduce the amount of time employees needed to spend manually sorting the information, an NLP-based filter was constructed and trained on Twitter tweets. This filter sorted these feeds into categories of Spam, Reviews, and News as well as sorted them into good, bad, and neutral categories. Finally, once the news was identified, it was further filtered into real or fake news with the final output being only real news and the overall sentiment of that news.
Classified 16,000 articles and 1,500 tweets to build a complete dataset for the model.
Cleaned text to standardize input to the model and ensure consistent results
Built functions to automatically remove symbols, hyperlinks, and emojis, and do spell-check of incoming text.
Build exception handling to treat potential edge cases of incorrect or unusable data being fed to the model in production.
Performed stemming and lemmatization of text to remove superfluous components and make the resulting corpus as small as possible while containing all important information.
A bag of words composed of around 30,000 unique words was compiled and built from scratch using both NLTK and TensorFlow packages for text processing and tokenization.
Concept space embedding via ELMo was also tested and found to have similar results to a bag of words with a significant increase in computational time.
Constructed an NLP-based filter utilizing embedding and LSTM layers in TensorFlow and Keras.
Sorting is subsequently done through the training and testing of an artificial neural network.
Produced classifications of whether the given text was news or fit into other categories of potential interest, such as spam.
Ran sentiment analysis of text and determined whether the text was overall positive, negative, or neutral.
Average classification accuracy of 93.4% was achieved, well above the target accuracy for the project of 85%.
Deployed solutions to a Flask app on a cloud-based service (AWS) to which future user applications connected via an API.
Further tested and compared this solution to those of AWS Sagemaker’s Comprehend, which achieved a slightly higher accuracy of 94.7% than the previous model.
Finalized model was then handed over to Android and iOS app developers along with web developers to create a user front-end.
Client ultimately elected to implement the model previously constructed from scratch due to costs associated with Sagemaker.
Micron Semiconductors, Austin, TX, February 2014 – April 2017
Micron Semiconductors is a manufacturer of various semiconductor devices from processors to sensors. To further the development of products new and interesting materials need to be tried and tested, however, the growth and experimentation of these materials become very expensive when attempting to test every possible combination. Therefore, there is an interest in using cheaper, faster computational tools to identify materials that possess the desired properties for a device before actual fabrication. Using a first-principles quantum mechanics code, large-scale calculations were run for a variety of materials on a Linux-based computer cluster.
Calculated properties of crystal structures via quantum mechanical first principles using the Vienna ab initio Simulation Package (VASP).
Utilized Linux high-performance computing cluster to compile and run calculations.
Automation of VASP jobs using Unix terminal commands and Bash scripting.
Worked with IT personnel managing and maintaining the cluster to compile executables and optimize computational tasks.
Post-processing data analysis of completed self-consistent calculations done with Python and MATLAB.
Plots of material properties such as band structure and density of states generated using Matplotlib and MATLAB.
Creation of instructive visualizations using molecular modeling software such as VESTA.
Electronic and optical properties for a variety of materials were determined and delivered to the client for further experimentation and fabrication.
Overall stochiometric trends analyzed using machine learning tools such as KNN, Logistic Regression, and Decision Tree classifiers.
Comprehensive reports and documentation written using LaTeX and subsequently presented to stakeholders.
BNYM, Somerset, NJ (Remote), June 2012 – January 2014
Bank of New York Mellon is a large fiscal institution with several branches located across the eastern seaboard of the US. The overall goal of this project was to decrease and minimize the loss of revenue caused by time and resources used to locate and retrieve older handwritten documents through the digitization of these documents into an easily accessible and searchable database. Since raw images and scans take up a large amount of memory, several models were constructed and compared to digitize characters and values from a scanned image using Bayesian and Decision Tree-based classifiers. The best performance was obtained with a Random-Forest model that was then deployed on an AWS EC2 instance and available to be queried by the client through an API.
Worked as part of a team with each member working on dedicated tasks contributing to an overall product.
Used R and Python to retrieve and clean data before implementation and model training.
Trained using multiple character datasets to introduce as much handwriting invariance into the model as possible.
Several models were constructed to compare performance and computational cost and select the one which best met the needs of the client.
Tested multiple types of Bayesian classifiers, including those utilizing Bernoulli, and Gaussian distributions.
Performance of digit recognition on par with human accuracy obtained using gradient-boosted trees within a Random-Forest ensemble.
Utilized Tesseract v2 through the command line and OCRFeeder for image segmentation and classification.
Analyzed the performance of various models based on speed and accuracy in a simulated production environment.
Assisted in the deployment of the overall model onto an AWS EC2 instance and with setting up APIs for future access and use.
Master of Science in Physics - Texas State University, San Marcos, TX
Bachelor of Science in Physics - Texas State University, San Marcos, TXormance Tuning/Optimization, Integrity, Fourier Analysis, Personal Branding, Software Development, Professional, Risk, Disability Awareness, Consulting, Give Clear Feedback, Bash Scripting, Stress Management, Market Share, Perseverant, Data Management, Negotiation, Team Lead/Manager, Delegation, Microsoft Transact-SQL (T-SQL), Talent Management, SAP, Public Speaking, Apache Cassandra, Accept Feedback, Automation, Acuity, Pytorch, Optimistic, Apache Pig, Goal-Setting, Web Production, Non-Verbal Communication, Open Source, Focus, Boosted Trees, Diplomacy, Data Formats, Coaching, Image Processing, Open-Minded, Amazon Web Services (AWS), Presentation Skills, SQLite, Discipline, Statistics, Idea Exchange, Training Program Evaluation, Decision-Making, HDFS (Hadoop Distributed File System), Observation, Systems Administration/Management, Agility, Cost Control, Highly Organized, Risk Management, Active Listening, Artificial Intelligence, Diversity Awareness, Query Analysis, Artistic Sense, Seaborn, Empathy, Business Problem Solution, Supervising, Writing Skills, Selflessness, Refactoring, Managing Difficult Conversations, Mathematics, Influential, User Acceptance Testing, Versatility, Reporting Skills, Design, Systems Maintenance, Taking Criticism, Stochastic Analysis, Verbal Communication, Cloudera, Deal with Difficult Situations, SQL (Structured Query Language), Written Communication, Team Foundation Server (TFS), Remote Team Management, Publications, Trainable, MySQL, Interviewing, Statistical Programming Languages, Clarity, Data Science, Generosity, DNS (Domain Name System), Sensitivity, Naive Bayes, Tolerance of Change and Uncertainty, Electrocardiogram, Persuasion, Production Control, Reframing, Matplotlib, Troubleshooting, RPC (Remote Procedure Call), Streamlining, Microsoft Visual Basic for Applications (VBA), Imagination, Credit Reports, Collaborative, Apache CouchDB, Dependability, Analysis Skills, Insight, Business Practices, Mediation, Market Research, Cooperation, Microsoft Visio, Intercultural Competence, Training/Teaching, Deal-Making, Performance Testing, Conflict or Dispute Resolution, Business Strategy, Resilient, Prophet, Memory, XG Boost, Sales Skills, Tuition Fees, Facilitating, Decision Support, Attentive, Statistical Analysis System (SAS), Calm, Offshoring, Introspection, Machine Tool, Friendliness, Human-Cyborg Relations, Inspiring, Big Data, Project Management, NetFlow, Lateral Thinking, PostgreSQL, Task Planning, Predictive Modeling, Authenticity, Algorithms, Results-Oriented, Business Intelligence, Emotional Intelligence, Scrum Project Management and Software Development, Interpersonal Relationships Skills, Security Attacks, Business Ethics, Subversion, Curiosity, Spectrum Analyzers, Storytelling, Data Entry, Punctual, Software Engineering, Persistence, Flask, Allocating Resources, Turing Test, Design Sense, C Programming Language, Mind Mapping, Microsoft Windows Operating System, Questioning, Automotive Industry, Humility, Gradient Boosted Machines, Motivated, CentOS, Tolerance, Operational Audit, Cultural Intelligence, Presentation/Verbal Skills, Commitment, Mercurial, Office Politics Management, Statistical Modeling, Critical Observation, Leadership, Visual Communication, System Test, Inspiration, Query Optimization, Personality Conflicts Management, Business Intelligence Software, Divergent Thinking, Data Mining, Analysis, MongoDB, Responsible, Atlassian JIRA, Experimenting, Data Profiling, Brainstorming, Unit Test, People Management, Best Practices, Humor, Optical Character Recognition (OCR), Positive Reinforcement, Software Architecture Design, Coordination, Technical Presentation, Confidence, tf-idf (term frequency€“inverse document frequency), Initiative, Apache Hive, Emotion Management, Sales, Competitiveness, Record Keeping, Time Awareness, GitHub, Self-Awareness, Performance Modeling, Team-Building, KNN, Virtual Team Management, Performance Analysis, Personal Time Management, Java Script, Planning, Time Management, Mentoring, Markov Chains, Coping, Azure, Networking, Microsoft Azure, Strategic Planning, Input/Output, Social Skills, Clustering, Recall, NCR Teradata, Reliable, Database Report Tools, Meeting Management, Database Administration, Innovation, Business Model, Task Tracking, IBM Product Family, Sense of Urgency, Oracle Applications, Scheduling, Robotics, Logical Reasoning, Performance Metrics, Encouraging, Linux Operating System, Patience, Random Forest, Work-Life Balance, Data Collection, Independence, Programming Tools, Prioritization, SQL Databases, Organization, Data Modeling, Respectfulness, Requirements Management, AWS, Furniture, Sagemaker, Computer Hacking, Network Monitoring, Telephone Skills, Mentoring, Data lake, Project Design, Financial Reporting, Multivariate Analysis, Software Installation, Microsoft SharePoint, CSS (Cascading Style Sheet), Numba, Data Quality, Financial Management, Eclipse IDE, Academic Advice, Unix Shell Programming, Illustrating Ability, Campaigns, Insurance, AllFusion Erwin, Business Analysis, Risk Analysis, Pandas, Logistic Regression, Unix Operating Systems, Identify Issues, VMWare, Regular Expressions, Amazon Elastic Compute Cloud (EC2), Microsoft SQL Server, Home Automation, Java, Production Machining, Worker's Compensation, Customer Research, Quantitative Analysis, RPA, Machine Learning, C++ Programming Language, Node.JS, Style Sheets, Develop Methodologies, Physics, MapReduce, Microsoft Visual Studio, Test Strategy, Manufacturing, Content Development, Linear Algebra, Reinforcement Learning, Project/Program Management, Ecosystems, Neuron, Pivot Tables, IEEE (Institute of Electrical and Electronic Engineers), Industrial Research, Problem Solving Skills, Usability Engineering, Logistics, Desktop PC, Neural Networks, Image Processing, Scientific Research, Network Performance/Analysis, Banking Services, Multi-Layer Perceptron, Artificial Neural Networks, Data Processing, MATLAB, TensorFlow, Internet Security, Computer Programming, Web Application Framework, Testing, Multitasking, Cardiac Monitoring, Face Recognition, Quality Control, Customer Support/Service, Waterfall Model of Software Development, Generative Adversarial Networks, Mining Methods, User Interface/Experience (UI/UX), Process Improvement, Secondary School, Oracle Database, Customer/Consumer Behavior, Organizational Skills, Electrical Engineering, Ridge Regression, Datasheets, IBM SPSS Statistical Package, Strategic Analysis, Computer Vision, Windows PowerShell, User Interface Design, Graphical User Interface (GUI), Oracle PL-SQL, Neurotrauma (Traumatic Brain Injury), Data Analysis, Stakeholder Presentation, Workflow Analysis, Signal Processing, Transformation Tools, Amazon Simple Storage Service (S3), Concrete, Object Oriented Programming (OOP), Relational Databases (RDBMS), Metrics, LifeTime Value (LTV), Statistical Algorithms, Horton Works, Agile Programming Methodologies, Radio Frequency, Database Technology, Interpersonal Skills, Econometrics, Detail Oriented, Budgeting, Application Programming Interface (API), Java, Market Segmentation, Mail Services, Web Scraping, Credit Risk, Kernel Programming, Data Warehouse, R Programing Language, Principal Component Analysis (PCA), Forecasting, Cross-Functional, Bootstrap, Internet Application, Time Series Analysis, Operating Systems, Advanced Data Visualization, Git, Marketing Strategy, Customer Surveys, Investment Services, IDE (Integrated Development Environment), LaTeX Typesetting, Modeling Languages, Bayesian Networks, Validation Testing, Process Modeling, Customer Conversion, Data Visualization, Outbound Marketing, Microsoft Office, GPU Processing, Scala Programming Language, Legal, Data Bricks, Use Cases, Hidden Markov Modeling, Objective C, Health Insurance, Pattern Matching, AI, Management Strategy, Plotly, Microsoft Product Family, GG Plot, Data Structures, Strategic Planning, Dynamo Db, Software Development Lifecycle (SDLC), Computer Linguistics, Data Migration, Scripting (Scripting Languages), Customer Churn, Business Process Management, NumPy, Spark, Other Skills, Recruiting/Staffing/Hiring, United States Department of Justice (DOJ), Text Mining, Artificial Intelligence (AI), NoSQL, IP (Internet Protocol), Database Extract Transform and Load (ETL), Microsoft Windows Azure, Product Pricing, 3-Laws Compliance, Cardiovascular Disease, Investment Management, Social Media, Prototyping, Online Marketing, Oracle, Natural Language Processing (NLP), Programming Languages, Econometrics, Kerberos, Support Vector Machines, Microsoft Word, Microsoft Excel, K-Means Clustering, JSON, Human Resources, Cloud Computing, Reporting Dashboards, HTML (HyperText Markup Language), Memory Hardware, Marketing, Keras, Quality Assurance, Apache, Variance Analysis, Microsoft PowerPoint, Cloud Storage, Needs Assessment, Geometry, Python Programming/Scripting Language, Wireless Software, Expense Analysis, Demographics, Computer Security, Analysis Software, Communication Skills, Name, Retail, Audiovisual, Oracle Development Tools, Cost Estimates, Systems Analysis, Computer Skills, Team Player, Gesture Recognition, Surveillance, Mobile Applications, SQL Server Reporting Services (SSRS), Functional Testing, Advertising, SQL Server Integration Services (SSIS), Machine Vision, Simulation, Cuda, Hyperparameter Tunning,