Data Machine

Location:

Walpole, MA

Posted:

April 17, 2023

Contact this candidate

Resume:

Abel Georges

DATA SCIENTIST

Phone: 872-***-**** Email: ****.*********@*****.***

TECHNICAL SKILLS

COMMUNICATION SKILLS verbal, written, presentations

LEADERSHIP supports project goals, and business use cases and mentors team

QUALITY continuous improvement in project processes, workflows, automation, and ongoing learning and achievement

CLOUD Analytics in cloud-based platforms (AWS, MS Azure, Google Cloud)

ANALYTICS Data Analysis, Data Mining, Statistical Analysis, Multivariate Analysis, Stochastic Optimization, Linear Regression, ANOVA, Hypothesis Testing, Forecasting, ARIMA, Sentiment Analysis, Predictive Analysis, Pattern Recognition, Classification, Behavioral Modeling

PROGRAMMING LANGUAGES Python, R, SQL, Scala, Java, MATLAB, C, SAS, F#

LIBRARIES NumPy, SciPy, Pandas, Theano, Caffe, SciKit-learn Matplotlib, Seaborn, TensorFlow, Keras, NLTK, PyTorch, Gensim, Urllib, BeautifulSoup4, MxNet, Deeplearning4j, EJML, dplyr, ggplot2, reshape2, tidyr, purrr, readr

DEVELOPMENT Git, GitHub, Bitbucket, SVN, Mercurial, PyCharm, Sublime, JIRA, TFS, Trello, Linux, Unix

DATA EXTRACTION AND MANIPULATION: Hadoop HDFS, Hortonworks Hadoop, MapR, Cloudera Hadoop, Cloudera Impala, Google Cloud Platform, MS Azure Cloud, both SQL and NoSQL, Data Warehouse, Data Lake, SWL, HiveQL, AWS (RedShift, Kinesis, EMR, EC2)

MACHINE LEARNING Supervised Machine Learning Algorithms (Linear Regression, Logistic Regression, Support Vector Machines, Decision Trees and Random Forests, Naïve Bayes Classifiers, K Nearest Neighbors), Unsupervised Machine Learning Algorithms (K Means Clustering, Gaussian Mixtures, Hidden Markov Models, D), Imbalanced Learning (SMOTE, AdaSyn, NearMiss), Deep Learning Artificial Neural Networks, Machine Perception

APPLICATIONS: Recommender Systems, Predictive Maintenance, Forecasting, Fraud Prevention and Detection, Targeting Systems, Ranking Systems, Deep Learning, Strategic Planning, Digital Intelligence,

WORK EXPERIENCE

ABBVIE April 2022 – Current

Senior Data Scientist - Consultant Chicago, IL

Worked within Business Technology Solutions – Enterprise & Integration Services at AbbVie as a Sr. Data Science Consultant to implement and advise on Data Strategies around the organization. Worked with various stakeholders within the organization to develop proofs-of-concept and training for modern data reporting and visualization tools.

Technologies Used: Python, SQL, Excel, SPSS Modeler, PowerBI, Dataiku, AWS SageMaker, Alteryx, SnowFlake, Alation.

•Delivered a broad range of Data Science, Machine Learning, and Artificial Intelligence engagements in a variety of areas such as supervised and unsupervised machine learning, natural language processing, and optimization

•Bringing technical and practical insight into how clients might use data science and AI more effectively to manage risk across their business, both as part of specific projects and as an added value to the services we currently provide

•Collaborated with Procurement & Supply Management to explore and advise on the latest reporting and visualization tools for enterprise-level savings.

•Advised Sales Planning and Analytics, Marketing Analytics, Business Insights, Patient Services, Customer Operations, Tax, and Human Resources departments on transitioning from Alteryx or legacy reporting tools to Dataiku platforms.

•Developed requirements to identify potential workflow blockers and key platform changes to establish proof of value and ensure legacy tool retirement.

•Trained and advised key analytics personnel around the organization to implement modern analytics solutions.

•Reviewed and advised on various machine learning frameworks, libraries, data structures, data modeling, and software architecture.

•Reviewed and implemented complex Excel Macros and Alteryx workflows to Dataiku workflows.

•Up-to-date with new machine learning techniques, technologies, architectures, and languages where necessary

•Owning and running the agile backlog for data science analytical activity

•Led a Data Science team of Data Engineers, Data Scientists, and ML Engineers to the successful delivery of products and services

•Conducted data analysis using advanced skills in statistics and database management

•Promotef new techniques and approaches continuously review and adapt external developments

•Acting as an internal enablement resource for client-facing sales and presales resources, Marketing Resources

•Participated in the implementation of and use of leading-edge data warehouse / big data technologies including software, end-user tools, and other data services

THRASIO February 2021 – April 2022

Machine Learning Engineer - MLOps Walpole, MA

Thrasio is an Amazon aggregator Start-up that acquires third-party merchants to manage and grow brands in either the Amazon marketplace and/or various direct-to-Consumer strategies. I worked within the Data Analytics team as a Machine Learning Engineer to develop data pipelines, MLOps pipelines, OCR models, reports, and data validations.

•Built a pipeline for the Supply Chain team to produce weekly Inventory Reconciliation Reports, which included Inventory Mismatch Reporting, Missing Shipments, and Transfer Order completeness.

•Worked on computer vision-based OCR models

•Implemented CNN-based Tesseract with BERT for named entity recognition

•Utilized Python, Pytesseract, OpenCV, and Tensorflow for this computer vision and NLP-based OCR problem

•Worked with Finance to troubleshoot and enhance Settlement Payment Reports for cash reconciliation.

•Built pipeline to map SKUs to corresponding UPC/EAN.

•Validated data for MWS to SPAPI transition.

•Collaborated with cross-functional teams of data scientists, user researchers, product managers, designers, and engineers passionate about our consumer experience across platforms and partners.

•Performed analyses on large sets of data to extract impactful insights on user behavior that helped drive product and design decisions.

•Worked with the Python package Pandas and FeatureTools for data analytics, cleaning, and model feature engineering.

•Updated Python scripts to match training data with our database stored in AWS Cloud Search so that we could assign each document a response label for further classification.

•Performed Supply Chain Reporting analytics and ran in PowerBI.

•Built dashboards in Periscope for internal usage and reporting.

•Extracted source data from Amazon Redshift on the AWS cloud platform.

•Built, trained, and deployed machine learning models using Amazon SageMaker.

•Designed and implemented a CI CD pipeline to automate model development and model deployment

•Setup model training pipelines to be triggered when model drift detected

•Utilized AWS Sagemaker MLOps tools

•Utilized Snowflake cloud data warehousing platform.

•Utilized Data Build Tool (DBT) to create complex models, use variables and macros (aka functions), run tests, and generate documentation.

•Used Databricks to support running collaborative analytics efforts.

•Applied Predictive Modeling, Data Mining Methods, Factor Analysis, Hypothetical Testing, and Normal Distribution.

•Transformed business requirements into analytical models, designed algorithms, built models and developed data mining and reporting solutions that scaled across the company’s massive volume of structured and unstructured data.

BANK OF NY MELLON November 2019 – February 2021

Data Scientist & ML Engineer Florham Park, NJ

Worked with the Identity Management Team within the Information Security Division to develop self-service tools for internal employees. Worked to establish cloud controls for identity governance and assess risks associated with cloud service providers.

•Use machine learning and statistical modeling techniques to develop and evaluate algorithms to improve performance, quality, data management, and accuracy

•Managed version control setup for the phantom platform using Git.

•Implemented MLOps for automation of model training and deployment

•Used Jenkins for CI CD pipelines.

•Set up a playbook for events and classification containers.

•Developed an app called risk hub and moved it to production deployment using Django.

•Contributed to the design and prototyping of medium to high-complexity machine learning systems

•Work with product managers to formulate the data analytics problem

•In charge of cleaning and debugging datasets and the codebase before applications reach QA.

•Used RASA to build a POC web app to show chatbot usage

•Provided recommendations for controls for implementation of IAM on the cloud.

•Perform analysis of user profiles and current application entitlements based on user profiles, organization, departments, and groups.

•Built a recommending system to auto-provisioning applications and platform access to new employees/contractors so they are productive as soon as they onboarded

•Entitlement Analytics

•Data Analysis and Reporting

•Developed a Recommendation Engine for Entitlements and Applications

•Performed Evaluation of on-prem and cloud controls.

•Benchmarked on-prem identity management system vs cloud identity management systems

•Responsible for presenting findings to stakeholders.

•Selected and built dashboards for internal usage

•Familiar with Machine Learning modeling using python and frameworks like Tensorflow

•Implemented and cleans datasets for network accesses based on user profiles

MONTANA STATE FUND May 2018-November 2019

Data Scientist Helena, MT

Worked within the Enterprise Applications team as a Data Scientist. Project responsibilities include multivariate analysis to test the impact of the WorkSafe Champions Safety Program on policies. I also worked with the SIU (special investigations unit) to indicate the feasibility of detecting insurance fraud with machine learning and help gain insight into possible fraudulent claims or activity.

•Worked in a Cloudera Hadoop environment using Python, SQL, and Tableau

•HDFS (Cloudera): Pulled data from the Hadoop cluster.

•Worked within the Enterprise Applications team as a Data Scientist.

•Used Python, Pandas, NumPy, and SciPy for exploratory data analysis, data wrangling, and, feature engineering.

•Used Tableau and TabPy for visualization of analyses.

•Worked along with Business Analysts, Data Analysts, and Data Engineers.

•Consulted with various departments within the company including, SIU and Safety.

•Managed and matched claim numbers into fraud cases.

•Cleaned fraud data to be joined with the claims data (~73k observations)

•Research and Assess the Fraud Predictive Analytics scenario in terms of predicting outcomes for new claims

•Create a Tableau Dashboard that will help SIU present their Annual Report

•Tried kernel density estimation in lower dimensional space as a feature to predict fraud.

•Testing Anomaly Detection Models such as Expectation Maximization, Elliptical Envelopes, and Isolation Forests.

•Multivariate analysis of safety programs from the last 10 years.

•Used regression to determine the correlation of participation in the safety program with the outcome of claims.

•Hypothesis testing and statistical analysis were done to determine statistically significant changes in claims after participating in the safety program.

•Presented findings of impact testing.

•Workers Compensation fraud detection

•Prepared data for exploratory analysis

•Engineering actuarial formulas

•Collaborated with other Data Scientists with use cases that included workplace accident prediction and sentiment analysis.

HILTON HOTELS May 2017-April 2018

Data Scientist/NLP Engineer McLean, VA

Worked with NLP to classify text with data drawn from a big data system. Text categorization involved labeling natural language texts with relevant categories from a predefined set. One goal was to target users by automated classification. In this way, we could create cohorts to improve marketing. The NLP text analysis monitored, tracked, and classified user discussions about products and/or services in online discussions. The machine learning classifier was trained to identify whether a cohort was a promoter or a detractor. Overall the project improved marketing ROI and customer satisfaction.

•Oversaw the entire production cycle to extract and display metadata from various assets developing a report display that is easy to grasp and gain insights.

•Collaborated with both the Research and Engineering teams to productionize the application.

•Assisted various teams in bringing prototyped assets into production.

•Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficiency in Machine Learning, Data/Text Mining, Statistical Analysis, and Predictive Modeling.

•Utilized MapReduce/PySpark Python modules for machine learning & predictive analytics on AWS.

•Implemented assets and scripts for various projects using R, Java, and Python

•Built sustainable rapport with senior leaders.

•Developing and maintaining Data Dictionary to create metadata reports for technical and business purposes.

•Build and maintain a dashboard and reporting based on the statistical models to identify and track key metrics and risk indicators.

•Keeping up to date with the latest NLP methodologies by reading 10 to 15 articles and whitepapers per week.

•Extracting the source data from Oracle tables, MS SQL Server, sequential files, and Excel sheets.

•Parse and manipulate raw, complex data streams to prepare for loading into an analytical tool.

•Involved in defining the source to target data mappings, business rules, and data definitions.

•Project environment was AWS and Linux.

TD AMERITRADE November 2016-April 2017

Data Scientist Houston, TX

This Investment Portfolio project centered on Natural Language Processing and Time Series Analysis for Investment Portfolio Management. The project involved creating a data model to analyze news in predictive analytics relating to stock trending over 6-month periods. This analysis was used to re-balance stock portfolios. Explored various algorithmic trading theories and ideas.

•Application of data mining techniques and optimization techniques in B2B and B2C industries and Machine Learning, Data/Text Mining, Statistical Analysis, and Predictive Modeling.

•Utilized PySpark Python modules for machine learning & predictive analytics in Hadoop on AWS.

•Predictive modeling using state-of-the-art methods.

•Implemented advanced machine learning algorithms utilizing Caffe, TensorFlow, Scala, Spark, MLLib, R, and other tools and languages needed.

•Programming, and scripting in R, Java, and Python.

•Developed Data Dictionaries to create metadata reports for technical and business purposes.

•Built reporting dashboard on the statistical models to identify and track key metrics and risk indicators.

•Implemented Gradient Boosting, Random Forest, and other ensemble models for improved model performance.

•Extracted source data from Amazon Redshift on the AWS cloud platform.

•Parsed and manipulated raw, complex data streams to prepare for loading into an analytical tool.

•Explored different regression and ensemble models in machine learning to perform forecasting

•Developed new financial models and forecasts.

•Improved efficiency and accuracy by evaluating models in R.

•Involved in defining the source to target data mappings, business rules, and data definitions.

•Performing an end to end Informatica ETL Testing for these custom tables by writing complex SQL Queries on the source database and comparing the results against the target database.

OWENS CORNING June 2015-October 2016

Data Scientist Medina, MN

Owens Corning global company which develops and produces a variety of materials as products such as fiberglass, composite, roofing, insulation, etc. The project focused on the capture of data from quality testing for aggregation with product development ranges for new product specifications. Utilized data from thousands of quality tests on manufacturing runs of various products and stored in Amazon RedShift and Hadoop Data Lake cloud system on AWS. Using this data to develop and train machine learning algorithms in NLG report generation to be used in R&D and product development to go back into algorithms that calibrate the machinery.

•Applied Machine Learning, Data/Text Mining, Statistical Analysis, and Predictive Modeling.

•Implemented Event Task for executing an application automatically.

•Involved in defining the source to target data mappings, business rules, and data definitions.

•Assist in continual monitoring, analysis, and improvement of the AWS Hadoop Data Lake environment

•Built and maintained a monitoring and reporting dashboard for the statistical models to identify and track key metrics and performance indicators.

•Involved in fixing bugs and minor enhancements for the front-end modules.

•Performed data mining and developed statistical models using Python to provide tactical recommendations to business executives.

•Integrated R into micro-strategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool.

•Implemented feature engineering including feature intersection generating, feature normalization, and label encoding with scikit-learn preprocessing.

•Worked on outlier identification with Gaussian Mixture Models using Pandas, NumPy, and matplotlib.

•Adopted feature engineering techniques with 200+ predictors to find the most important features for the models. Tested the models with classification methods, such as Random Forest, Logistics Regression, and Gradient Boosting Machine, and performed hyper-parameter tuning to optimize the models.

INFUSIONSOFT January 2014-May 2015

Data Scientist Chandler, AZ

Project responsibilities include advanced analytics, data mining, and statistical techniques on structured and unstructured data. Employed a diverse set of tools to bring insights out of complex data. The project involved data stored in disparate systems.

•Worked with internal architects and, assisted in the development of current and target state data architectures.

•Performed data quality in Talend Open Studio.

•Involved in defining the source to target data mappings, business rules, and data definitions.

•Worked with stakeholders and analysts to identify systems needs and requirements, and helped Big Data Engineers translate those requirements into specifications.

•Created reporting visualizations to automate weekly and monthly reports.

•Responsible for defining the key identifiers for each mapping/interface.

•Responsible for defining the functional requirement documents for each source to target interface.

•Defined the list codes and code conversions between the source systems and the data mart.

•Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers, and execution of test plans.

•Enterprise Metadata Library with any changes or updates.

•Coordinated with the business users in providing an appropriate, effective, and efficient way to design the new reporting needs based on the user with the existing functionality.

•Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.

HARLEY DAVIDSON January 2011-December 2013

Data Scientist Milwaukee, WI

Quality Control and use of ISO quality standards to write custom algorithms to test production quality against metrics. Worked with automated testing, process control, Industrial automation, data acquisition and instrumentation including vision systems and motion control; Data Science & Big Data · Simulation & Modeling. ·

•Implemented Agile Methodology for custom application development using R and Python.

•Performed K-Means Clustering, Multivariate analysis, and Support Vector Machines in Python.

•Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed Machine Learning use cases under Spark ML and MLlib.

•Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization, and performed Gap analysis.

•Used Pandas, NumPy, seaborn, SciPy, matplotlib, Scikit-Learn, and NLTK in Python for utilizing various machine learning algorithms.

•Worked on different data formats such as JSON, and XML and performed machine learning algorithms in Python.

•Data Manipulation and Aggregation from different sources using Nexus, Toad, Business Objects, Power BI, and Smart View.

•Extracted data from HDFS and prepared data for exploratory analysis using data munging with R.

•Worked with Data Architects and IT Architects to understand the movement of data and its storage.

•Rapid model creation in Python using Pandas, NumPy, SKLearn, and plotly for data visualization.

•

EDUCATION

Certificate, Full-Stack Development

Georgia Institute of Technology at Atlanta, GA

Bachelor of Science in Biology

Georgia State University at Atlanta, GA

Contact this candidate