Post Job Free
Sign in

Junior Data Scientist/Machine Learning Engineer

Location:
Houston, TX, 77021
Salary:
65000
Posted:
July 25, 2022

Contact this candidate

Resume:

832-***-**** adrvpn@r.postjobfree.com

Pablo Simon Nguema Obiang

Junior Data Scientist/Machine Learning Engineering

● 1+ years applied professional work experience in the Data Science/Machine Learning market space.

● Demonstrated skill applying statistical modeling, statistical hypothesis testing, and optimization.

● Strong track record executing applied machine learning projects.

● Skilled applying practical machine learning (e.g., regularization, cross-validation, feature engineering and selection, dimensionality reduction, generalization, and model selection).

● In-depth understanding of neural networks, including convolutional and recurrent architectures, as well as unsupervised approaches, such as auto encoders

● Expertise in common supervised machine learning methodologies – Naïve Bayes Classifiers, Linear Regression, Logistic Regression, Random Forests, Regression Forests, and Poisson Survival Modeling.

● Strong proficiency with TensorFlow for building, testing, validating, selecting, and deploying successful and reliable machine learning algorithms using Python.

● Skilled working with NumPy stack (NumPy, SciPy, Pandas, and Matplotlib).

● Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction in Python.

● Adept at discovering patterns in data using both algorithms, visual representation, and intuition.

● Hands-on applying machine learning techniques such as Naïve Bayes, Linear and Logistic Regression Analysis, Neural Networks, RNN, CNN, Transfer Learning, Time-Series Analysis, Trees and Random Forests

● Perform exploratory analysis on varying types of data and datasets, allowing for a full knowledge of the subject matter, a nuanced understanding of the variables in question, and a technically sound insight into the required modeling approach.

● Transform business requirements into analytical and statistical data models in Python and TensorFlow.

● Utilize Docker to handle deployment on heterogeneous platforms such as Linux, Windows, OSX, and AWS.

● Experience working with relational databases with advanced data SQL skills.

● Strong proficiency with non-relational (NoSQL) databases, such as MongoDB, DynamoDB, etc.

● Experience with third-party cloud resources – AWS, Google Cloud Platform, and MS Azure.

● Hands-on skill with Tidyverse, Dplyr, Readr, Zoo, Glmnet in R, Pandas, Numpy, Matplotlib, Seaborn, Scikit-Learn in Python for performing exploratory data analysis (EDA), and data preprocessing.

● Experience with data analysis methods such as data reporting, ad-hoc data reporting, graphs, scales, pivot tables, OLAP reporting with Microsoft Excel, R Markdown, R Shiny, Python Markdown.

832-***-**** adrvpn@r.postjobfree.com

Pablo Simon Nguema Obiang

Junior Data Scientist/Machine Learning Engineering TECHNICAL SKILLS TABLE

Data Science Specialties: Natural Language

Processing, Machine Learning, Internet of Things

(IoT) analytics, Social Analytics, Predictive

Maintenance, Stochastic Analytics

Analytic Skills: Bayesian Analysis, Inference,

Models, Regression Analysis, Linear models,

Multivariate analysis, Stochastic Gradient Descent, Sampling methods, Forecasting, Segmentation,

Clustering, Naïve Bayes Classification, Sentiment

Analysis, Predictive Analytics

Analytic Tools: Classification and Regression Trees

(CART), H2O, Docker, Support Vector Machine,

Random Forest, Gradient Boosting Machine

(GBM), TensorFlow, PCA, RNN, Linear and

non-Linear Regression

Analytic Languages and Scripts: R, Python,

HiveQL, Spark, Spark MLlib, Spark SQL, Hadoop,

Scala, Impala, MapReduce

Languages: Java, Python, R, Command Line,

C++/C, JavaScript, SQL, SAS

Python Packages: Numpy, Pandas, Scikit-learn,

Tensorflow, SciPy, Matplotlib, Seaborn, Plotly,

NLTK, Scrapy, Gensim

Version Control: GitHub, Git, SVN

IDE: Jupyter Notebook, VS Code, Intellij IDEA,

Spyder, Eclipse

Data Query: Azure, Google Bigquery, Amazon

RedShift, Kinesis, EMR; HDFS, RDBMS, SQL,

MangoDB, HBase, Cassandra and NoSQL, data

warehouse, data lake and various SQL and

NoSQL databases and data warehouses

Deep Learning: Machine Perception, Data

Mining, Machine Learning algorithms, Neural

Networks, TensorFlow, Keras

Soft Skills: Experienced with delivering

presentations and technical reports;

collaboration with senior data scietist and

cross-functional teams, and advisement on

how to leverage analytical insights.

Developed analytical reports which directly

address strategic goals

832-***-**** adrvpn@r.postjobfree.com

Pablo Simon Nguema Obiang

Junior Data Scientist/Machine Learning Engineering PROFESSIONAL WORK EXPERIENCE

Enhance IT

Atlanta, GA

Jan 2022 - Now

Junior Machine Learning Engineer

Assigned to a dev team that built a pipeline for automatic data entry of scanned government forms using AWS as storage for our database. The pipeline accepts two images (the image for information extraction and the reference form) and outputs information such as names, id, and other tax information. The information extraction steps were implemented using Google Tesseract and AWS text Extract as well as Convolutional Neural Networks. We performed OCR on the image to extract wanted information and stored them in our AWS database. We reduced the data entry time exponentially by a power of 3. This project included using Machine Learning techniques and Python libraries to derive relevant analysis and metrics, including building proof of concepts to determine the value of implementing in future projects. As part of a team focused on understanding the customer experience across the organization, I worked with business partners to understand the business objectives, explore data sources, and build NLP/ML solutions. I was able to think outside the box to uncover new ways to analyze unstructured data leveraging our Open-Source Data Science Platform (OSDS) and other analytical tools to solve complex business objectives.

● Applied business analytics skills, integrated, and prepared large and varied datasets, and

communicated results.

● Worked with specialized database architecture and computing environments.

● Developed analytic approaches to strategic business decisions.

● Performed analysis using predictive modeling, data/text mining, and statistical tools.

● Built predictive modeling using Machine Learning algorithms such as Random Forests, Naive Bayes, Neural Networks, MaxEnt, SVM, Topic Modeling/LDA, Ensemble Modeling, GB, etc.

832-***-**** adrvpn@r.postjobfree.com

Pablo Simon Nguema Obiang

Junior Data Scientist/Machine Learning Engineering

● Used common NLP techniques, such as pre-processing

(tokenization, part-of-speech tagging, parsing,

stemming).

● Performed semantic analysis (named entity recognition, sentiment analysis), modeling, and word representations

(RNN / ConvNets, TF-IDF, LDA, Word2vec, Doc2vec).

● Worked with Big Data infrastructure and tools such as Hive and Spark.

● Synthesized analytic results with business input to drive measurable change.

● Performed data visualization and developed

presentation material using Tableau.

● Defined key business problems to be solved while developing and maintaining relationships with

stakeholders, SMEs, and cross-functional teams.

● Provided knowledge and understanding of current best practices and emerging trends within the analytics industry.

● Applied Statistical NLP / Machine Learning, especially Supervised Learning- Document classification,

information extraction, and named entity recognition in-context.

● Participated in product redesigns and enhancements to know how the changes would be tracked and to

suggest product direction based on data patterns.

● Applied statistics and organized large datasets of structured and unstructured data.

● Worked with applied statistics and applied mathematics tools for performance optimization.

● Facilitated data collection to analyze document data processes, scenarios, and information flows.

● Determined data structures and their relations in supporting business objectives and provided useful data in reports.

● Assisted in continual improvement of AWS data lake environment.

● Used Agile approaches, including Extreme Programming, Test-Driven Development, and Agile Scrum.

832-***-**** adrvpn@r.postjobfree.com

Pablo Simon Nguema Obiang

Junior Data Scientist/Machine Learning Engineering

● Promoted enterprise-wide business intelligence by enabling report access in SAS BI Portal and Tableau Server.

Mobile Apps

Atlanta, GA

May 2021 – Feb 2022

Junior Data Scientist/Computer Vision Researcher

I worked as a Data Scientist/Computer Vision Researcher to find state-of-the-art solutions in classifying crushable and uncrushable materials for the company’s production pipeline. The dev team built a full pipeline to make inferences in real time. Before sending material for further processing, our model had to decide whether material was crushable or not. We tested various computer vision architectures such as VGG16, Resnet, Inception, EfficientNet, etc. Resnet gave the best performance for a recall of 0.85 and precision 0.7 and fast inference time

(~3s). We aimed to maximize recall in this problem because it was more cost-effective to filter out as many uncrushable materials than to keep as many crushable ones.

This project built “digital twins” — computer models replicating mining equipment behavior. Input from sensor readings was applied to specific field issues such as ore truck loading and conveyor system optimization.

● Worked in Git development environment.

● Applied expert-level Python

● Utilized Tensorflow, Keras, Python, and deep neural network and analytical techniques.

● Transformed business requirements into analytical models, designed algorithms, built models, and

developed data mining and reporting solutions that scaled across massive volumes of structured and

unstructured data.

● Worked with Proof of Concepts (POC's) and gap analysis and gathered necessary data for analysis from different sources.

832-***-**** adrvpn@r.postjobfree.com

Pablo Simon Nguema Obiang

Junior Data Scientist/Machine Learning Engineering

● Prepared data for data exploration using data

wrangling.

● Designed physical data architecture of new system engines.

● Implemented neural networks.

● Developed Logical and Physical Data models and

organized data per the business requirements using Sybase Power Designer, ER Studio in both Online

Transaction Processing (OLTP) and Online Analytical Processing (OLAP) applications.

● Designed Star Schema and Snow Flake schema for Data Warehouse and Operational Data Store (ODS)

architecture.

● Developed Logical Data Architecture with adherence to Enterprise Architecture.

● Use of OpenCF for image feature engineering such as image segmentation

● Worked with data modeling tools Power Designer and ER Studio.

● Applied Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.

● Implemented Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSIS, SSAS, and SSRS.

● Responsible for Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables, and Online Analytical Processing (OLAP) reporting.

● Interacted with data from Hadoop for basic analysis and extraction of data in the infrastructure to provide data summarization.

832-***-**** adrvpn@r.postjobfree.com

Pablo Simon Nguema Obiang

Junior Data Scientist/Machine Learning Engineering EDUCATION

Bachelor of Science - Electrical and Computer Engineering - Texas Southern University

CERTIFICATIONS

TensorFlow Developer Certificate in 2021: Zero to Mastery - Aug 2021 Automate the Boring Stuff with Python Programming - May 2021 LANGUAGES

Proficient in English, Spanish and French(Intermediate Level)



Contact this candidate