Data Analyst

Location:

Bakersfield, CA

Posted:

September 23, 2020

Contact this candidate

Resume:

Henry Ssembatya

+1-805-***-**** • *********@*****.*** • linkedin.com/in/henry-ssembatya • https://sembahen.github.io/# TECHNICAL SKILLS

Technologies: Python, SQL, HTML, Tableau, JavaScript, CSS, Java, MATLAB, SAS, Tensor Flow, Keras, Hadoop, Hive, Spark, PyTorch, AWS.

Core Competencies: Data Science and Analytics, Big Data, Machine learning, CNN, RNN, Software Development, NLP, Time Series EXPERIENCE

Applus RTD USA, Data Analyst Apr 2019-Date

• Perform analysis, querying, processing and Quality Control of large datasets for energy assets using SQL, excel, etc. to assess equipment integrity, adherence to EPA standards. Generate comprehensive reports and make recommendations to field engineers.

• Design new methods to reduce processing time and detect errors. Maintained a kickback (error rate of less than 0.01%)

• Operate a robust Database Management System; responsible for querying, identifying and correcting anomalies, applying advanced analytics and visualizations, making technical recommendations, deploying pertinent scientific calculations, etc. Thinkful, Data Science Apprentice Dec 2019-May 2020

• Learned and implemented industry best practices and modern data science standards alongside a senior data scientist mentor (expert with years 6+ of experience in Data Science) and a rigorous curriculum for six months building comprehensive Data Science models using Python and SQL.

• Developed and presented projects involving Exploratory Data Analysis, machine learning, deep learning, time series, Big Data, NLP, statistical modeling, study design, statistics and probability. University of Southern California, Research Assistant (Data Science) Aug 2017-Mar 2019

• Published a paper with SPE on optimization of an Energy technique.( onepetro.org/conference-paper/SPE-195301-MS).

• Analyzed large datasets using Python and SQL to clean up data, query, built ETL data pipelines and performed statistical analysis to develop optimized solutions.

• Performed Statistical Modeling and Data Analytics on various data sets including well logs, production and steam injection datasets to extract important insights and apply AI for Predictive and Prescriptive Modeling. SELECTED PROJECTS & RESEARCH

Project: Predicting strategies to mitigate churn rate for customers of a Telecommunications company Jan 2020-Mar 2020

• Deployed classification algorithms using Python to predict the possibility of customers leaving the company based on various parameters. Determined what policy changes would effectively improve customer retention using machine learning.

• Recommended and demonstrated statistically the features that when implemented would mitigate churn rates (by as high as 15%). Project: Image Classification of using Deep Learning (CNN) May 2020-Aug 2020

• Designed a deep learning algorithm to classify 60,000 different images with high accuracy using Tensor Flow and Keras

• Modified the deep learning algorithm to improve the classification accuracy using a combination of Convolutional Neural Networks and Data Augmentation.

Team lead: Property Prices in New York: A Time Series. Jan 2020-May 2020

• Designed a program (based on studying over 300,000 properties sales in New York City) to improve decision making on the right time to invest in/purchase New York City properties. Used a combination of time series analysis, clustering and predictive modeling. Applied autocorrelation to investigate the consistency and significance of patterns.

• Implemented different strategies to improve the models and deal with skewed data, for example, using paramgrid for tuning the hyperparameters.

Project Lead, Text Parsing and Analysis on ATCE Conference 2013 Dataset (NLP). Jun 2017-Dec 2017

• Parsed a large text dataset consisting of 336 conference papers from the 2013 ATCE conference.

• Developed an algorithm that would identify the category of each paper presented at the conference based on the text in the entire paper using machine learning. This would be useful for readers interested in specific topics to easily identify the papers under that topic.

EDUCATION

Thinkful Remote

Data Science Certificate May 2020

University of Southern California Los Angeles, California Master of Science in Petroleum Engineering GPA 3.54 Aug 2018 China University of Geosciences, Wuhan Wuhan, China Bachelor of Science in Petroleum Engineering GPA 3.56 July 2016 ADDITIONAL ONLINE COURES DONE (Coursera, LinkedIn, Udemy) 1.Introduction to Big Data. 2. Programming Foundations with JavaScript, HTML, CSS. 3. Java Programming: Solving Problems with Software. 4. Introduction to Data Science in Python. 5. Learning SQL Programming. 6. Applied Statistical Modeling and Data Analysics. 7. Introduction to Tensor Flow for AI 8. Ultimate AWS Certified Cloud Practitioner (In progress)

Contact this candidate