Machine Learning Data Scientist

Location:

Oklahoma City, OK

Posted:

December 13, 2024

Contact this candidate

Resume:

Stevenson Manik

Sr. Data Scientist / Machine Learning

****************@*****.***

572-***-****

Professional Summary

Data Scientist with 9+ years of experience executing data-driven solutions to increase the efficiency, accuracy, and utility of internal Data Processing.

Extensive experience with “Machine learning” solutions to ratifying Business situations and generating visualization data by using Python.

Worked with different tools like Pandas, NumPy, Matplotlib, and Scikit-Learn for Python to generate short coding data with Machine Learning Models.

Hands-on working with Naïve Bayes, Random forests, Decision trees, Linear and Logistic Regression. Principle component analysis, SVM, Clustering, Neural Networking, and circulated vision on related systems.

Passionate in implementing “Deep Learning Techniques” like Kera’s, and Theano's.

Experience knowledge on Time Series Forecasted sales and demand for loans using time series modeling techniques like Introgressive. Moving average, Holt-winter.

Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets.

Managed GitHub repositories and permissions, including branching and tagging.

Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupyter Notebook 4.X, R 3.0.

Python Packages used for Developing visible visualization to plot results like Seaborn, Matplotlib, gg, plot, and Pygal.

Extracted data and worked with data from different databases for Oracle, SQL Server, DB2, MongoDB, NoSQL, PostgreSQL, Teradata, and Cassandra.

Excellent understanding of Systems Development Life Cycle (SDLC) such as Agile and Waterfall

Experience in data from various sources like Oracle Database, Flat Files, and CSV files and loaded to target warehouse.

Knowledge of writing SQL queries and resolving key performance issues.

Experienced with machine learning and deep learning algorithms such as logistic regression, random forest, Ada Boost XGBoost, KNN, SVM, ANN, Deep learning for Computer Vision, RNN, LSTM, linear regression, lasso regression, ridge regression, and k-means.

Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Clustering, and Neural Networks.

Developed NLP models for Topic Extraction and sentiment Analysis/Search.

Work with NLTK library to NLP data processing and finding the patterns.

Knowledge of working with Proof of Concepts (POCs) and gap analysis and gathering necessary data for analysis from different sources, prepared data for data exploration using data munging and Teradata.

Excellent knowledge and experience in OLTP/OLAP system study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool.

Well-experienced in Normalization and de-normalization techniques for optimum performance in relational and dimensional database environments.

Using Docker and ansible, containerized virtual infrastructure & rsquos configuration management tasks which are used to detect config drifts and change back to original configurations.

Strong business sense and ability to communicate data insights to both technical and nontechnical clients.

Designs and evaluates the results of controlled experiments to optimize elements of our offering; advises others with respect to their experimental designs.

Expertise in predictive modeling using both supervised and unsupervised learning techniques.

Expertise in containerizing applications using Docker composes.

Hands-on experience in different frameworks like Keras, and TensorFlow.

Highly skilled in using Tableau, and PowerBI visualization tools for creating dashboards.

Work closely with customers, cross-functional teams, software developers, and business teams in an Agile work environment to drive data model implementations and algorithms into practice.

Motivating and mentoring a team of data scientists to grow their skills and careers while engaging with them in Writing code, finding datasets, and pulling knowledge together as a resource to bring the team up the speed.

Technical Skill:

Languages

Python, JavaScript, MATLAB, SAS, Spark, Docker, SQL VBA, C++, C

Python Packages

NumPy, Pandas, sci-kit-learn, TensorFlow, SciPy, Matplotlib, Seaborn, Numba, SpaCy, NLTK, LightGBM, XGBOOST, CatBoost, Dask, Genism, PySpark

Machine Learning

Time Series Prediction, Natural Language Processing and understanding, Support Vector Machines (SVM), Machine Intelligence, Generalized Linear Models, Machine Learning algorithms, ETL Tools, Apache NiFi, Informatica, Real-time Data Pipelines, Log Management, Cyber Security Data Analysis

IDE

Jupyter, Spyder, MATLAB, Visual Studio

Versioning Tools

SVN, GitHub

Data Query

Azure, AWS SageMaker, Azure DB, Google Cloud, Oracle, Amazon RedShift, Kinesis, EMR; RDBMS, Snowflake, SQL and data warehouse, data lake, and various SQL and NoSQL databases

Cloud Platform

Amazon AWS, GCP, Heroku, Microsoft Azure, Snowflake

Visualization

Power BI, Cognos 11 Analytics, Tableau, Jupyter Notebook 4.X, and IBM Data Stage

Deep Learning

Machine Perception, Neural Networks, TensorFlow, Keras, PyTorch

Professional Experience:

Sr. Data Scientist / Machine Learning CVS, RI July 2022 to Present

Responsibilities:

Used SQL and Python to collect, analyze, and preprocess the data.

Utilized ggplot and other visualization libraries to perform initial data exploration and previsualization.

Used Python to explore the data using statistical methods along with visualization with the use of Python packages.

Worked on data preprocessing and cleaning the data utilizing Pandas, and NumPy, and performing feature engineering and data imputation techniques for missing values in the dataset using Python.

Performed stemming and lemmatization of text to remove superfluous components and make the resulting corpus as small as possible while containing all important information.

Developed, deployed, and maintained production NLP models with scalability in mind.

Helped to develop and grow the new service area of Business Analytics.

Developed ML models for fraud detection and anomaly identification.

Collaborated with cybersecurity teams to enhance data protection and threat detection.

Applied data encryption and best practices for secure data handling in ML workflows.

OLAP cubes are used in preparation for data mining, behavioral and attitudinal segmentation, predictive modeling, insight extraction, and data visualization.

Design, develop, and implement novel computer vision algorithms for unique use cases using deep learning frameworks such as TensorFlow, keras, PyTorch, Caffe etc.

Worked on Support Vector Machines (SVM), clustering models, Principle Component Analysis (PCA) with different structured & unstructured datasets for dimensionality reduction & analyze the accuracy of the models.

Used GitHub as hosting service by providing convenient place to store multiple versions of files for GIT.

Experience with machine learning platforms like TensorFlow, Jupyter notebook, Scikit or Apache Spark, Hive, Kafka.

Experience building real-time data pipelines for processing and analysis.

Familiar with ETL tools such as Apache NiFi and Informatica for data routing and transformation.

Worked on projects involving log data management and enrichment.

Implementation of machine learning algorithms and concepts such as K-means Clustering (varieties), gaussian distribution, decision tree, etc.

Performing R&D in computer vision and machine learning.

Analyzed data using data visualization tools and reported key features using statistic tools and supervised machine learning techniques to achieve project objectives.

Analyzed large data sets and applied machine learning techniques and developed predictive models, and statistical models.

Experience with Keras and TensorFlow in developing predictive algorithms.

Solved analytical problems, and effectively communicated methodologies and results.

Concept space embedding via ELMo was also tested and found to have similar results to a bag of words with a significant increase in computational time.

Presented my findings to stakeholders and decision-makers to better inform future decisions.

Performed feature engineering to clean and process the data to feed to my model.

Used outside resources to supplement the data we had already gathered.

Created a theoretical model to explain patterns in the data based on the perceived primacy of leverage.

Prepared reports and presentations using MS Office and Matplotlib that accurately conveyed data trends and associated analysis at Board meetings.

Participated in all phases of data mining, data collection, data-cleaning, developing models, validation, and visualization and performed statistical analysis/machine learning.

Environment: AWS Redshift, EC2, EMR, Hadoop Framework, S3, HDFS, GitHub, Spark (PySpark, MLib, Spark SQL), Python 3.x (Scikit-Learn/SciPy/NumPy/Pandas/NLTK/Matplotlib/Seaborn), Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Jupyter, Random Forest, XGboost, LightGBM, Collaborative filtering, Ensemble), NLP, Teradata, Git 2.x, Agile/SCRUM

Data Scientist / Machine Learning Engineer Apple, CA March 2020 to June 2022

Responsibilities:

Developed Sentiment Analysis using LLM Bert by training historical Data provided by the organization to understand the sentiment of end-users using Python using Google vertex ai.

Used guardrail ai with LLM to create a response structure to the query.

Performed Data Collection, Data Cleaning Data Visualization, and Text Feature Extraction and performed key statistical findings to develop business strategies using Python.

Employed NLP to classify text within the dataset. Categorization involved labeling natural language texts with relevant categories from a predefined set using Python.

Managed GitHub repositories and permissions, including branching and tagging.

Created a model to validate Health Insurance claims using ML and DL techniques.

Led and managed research and development efforts for the computer vision team.

Patented a computer vision and deep learning framework in C++.

Using strong technical business/domain knowledge and handling of Big Data analytics to develop and deliver multiple business analytics visualizations using Tibco Spotfire, SAP Business Objects, Crystal Reports.

Created ML model for Health Insurance prediction to facilitate customers with better health plans as per their needs and help organizations avoid unforeseen losses and liabilities.

Conducted hyperparameter tuning with SageMaker to optimize model performance.

Leveraged SageMaker’s built-in algorithms for rapid model development

Used AWS SageMaker to train, deploy, and manage ML models in production.

Using docker, containerized all the configuration management tasks and uploaded those docker images into local docker & rsquos artifactory.

Built one class Support Vector Machine (SVM and Principal Component Analysis (PCA) algorithms for anomaly detection of fraud and other errors that signal dishonest behaviors.

Experience in using various packages in Python-like ggplot2, caret, dplyr, R-weka, Gmodels, twitter, NLP, Reshape2, rjson, plyr, pandas, NumPy, Seaborn, SciPy, Matplotlib, sci-kit-learn, Jupyter Notebooks, VS Code, Beautiful Soup.

Develop integration with Jupyter Notebook.

Developed a model to recognize fraudulent card transactions. The model can detect both fraud and non-fraud Credit card transactions which increased the accuracy by 30%.

Built models using Python to predict the probability of attendance for various campaigns and events.

Developed a Machine Learning CI/CD pipeline on Google Cloud.

Used text to understand user sentiments over time. Data was facilitated from various sources such as the company's official website, Twitter, Facebook, Quora, etc.

Designed and implemented a C++ computer vision library that is being used in all product lines.

Initiated various pre-processing phases of text like Tokenizing, Stemming, and lemmatization, Stop Words, Vocabulary Phrase Matching, POS Tagging using NLTK and Spacy libraries on Python, and converting the raw text to structured data using Python.

Develop time series based ARIMA models to understand sales and influencing factors on it.

Created a Sparse dataset by using Count vectorizer, Document Term Matrix, and TF-IDF Vectorization while assigning IDs for each word and checking the frequency of words in the corpus using Python.

Environment: Machine learning, AWS, MS Azure, Cassandra, SAS, GitHub, Spark, HDFS, Jupyter Notebooks, Hive, Pig, Linux, Python, MySQL, Eclipse, PL/SQL, SQL connector, Spark-ML.

Data Scientist Charles Schwab, TX Sep 2018 to Feb 2020

Responsibilities:

Proven success managing complex projects and reaching milestones on time, within budget, and above expectations.

Designed the chatbot for customer queries using NLP.

Developed and tested Bayesian Estimations of the probability of any action of the internet users (as clicks, conversions) using Real Data.

Reproduced the working environment with the help of docker to train and run the machine learning model anywhere.

Developed a highly optimized Sentimental Analysis Model on complaints data in Spark Environment.

Write samples and guides using Jupyter Notebook.

Keenly interested in spearheading cross-functional and cross-cultural teams of developers, testers, and data managers for end-to-end delivery lifecycle management.

Developed and coded Python clear product recommendation process visualization to explain the problem to the business.

Performed research and developed computer vision solutions to solve law enforcement and biometric problems, such as in visual information retrieval of scars marks and tattoos (SMTs), iris segmentation and iris detection from a distance.

Good experience in working with various Python Integrated Development Environments like PyCharm, Spyder, Jupyter Notebook, Anaconda.

Helped in migration and conversion of data from the Sybase database into Oracle database, preparing mapping documents and developing partial SQL scripts as required.

Instructed the business on how to develop and test website store recommended Advertisements to meet their specific requirements.

Developed a prediction model to identify high-risk customers and developed a framework to identify anomalies in commercial transactions.

Integrated SageMaker with AWS services like S3, Lambda, and Redshift for end-to-end ML solutions.

Implemented monitoring and performance tracking using SageMaker Model Monitor.

Built algorithms focused on identifying security threats and suspicious activities.

Participated in projects involving secure user authentication and data validation.

Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems.

Collected data from different cloud platforms like Salesforce Apex, AWS, and Azure.

Made the model available to end users over the internet using SaaS.

Worked with all levels of the organization and managed small teams in the project.

Developed microservices & batches using FASTAPI.

Created Image Filtering and binary Morphology using NLP.

Environment: Hadoop Framework, Salesforce Apex, Oracle, AWS, and Azure, S3, HDFS, Spark (PySpark, MLib, Spark SQL), Python, Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, Collaborative filtering, Ensemble), NLP, Teradata.

Data Scientist Reliable Software, India May 2016 – June 2018

Responsibilities:

Developed a demand forecasting model based on different time-series techniques to assist demand planners in effectively allocating resources.

Developed personalized product recommendations with machine-learning algorithms that used Collaborative filtering to better meet the needs of existing customers and acquire new customers.

Created machine-learning algorithm and employed logistic regression, random forest, KNN, SVM, neural network, linear regression, lasso regression, and k-means.

Developed optimization algorithms for use with data-driven models such as supervised and unsupervised machine learning or reinforcement machine learning.

Utilized machine-learning models to implement a high-performing demand forecasting framework from scratch.

Satisfied critical requests from executive leadership: previous models could not adjust to the market demands.

The model obtained consistent, high-quality results using hierarchical modeling (MLib/GBT).

Advised about how best to modify existing predictive out-of-stock models to accurately forecast for a longer time-horizon.

Used ML algorithms logistic regression, support vector machine, k nearest neighbors, Naiumlve Bayes, CART, bagging, boosting, ensemble learning to analyze the data based on the features selected for data-driven decisions.

Worked in PySpark, and Python on Azure Databricks.

Used NLP techniques to sort and classify documents.

Used Hierarchical time-series analysis to forecast competent usage across various products.

Used AWS Redshift Data Warehouse and Boto 3 to access AWS Resources from Python.

Worked with the product development team for optimal product design, pricing, and marketing strategy.

Researched statistical machine-learning methods that included forecasting, supervised learning, classification, and Bayesian methods.

Advanced the technical sophistication of solutions using machine learning and other advanced technologies.

Performed data integrity checks, data cleaning, exploratory analysis, and feature engineering using R and Python.

Environment: PY-Spark, Azure Databricks, Machine learning, AWS, MS Azure, M-Lib/GBT, Cassandra, SAS, Spark, Linux, Python, MySQL, MongoDB, Data Visualization, PL/SQL, SQL connector, Spark-ML, Big Data.

Data Analyst HSBC, India May 2015 to April 2016

Responsibilities:

Analyzed complex healthcare data to identify patterns, trends, and insights that contributed to the enhancement of patient care and operational efficiency.

Employed SQL queries and Excel functions to gather, manipulate, and interpret data, ensuring accurate and meaningful insights.

Designed and developed interactive dashboards and reports using Tableau to convey complex information in a digestible format to stakeholders.

Used data mining algorithms and approach.

Designed and developed all the tables, views for the system in Oracle.

evaluated the model using the Metrics library of sci-kit-learn and estimated the Confusion matrix, Classification report, Accuracy, and ROC accuracy score.

Reviewed the conceptual EDW (Enterprise Data Warehouse) data model with business users, App Dev, and Information architects to make sure all the requirements were fully covered.

Analyzed the source system to understand the source data and table structure along with a deeper understanding of business rules and data integration checks.

Identified various facts and dimensions from the source system and business requirements to be used for the data warehouse.

Implemented the Slowly changing dimension for most of the dimensions.

Implemented the standard naming conventions for the fact and dimension entities and attributes of logical and physical models.

Reviewed the logical model with application developers, ETL Team, DBAs, and testing team to provide information about the data model and business requirements.

Worked with DBA to create the physical model and tables.

Analyzed existing logical data model (LDM) and made appropriate changes to make it compatible with business requirements.

Expanded Physical Data Model (PDM) for the OLTP application using Erwin.

Environment: Python, Jupiter, Oracle 8.0, MATLAB, SSRS, SSIS, SSAS, Mongo DB, HDFS, Hive, Pig, SAS, Power Query, Power Pivot, Power Map, Power View, SQL Server, MS Access.

Contact this candidate