Attuned to the latest trends and advancements in this field, I am consistently Delivering impeccable results through my dedication to handling multiple functions and activities under a high-pressure environment with tight deadlines
Over a decade of expert involvement in IT with knowledge in Artificial Intelligence, Data Mining, Deep Learning, Predictive Analysis, and Machine Learning with big datasets of structured and unstructured data; Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle.
Specialized knowledge to deal with exponentially growing data. Understands the underlying science of data and applies the same in a diverse set of problem statements in a variety of fields
Automated recurring reports using SQL and Python and visualized them on platforms BI or Tableau.
Experience in the implementation of the Stored Procedures, Triggers, and Functions using T-SQL
Skilled to build a fully automated, highly elastic cloud orchestration framework on AWS.
Predictive Modelling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution, and other advanced statistical and econometric techniques.
Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights using large data sets. Created visually powerful and actionable interactive reports and dashboards.
Experience in developing and analyzing data models. Involved in writing simple and complex SQL queries to extract data from a database for data analysis and testing
Well-versed in Statistical programming language coding skills in R, Python, SAS, and cloud platforms such as Azure ML, AWS ML, and GCP tools like BigQuery, Vertex AI, etc.
Experience using CUDA/GPU API for real-time image processing.
Strong SQL programming skills working with functions, packages, and triggers.
Developed predictive models using Decision Trees, Random Forests, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
Strong knowledge in all phases of the SDLC (Software Development Life Cycle), Agile & Scrum Methodologies
Knowledgeable in machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, Natural Language Processing (NLP), etc.
Sr. Data Science Consultant - Gamma Team January 2022 - Present
Boston Consulting Group, Chicago, Illinois
As a Senior Data Science Consultant under the Gamma Team at Boston Consulting Group, I have assisted in providing case analysis for internal projects and for a user portal data exchange that provides data assets, models, and algorithms. I am active in integrating several new data sources into the existing portal and provide support and consulting for close to 50 existing pipelines with their associated predictive models.
Liaison between Lighthouse and data vendors integrating new pipelines and providing model preparation consultations
Responsible for weekly meetings with vendors and gamma team covering JIRA assigned tickets
Exposure to a wide variety of domains of differing granularity and frequencies including satellite imagery, footfall traffic, credit card purchases, device usage on an international scale, social listening, financial, demographic, telecom, GIS, geospatial data, ESG ratings, global temperatures, vegetation, and air quality, to name a few.
Distinguished legality of the use of data between raw data, derived data, and data with model-inferred values
Vendor staging of data and machine learning APIs using vendor-specific platforms including GCP/AWS/AZURE
Provide feature updates to the repos to streamline pipelines destined for Snowflake and debugging of airflow event breakdowns.
Use of APIs/SFTP/S3/GCS to manage source data staging specific to each vendor and dynamic SQLs used to create queries with domains over hundreds of snowflake databases, schemas, tables, and views.
Developed source code using python within 3 large Git repositories to handle data ingestion, transformations, validations, final staging, and markup for client portals or internal case projects.
GCP BiqQuery to assist vendors and case teams with big data and model-inferred data using Vertex AI
Implemented python virtual classes and documented details of data lineage and data pipelines in Egnyte/Confluence/Share point
Accountable for pipeline schedules and log reviews of DAGs coded using ECS operator python structures
Protocol conversion from Rest APIs toward GraphQL for ingesting geospatial maritime vendor data.
Repos cloned and tested using locally run docker containers before pushing upon successful results
Regex used to pattern match S3 directories with parameterized queries dynamically based on current dates or other events or messages received
Enumeration modules developed to detect changes in vendor source data stage structures.
Weekly JIRA meetings with teams discussing current projects and vendor data proposals for new integrations
Sr. Data Scientist Oct 2021 – Jan 2022
Credit Suisse, New York
Serving with the RAD team, I assisted with implementing Isolated Forest machine learning models used to detect anomalous trading signatures among currencies, options, futures, stocks/bonds, IPOs, and collateral assets. I was able to streamline the current workflow by automating several steps and implementing their first data pipeline. In addition to another team, I developed code to extract data from several online submitted tax forms using OCR Tesseract technology. LSTMs and CNNs were used to convert handwritten images into text with 99% accuracy where data was stored in a NoSQL database.
Contributions made for two independent projects: RAD(Regulatory Anomaly Detection) and Tax Extraction.
Daily scrum meetings using Odyssey/Atlassian JIRA to create tasks/stories and set priorities
Weekly collaboration with European clients and coordination with IT India data center on off-hours to construct complex SQL queries processed with Impala engines of error trading data
Orchestrated several pipelines with cron-jobs to ingest and process error trading data to hive connected tableau creating aggregated visualizations based upon regulators, jurisdictions, and data sources.
Developed anomaly detection models using isolated random forests of features from stocks and derivatives error trading data.
Implemented OCR using Tesseract, CNNs, etc.
Text so extracted was treated with NLP, word embedding was done using BERT and used LSTMs for topic modeling
Deep learning frameworks PyTorch and Keras on TensorFlow were used
Python, NumPy, SciPy, Pandas, etc. tools used
Lead developer to code OO Python classes to process document handling of pdf tax forms stored in Mongo DB with 99% accuracy in extracting data from tax forms using tesseract.
Synthetic training data used with TensorFlow to model variations of tax form handwritten dates.
JSON schemas with specific coordinate extractions using CV2 created for W9, EXP, ECI, W8BENE, WBEN, IMY tax forms.
Homographic coordinate alignment and morphological transformations applied to images using CV2
Regex expressions applied to post-processing of OCR-extracted text data to filter noise
Multi-threading/processing using python applied to multiple pages at once within a single tax form to decrease the time to process and extract data.
Sr. Data Scientist Nov 2019 – Oct 2021
Freeport McMoRan, Phoenix, Arizona / Remote
As a Senior Data Scientist at Freeport McMoRan, I assisted in developing Monte Carlo methods, optimized through particle swarm methods. Recommendations were created by simulating mine conditions that aided managers in choosing scenarios to optimize ore producing KPIs. During testing, experienced mine dispatchers accepting these recommendations resulted in millions of dollars of additionally produced ore during a typical 12-hour shift.
Data ingestion of IoT data using HVR/SAP/PIBA/API sources.
Data pipelines using Kafka/Even Hubs/Snow pipe
Data persistence using Blob Storage(Cosmos)/Teradata/Snowflake
Data analytics using Azure Streaming Data/Python/Azure Databricks/SQL/GIT/AZURE
Use of Power BI on top of cosmos with data in JSON format
Use of wide-ranging IoT data types from GPS/Temperature/Pressure/Altitude/Speed/Bumps
Databricks cluster analysis and run-time tests to determine latency issues
Development of using REST API/web sockets used to change mine operations remotely with the aid of Postman
Use of brainstorming tool Free mind to organize complex data relationships
Daily standup meetings of items assigned in Azure DevOps using the Agile method
Running mode sequentially or in parallel using Pandas or Spark Data frames
Remote testing of model recommendation with the client in real-time at Peru Mine in South America
Analysis to determine the duration of various down events in the mine
Feature engineering of IoT data to measure or trigger events in the mine
Research and wiki-documentation of past and current items in DevOps
Led discussions with the client in adopting model use in the mine
Developed a method to predict when shovel material blocks would become depleted
Analysis of precision of estimates made of mine KPIs from the Monte Carlo Simulations
Analysis of code to produce model and experimentation of python classes or extraction of data structures with classes to better understand and make suggestions to change the core code.
JDBC connection to data sources using Databricks
Worked with a core team of 4-6 individuals of Data Engineers, Data Analysts, and Data Scientists
Data Scientist Jul 2018 – Nov 2019
KeyCorp, Cleveland, OH
Key Corp’s primary subsidiary is KeyBank whose service spans retail, small business, corporate, and investment clients. I developed a churn model that identified customers who were on the verge of leaving, using logistic regression. Through incentives for customers who were identified by the model to potentially cancel their account, cost savings were created by retention, and the average lifetime of customers was increased significantly.
Worked with the data science team and provided respective data as required on an ad-hoc request basis
Delivered portfolio risk dashboard as a package covering all aspects of the credit life cycle for retail unsecured loans
Created time series forecasts using Prophet for default rates of bank financial instruments
Worked with huge data sets from Big Data with Hadoop, HDFS, Map Reduce, and Spark
Information used included structured and semi-structured data elements collected from both internal and external sources
Unbalanced data issue was handled using Synthetic Minority Over Sampling, SMOTE. Missing data were handled using KNN imputation
Assisted both application engineering and data scientist teams in mutual agreements/provisions of data, deployment of production models, etc.
Python, MLLib, and a broad variety of machine learning methods including classifications, regressions, and dimensionality reduction were incorporated
Use of Supervised, Unsupervised, and Semi-Supervised classification and clustering of documents
Strong communication and problem-solving skills incorporated in a team environment
Contributed to security projects involving real-time object tracking and classification using OpenCV libraries
Use of google cloud video intelligence to make videos searchable and discoverable
Interrogated analytical results to resolve algorithmic success, robustness, and validity
Assisted in developing Spark/Scala, and Python for regular expression (regex) projects in a Hadoop/Hive environment
Data Scientist Feb 2015 – July 2018
Aetna Insurance, Hartford, CT
Aetna is a Health & Life Insurance company that is privately held. Data from current and past patrons were used to warehouse data that allowed me to create models and make predictions of the best medical plans suited for each potential customer via a recommender system. Insurance agents were aided by this recommender system that incorporated Principle Component Analysis and Singular Value Decomposition in making collaborative suggestions about prospective customers. Early A/B testing showed that insurance agents who used the recommender system were able to close deals 13% more often than those who had not adopted the system.
Developed personalized product recommendations with machine learning algorithms that used Collaborative filtering to better meet the needs of existing customers and acquire new customers.
Created machine learning algorithm and employed logistic regression, random forest, KNN, SVM, neural network, linear regression, and lasso regression, and k-means.
Developed optimization algorithms that can be used with data-driven models, such as with supervised and unsupervised machine learning or reinforcement machine learning.
Able to research statistical machine learning methods which may include forecasting, supervised learning, classification, and Bayesian methods.
Able to advance the technical sophistication of solutions through the use of machine learning and other advanced technologies.
Performed exploratory data analysis and data visualizations using R, and Tableau.
Collaborated with data engineers to implement the ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
Performed data integrity checks, data cleaning, exploratory analysis, and feature engineering using R and Python.
Data Scientist Oct 2013 – Feb 2015
Black Rock, New York, NY
Created alternative methods to scout companies in anticipation of LBO negotiations. Data training sets were created, and the classification of documents into relevant categories was used in the search for companies with low market visibility. NLP techniques and associated models were used to filter data from several sources on the web. Relevant documents led to the discovery and acquisitions of several companies structured as LBOs that yielded estimated averages of IRRs of 25% with expected exit times of 2-4 years.
Use of a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction using NTLK, word2vec, and TF/IDF
Cleaned, parsed, and tokenized 1.6 million sentences using NLTK and scikit-learn.
Developed, deployed, and maintained production NLP models with scalability in mind.
OLAP cubes used in preparation for data mining, behavioral and attitudinal segmentation, predictive modeling, insight extraction, and data visualization
Worked with complex applications such as R and SAS to develop neural networks, and implement cluster analysis.
Implementation of machine learning algorithms and concepts such as K-means Clustering (varieties), gaussian distribution, decision trees, etc.
Analyzed data using data visualization tools and reported key features using statistic tools and supervised machine learning techniques to achieve project objectives.
Analyzed large data sets and applied machine learning techniques and develop predictive models, and statistical models.
Used key indicators in Python for machine learning concepts like regression, bootstrap aggregation, and random forest.
Use of inferential statistics and machine learning data pipelines.
Data Scientist Jan 2010 – Oct 2013
Johnson and Johnson, New Brunswick, New Jersey
Johnson and Johnson develops several medical products that monitor health conditions. ANN autoencoders were used to model ECG readings and to detect anomalies from baseline. Data from arrhythmic events were combined with normal ECG data to simulate early warnings of symptoms of heart disease based on error reconstruction metrics. Successful models were packaged and forwarded for deployment.
Thrived as a contributor, scientist, and developer in an Agile development process.
Use of Python/R or similar scripting languages to manipulate, analyze and visualize large data sets.
Rapid model creation in Python using pandas, NumPy, Sklearn, and Matplotlib for data visualization.
Models implemented in SAS and interfaced with MSSQL databases and scheduled to update on a timely basis.
Developed anomaly and outlier detection development in the creation of algorithms with CNNs.
Machine learning classification of documents - Neural Network and Deep Learning language techniques, K-neighbors, K-means, Random Forest, Logistic Regression.
Trained Data with Different Classification Models like Decision Trees, SVM, and Random forest
Managed BI group and associated cross-departmental collaborators in the design, development, and implementation of near-real-time cloud and traditional data systems to capture, cleanse, store, and process data
Programmatic usage of SQL databases and search engines.
Segmentation of medical-related images using OpenCV libraries.
Developed and deployed machine learning as a service on Microsoft Azure cloud service.
Conducted and interpreted analyses using noisy data sets.
Senior Mathematics Lecturer Aug 2007 – Jan 2010
University College of the Cayman Islands, Grand Cayman, Cayman Islands
UCCI is a private institution that serves the academic needs of its local community and prepares students for the thriving business sector in the Cayman Islands. I taught a wide range of courses including College Algebra, Calculus, Linear Algebra, and Discrete Mathematics. Courses covered several fundamental computer science concepts including classical logic, combinatory, and graph theory.
Installed online test data banks and provided technical assistance for colleagues.
Executed IT projects using Raspberry Pie, Arduino, and Network Simulators.
Taught Discrete Mathematics for computer science majors.
Development of an ASPX database-driven website for posting student grades.
Invigilated break-out sessions for yearly science conferences.
Led break-out sessions for yearly STEM conferences.
Conducted yearly chess tournaments for students with personally donated funds.
Video/Audio distance learning for nearby Islands integrated within normal classrooms.
Wrote, reviewed, and was lead invigilator for several collaboratively designed Midterms and Finals.
Bachelor of Arts in Mathematics, Michigan State University
Master of Science in Mathematics, Michigan State University
Master of Science in Statistics, Michigan State University
Data Science Specialties: Natural Language Processing, Machine Learning, Internet of Things (IoT), Predictive Analytical thinking, Predictive Maintenance
Analytical Skills: Bayesian Analysis, Inference, Time-Series Analysis, Regression Analysis, Linear models, Multivariate analysis, Gradient Descent, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics. Experience in stochastic optimization and regression with machine learning algorithms, formulating and solving discrete and continuous optimization problems.
Analytics Tools: Classification and Regression Trees (CART), Support Vector Machine (SVM), Random Forest, Gradient Boosting Machine (GBM), Principal Component Analysis (PCA), Regression, Naïve Bayes,
Analytic Languages and Scripts: Python, Spark, Spark SQL, Scala, Java, C++, Ruby, LaTeX, Markup
Data Integration: SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS)
Languages: C, C++, R, Java, Perl, SQL, Python
Version Control: Git Hub, Git, SVN
Libraries: ggplot, nltk, numpy, opencv, pandas, scikit-learn, scipy, spacy, multiprocessing, dask, flask
Computing Environments: Jupyter, Spyder IDE, Atom, VBCode, Eclipse, Android Studio
Command Shell: iPython, Mac OS X, Powershell, Bash
Data Query: Azure, Google, SQL and noSQL, various SQL and NoSQL databases and data warehouses. Experience with AWS,GCP and Azure cloud computing, Spark, Tableau. Capable of writing efficient code and working with large data sets.
Deep Learning: Machine perception, Data Mining, Machine Learning algorithms, Neural Networks, Tensor Flow, Keras
Soft Skills: Able to deliver presentations and highly technical reports; collaboration with stakeholders and cross-functional teams, advisement on how to leverage analytical insights. Development of clear analytical reports which directly address strategic goals.
Operating Systems: Windows 8 and 10, Linux Ubuntu