Resume

Data Scientist Science

Location:

Plano, TX

Posted:

March 16, 2024

Contact this candidate

Resume:

Salih A. Osman, Ph.D., Principal Data Scientist, MCSE, Chief Economist

Plano, Texas, USA 469-***-**** ad4dsd@r.postjobfree.com https://www.linkedin.com/in/salih-osman-459998253/ LEAD DATA SCIENTIST

Accomplished, highly innovative and analytical Lead Data Scientist with extensive data science experience at both Public and Private Sectors, two published papers and expertise in Data Science, and AI at Major US 500 Fortune Corporations, Multi-National Corporations, and Government in US and the GCC Region. Extensive and diverse teaching and mentoring experience in Data Science, Analytics, Statistics and Economics in the US and abroad. Exemplary communication, critical thinking, and technical skills. AREAS OF EXPERTISE

ML & AI Solutions Life Cycle & Deployment CRISM-DM Methodology Model Development & Selection Cross-Functional Collaboration Project Management Mentoring Process Improvement Compliance Quality Assurance Statistical Inference NLP Collaboration Supply Chain Portfolio Cost Optimization and Demand Forecasting Radio Network Predictive Maintenance

IOT Predictive analytics Python Scripting Using Pyspark for handling big data volume Building ETL pipelines -Using tools such Databricks -Data Lakehouse; DataIku; SQL; and Apache Spark. Data Science, ML and AI Portfolio

• 3+ Years of experience in Gen AI, Vector DB- Milvus, FAISS, Auto GEN, CO-PILOT, ChatGPT 4.0, ChatGPT 3.5 TURBO.

• 15+ years of professional experience as a Data Scientist - applying machine learning and quantitative analytics in optimizing business decisions using technologies like Scikit-learn and Deep Learning.

• 10+ years – using major and Deep Learning frameworks (e.g., Scikit-learn, PyTorch, TensorFlow and Keras) and Algorithms

(e.g., CNN, GAN, LSTM, RNN, XGBOOST).

• 7+ years of Data Engineering experience with Modern BIG Data Analytics Architectures (Hadoop, SQL, HIVE, Spark, etc.) on major CLOUD Platforms (e.g., AWS, Azure, Google Cloud).

• 3+ Years of experience in fine Tuning and Prompt Engineering of LLMs- Large Language Model- Transformers, such as BERT; Chat GPT, Llama-2 including Relevance Augmented Generation of Knowledge- RAG models.

• 7+ years extensive experience with tech stacks like Python; R for statistics; Azure Data Bricks.

• 4 years of experience using PyTorch as ML development platform to build my ML models.

• 7+ years extensive experience with Exploratory Data Analysis -EDA- visualization tools such as: Matplotlib; Seaborn; ggplot2; Tableau; Power BI.

• 10+ years – have Demonstrated expertise in various Statistical data modeling areas like Linear Mixed Effects models

(LMM); Analysis of Variance (ANOVA, and MNOVA); experimental design; Generalized Linear Models (GLM)

• 5 + years’ Experience with AGILE Scrum Delivery using Sprint Planning; experience with Consumer-Packaged Goods (CPG) and / or RETAIL Industries.

• 10+ years of strong skills in Data Science Solution Delivery, Data Science Domain, and Data Science Advocacy:

• 7+ years Working knowledge with modern CLOUD-based Data Storage and Compute environments (e.g. AWS Sage Maker, Azure Databricks, Vertex AI-platform in GCP, Big Query, AWS s3, SQL,..etc)

• 5+ years’ Experience deploying AI/ML Models into discovery/production environment to drive insights for business decision making in many use case, such as Anomaly detection, Demand forecasting, Predictive Maintenance, and NLP analytics for Executive team insights discovery from Companywide Executive Summaries and enterprise-wide employees surveys.

• Excellent communication and presentation skills, with the ability to articulate new ideas and concepts to technical and non-technical partners.

Work EXPERIENCE

Principal Data Scientist – GEN AI.

AT&T/AMDOCS, 03/2023 – Present.

Developing GEN-AI solution to support business critical operation at Major Telco company with emphasis on -ingestion hugh amount of text on Radio Network design and development, and maintenance. This includes conduct text chunking techniques - such sentence, recursive character text, contents-aware, etc, Text Embedding- using LLM model Embeddings from HuggingFace Hub Embedding Models Library database. Finally, store the Text Vector embedding in an Indexed Vector DB such as (Milvus, FAISS, Chrome...etc.), then feed the Vector to LLM Prompt such (GPT4.0, GPT-4-32k, GPT3.5 Turbo and GPT3.5-16K) The GEN-AI Use case solutions above were built on Azure cluster platform with fully fledged Python API’s such as Flask, - for developers and data scientist use. As well as RESTful API for serving for end users (Q &A, Text Summarization, Text, Images Searches, ….etc) .

Following Sprint Agile Planning. Working within a cross-functional team of technologist from data science, data engineering, and software development

Ericsson, Plano, TX, 06/2019- 03/2023

Principal Data Scientist/Senior Data Engineer

Enterprise-wide Lead Mentor for Citizen Data Scientist Program

• Lead Enterprise-wide Strategic - NLP AI/ML Solution development of Business Strategic Use Case to support HR and CEO office -to create an Executive Team Feedback Framework (ETFF) using Enterprise-wide Employees Opinion Surveys on the state of Ericsson Business Operations and strategic direction of the company - by fine tuning Advanced Large Language models such as-BERT, XLM, and Chat GPT models. The model is fine-tuned on Enterprise-wide employees and executive team feedback comments in of 150,000 employees comments bi-annual surveys from across the organization mentioned above to .

• Furthermore, to improve the above (ETFF) Fine Tuning and Prompt Engineering of LLMs- Large Language Model- Transformers, such as BERT; Chat GPT with RAG techniques.

• Train, validate and deploy classification and clustering predictive and prescriptive ML models to solve business problems – by aiding business make smart decisions. This includes (Bill of Materials Anomaly detection to avoid Project Lead Time- LT extension and Radio Network Site delivery interruption, predictive maintenance of radio eNodeB upgrades, as well Customer Churn Prediction and Customer Marketing Campaign Clustering analytics).

• Have used PyTorch/Flair library from “Hugging Face” as ML development platform to conduct BERT tuning model for our employee’s survey sentiment analysis, Name-Entity Recognition -NER to correlate entity with Sentiment Score across business functions. This NLP solution has provided Ericsson Executive Team with Data-Driven Enterprise -wide Employees’ Feedback Analytics Framework Presented in a Sentiment Dashboard Score by Topics/Business Function.

• Designed, developed, and deployed Anomaly Detection Engine-using deep learning Autoencoder model- for Quality Check of Site Bill of Material for three major global telco operators. Also used for Radio Manufacturing anomalous defects, and financial data risk analytics. The solution has the potential to be replicated to other operators and others department within the company. Business impact reduction of Radio Site Delivery Lead time by 50%, and improved operator site acceptance rate.

• Pyspark was utilized to handle big data volume for Radio site build - Bill of material - across 5 market Data areas globally. Some version of this model was deployed on Google Cloud Platform-GCP Vertex AI. In another Ericsson Marker Area, I have deployed the model on AWS Sage Maker VPC Cloud instance.

• Have used PyTorch as ML development platform to build my model.

• Supply Chain Stock Inventory Portfolio Planning- SIPP Material Demand Predictive Planning. I have worked on Supply Chain Logistics predictive analytics for demand forecasting for Radio Site Build Bill of Material - BOM. In order to ensure the stock replenishments of Radio Site material build and maintenance material and replacement parts. I have used Tableau and Power BI dashboards as front-end UI for Visualization of my model daily and Weekly Strategic Stock of BOM Forecasting to be used by Radio Site support Engineer.

• Radio Site Build Project Predictive Planning. Utilizing Bayes theory to compute the optimal resource costs for building the site. In other words, to optimize the Bill of Material resources costs, project labor billing cost, and time to build. Where, site project net revenue to be maximized, given certain material and time cost constrain.

• Used MILP optimization model to optimize the Radio Site Build/Region population coverage (to address Set Coverage Problem). Including formulation of MIP optimization model (Decision Set/Indices; Tower build Cost and Region Population coverage Parameters; Decision variables for build and coverage; Build cost and coverage Constraints; and objective function of Coverage 3

• CO2 Emission: Predictive Forecasting Computation of CO2 Emission for Ericsson Inbound and Outbound shipments for annual “Climate Change- sustainability report”. The objective of the report is tracking Ericsson 2030 – Zero CO2 Emission goal and the underlying SDG goals. Using a mix of data sources: - Geospatial, Drone Aerial images, different modes of transportation CO2 emission data, for Land, Sea, and Air freights data streaming. To conduct CO2 emission predictive /forecasting for Ericsson Radio system manufacturing footprint globally.

• Radio eNodeb Fault Prediction: built (an Ensemble XgBoost) ML Model utilizing Data Streaming in near time from Radio Devices daily to Predict Potential Failure of Radio nodes within the next 24 Hours. Saved company 80% of Engineers’ site visit “Truck Loads Cost” per site upgrade.

• Involved creation of document term matrix-DTM from text- using intensive text mining and NLP techniques to build features from history of resolution and observation troubleshooting notes texts- recorded by Radio network support engineers. This sparse matrix of binary-value features extracted from text- is added to the model structured features to improve the explanatory power of the model and its accuracy.

Senior Data Scientist: Consulting for the following clients: - AT&T, Citi-Bank Group -TX, and NCR- Atlanta, GA, 08/2017 – 05/2019 Build Analytics Solution for AT&T: Customer Products Cohort Migration –Journey using Predictive analytics models and Revenue forecasting. Deploy models within Customer 360 framework- includes Service Subscribers clustering for marketing campaign rollout. Use advanced Complex Nested SQL queries on Teradata Warehouse Platform, Python, and R as analytics tools to build these models. Including global data integration layer definition, as well as the Product Life Cycle Support Analytics from pre-sales phase to Customer Churn rate Classification prediction at Production Phase. Using CI/CD pipeline with microservices platforms such as: Docker container, Kubernetes, Kubeflow, TensorFlow Enterprise, in combination with Cloud platforms such as AWS Sage Maker, Azure Databricks MLflow, and Google Cloud Platform-GCP Vertex AI.

• Used Python as main programming language.

• Reverse Logistics Optimization Predictive and Prescriptive analytics: - T-Mobile Reverse Logistics (Device Remorse Cost and budget Optimization Program Analytics Project). To predict customer/Subscriber device, returns. Then used Mixed Integer Linear Programming – MILP model to mitigate the associated financial loss and optimize the overall Device Remorse Budget Program of the carrier.

• Utilizing Bayes theory to optimize phones devices’ returns financial costs minimization- as objective function, given transportation logistics and devices insurance cost as constrains.

• Used R as main Programming Language in building and training the model.

• Banking and Financial Services Experience: Develop an anomaly detection model for Financial Risk analytics for UBS bank, New York, Anti-money laundry (AML) prediction.

• Citi-Bank Group, Irving, TX, conducting NLP for Mortgage Contract documents (PDF, DOCS, ...etc.). Ingest Hugh amount of text form those documents with tables, text boxes, …etc. to create a Text Corpus to conduct NLP analytics on the top using Python NLTK, SPACY, Word2Vec, Doc2Vec, embedding techniques for words and documents contents. Then, conduct Stemming, Lemmatization, feed into SPACY or Transformers Language models such as – BERT to Entities name, extract insight and find correlation between entities within a Context to discover deep insights within contract documents. In other use cases- using TF- IDF to construct Document-Term Matrix-DTM to Predict outcomes using Binary or Multi-Class Classification problems.

• Involved creation of document term matrix-DTM using intensive text mining vectorization techniques- such as TF-IDF, with the help of Python NLTK library to search for similarity of certain business token within big corpus of mortgage loan contracts electronic documents.

• As part of my role at NCR Enterprise Customer marketing Campaign analytics program, I have used Alteryx to design my model flow. Also, run model iteration for testing and sharing different scenarios of marketing campaigns outcomes. Senior Data Scientist: -Computer Vision, Transportation Logistics at FedEx, Memphis, TN, 01/2017 – 07/2017

• Collected IOT data from Cameras, RFID devices to train the following AI neural net models.

• Artificial Intelligence (AI) project for missing Shipment’s image processing, and Text Extraction from shipment labels’ Image to feed into NLP model for further data mining. The computer vision techniques used are Image Edge detection, feature extraction, Localization for Image and object identification and Image matching. Utilizing (Deep Learning Neural networks algorithms- such as Convolutional Neural Networks –CNN, RNN, and Deep Belief Networks, DBN). The solution helped FedEx automate the process of missing shipment tracking and location, which saved manual work (FTE’s) and improved customer experience in such situations.

• In addition to Speech recognition, conversion of voice to text and conduct text mining on the top of that, with CNN algorithm 4

For Sentence Classification, to enrich customer self-service experience via “Chat Bots” to answer his question and interact with customer in real time with knowledgeable answers on Customer Shipments tracking and location issues. Utilizing tools such as Azure-ML and IBM Watson.

• Major strategic initiative for a World’s Leading Logistics Services Provider Big Data analytics project for shipping logistics optimization POC, utilizing to predict Shipment Caging in advance and recommend prescriptive solution to customers and agents to avoid Shipment Caging, minimize shipment caging cost using MIP Optimization model. Hence, Ericsson could maximize customer (the Telco Carriers) satisfaction and improve Customer Experience.

• Used Python as main programming language.

Senior Data Scientist/ Statistician: HCL America:- Aerospace and Airline at Boeing & Telco T-Mobile, Seattle, WA, 01/2012 – 12/2016

• Boeing- (Seattle, WA) Using aircraft Quick Access Recording-QAR- System sensor data timeseries modeling with - COX Regression, Hazard Proportion Model for prediction of Aircraft Devices time to fail for predictive maintenance purpose prior to flight to minimize “Preventable Flight Interruption Events- Major Aircraft Manufacture KPI for Airline Operators”.

• To build the above model: Using Aircraft IOT data to build aircraft predictive maintenance modeling using structured, semi- structured and aircraft technician maintenance text log data -using NLP analytics techniques such as DTM matrix of uni-bigrams

- to build One Hot Encoding matrix to predict the likely hood of aircraft devices failure/maintenance need to fix and avoid flight interruptions. Another accomplishment is building an Ontology of Aircraft maintenance to help in building a Knowledge Graph to support Design and Support Engineer in troubleshooting aircraft devices failure and long-term maintenance.

• Involved creation of document term matrix-DTM using intensive text mining vectorization techniques- such as TF-IDF, with the help of Python NLTK library.

• Pyspark was utilized to build the ML model and to handle big data volume for Aircraft Quick Access Recorders -QAR- traffic data hosted on Hadoop cluster; streaming data was handled by Kafka topics configured on the data lake.

• Conduct Extraction, Transform and Load -ETL techniques such Kafka -KSQL (running interactive SQL queries for processing QAR data streaming on Apache Kafaka platform).

• Used R as main Programming Language in building and training the model.

• Airbus- USA: Used big data sets -stored in AVRO format- streaming from aircraft in batch mode to be used for building Sensors Data forecasting models for aircraft certain component monitoring.

• Retail Analytics – Home Depot and Dollar General, includes building Hierarchical Store Demand Forecasting models to streamline Departmental and Regional Stores inventory demand forecasting within the Retailer Supply Chain. Also Participated in building Cybersecurity attacks predictive ML Models - (Such as: Denial of Service- DOS attacks) Predictive analytics model to proactive secure network vulnerabilities in the aftermath of major Home Depot Cyberattack in 2014.

• Marketing Mix Modelling- MMM, develop multi-Channel advertising campaign ML- MMM for T-Mobile customers Journey Analytics using Multi-variate Regression analysis and other regression techniques.

• Using the MMM and Bayes Optimization Model techniques to evaluate and optimize the budget for the different marketing channels – TV, Mail, and Online.

• Used Python as main programming language.

Statistical Lead Expert: Evidence-based Socio-Economics Policy Analytics – Studies and Research Section Head –General Secretariat, Executive Council, Abu-Dhabi, UAE 06/2007 to 12/2011 Lead Cross-Functional teams from across the Abu-Dhabi government, the Business Community main Players (MUBADALA, ADNOC, MASDAR, Chamber of Commerce etc.) and Academic Community (in Triple Helix Model framework) to conduct a major Advance Analytics and Statistical Analysis projects on the Innovation and Intellectual Capital resource capabilities of the Abu-Dhabi Information and Knowledge- based Economy to identify Gaps and recommend Macro and Microeconomics Policy options, to transform Abu-Dhabi economy from Natural resource based to Knowledge-based economy, which resulted in Abu-Dhabi 2030 Economic Vision. The Vision document includes among many other statistics – a Social Accounting Matrix- SAM report for the analysis of Emirates National Income Accounting System dis-aggregated by main economic Sectors- flows of all economic transactions. The project report was designed, implemented, and published in two UN Conferences in Geneva in 2009 & 2010 at Science, Technology & Innovation STI- 2020 Annual Conference. In addition to build the research capabilities of GSEC, lead the development of Across the Abu-Dhabi government initiative for Knowledge Management- KM and build the foundation of Abu- Dhabi government Communities of Practice (CoPs). Lead research team to develop and review the Abu-Dhabi Policy Agenda for the 2030 vision.

Sultan Qaboos University, Sultanate of Oman, Muscat Assistant Professor of Economics, O8/2005 – O5/2007 Congressional Research Services, US Congress, Washington, D.C, Statistician/ Public Policy Research Analyst. 5

Howard University, Collin Community College, Tarrant County College Adjunct Faculty/Assistant Professor Abu-Dhabi Executive Council, UAE Head of Statistics and Strategic Research Congressional Research Services, US Congress, Washington, D.C. Jan 2001 – 07/ 2005 Data Scientist/ Research Analyst

Conducted Research and statistical data modeling -using SAS, and E-views for time series Forecasting, to Support Evidence-based Policy at Congressional Research Services to support the Congressional staff at both Chambers with economic and statistical analysis for brief Economic and Social research submissions on a diverse list of issues ranging from economics, social welfare, defense, trade, to technology, science, and industry.

EDUCATION

Ph.D. Economics & Applied Statistics Howard University, Washington, D.C., 2000 Master of Arts (MA), Economics Howard University, Washington, D.C., 1994 Microsoft Certified System Engineer- MCSE, 2001

TECHNICAL SKILLS

R, Python, Sklearn Pandas Numpy Pyspark IBM-SPSS SAS TensorFlow Keras SAP HANA Big Data technology- Hadoop

Apache Spark 365 MS Office Computer Vision NLP RPA and AI Solutions AWS SageMaker Azure MLFlOW Google Cloud Platform - Vertex AI for CI/CD Deployment of AI/ML Solutions On-premises and Public Cloud or Hybrid Multi-Vendor environment Data Exploratory Analysis- Dataiku Tableau and Power BI- Building Information. Data Science/ Experience Summary

1. Python programming

From my resume you find: - 10 years of experience in Python as a Programming languages for all my ML model build, evaluation and deployment: Python (skilled).

• Highly skilled in using Python as a programming language in the following use case: Build Anomaly detection for radio site Bill of Material Quality Check, Prediction of Stock inventory Planning Portfolio, Store demand replenishment demand forecasting at Dollar General, Aircraft predictive maintenance at Boeing.

• Highly knowledgeable with Python programming concepts and practices such as data structures

(List, Tuple, Dictionary, and Sets, etc..), data types - float, integer, binary, and str string types. As well as - Indexing, Slicing, and building - User Define Function- UDFS, Methods, and Classes.

• Highly skilled Building CI/CD - Continuous Integration and Continuous Deployment pipelines to deploy AI/ML models on various cloud platforms- such as AWS Sage Maker, Azure Databricks MLFLOW with Kubernetes.

• Highly skilled in building Python programming codes using major IDEs such as Jupyter Notebook, Vscode, PyCharm, etc.

• Highly Skilled in building nested Python functions and methods to automate data ingestion, wrangling, feature transformation and engineering, model training and hyperparameter tuning, validation and deployment using Docker Container, or AWS, Azure or GPC container registry

/Kubernetes-on VPC instance- endpoints or conventional API's, such as Flask, etc.

• Big Data highly skilled coding in PySpark to build ML model pipelines above. along with Kafka and Zookeeper in cases where model data streaming is needed. 2. Cloud – Platforms:

From my Resume, I have experience in two major platforms, with AWS Sage Maker being the last one I have used at Ericsson: -

• Have Hands on experience in Using CI/D microservices platforms such as: Docker container, Kubernetes, Kubeflow, TensorFlow Enterprise, in combination with Cloud platforms such as AWS Sage Maker, Azure Databricks MLflow, and Google Cloud Platform-GCP Vertex AI. 6

• My Technical Skills are highly concentrated on Cloud technologies from below text from my resume as well:

• Play a Lead Role in AWS SAGEMAKER deployment- leading team in building CI/CD Industrialization platform for deployment of AI/ML Solution and to support of AI/ML democratization for Citizen Data Scientist.

3. Data engineering – Big data:

• All my above experience involves dealing with Big Data both on Premise and in cloud.

• Have Hands on: Apache Spark, PySpark, Hadoop- Data Lakes, Hives, HDInsight, Paraquet files, AWS s3 buckets, Azure BLOB Storage, Azure Data Warehouse, Data Lake and Delta Lake.

Modelling: BIM/IFC- Industry Foundation Classes as platform for End-to-End Service Delivery AI/ML Solutions Languages: English, Arabic: Expert and High Proficiency Spoken and Written; Native Language. Continuous Applied Learning: Robotics Process Automation - RPA Reinforcement Learning models. Hobbies: - Reading data science and AI articles. I like walking, hiking, camping, and outdoor activities cooking & grilling. Main Achievements:

• Boeing Pride Award for Data Science Program at - Commercial Business Unit- Boeing- 2016

• Best Applied Data Science Boot Camp Mentor Award at Ericsson for Two Years in a Raw - 2021-2022

• Microsoft Certified System Engineer- MCSE Certified.

• Abu-Dhabi Executive Council Excellence Award Program Recipient- 2010

• While at Graduate School in Howard University - Winner of Kawasaki Fellow, UN University Fellow, Dean List of Graduate School of Art & Science at Howard University, Graduate with 3.85 GPA - Ph.D.

• Graduated with Honor Class Level 1 bachelor’s degree in economics -University of Khartoum-1988. International Journal Publication List:

Two Paper Published at: Int’l Journal of Knowledge Management and Cultural Change • “The Effect of Knowledge and Technology Transfer on Long-Term Growth: Country Cross-Section Comparison between the Gulf Cooperation Council Countries and Selected Asian Countries, 1985-2005”

“The Impact of Knowledge Creation and Sharing on Long-Term Growth: Country Cross-Section Comparison between OECD, Non-OECD and Arab Countries, 1990- 2005 “ Languages English: Full Proficiency, Spoken and Written. Arabic: Full Proficiency, Spoken and Written

Contact this candidate