Post Job Free

Resume

Sign in

Data Scientist Ai Ml

Location:
Dallas, TX
Posted:
March 14, 2024

Contact this candidate

Resume:

*

KAR SATA, PhD

Principal Data Scientist

(Human Health, Clinical, Marketing Domains)

Ph: 512-***-**** Email: ad4cbg@r.postjobfree.com

(U.S. permanent resident, legally authorized to work) An accomplished Data Scientist with several years of experience in Technology applications in over 8 domains including Human Health, Clinical and Marketing domains, with a down-to-earth attitude, unquenchable thirst for knowledge and eagerness to help/lift others in the team. Recognized for the ability to recruit, retain, train and motivate diverse team members.

Summary

Data Scientist with 14+ years experience in Data Science, Ai/ML, MLOps, Human Heath & Mktg.

Managed Ai/ML Dev, ML-Engr/MLOps teams for end-to-end use case production automation.

Implemented Analytics projects in healthcare involving claims, drug utilization and e-prescriptions.

Developed targeted program solutions for real-time Rx claims adjudication & increased accuracy.

Experienced in Ai/ML Enablement in 8 domains. i.e. cross-functional applications of techniques.

Executed enterprise-wide Ai/ML projects, starting from R&D, POC, Prototyping/System Design, System Test, UAT to Production automation based on Industry Standards.

Executed projects involving Python, R, Py-Spark, H2O.ai, DataRobot, Domino, AWS Sagemaker & S3, Google Cloud, Vertex Ai, Azure, IBM DSX, Spark-SQL, SPSS & SAS Enterprise, JMPPro, KNIME, Gurobi, chatGPT, Generative Ai, Large Language Models, etc. tools.

Knowledgeable in establishing scope for healthcare/clinical use cases utilizing Image/Vision Analytics

(NLP, Deep Learning, OCR) and augmented LLMs such as GPT-3/4/4V, BERT, LaMDA, PaLM for and Generative Ai models such as GANs for accurate classification.

Prompting LLMs, parsing, preprocessing & embedding text and orchestrating healthcare solutions.

Experienced in Regression, Random Forests, XGBoost, Ensembles, BinaryMulticlass/Multilabel models, Association Rules, NLP (LDA), Time-Series, Neural Nets, Deep Learning methods.

Demonstrated proficiency in Data engineering by creating structured data in databases (from flat files

& logs) including creation, management & maintenance of analytically rigorous data for Ai/ML.

Experienced in Supervised/Unsupervised methods, Data/Text Mining, Topic Modeling, Clustering, Decision Science, Statistical Analysis, SVM, PCA Time-Series, Vision Analytics & Optimization.

Skilled in Time Series Analysis, Statistical Testing, Correlation, Response Surface/Gaussian Model Multivariate Analysis, Forecasting, Business Intelligence tools and application of Statistical Concepts.

Experienced in SQL Server Management Studio, SQL Server Analysis/Reporting Services, ETL (SQL

& Informatica), SAS, SPSS, Business Objects, Teradata, Oracle 9i/10g, PL/SQL, etc.

Knowledgeable in Big Data, Hadoop, Kafka, Yarn, Pig, Hive, Map-Reduce, Cassandra, Spark, etc. Professional Experience

Heritage Auctions, TX 7/2023 – Till Date

Principal Data Scientist (Lead Ai/ML Enablement Consultant)

Developed statistical analysis practice & statistically robust hypothesis testing (cross-functionally similar to healthcare domain), A/B Testing, interval estimations, and confidence intervals tests for operations.

Led the Ai/ML team to implement Next-Gen Vision Analytics use cases (cross-functionally similar to clinical image analytics) using augmented deep learning models, OCR, GPT-4V & in-house GPU train approach for image prediction component using the Image prediction and OCR component embedded in OpenAi GPT chat.

Utilized LLM approach such as GPT-4 (enterprise), DistilBERT, Langchain, LaMDA, PaLM with data augmentation, prompt engineering and fine-tuning for most accurate classification tasks & user input testing.

Established LLM agnostic functional approach using NLP combined with fuzzy matching for classification tasks with quick response for high valued item and high billing prediction. Established scope for Generative Ai models such as DALL-E/GANs for augmentation & classification in case of imbalanced data occurances.

Established practice to maintain code repositories in GitHub, ensuring proper versioning of code. 2

Managed AI/ML Dev, ML-Engr & MLOps teams (hands-on programming + personnel management) that utilize advanced analytics techniques such as:

Linear/Logistic/Multinomial Regression, Multi-class and Multi-label Models, Decision Trees, Random Forests, SVM, Neural Nets/Deep Learning (using TensorFlow,Keras,PyTorch), Unsupervised /Supervised Machine Learning, PCA, Text Analytics, LDA/Guided LDA, NER, Topic Modeling, Classifier Chains, Ensemble Models, Time-Series & Anomaly/Outlier detection.

Created derived data elements using feature engineering (e.g. Chat topics and sentiment using LDA/BERT and LLMs, regular expressions, click streams, etc.) that could affect classification predictions & customer actions. Verizon, TX 7/2016 – 7/2023

Principal Data Scientist (Ai/ML-Engr)

Managed three global teams to implement Ai/ML use cases from Proof of Concept to Production to support 143M + retail wireless, 28M+ business wireless and 8M+ wireline consumers.

Executed a 14-month $3M+ cross-functional Ai/ML program for VBG, 18 month $7M+ program for VCG.

Developed & Deployed (273+) ML models at scale (Python/PySpark) using Ai/ML Dev + MLOps for Feature Engineering, Data pipelines (batch & real-time), Scheduled jobs, Dockerization & Model API's including Model & Feature Explainability & Model+Feature Observability (Drift, Bias, Skewness etc.)

Managed AI/ML Dev, ML-Engr & MLOps teams (hands-on programming + personnel management) that utilize advanced analytics techniques such as:

ESPx, BTEQ/SQL, Python, Spark, Shell scripting, MLOps, Cluster Management, HDFS, Hive, Teradata, Flat-files, GCP, AWS, etc. and ML methods such as Supervised & Unsupervised Learning Decision Trees, Random Forests, SVM, Deep Learning, Neural Nets, LDA topic classification, etc.

Developed and Deployed data pipelines for datasets that exist in various sources (e.g. Oracle, Teradata, Flat files, Hive/Hadoop etc.), built sophisticated ML models and deployed them using ML-Eng/MLOps practice for batch + real-time prediction infrastructure for each business use case utilizing tools/platforms such as: Python (VS Code, Jupyter Notebooks, Spyder & Anaconda), H2O.ai, DataRobot, R, R Studio, R Shiny, ESPx, BTEQ/SQL, Python, Spark, Shell scripting, MLOps, Cluster Management, HDFS, Hive, Teradata, Flat-files, GCP, AWS, Apache Spark, Spark-SQL, Py-Spark, Kafka (real-time), Oracle/Teradata/MS SQL Server, AWS Sagemaker & Vertex Ai), Google Cloud, IBM Data Science, Domino, Databricks, KNIME, Gurobi, Vespa, Azure, etc.

Led teams of Data Scientists, ML Engineers and Analysts to implement advanced ETL Processes, Data Pipelines, API Integration/Utilities for use-case production, automation and scale-up.

Created derived data elements using feature engineering (e.g. Chat topics/sentiment, regular expressions, click streams, etc.) that could affect predictions for customer actions.

Established scope for LLMs such as GPT-3, BERT, LaMDA, PaLM for Natural Language Processing performance improvement and Generative Ai models such as GANs for augmentation & classification.

Implemented effective & customer centric interventions. Conducted A/B Tests and Hypothesis Testing.

Automated metrics reports for Ranked Variable Importance, Odds ratios, Gini Index, Precision, Recall, PRC Curves, Lift, Lift-Drift, Gain, Deciles, Support/Confidence, F1/Macro-F1 scores etc.

Translated the Python & R Code to Py-Spark/Scala code for distributed model training/production.

Led the Hadoop, Kafka, Hive/Spark team to implement the model using libraries (e.g. MLlib) in Production

Trained teams to evaluate predictive model output form the Hadoop environment & validate the results. TekWissen, MI 6/2013 – 7/2016

Sr. Data Scientist (Enterprise Analytics Strategy Lead) Client: CNSI Inc., MI (6/2015 – 7/2016)

Contributed as Data Scientist (Analytics Strategy Lead) for Phase I & Phase II. Developed and implemented use-cases from concept to production. Managed two analytics teams across the globe.

Developed targeted program solutions for Real-time pharmacy claims adjudication & increased accuracy.

Implemented Strategy for Medicaid Managed Care, Medicaid Claims data to develop relevant use case involving AI/ML (predictive / statistical modeling scenarios - e.g. “Opioid” dependency prediction).

Established enterprise analytics strategy by liaising with Data teams, business users and stakeholders.

Interpreted the healthcare policy changes and determine its relevance to Healthcare Data Analytics.

Served as an integrator between business operations needs of clients and analytics + data engineering team.

Evaluated options for understanding the “Clinical data elements” to bring about a 360 degree view of the patient’s health information leading to better insights in the overall population health management.

Brainstormed new ideas, out-of-the-box solutions & executed them within a team environment.

Presented technical analysis to internal and external clients & stakeholders including C-level executives. 3

Client: DrillingInfo, TX (11/2014 – 6/2015)

Implemented data pipelines for data and introduced factors that drive efficiency for highest ROI.

Developed user-friendly, modular statistical models using R & performed model comparisons. Client: USG (United States Gypsum) (7/2014 – 11/2014)

Led the Advanced Analytics projects and implemented use-cases using proven analytics methods Client: Xerox- Texas Medicaid & Healthcare Partners (TMHP), TX (11/2013 – 7/2014)

Implemented Analytics projects in healthcare involving claims, drug utilization and e-prescriptions.

Supported client teams in Medicaid claims processing/payment & detecting fraud using analytics. Client: Advanced Micro Devices Inc., TX (6/2013 - 11/2013)

Executed technology enablement & implementation for sales (e.g. JMP-Pro & R-analytics).

Supported sales forecasting & planning team by improving time series & principal component analysis.

Validated Macro-Economic data (e.g. Euromonitor, Moody’s etc.) using key market indicators. Center of Advanced Analytics & BI (United Supermarket Project), TX 1/2013 – 6/2013 Sr. Data Scientist (Consultant)

Implemented software applications such as SAS Enterprise Miner/Guide and R/Excel/JMP-Pro.

Deployed novel analytics methods and identified variables that govern supermarket product sales. Credent Technologies, TX 6/2012 – 1/2013

Sr. Data/Business Intelligence Consultant

Managed first phase of (Extraction, Transformation, Loading) ETL process using Informatica software tool for complex data of materials industry, performed dimensional modeling and validated sales prediction. The Institute of Env. & Human Health, TX 1/2007 – 6/2012 Project Manager (Sr. Scientist - Data Science/ Data Analysis /Toxicology & Statistics)

Procured Phase I & II projects for clients by providing Data/Statistical Analysis Expertise.

Conducted Robust Statistical Analysis using SPSS, JMP-Pro, R and STATISTICA packages.

Secured $90K, $11K and $6 million total (collaborative effort) contract for new clients to expand.

Managed technology transfer and marketing operations for new technologies (e.g. toxicology and remedy).

Delivered technical product presentations and communicated client needs and our capabilities. Publications & Technical Articles

Author of 3 Peer-Reviewed Journal Papers (1 in progress), 2 Book Chapters, 7 Conference Proceedings, 5 Technical Articles, 15 Technical Presentations and 7 Poster Presentations at learned societies. Education

PhD degree with a focus on Data Analytics, Human Health & Statistical Projects (Texas Tech Uni, USA) Master of Science degree in Engineering (Texas A&M Uni-Kingsville, USA) Bachelor of Science degree in Engineering (Amravati University, India) Management Information Systems courses with focus on Data Sci (Texas Tech Uni, USA) - (1 Semester) Bachelor of Science in Computer Information Systems - Educational Equivalency (Pace University, USA) Honors/Awards

Spotlight Award (Verizon – 2023): For demonstrating Verizon’s Core Values and going far above and beyond.

Judge: Nominated and served as a Judge for an AI/ML and Data Science Hackathon in 2019 at Verizon

Recognition: Featured on TTU News (2013) as alumni for several successful collaborative industry projects.

Award Nomination: YPGL’s 20 under 40 award 2010: Selected among 40 honorees (under age 40) in Lubbock, TX, for leading my employing organizations to success and community involvement. Professional Certificates

Udacity NanoDegree: Ai/ML Data Analytics (Corporate Program) Time Series Modeling Certification (DataRobot)

Data Science Essentials Certification (DataRobot)

EIT (Engineer in Training) – Texas Board of Professional Engineers Advanced Hadoop Based Machine Learning - Austin ACM SIGKDD



Contact this candidate