Subhasis Datta
Chief Data Scientist Machine Learning/AI Expert Technical Program Manager Big Data Analytics at Scale Thought Leader
301-***-**** *******@*****.*** North Potomac, MD 20878 linkedin.com/in/subhasisdatta twitter.com/DataPredictive
Work Status: US Citizen; Held DoD, DHS, Multiple Federal Full background Clearances
Experience highlights and key accomplishments
20+ Years’ Individual Contributor and Team Lead experience as Data Scientist, Engineer, PM, Machine Learning/AI Expert in successful design, modeling, validation, deployment, performance analytics of over five dozen ML/AI, Big Data Analytics programs
Principal Innovation Scientist and R&D Lead in seven AI/ML R&D Innovation programs leading to development of new analytical algorithms, testing ideas and proof-of-concept, and product development leading to new services, products, program offerings
Created, led, mentored a Data Science, AI/ML Center for Excellence leading over 20 Data Scientist, Engineers and $5-10M line budget for successful delivery on $200M Bid Data ML/AI programs and infrastructure collaborating with cross-functional teams
Transformed as Lead one of the world’s largest ‘Big Data’ fraud detection system while increasing revenue protection by 2,500% from $1B Year 1 to $25B+ by Year 4. Collaborated with executive leadership, cross-functional teams in building a new ML/AI fraud prevention program, infrastructure to handle billions of transactions, scoring 200M returns real-time at over 90% accuracy
Successfully conducted 1,000X Performance & Scalability Modeling & Analytics of Cloud based Infrastructure and Applications; Managed hundreds of millions in Cloud Infrastructure Budget, Operations, Schedule & Performance Metrics, Fraud Analytics
Saved $10’sM in infrastructure costs, improved operations by 10% through Performance Analytics for largest Federal IT Infrastructure Outsourcing contract worth $1.5B. Led teams for Data Center, Network, Server and Desktop Operations, Cybersecurity surveillance operations, Finance and Budgeting, Enterprise Incident, Infrastructure Performance Analytics
Designed analytical and pattern analysis algorithms for one of the largest Insider Threat Unauthorized Access Detection program mining billions of audit trail transactions for abusive patterns mandated by Congressional law. Built engineering team and scaled algorithms and infrastructure as Lead for delivery within 9 months. Program detected $100’sM in abuse, misdemeanor cases.
Designed, Built, Deployed, as AI/ML Lead, $70M Air Force (AF) Predictive Readiness System employing AI, Fuzzy Logic to predict real time AF Readiness of Forces analyzing billions of metrics and geospatial data utilizing Modeling & Simulation, Visualization.
Transformed, Improved infrastructure processing performance of 25M taxpayers by 25% for IRS $2B ‘System of Systems’ IT Operations. As Data Scientist, Chief Architect, designed Enterprise Architecture, built Simulation Model of IRS Tax Filing Systems processing $1.72T/year for 200M taxpayers. Program improved Critical Success Factors, Performance Metrics by 25%.
Won $750 M Funded contracts, $25B+ IDIQ contracts as Primary Technical Author; As PM briefed C-Level Executives for 20 years
Machine/Deep Learning/AI Project/Program Lifecycle Management Predictive & Data Modeling/Analytics
C-Level Relationship Management Cross-Functional Team Leadership Product Management including COTS
IT Infrastructure Performance Assessment Innovation Management Enterprise Architecture & Design
Open Source Computing & Platforms Cloud Computing & Analytics Big Data, Hadoop/Spark ML/AI/Analytics
Education
Professional Education Certification & Micro Masters, MIT, Probability & Data Science, Big Data Analytics, Machine, Deep Learning
MS Systems Engineering, John Hopkins University (Expected 2020)
PhD (ABD Status) Management Science and Information Systems, University of Maryland
MS Management Science and Information Systems, University of Maryland
MBA Operations Research and MIS, Southern Illinois University
BS Chemical Engineering, Jadavpur University
Certification Courses: Over 2 dozen Data Science, Open Source, Big Data, Machine/Deep Learning, AI, Cloud AI/ML on-going
Career Summary
Simple Technology Solutions LLC, Chief Data Scientist, Principal, Machine Learning and Artificial Intelligence (ML/AI) 2019 – Present
As Chief Data Scientist and Principal for ML/AI responsible for leading STS Enterprise wide practice in Data Science, Machine Learning, and AI across multiple Agencies including DHS leading team of Data Engineers and Data Scientists, Cloud Architects and Engineers in design, modelling, evaluation, testing and deploying scalable ML/AI, Natural Language Processing (NLP) models
Design and build enterprise-wide Data Modeling for Data Lake, Data Science, ML and AI, NLP, Text Analytics models using Big Data Analytics platforms and tools such as Spark Ecosystem using PySpark, Python Libraries, IBM SPSS Modeler, Netezza Enterprise Data Mart/Lake/Warehouse for risk-based scoring for threat analytics, target identification, performance engineering for TSA
Support Innovation Lab Predictive Modeling for USCIS by envisioning, designing and building innovative solutions for research and technology applications within DHS using latest ML/AI tools and techniques in Text, Speech Vision Analytics, Cloud Engineering
Analytica LLC, Chief Data Scientist, Director of Machine Learning and Artificial Intelligence 2018 – 2019
Chief Data Scientist, Director of ML/AI responsible for identifying financial fraud, maleficence in working with Federal Financial Oversight Enforcement agency. Build end-to-end prototype for DHS Enforcement and Benefits fraud, Person-to-Person matching. Advice HHS/CMS on Program Safeguard Fraud and Abuse, SME to IRS in Cloud based Advanced Analytics, ML/AI applications.
Designed End-to-End Architecture and Prototype for DHS Person-to-Person Matching, Enforcement and Benefits fraud; employing Cloud based Enterprise data ingest, SAS based Analytics, Big Data Infrastructure, Tableau and Visual Analytics to identify abuse.
Building ML/AI models for Outlier detection and Performance Analysis with no labelled data for financial fraud/abuse detection
SES Corporation, Technical Program Manager and Chief Data Scientist 2017 – 2018
Technical Program Manager and Chief Data Scientist for the largest Federal Infrastructure development, Integration and Analytics Program for the 2020 Decennial Census, US Census Bureau. Responsible for hundreds of Infrastructure Engineers (Cloud, On-Prem, Networking), DevOps and Security Engineers building thousands of Servers for capturing, managing and analysing the largest data capture, management and dissemination of 2020 Census. Managed Program Schedule, Performance, Budget and Analytics, Metrics.
Advised, designed Cloud based Enterprise Fraud Analytics with PostgreSQL, SAS, Python/R Analytics on Enterprise Data Lake
Led Infrastructure, DevOps, Security Engineering team with deploying, integrating thousands of cloud servers, program schedule, performance management, metrics analytics on hundreds of millions in Infrastructure operations, hundreds of component ATOs
ICF International, Technical Program Manager, Chief Data Scientist, Technical Director, Innovation R&D Lead 2013 – 2017
ICF PM/Lead responsible for program planning, management, control, delivery and success of multiple federal agency Analytics and Infrastructure programs including HUD, USPS, DoS, others using ML/AI and Advanced Analysis with Big Data where applicable.
Transformed as PM/Lead, one of the largest federal IT Infrastructure outsourcing program valued at over $1.5B. Managed cross-functional teams with Prime contractors and Agency responsible for Data Center, Mainframes, mid-tier Servers, Network Operations, Desktop and Mobile computing assessing their Performance Metrics as guided by their SLAs. Monitored performance using real time metrics assessment of program, infrastructure, cost data captured throughout the infrastructure
Saved over 10% of program costs valued at $10’sM employing full lifecycle end to end program management metrics analysis
Saved over $100M through streamlining operations, architectural reviews identifying redundancies, gaining efficiencies in Data Center Mainframe, Server operations transitioning from on premises to Cloud infrastructure, WAN network upgrades, mobile and desktop device consolidation, migration of applications to the Cloud. Received 10 out of 10 Satisfaction ratings all 4 years.
Conducted automated performance analytics of 100+ metrics for 12 SLAs and Survey Analytics leading to 10% improvement
Automated and improved Incident Management of millions of incident records analytics for root cause analysis, infrastructure downtimes, bottlenecks using data and text mining to identify patterns to improve operations by 15%
PI for Innovation Award on Open Data initiative for Federal agency financial analysis using Cloud based BI and Dashboard
Received several Innovation Excellence Awards, Program Performance and Business Development, Proposal Writing Awards.
Data Science and Social Good Volunteer Experience (2013 – Present):
Member, NIST Public Working Group on Defining Big Data, Analytics and IoT Standards
TPM and Social Good Data Scientist deploying Machine Learning, AI, Text Analytics/Mining, UN Standby Task Force
Senior Data Scientist, Program Manager for Data Analytics, ML/AI in social good programs (DataKind, Data Without Borders)
Senior Social Good Data Scientist, TPM for humanitarian, underprivileged mission support, Statistics Without Borders
The MITRE Corporation, Chief Data Scientist/Architect, R&D Lead & Systems Engineer, Technical Program Manager 2006 – 2013
Chief Data Scientist/Architect, Lead for a Portfolio of dozens of Analytical, Machine Learning (ML) and AI, Infrastructure and Systems Engineering, Enterprise Architecture and Design, BPR programs. Transformed in this TPM and Chief Data Scientist, Machine Learning Scientist role dozens of Fraud Detection, Compliance Improvement, Modeling and Simulation and Analytics programs along with their associated ML/AI Infrastructure working in collaboration with diverse cross-functional teams and C-level Executives for 7 years.
Transformed as PM, one of the world’s largest ‘Big Data’ fraud detection system while increasing revenue protection by 2,500% from $1B Year 1 to $25B+ by Year 4. Collaborated with executive leadership, cross-functional teams in building a new ML/AI fraud prevention program, infrastructure to handle billions of transactions, scoring 200M returns real-time at over 90% accuracy
Transformed, Improved infrastructure processing performance of 25M taxpayers by 25% for IRS $2B ‘System of Systems’ IT Operations. As Chief Scientist/Architect, designed Enterprise Architecture, built Simulation Model of IRS Filing Season Systems that processes $1.72T per year for 200M taxpayers. Program improved Critical Success Factors, Performance Metrics by 25%.
Improved $2B IRS IT Infrastructure Operations by reducing downtime by 10% through deep understanding of Top 10 issues among 10% Incidents causing 90% of downtime, outages. As TPM and Data Scientist performed Data Analytics and Text Mining on millions of Incident Management records to analyze Help Desk Incidents, Patterns, Root Causes, Solutions to reduce risk.
Reduced risk in 50 mission critical systems (multi-billion budget) by 20% by designing a comprehensive risk and readiness assessment framework with multi-dimensional factors and criteria to evaluate operational readiness and address key risks
Improved operational system performance of $ multi-billions by 25% by developing end-to-end IRS 'System of Systems' architectural diagrams, dynamic Simulation Model for C-level Executive what if analysis scenarios for critical decision making
As PI and R&D Lead won six Innovation proposals worth $1.5M over 10 years to build new capabilities, services and products
Received MITRE’s highest recognition ‘Directors Awards’ twice and MITRE Corporate Knowledge Management Award for excellence in program and IT management.
Alion Science & Technology, Asst. VP, Director for Center for Data Science and Systems Engineering, Sr. Data Scientist 1996 – 2006
Director of Data Science and Technical Program Manager, ML/AI Center for Excellence leading over 20 Data Scientist, Engineers and $5-10M line budget leading to successful delivery of $200M ML/AI programs and infrastructure. Collaborated with cross-functional teams to successfully impact and deliver on dozens of Analytical, Fraud Detection, Systems Engineering, ML/AI programs worth millions of dollars. As Director, TPM, acquired four product companies, startups to integrate their products as part of portfolio of products in $ multi-million programs and products sold. Created Fraud/Compliance Revenue Maximization ML/Analytics algorithms.
Designed analytical and pattern analysis algorithms for one of the largest Insider Threat Unauthorized Access Detection program mining billions of audit trail transactions for abusive patterns mandated by Congressional law. Built engineering team and scaled algorithms and infrastructure as TPM for delivery within 9 months. Program detected $100’sM in abuse, misdemeanor cases.
Designed, Built, Deployed as Lead, $70M Air Force (AF) Predictive Readiness System employing AI, Fuzzy Logic to predict real time AF Readiness of Forces analyzing billions of metrics and geospatial data utilizing Modeling & Simulation, Visualization
As Chief Architect, Lead, designed DHS Incident Command and Director National Intelligence Information sharing Architecture
Transformed Federal Crop Insurance Program by developing a new Fraud Detection program that detected $100’sM in new fraud protected and trained analysts and investigators in Data Mining and Analytical techniques. As Individual contributor and Lead of cross-functional teams designed over 25 analytical, ML, pattern recognition fraud detection algorithms to identify fraud.
Chief Data Scientist, PM and Lead for Analyzing and Testing Medicare / Medicaid data for fraud, waste and abuse for the Health Care Financing Administration (HCFA) Statistical Analysis Center (SAC) at IT Infrastructure and Testing Lab
Won $1M+ R&D funds as PI/Data Scientist to Innovate on Analytical/ML algorithms for proof-of-concept tax revenue maximization product in collaboration with Oracle, Teradata to build Data Warehouse, 20 non-compliance detection algorithms.
Received ‘Presidents Award’ from IRS Commissioner and more than 12 excellence awards from Alion Science and Technology for excellence in program, mission, project, and program performance.
Additional positions: University of Maryland, Chief Data Scientist, Director of Research, Institutional Studies, Information Systems; Adjunct Faculty, Robert H. Smith School of Business
TECHNICAL COMPETENCIES, TRAINING, TECHNOLOGIES, CERTIFICATIONS
Certifications & Data Science, AI/ML Training: Deep and Machine Learning, AI (on-going), ITIL V3, ITIL Foundation for IT Infrastructure and Service Management, Certified Knowledge Management Professional, KMPRO, Data Scientist, Data Science Central, Big Data, Data Science Certification, Open Source Languages such as Python and R for Data Science and ML/AI, Big Data and Analytics, Graph Database and Analytics; Multiple University Certification Courses in Big Data Ecosystem, Cloud analytics, Machine Learning, AI, IOT.
Data Science Skills and Competencies: Advanced Data Mining including Neural Nets, Decision Trees, SVM, PCA, Clustering, Machine Learning, AI, Statistics, Big Data on Hadoop and Spark, Operations Research, Modelling & Simulation, Optimization, Fuzzy Logic; Data Science including Machine Learning, Artificial Intelligence, Agent Based Modelling, Data and Query Optimization; Data Management and Modelling, Extraction, Transformation, Loading of large scale warehouse and Big Data Mining systems, Business Intelligence, Map Reduce, Relational, Object, Columnar, In-memory and Mainframe databases, Advanced Analytics algorithms, Performance Modeling & Analysis, Analytics in the Cloud; IoT Sensors with Big Data, Cloud Computing.
Software, Tools & Languages: R, Python (Pandas, NumPy, PySpark, Scikit-Learn, Matplotlib, Tensorflow), SQL, Java, Hadoop Ecosystem with HDFS, Spark, MapReduce, MatLab Programming, Netezza, PostgreSQL, Oracle DBMS, Database Administration, ETL (SQL Loader), Warehouse Builder, PL/SQL Stored Procedures, Query Optimization, Oracle Spatials/GIS, Oracle Application Express and MapViewer, Oracle Express Server, Oracle Discoverer, Hyperion Business Intelligence, Business Objects Reporting, Sybase, Postgres SQL, SQL Server DBMS and SQL, SAS Statistical Procedures, SAS Data Mining, IBM SPSS Modeler, SPSS Statistics, Java Programming and J2EE Architecture, MS Office, MS Project, SEI CMM/CMMI Assessments, IDEF0 Workflow Modeler, IDEF 1X Data Modeler, PVCS Tracker Test Tools, Rational System Architect, Proforma ProVision. NCR Teradata, Informix, MS Access, DataCom DB, DB2, ERWIN, Meta Design/IDEF, Oracle Designer, IEF, Telelogic (QSS) DOORS, Rational Requisite Pro, CORE, Rational Test Manager, Clear Quest, Mercury Win Runner, Load Runner and Test Director, ModSim, Arena, iGrafx, AnyLogic, and Octave, BPR Process, System Engineering tools.
Data Science Coursework, Certifications coursework completed or in progress:
Massachusetts Institute of Technology - Professional Education & Certification Series, Artificial Intelligence, Data Science, Big Data, Deep Learning, IoT, Analytics courses (2015-2020 on-going): Micro Masters on Statistics and Data Science, Professional Education and Certification courses: Deep Learning Series (Intro, CNN, LSTM, RNN, GAN) Internet of Things: Roadmap to a Connected World; Big Data and Social Analytics; Tackling the Challenges of Big Data and Analytics; Computational Thinking & Data Science, Social Physics
Stanford University – Deep Learning Certification (on-going), Big Data and Hands on Machine Learning Course: Machine Learning
Cloud Computing, Machine Learning, Data Engineering and Analytics in Cloud Certification and learning series of courses (on-going):
Cloud Computing, Google Cloud Platform, AWS, Azure ML, Cloud Architecting (Google, Amazon, Microsoft, Cloud Academy)
University of California, San Diego - Big Data Specialization - Unlock Value in Massive Datasets 6 courses (on-going)
Introduction to Big Data; Hadoop Application Framework; Big Data Analytics; Machine Learning and Graph Analytics with Big Data
The Johns Hopkins University - Data Science Specialization& Executive Data Science courses (work in progress on 15-course series):
The Data Scientist's Toolbox; R Programming; Getting and Cleaning Data; Exploratory Data Analysis; A Crash Course in Data Science; Building a Data Science Team; Managing Data Analysis and rest of the series for each area of specialization
Additional Data Science Courses, AI/ML courses: Additional several dozen Certification and Professional Education courses
Publications: A dozen published journal and conference papers; Professional organization papers such as with ACT-IAC
Memberships: IEEE, ACT-IAC, ACM, SIGKDD, INCPOSE, PMI, MIT Professional, NIST Working group, UNHCR, several Data Science orgs