Post Job Free
Sign in

Data Scientist Machine Learning

Location:
Chantilly, VA
Posted:
June 26, 2025

Contact this candidate

Resume:

Hung Pham

***** ******** ******, ********* ** ***52

703-***-****

*********@*****.***

OBJECTIVE:

l A Data Scientist in the software development space PROFESSIONAL SUMMARY:

l Over 10 years of Data Scientist/Machine Learning experience and Python l 3.5 years as Solutions Architect and member of Cloudera Government Solutions Professional Services team with a focus on Data Analytics.

CLEARANCE: TS/SCI

EDUCATION:

l Masters Business Administration - Georgetown University l Bachelors of Science in Mechanical Engineering - Virginia Tech ONLINE COURSES:

l Udacity Self-Driving Car Engineer Nanodegree

l Udacity Deep Learning Nanodegree

l Machine Learning by Andrew Ng – Coursera

l The Data Scientist’s Toolbox by Jeff Leek, Roger Peng and Brian Caffo – John Hopkins University l Mining Massive Datasets by Jure Leskovec - Coursera TECHNICAL SKILLS:

GENERATIVE AI

Generative Artificial Intelligence using LangChain to leverage Large Language Models (LLM) in maximizing operational efficiency and productivity. Built Sequential and Router chains using LangChain, Similarity Search using FAISS, Retrieval Augmented Query (RAG) using Ollama and multimodal (image, text and audio) search using ImageBind and Cosine Similarity. Built local AI agents workflows using n8n, Supabase / Qdrant, Docker, Open WebUI, Streamlit and Bolt. Built teams of AI agents using Swarm/Autogen. Built teachable agent using Autogen. Built hierarchical team of agents using LangGraph. Built multi-agent software team that writes Python code using LangGraph.

NATURAL LANGUAGE PROCESSING

Natural Language Processing/Text processing - Topic Modelling using scikit-learn’s NMF, LDA and TF-IDF. Context/semantic similarity via embeddings using Gensim Word2Vec, Doc2Vec. Researched content similarity using sequence-to-sequence Recurrent Neural Networks (RNN) and BERT transformers. IMAGE DETECTION AND CLASSIFICATION

Image classification and object detection with Faster-RCNN, Mask-RCNN and YOLO-V3, using Pytorch, Torchvision, JupyterLab on cloud platform (AWS SageMaker) and pre-trained models (ResNet, Densenet, Inception, etc).

DEEP LEARNING / REINFORCEMENT LEARNING

Created schema for hyperparameters using Hydra and integrated it with Kubeflow pipeline which consists of preprocessing, training and testing of data on Torchvision models in Docker images registered in AWS ECR. Created visualizations of embeddings using t-SNE, UMAP, PCA and GradCam in Jupyter notebooks. Enhancement of reinforcement learning framework by adding semantic triple knowledge base/graph as storage for new/known information on relationship between Model (of environment behavior) and long-term Value (for Policy planning).

DATA ANALYTICS

Quantitative (numeric) and qualitative (categorical, label / name) data analytics using Spark MLlib algorithms, Scikit-learn, Impala, Hive, R, Oracle, MySQL and SQL Server 2008 R2.. Analysis normally follows acquisition

(Flume, Sqoop, Kafka, Oracle Direct Load and export/import, mysqldump, SQL, PLSQL, etc), manipulation / transformation / enrichment (using Python, R, CSVkit, perl, shell script, C#, PLSQL, T-SQL, SparkSQL, Hive, Impala, etc), analytics (supervised/unsupervised machine learning, NLP, topic modeling using Spark MLlib and Scikit-learn), storage (HDFS, HBase, Oracle, MySQL, SQL Server 2008 R2, R, etc), retrieval (HDFS CLI, Sqoop, Hive, Impala, SQL, PLSQL, T-SQL, Oracle export, mysqldump, etc). HADOOP

CDH3, CDH4 and CDH5 ecosystems (Hive, Pig, Hue, Oozie, Sqoop, Impala, Flume, HBase, Accumulo, Zookeeper), Accumulo, Spark and Cloudera Manager. Configuring Kerberos authentication using SPNEGO. Configuring MapReduce ACLs on JobTracker Web Consoles. Installed CDH3, CDH4 and Cloudera Manager using Puppet. ETL into HDFS using Spark Streaming, Flume, Sqoop and Oozie. DATABASE ARCHITECT

3-Tier Web-based application architecture, Data modeling/Schema designing for OLTP and DSS applications, Designing and Implementing high-availability and/or disaster recovery database systems using Oracle RAC and/or DataGuard solutions.

DATABASE DEVELOPMENT

Data Modeling/Designing (ERWIN) according to business specs, Database Design/Creation, Upgrades,Database Connectivity (Net8, Oracle Names Server), PL/SQL Stored procedures/packages, Data Migration and/or Import (Export/Import, SQL Loader, PL/SQL procs, Java programs), Configuration Management software release automation tools such as Ant, CVS. Performance Analysis/Tuning using Method R

DATAMART/DATAWAREHOUSE

Logical design using star schema and/or 3NF. Direct path data loading using CTAS and external table. Partitioning to improve data loading/management and query performance. ETL tools using PL/SQL procedures/packages/triggers and/or Java.

DATABASE APPLICATIONS DEVELOPMENT

Oracle Forms 3.0, MS Access, Oracle Applications Server, Weblogic, WebDB, Oracle 9i Applications, PL/SQL Store Procedures, Enterprise Java Beans (EJB), J2EE, JSP PROGRAMMING

C, C#, Java, Scala, Korn Shell Scripting, Perl, Python, PL/SQL, T-SQL, XML, XQuery, HTML,CVS, Javascript, Google Apps Script

WEB-BASED APPS ADMINISTRATION

Siteminder Installation, BEA Weblogic and Webmethods Installation & Configuration, LDAP Installation & Administration

DATABASE ADMINISTRATION

Oracle 10g (Red Hat Linux, Sun Solaris/OpenVMS), Installation/Patch Application, SAN Management (HP XP10000), Data Migration and/or Import (Export/Import, SQL Loader, PL/SQL procs, Java programs), Oracle Parallel Server / RAC 10gR2 (ASM, OCFS2, session failover; Weblogic/OCI), Context Server/Intermedia Configuration (text search engine), Materialized views, High Availability/Disaster Recovery (Oracle RAC 10gR2 with ASM and OCFS2, Hot Standby Database using Data Guard), Disk Space Planning and Management, Backup/Recovery (Korn shell scripts, RMAN), DB instance Monitoring (Korn shell scripts, Oracle Enterprise Manager Grid Control, DBA Studio), Performance Analysis/Tuning using Method R UNIX SYSTEM ADMINISTRATION

Sun Solaris 2.8, Red Hat Linux 9.0, Solaris 2.8 Installation, Veritas Volume Manager Installation and Configuration, Creating/Mounting file systems using Disk Suite and Veritas Volume Manager, Configuring IP address and connectivity, Users Administration, Configuring firewall (Netscreen) and load-balancing (BigIP, Resonate)

PROFESSIONAL EXPERIENCE:

Booz Allen 12/24 – Present

Data Scientist

l Developing data pre-processing process that performs de-duplication and classifies the quality of data using Nemo Curator.

NTT Data Federal 12/21 – 6/24

Data Scientist

l Modeled statistical hypothesis using Bayes Theorem to determine probability of allocation of fundings for a given country.

l Performed regression analyses to provide insights into correlation between award amount and indices levels of needs.

l Developed scripts to automate data flows using Google Apps Script. l Created visualizations using Tableau & SysTrack. l Led team in proposing a new infrastructure architecture and detailing requirements for a new technology stack.

l First NTT member hired and Involved in hiring process of almost all members (except one who’s GDIT employee) of Data Products Group team.

Next Tier Concepts 4/19 – 9/21

Data Scientist / Data Engineer

l Created schema for hyperparameters using Hydra and integrated it with Kubeflow pipeline which consists of preprocessing, training and testing of data on Torchvision models in Docker images registered in AWS ECR.

l Created visualizations of embeddings using t-SNE, UMAP, PCA and GradCam in Jupyter notebooks. l Researched and implemented adversarial machine learning and CycleGAN image translation using PyTorch Lightning, Torchvision, Jupyter notebooks on cloud (AWS) with pre-trained models (ResNet, Densenet, Alexnet, etc.) and Adversarial Robustness Toolbox in Docker containers orchestrated by Kubeflow (Kubernetes). l Developed neural networks architectures for classification and object detection of geospatial data using Faster-RCNN/Mask-RCNN, PyTorch, Torchvision, Python (scripts and Jupyter notebooks) on cloud (AWS, GCP) platforms, and pre-trained models (ResNet, Densenet, Inception, etc.) in Docker container. TPGS Federal 8/16 – 12/18

Data Scientist/Machine Learning Analyst

l Developed RESTful web services for Solr searches/queries using Spring Boot and SolrJ l Developed Python script to implement Gensim Doc2Vec and embedded it in web services using Flask and/or Django

l Evaluating Gensim Doc2Vec and various Deep Learning algorithms – CNN, RNN, GAN, DCGAN - for text classifier and image duplicate detection - Tensorflow, Keras, Python, Jupyter Notebook. l Topic Modelling - Used scikit-learn’s NMF and LDA to extract 50 topics from a corpus of near 1 million documents. Produced a list of top words per topic and top topics per document. l Supervised learning classifier - Apache Spark MLlib’s Decision Tree, Random Forest, SVM and Naïve- Bayes

l Built Hadoop clusters in AWS using Cloudera and/or Databricks l Built REST API with Spring MVC and/or Jersey and Swagger l ICD-503 accreditation process (SSP, BOEs, POAMs) l Automated data merge and analysis from numerous Excel spreadsheets using Python, xlrd and Windows bat scripts and displayed results using Tableau. CLOUDERA 5/13 – 8/16

Solutions Architect

l Automated installation and configuration of Cloudera Key Management Server using Docker and Swarm Overlay Network.

l Used Kafka and Flume to ingest data into HDFS

l Used Scala to develop a Flume and Spark Streaming application that ingested streaming data, parsed and extracted specific fields, enriched them with reference data and stored the final result in HBase. l Installed, configured Cloudera Director (Launchpad) on Amazon Web Services (AWS). l ICD-503 accreditation process (SSP, BOEs, POAMs) l Machine Learning using Spark MLlib, R and Octave l Installed, configured CDH[3-5] using Cloudera Manager and/or packages. l Automated HBase snapshot process using JRuby script. l Installed and configured Hadoop authentication/authorization with RedHat Identity Manager, MS Active Directory, MIT Kerberos using either Centrify or SSSD. l Installed Cloudera Manager using Puppet.

l Installed CDH3 using Puppet.

l Configured Kerberos authentication using HTTP SPNEGO l Installed Accumulo and ran benchmark tests

l Installed JMXTRANS to extract Accumulo metrics for Ganglia l Imported data from Oracle RAC data warehouse using Sqoop l Ingested streaming data into HDFS using Flume

l Constructed and scheduled Oozie workflows with Java actions that extract, ingest and process data l Installed and gave a presentation / demo on Storm l Installed Accumulo 1.4.3 on CDH4 and ran benchmark, functional and ingest tests l Installed Impala and Search (Solr)

GENERAL DYNAMICS INFORMATION TECHNOLOGY 9/11 – 5/13 Oracle DBA

l Installed, configured and administered Nagios, plugins and NRPE for remote monitoring. l Compiled and installed MySQL 5.5.27 from source and configured Master-Slave Replication over SSL. l Installed, configured and administered MySQL Cluster and Galera Multi-Master Replication. l Configured custom SELinux policy to allow Galera to function properly with SELinux enabled in enforcing mode.

l Improved query performance by implementing materialized views refreshed by triggers and stored procedures.

l Installed Oracle Connection Manager as proxy behind firewall. l Developed infrastructure migration plan to Oracle RAC 11gR2 on RedHat Linux. l Engineered and built Oracle RAC 11gR2 on RedHat 6 with GNS and Linux DM multipalth. l Installed, configured and administered DHCP failover and DNS server for GNS feature of Oracle RAC 11gR2.

l Administered IP address subnetting, DHCP and DNS zone files for new domains. l Created and administered Oracle 11gR2 databases and physical stand-by. l Upgraded Oracle 11.2.0.2 primary and its physical stand-by to 11.2.0.3 l Developed and automated RMAN backup script on Windows using Task Scheduler. l Installed and configured MySQL cascading replication (master-slave/master-slave). l Architected and built MySQL Cluster with multiple management, sql and data nodes. l Trained in Semantic Web technologies (RDF, RDFS, OWL, SPARQL) TECHNISOURCE / AMERICAN SYSTEMS / NORTHROP GRUMMAN 12/10 – 9/11 Developer/DBA

l Installed and administered Oracle 11gr2 RAC in Red Hat Linux on NFS-mounted shared storage. l Implemented and administered RMAN backups for multiple database instances. l Developed and implemented automated database build process using ANT, Subversion and Eclipse. l Developed SQL Server 2008 R2 applications using C#, T-SQL, XML, XQuery and Xpath l Working knowledge of Semantic Web technologies (RDF, OWL, SPARQL, SWRL, Turtle) and Web Services (SOAP, WSDL, UDDI, REST)

APPLIED TECHNOLOGIES CORP 08/09 – 12/10

Oracle DBA

l Installed and administered multiples database instances in virtual Solaris servers (VMware) l Cloned databases to other environments to speed up development cycle using VMware virtualization and snapshot technologies.

l COGNOS front-end applications on Oracle 10gR2 on Solaris l Database builds for various environments and software releases. l Data extracting/manipulating/loading/replicating into DataMart using PL/SQL procedures/packages and/or triggers

RELIABLE SYSTEMS GROUP 4/08 – 3/09

Oracle Development Engineer

l COGNOS front-end applications on Oracle 10gR2 on Solaris SPARC 64-bit l Database migration from Oracle 10.2.0.1 on RedHat Linux to Oracle 10.2.0.4 on Solaris SPARC 64-bit l Created development environment using Oracle 11g on CentOS on VMWare Workstation l Implemented table partitioning to improve data loading/data management and query performance. l Assisted developers with schema creation/management, data management/presentation (views and materialized views)

l Migrated database from Oracle 10.2.0.1 on RedHat Linux to Oracle 10.2.0.4 on Solaris SPARC 64-bit and applied Critical Patch Updates

l Implemented Database Security Technical Implementation (STIG) DEFINITIVE LOGIC 11/06 – 4/08

Oracle Engineer

l Implemented High-Availability solution using Oracle 10gR2 RAC (ASM and OCFS2) on Red Hat Linux and HP XP10000 disk array

l Upgraded from 10gR1 to 10gR2

l Database migration from single instance development environment to a 3-nodes Oracle RAC production environment

l Installed, configured and administered CA BrightStor ARCServe backup system l Installed, configured and administered OEM Grid Control l Received SAN training and took on dual role DBA-SAN Admin FREDDIE MAC 5/02 – 12/02, BCE EMERGIS 12/02 – 5/05, FISERV 5/05 – 10/06 Oracle Developer/Administrator

l Dual role as an Oracle developer and administrator l Upgraded Oracle databases from 9i to 10g.

l Designed and implemented Disaster Recovery solution using Data Guard l Developed, implemented and propagated database changes (from development, test, QA, stage and production) according to Configuration Management requirements using CVS and Ant l Experience with software development full life cycle and Configuration Management tools (Ant, CVS) l Developed PL/SQL stored procedures/packages and executed as batch programs l Partitioned large tables to improve data management/archiving and performance l Automated database baseline build (combination of Ant, CVS, Perl, PL/SQL and UNIX shell scripts) to build non-production environments (development, test, QA and stage) l Performed database tuning (both instance tuning as well as application/query tuning) to improve database performance

l Provided responsive DBA support to maximize test and development process effectiveness l Installed Siteminder, Webmethods, Veritas Volume Manager, LDAP l Administered IBM's Tivoli backup system

l Provided complete database server and database instance build-out including Oracle as well as Solaris installation

l Developed and Implemented scripts that monitored database locks and inform lock holder SPRINT CORP 10/01 – 12/01

Sr. Development DBA

l Assisted application developers in resolving DB issues and debugging PL/SQL programs l Migrated Oracle Parallel Server (know currently as RAC) databases from Oracle 7 to 8i EQUIDITY.COM 12/99 – 7/01

Development / Production DBA

l Designed and implemented Oracle database systems that support 3-tier web-application. l Designed and implemented High-Availability solution using Oracle OPS (known as RAC now) l Designed and implemented Disaster Recovery solution using Oracle DataGuard l Installed Oracle 8.1.7 on Solaris 2.8 and developed databases l Developed back-end stored procedures (PL/SQL and/or Java) as well as front-end applications

(combination of JSPs and EJB J2EE with Weblogic container) l Analyzed and enhanced performance of database instances as well as critical queries l Developed PL/SQL stored procedures to aid text search via Context/Intermedia engine l Designed data warehouse using combination of star schema and 3NF logical designs l Developed a suite of Java programs to streamline data extracting/loading/migration process. l Implemented hot standy database and wrote shell scripts to automate application of archive logs (Oracle 8.0.5)

l Configured Net8 to provide database access / connectivity l Developed data models using ERWIN to assist application developers and/or end users l Wrote shell scripts and cron jobs to automate backups and monitor databases l Installed and configured Perl

l Installed, configured and administered load-balancing systems (BigIP, Resonate) l Administered NetScreen firewall

l Performed basic Solaris System Administration (installation, creating/mounting file systems using Disk Suite and Veritas, configuring IP address and connectivity, users’ administration) CHIRON CORPORTATION 12/98 – 12/99

Production DBA

l Installed Oracle 8.0.5 on Sun Solaris

l Created database instances and performed data load/migration using PL/SQL procedures and/or SQL Loader

l Provided database access / connectivity using Net8, Oracle Names Server l Administered a number of instances from Oracle 7.1.5 to 8.0.5 on VMS and NT platforms l Administered Oracle Applications Server

l Automated backup and recovery using combination of Korn shell scripts and Recovery Manager l Monitored performance using Enterprise Manager

LITTON PRC 7/96 – 12/98

Oracle Developer

l Installed Oracle 7.3.2

l Sized and designed / developed database

l Performed data migration and backups

l Analyzed database statistics on memory usage, I/O and database growth rate and resized database l Tuned database instance and critical queries and monitored their performance l Developed programs using Pro-C and embedded SQL language to schedule court sessions with police officers' attendance

l Developed database applications using SQL Forms and MS Access l Developed test scripts for API products (using C) of a large optical storage and retrieval database systems (Oracle RDBMS)

GLOBAL ONE (SPRINT INTERNATIONAL) 4/95 - 3/96

International Sales & Marketing Manager

l Managed the development and delivery of profitable consumer sales and marketing programs to rapidly evolving marketplace of five developing Asia-Pacific countries l Main responsibilities were meeting revenues objectives and controlling budgeted expenses MCI TELECOMMUNICATIONS CORPORATION 5/93 - 4/95

International Sales Program Manager

l Managed sales programs to fast-paced international telemarketing center that supports ten different languages and delivers $600 K of revenue per month with over 200 sales associates l Directed a team of two Program Managers in working closely with Marketing and other departments in matrix environment

Senior Business Analyst

l Functioned as internal consultant to senior management l Provided strategic analyses and recommendations from a financial perspective l Work involved identifying key business issues, developing business cases, performing financial analyses and providing recommendations on proper courses of actions DELOITTE & TOUCHE MANAGEMENT CONSULTING 12/90 - 5/93 Manager

l Contracted by Fortune 500 clients to assist in the formulation of business strategies (sales & marketing, market entry, product development, market share increase, etc.), improvement of operations (productivity, quality management, cost reduction, cycle time compression) and project management and implementation BELL ATLANTIC CORPORATION 6/89 - 8/89

Systems Requirements Analysis

l Developed user requirements for a marketing information system used by a large Marketing Department with $2 billion in sales.

l Designed/developed standardized survey and conducted face-to-face interviews to capture users’ requirements and needs

AT&T TECHNOLOGIES, INC. 7/82 - 8/86

Planning Engineer

l Improved operations efficiency and cut cost through carefully planned applications of new technologies and quantitative analyses

l Developed floor layout and operating instructions. Worked with production and process engineering to improve yield and meet customer requirements



Contact this candidate