Post Job Free

Resume

Sign in

Data Scientist, Machine Learning Engineer, Artificial Intelligence Eng

Location:
Haymarket, VA
Posted:
March 08, 2024

Contact this candidate

Resume:

John Lanzerotta

Haymarket, VA

Objective

to utilize 27 years of IT job experience as a: Data Scientist, Machine Learning Engineer, Artificial Intelligence Engineer, LLM Engineer

Skills

Software:

Amazon Web Services (AWS), Microsoft Azure, and Oracle Cloud Infrastructure (OCI): assorted services

Data Science, Artificial Intelligence / Machine Learning (AI/ML), Deep Learning and Data Engineering tools:

- Python:

-- Data Science Libraries: numerical operations (e.g., NumPy), data manipulation and analysis (e.g., Pandas), data visualization (e.g., Matplotlib and Seaborn)

-- Machine Learning Libraries: supervised and unsupervised learning algorithms (e.g., Scikit-learn for algorithms such as linear regression, logistic regression, decision trees, random forests, k-nearest neighbors (K-NN), K-means clustering). Dimensionality reduction (e.g., Principal Component Analysis [PCA], t-distributed Stochastic Neighbor Embedding [t-SNE])

- Neural Networks:

-- Training and Optimization: activation functions (e.g., Sigmoid, Hyperbolic Tangent [tanh], Rectified Linear Unit [ReLU]), backpropagation, loss functions (e.g., Mean Squared Error [MSE], Cross-Entropy), optimization algorithms (e.g., Gradient Descent, Stochastic Gradient Descent, Root Mean Squared Propagation [RMSprop], Adam)

-- Overfitting: regularization (e.g., dropout, L1/L2 regularization, early stopping, data augmentation)

-- PyTorch: e.g., build Multilayer Perceptron (MLP)

- Natural Language Processing (NLP):

-- Text Preprocessing: tokenization, stemming, lemmatization, stop word removal

-- Feature Extraction: Bag-of-words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), n-grams

-- Word Embeddings: Word2Vec, Global Vectors (GloVe), FastText

-- Recurrent Neural Networks (RNNs): Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU)

- Large Language Model (LLM) architecture:

-- High-level architecture: encoder-decoder Transformer, decoder-only Generative Pre-trained Transformer (GPT)

-- Attention mechanisms: self-attention, scaled dot-product attention

-- Text generation: greedy decoding, beam search, top-k sampling, nucleus sampling

- Building instruction datasets:

-- Traditional/real data: filtering via regex, removing near-duplicates

-- Synthetic data: generation via LLMs (e.g., Orca and phi-1 papers), improving via Evol-Instruct

-- Prompt templates/chat templates: Chat Markup Language (ChatML), Alpaca

- Pre-training LLMs:

-- Data pipeline: dataset filtering, tokenization, collation with pre-defined vocabulary

-- Libraries/frameworks: Megatron, GPT-NeoX

- Supervised Fine-Tuning (SFT):

-- Parameter-Efficient Fine-Tuning (PEFT): Low-Rank Adaptation (LoRA), Quantized Low-Rank Adaptation (QLoRA), Unsloth

-- Tools: Axolotl (with multi-GPU and multi-node via DeepSpeed)

- Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF):

-- Algorithms: Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO)

- Evaluation:

-- Benchmarks: General benchmarks (e.g., Language Model Evaluation Harness), Task-specific benchmarks, Human evaluation

- Quantization:

-- Techniques: Base/naive (e.g., absolute maximum [absmax], zero-point), GPT-Generated Unified Format (GGUF) and llama.cpp, GPT Quantization (GPTQ) and ExLlamaV2 EXL2, Activation-aware Weight Quantization (AWQ)

- Model merging:

-- libraries: mergekit (e.g., Spherical Linear Interpolation [SLERP], Drop And REscale [DARE], Trim Elect Sign & Merge [TIES-Merging], Passthrough)

- Prompting and Outputs:

-- Prompt engineering: zero-shot, few-shot, Chain of Thought (CoT), Reasoning and Acting (ReAct)

-- Structured output libraries: Language Model Query Language (LMQL), Outlines, Guidance

- Retrieval Augmented Generation (RAG):

-- Embedding models: SentenceTransformers

-- Vector databases (store embedding vectors and optionally Memory): Chroma, Pinecone, Milvus

-- Orchestrators (frameworks to connect to tools, databases): LangChain, LlamaIndex, FastRAG

-- Retrievers (rephrase/expand user instructions): multi-query retriever, Hypothetical Document Embeddings (HyDE)

-- Evaluation: RAG Assessment (Ragas), DeepEval

- Inference optimization:

-- Techniques: Flash Attention, Key-value cache (e.g., Multi-Query Attention [MQA], Grouped-Query Attention [GQA]), Speculative decoding

- Deployment:

-- Server deployment: SkyPilot, Text Generation Inference (TGI), vLLM (with PagedAttention)

-- Edge deployment: Machine Learning Compilation for Large Language Models (MLC LLM), mnn-llm

- Vulnerability scanning and Observability

-- Vulnerability scanner: garak

-- Observability & Analytics: Langfuse

Infrastructure as Code (IAC), Automation, Provisioning and Configuration Management: AWS CloudFormation, Azure Resource Manager (ARM), HashiCorp Terraform, Ansible, Python Fabric, Remote Command Execution over SSH via Linux Shell Scripts/Bash Scripts, Windows PowerShell Remoting scripts with Windows Remote Management (WinRM), Vagrant

Continuous Integration/Continuous Deployment (CI/CD), Pipeline as Code, Policy as Code: Jenkins, Ansible

Build Automation: Maven

Version Control System (VCS)/Source Control Management (SCM): Git, GitHub

Workflow Automation, Extract Transform Load (ETL), Directed Acyclic Graphs (DAGs), Data Engineering Pipelines: Apache Airflow, Apache Hop, Apache Hadoop, Apache Spark, Pentaho Data Integration (Kettle, Spoon)

Software Repository Manager (e.g., build binaries, artifacts, packages, dependencies, images): Sonatype Nexus Repository Manager

Continuous Inspection of Code Quality/Code Security/Code Analysis/Code Testing: SonarCloud/SonarQube SonarScanner and Checkstyle, Maven Checkstyle

Container Runtime and Orchestration: Docker, Kubernetes, Kubernetes Operations (kOps), Prometheus, Lens, Helm, KubeApps

Technical Experience

2023-present Arcfield Chantilly, VA

Data Scientist Senior Technical Specialist

Wrote multiple Python data transformation, enrichment and AI/ML pipelines; e.g.,

- download structured data from data source (e.g., object storage bucket, sftp server, API), use precompiled regular expressions with named groups to parse vendor file keys, assorted parsing and transformations (e.g., Avro to JSON, Elasticsearch index dump JSON to JSONL, multi-file find and replace based on multi-row mapping in datastore, multi-file recursive grep/search based on multiple search strings in datastore, recursive extraction of n-level archives of all common archive formats, re-zip, split, combine, etc.), track status at each processing stage for each file in a data store for flow control and debugging, upload to object storage bucket, utilize custom upload limits and cleanup behavior, with multiprocessing pools for parallel tasks in separate processes

- dataset profiler to download a dataset from object storage, recursive extraction of n-level archives of all common archive formats, generate file manifests and metadata (including file types, sizes and MIME types) in JSON and README files

- email parsers to parse .pst, .eml and Dovecot Maildir email files, extract key email fields to JSON, and extract attachments

- unstructured/semi-structured data (e.g., PDFs, CSVs, JSONS, MS Office files in object storage data lake) enrichment tool leveraging Natural Language Processing (NLP) to create JSONL metadata files indexed for searchability and discoverability. Main functionality: content detection and extraction (e.g., Apache Tika) -> if failure, then export page to image -> Optical Character Recognition (OCR) (language-specific models; e.g., EasyOCR) -> Natural Language Processing (NLP) Named Entity Recognition (NER) (language-specific models; e.g., spaCy) for content extraction of selectors, entities/fields/categories/realms and metadata -> Regular Expressions (RegEx) -> JSONL format -> consumed/indexed by indexing/observability/analytics tool (e.g., OpenSearch ingestion pipeline via bulk POST or index template that specifies selectors and their field types [e.g., keyword, text] and interleaves OpenSearch information into the JSONL), linking to existing data location -> API's (e.g., via OpenSearch API's) accessible by either web front end, other apps or federated search service (as registered provider). End result of pipeline makes minimally-curated/categorized/triaged/evaluated/text-preprocessed, unstructured/semi-structured data searchable and discoverable.

- multimedia (image/video) enrichment tool leveraging Machine Learning (ML) to create JSON metadata files containing description, recognized objects and text, and audio transcript (if source file is video). Main functionality: listen for new multimedia files added to select S3 buckets; keyframe extraction (e.g., OpenCV cv2 and katna); apply OCR (e.g., EasyOCR), Object Detection (e.g., YOLOv5), and Image Captioning (e.g., vit-gpt2-image-captioning) to keyframes; Speech-to-Text (e.g., ffmpeg and whisper); package outputs into JSON and upload to 'enhancements' bucket; deployed in AWS to operate and scale automatically based on demand increase/decrease. End result of pipeline is metadata that can be filtered and searched (whereas multimedia files alone cannot).

2022-2023 Candlelight Technologies Herndon, VA

Cyber Security Engineer (SME)

Cyber Security engineering and administration using Ansible, AWS, Linux, Nagios (monitoring), Rapid7 Nexpose (vulnerability scanning) and Splunk (log and data ingestion).

2022 TENICA Global Solutions Chantilly, VA

Senior Software Engineer

Subject Matter Expert (SME) in architecting and designing AWS Data Analytics solutions using Relational Database Service (RDS), Database Migration Service (DMS), Kinesis, Simple Queue Service (SQS), Managed Streaming for Apache Kafka (MSK), Glue, Elastic MapReduce (EMR), Apache Airflow, Data Pipeline, Step Functions, OpenSearch/Elastic Stack, Athena, Redshift, Trino, Hadoop and Immuta Data Access Platform solutions.

2020-2022 Microsoft Reston, VA

Service Reliability Engineer/DevOps Engineer

Azure Support Platform team member. DevOps responsibilities: participated in team scrums and code/design reviews, made minor code changes, submitted and reviewed Pull Requests, created and ran Build Pipelines and Release Pipelines, initiated and monitored deployments, monitored and troubleshooted issues in air-gapped clouds for multiple Microsoft Azure services as their Directly Responsible Individual.

2019-2020 CACI Arlington, VA

ServiceNow Engineer

Subject Matter Expert (SME) in architecting, designing, implementing, customizing and integrating ServiceNow Discovery, ServiceNow CMDB, ServiceNow IT Service Management (ServiceNow ITSM), ServiceNow Application Portfolio Management, ServiceNow Financial Management, ServiceNow Event Management, ServiceNow Operational Intelligence/Service Analytics, ServiceNow Human Resources (ServiceNow HR), ServiceNow Customer Service Management (ServiceNow CSM), ServiceNow Agile Development, and ServiceNow Test Management solutions.

2019 SAIC Reston, VA

ServiceNow Engineer

Subject Matter Expert (SME) in architecting, designing, implementing, customizing and integrating ServiceNow Discovery, ServiceNow CMDB, ServiceNow IT Service Management (ServiceNow ITSM), ServiceNow Application Portfolio Management, ServiceNow Financial Management, ServiceNow Event Management, ServiceNow Operational Intelligence/Service Analytics, ServiceNow Human Resources (ServiceNow HR), ServiceNow Customer Service Management (ServiceNow CSM), ServiceNow Agile Development, and ServiceNow Test Management solutions.

2007-2019 Lanzerotta Consulting Arlington, VA

Senior Software Architect

Subject Matter Expert (SME) in architecting, designing, implementing, customizing and integrating BMC Remedy (ITSM and custom), BMC MyIT/Smart IT, BMC Discovery/Atrium Discovery and Dependency Mapping (ADDM), BMC Atrium Orchestrator (BAO), BMC TrueSight Operations Manager (TSOM) [including BMC ProactiveNet Performance Management (BPPM), BMC Patrol, and BMC Event and Impact Management], BMC TrueSight Capacity Optimization (TSCO), BMC TrueSight Cloud Cost Control, BMC Network Automation (BNA), Micro Focus/HP OpenView Network Node Manager (NNM), Splunk Enterprise and ServiceNow solutions.

2007 Signature Consultants Herndon, VA

Senior Software Architect

Subject Matter Expert (SME) in architecting, designing, implementing, customizing and integrating BMC Remedy (ITSM and custom), MS SQL and .asp solutions.

2004-2006 IBM Fairfax, VA

Senior Consultant

Subject Matter Expert (SME) in architecting, designing, implementing, customizing and integrating BMC Remedy (ITSM and custom) and Peregrine solutions.

2001-2004 Northrop Grumman IT Greenbelt, MD

Senior Enterprise Software Consultant

Subject Matter Expert (SME) in architecting, designing, implementing, customizing and integrating BMC Remedy (ITSM and custom), Crystal Reports/Enterprise and HP OpenView solutions. The following are examples of custom consulting work performed:

- Implemented multiple core help desk software solutions in various Department of Defense and House of Representatives agencies supporting small and mid-sized organizations (15-110+ agents).

- Designed and implemented a heavily-customized problem and asset tracking solution based on the ARS HelpDesk suite for an enterprise wide government civilian agency support.

- Designed and implemented custom solution for operational and financial management support of multiple departments within a fortune 500 company.

- Implemented integrations between Remedy and various enterprise management, reporting, and VOIP products. E.g., HP OpenView NNM, NetIQ, SMS, Cisco CAD (VOIP), Crystal Reports, Crystal Enterprise, RightNow, Remedy Flashboards, Peregrine AssetCenter, custom financial applications, custom database applications, etc.

2000-2001 Evergreen Systems, Inc. Reston, VA

Senior Systems Consultant

Designed and implemented Remedy (ITSM and custom) solutions for customers across the country in the following industries: manufacturing, software/web, government, military, financials, scientific/research, food services, real estate.

1998-2000 EDS - Electronic Data Systems Herndon, VA

Communications Engineer/Information Associate

Network Management implementations: Conducted network hardware & software installations, configurations, troubleshooting, and support for USAF bases world-wide as part of CITS NMS/BIP project.

System design - Trouble-Ticketing System (TTS): Participated in design, implementation, and training of CITS trouble-ticketing system used worldwide by USAF. Designed on Remedy with Distributed Server Option (DSO) for ticket transfer across WAN.

1994-1996 Computer Classroom Operations Amherst, MA

Computer Lab Consultant

Troubleshooted computer, printer, and network problems in multiple computer classroom LAN’s.

Education/Classroom Training

Sc. B. cum laude with 3.6 GPA – Bachelor of Science in Environmental Science, cum laude, with Commonwealth Honors from the University of Massachusetts at Amherst. Minor in Psychology. Topics studied include computer-based ecosystem modeling.

Classroom training – hundreds of hours in: DB, programming, OS, hardware, etc. List available on request.

Certifications

ServiceNow Certified Implementation Specialist - Discovery (CIS - Discovery)

ServiceNow Certified Implementation Specialist - Event Management (CIS - Event Management)

ServiceNow Certified Implementation Specialist - Financial Management (CIS - Financial Management)

ServiceNow Certified Implementation Specialist - Application Portfolio Management (CIS - Application Portfolio Management)

ServiceNow Certified Application Developer

ServiceNow Certified System Administrator

ServiceNow Micro-Certification – Predictive Intelligence/Agent Intelligence

ServiceNow Micro-Certification – Virtual Agent

ServiceNow Micro-Certification – Performance Analytics

ServiceNow Micro-Certification – Enterprise Onboarding and Transitions

ServiceNow Micro-Certification – HR Integrations

ServiceNow Micro-Certification – Flow Designer

ServiceNow Micro-Certification – IntegrationHub

ServiceNow Micro-Certification – Asset Models Management

ServiceNow Micro-Certification – ServiceNow Platform Subscription Model

ServiceNow Micro-Certification – Application Portfolio Management

ServiceNow Micro-Certification – CSM with Service Management for Implementers

ServiceNow Micro-Certification – Automated Test Framework

ServiceNow Micro-Certification – Agile and Test Management Implementation

BMC Accredited Administrator: BMC TrueSight Operations Management 10.x

BMC Accredited Administrator: BMC Discovery 11.x

BMC Accredited Administrator: BMC Atrium Orchestrator 7.x

BMC Accredited Administrator: BMC Atrium CMDB 9.1

BMC Accredited Administrator: BMC MyIT 3.x and Smart IT 1.x

BMC Accredited Administrator: BMC Remedy IT Service Management 9.0

BMC Accredited Administrator: BMC Remedy AR System – 9.0, 8.0 (re)certifications

BMC Certified Administrator: BMC Remedy AR System 7.6.04

BMC Remedy Approved Consultant (RAC) - 7.x, 6.x, 5.x, and 4.x (re)certifications

CompTIA Advanced Security Practitioner+, with Continuing Education (CASP+ CE)

CompTIA Security+, with Continuing Education (Security+ CE)

CompTIA Linux+

Linux Professional Institute Certified Linux Administrator (LPIC-1)

Cisco Certified Network Associate Security (CCNA Security)

Cisco Certified Entry Networking Technician (CCENT)

EC-Council Certified Incident Handler (ECIH)

HP Certified Professional – Accredited Integration Specialist (AIS) – OpenView Network Services

Splunk Core Certified Power User

Splunk Core Certified User

Microsoft Certified IT Professional (MCITP): Enterprise Desktop Administrator on Windows 7

Microsoft Certified Technology Specialist (MCTS): Windows 7, Configuration

Microsoft Certified Solutions Associate (MCSA): Windows 7

Microsoft Specialist (MS): Windows 7, Enterprise Desktop Administrator

Microsoft Specialist (MS): Windows 7, Configuring

Microsoft Certified Professional (MCP): Windows NT 4.0

EXIN Foundation Certificate in IT Service Management (ITIL Foundation)

Clearance

Active Top Secret/Sensitive Compartmented Information with Polygraph Clearance (TS/SCI with Poly Clearance)

Languages

English, Italian, Sicilian, Spanish



Contact this candidate