Paul Duncanson
Email: ad3xd0@r.postjobfree.com
SUMMARY OF QUALIFICATIONS
Lead Data/Software/ML Engineer/Architect with extensive knowledge of Application/ Framework/ ETL Development, Machine Learning, GPT, NLP, API's, Object Oriented, Service Oriented and Functional methodologies, Algorithms and Design Patterns. TECHNICAL SKILLS
Languages: Scala, Python, C++/C, Go, TypeScript/Javascript, Node.JS/ReactJS/ Angular, Objective-C/Swift, Java
Operating Systems: Linux, Mac OS, iOS, Windows
Streaming Platforms/Databases: Kafka/Kinesis, S3, Redshift, DynamoDB, Cassandra, Flink, Microsoft SQL Server, Oracle, MySQL MongoDB/Mongoose, PostgreSQL, Pinecone Packages/Frameworks: AKKA, Scikit-Learn, TensorFlow, PyTorch, OpenCV, Pandas, NLTK/ spaCy, NumPy, Apache Spark/Air flow, Spring Boot, SNS/RabbitMQ, Docker, ZIO/Cats, Tesseract Software: Jupyter/Colab, Sagemaker, MatLab, IntelliJ/Eclipse, Xcode, Visual Studio Code Hosting Providers: AWS, Azure, GCP
•
•
•
•
•
•
•
•
•
•
Python – 8 years
Javascript/TypeScript/ReactJS/Angular – 8 years
C/C++ - 10 years
Spark/Air flow – 8 years
AWS/GCP/Azure – 8 years
APIs / Microservices – 9 years
Restful Web Services – 10 years
Java/Scala – 8 years
CoreNLP/SpaCy/Tesseract, Matlab, Machine Learning/TensorFlow/SciKit/PyTorch – 8 years UML – 10 years
PROESSIONAL EXPERIENCE
Pyramid Consulting
Role: Discover - Lead Data Engineer/OCR
Architect Date: 07/2022 - current
•
•
•
•
Conducted meetings with Discover to understand their specific needs, leading to the development of a custom Document Query Application and an OCR Check Reader Document Query Application utilizes Python/Scala/ZIO/Spark/Pytorch/Kafka/Flink/ LangChain/Sagemaker/Glue and Pinecone, S3 and Snowflake Includes a work flow diagramming tool and execution engine written in Python that performs text aggregation from consumed Kafka topics with an intuitive design OCR Check Reader employed Gaussian Blur image processing techniques utilizing OpenCV to capture and clean MICR data to achieve 92% accuracy Role: Microsoft Research – Lead Data Engineer/
Architect Date: 01/2022 – 07/2022
•
•
•
•
Played a key role in understanding and translating client requirements into the development of a Machine Learning Framework that dramatically increases productivity with an intuitive design that supports the entire ML Pipeline Development life cycle Technologies/languages supported also included Python, PyTorch, Scala Spark Batch/ Streaming, F-Score Measurement, Tableau, Azure Blob Storage and Data Lake Store Refinement was iteratively achieved through frequent interactions with the end consumers that influenced how the framework can target local development initially that can seamlessly transition into a hosted deployment for shared collaborative development and maintenance by making use of containerization using Anaconda, Docker and Kubernetes An NLP trained model was also included through on-going feedback from consumers as a proof of concept that provides text classification and summarization TekSystems Consulting
Role: Core Logic - Lead Data Engineer/
Architect Date: 08/2021 – 01/2022
•
•
Led the effort in refining and implementing customer requirements into the development of a GCP Vertex AI architecture that includes a migration plan and a GCP pipeline POC Trained Core Logic employees on how to utilize the GCP Vertex AI architecture by conducting recorded video sessions to support the development effort of 6 core forecasting applications
Role: Conifer Health – Sr. Software Engineer/Machine Learning/ NLP Date: 01/2019 – 08/2021
•
•
•
•
•
Spearheaded the analysis and refinement of diagnostic data based on customer feedback Refined diagnostic data into bag of words/n-grams that trained models for classification Made use of Sparks unified NLP pipeline where SpaCy/Nltk and Stanford’s NLP libraries were utilized to capture and classify text Refined ingestion platform to distinguish a given patient across multiple providers Utilized TensorFlow/Keras/PyTorch training models to classify treatments with specific patient diagnosis
Role: Synchrony – Sr. Software Engineer/Machine
Learning Date: 01/2018 – 12/2018
•
•
•
•
Developed and trained Classification/Regression models utilizing multiple credit card features to distinguish fraud behavior
Built ETL pipeline that automates ingestion with optimized performance across Spark distributed platform
Analytic pattern recognition characteristics can be applied through a parameterized pivoting technique
Technologies utilized: UML/Scala/Spark/Streaming/Parquet/Hadoop/MLib/GraphX/ Global Logic
Role: Shell Corporation – Sr. Software/Data Engineer/Lead Architect Date: 01/2017 - 12/2017
•
•
•
•
•
•
•
Actively reported project progress and sought approval from Shell’s Board of Directors, emphasizing customer-centric development approaches Led the gathering of specifications and business goals from clients, ensuring the architectural solution aligns with their needs Designed and developed the AWS IoT analytics engine and data collection framework that captures GPS data from a customized Linux-based hardware device installed on each chassis going in and out of the Port of Long Beach initially and then all ports throughout the U.S.
Utilized Athena for query across multiple data sources for real time billing and trend analysis
Gathered specifications, business goals and feedback of first iteration effort from developed hardware to and from the overall architectural solution Developed AWS Firehose, AWS Lambda, C code application on IoT device, Javascript/ Angular developed to support tenant login dashboard with Docker Developed Spark Streaming platform to generate routing of trailers for highest logistical availability with lowest mileage expense Ephesoft Corporation
Role: Principle Engineer
Date: 01/2015 – 01/2017
•
•
•
•
•
•
•
•
Hands-on technical lead involved with training a forming team while developing features utilizing Maven, IntelliJ, ReactJS/Node.JS/Scala, Kafka, Flink, Akka, Docker, AngularJS and the Selenium Automation Testing Framework
Developed document router tool that makes use of Tesseract to perform Optical Character Recognition (OCR) on batches of document images that are routed through a visually designed data-driven work flow utilizing Airflow DAG components Applied TF-IDF vectorization method and K-Means algorithm to determine document grouping
Applied Naive Bayes algorithm to classify documents in relation with a given batch and in relation to all collected documents
Work flow of captured email attachments and files from a designated subdirectory are fed into Flink/Kafka Topics to optimize throughput with parallelized consumers Built a set of Micro Services hosted on a. Cluster of Azure virtual machines to provide OCR document processing through Tesseract and Nuance Employed a B-Cubed algorithm that scores CoreNLP ’s Coreference output to optimize noun phrase correlations
Developed Machine Learning algorithms utilizing TensorFlow/PyTorch that categorize documents with a multi-classifier that distinguishes document types (i.e. invoices, receipts, offer letter, credit report, etc.)
Invigorate Software
R o l e : Principle/Lead Eng
Date: 01/2014 - 01/2015
• Developed a cloud-based image and story sharing platform, PicSavour, utilizing AWS/ Apache Spark Streaming/Flink/Kafka/Cassandra
•
•
•
•
•
•
Developed load balancing driver on EC2 to provide dynamically scaled session support Designed and Developed Scala Machine Learning algorithms on Apache Spark platform to capture relationships between shared images Utilized Amazon Recognition library and Open CV for facial recognition. Custom code developed utilizing TensorFlow convolutions on image data to automatically tag detected faces with sufficiently similar facial features Microservices written in Javascript that distributes work across a set of AWS GPU’s to provide real-time tagging feedback on the mobile device as photos are taken within the mobile app.
Utilized Kafka cluster that captures image meta data records into log entries within partitions and topics associated with login credentials and geographical location Designed a denormalized set of tables on DynamoDB for fast geographical, logical and/or chronological query
Emotiva Corporation/Momentum Data
Systems R o l e : Lead Engineer
Date: 01/2013 - 01/2014
•
•
•
•
•
•
Developed two high-end home theater sound systems, Emotiva XMC-1, Fusion Node.JS/ReactJS/C++/Objective C/OOP/Python code developed on corresponding platforms Linux/OS X/Window’s
AWS IoT utilized to transfer room correction coefficient data between Dirac and Emotiva devices across MQTT
Captured video/audio selections with Apache Kafka cluster for query with Cassandra Sink providing video selection trends and best alternative choices Developed a cloud-based recommender engine that populates ad banner space on mobile device
Led architecture redesign and development of device driver services to provide a reusable set of object-oriented components utilizing latest C++ standards (I.e. C++17) to reduce spin-up time for new board design from one year to 3 months Intel Corporation
Role: Lead Developer
Date: 01/2012 - 01/2013
•
•
•
•
•
Utilized TermIO/FastBoot/Objective-C to provide the firmware download application for all Intel IoT devices
Developed Intel’s Smart Watch iPhone-based Software Development Kit for Intel in Swift Developed Bluetooth LE communications stack between iPhone and SmartWatch Developed a cloud-based recommender engine that populates ad banner space on mobile device
Utilized Jenkins/Jira/Git for continuous integration, task management and version EDUCATION
B. S., Information Systems Management, University of San Francisco, 3.8 Honor Roll Continuing Education: Stanford University (Machine Learning, NLP, Linear Algebra) - 2022 Currently enrolled in M.S., Data Science, University of California, Berkeley - 2025