Real-Time Big Data

Location:

Seattle, WA

Posted:

August 01, 2025

Contact this candidate

Resume:

John Dangov

*******@*****.***

425-***-****

My 25+ years of expertise lies in identifying and proactively removing redundant and unnecessary complexities and processes. My output includes creating reusable, platform-independent processes that are encapsulated and generalized, which can be used across multiple projects.

This approach leads to significant savings in time and resources; eliminating the need to reinvent the wheel for each new project. It also enhances consistency and quality across different projects and teams, while platform-independent processes provide added flexibility and scalability to solutions. My strengths are instrumental in optimizing organizational operations and efficiently achieving objectives.

Strengths: ML/BI/SQL/Big Data Expert, OLTP/OLAP/Data Vault, Analytics/Data Mining, Distributed/Partitioned Storage, Parallel/Real-Time Processing, Project/Dev Manager/Team Lead

Technologies: Data: S3/Redshift/Big Query/Redshift/SQL Server/Teradata/Netezza. Cloud: AWS/Google /Azure.

Methodology: Agile/Iterative/Waterfall

BI: Tableau/Power BI/Pentaho/Quick Sight

Scripting: Transact SQL/Python/C Shells

Current Technical Leadership Roles

Education-Startup: CTO/Lead ML Engineer/Architect

oArchitect and implement an education platform leveraging an open-source, license-free technical stack

oDesign of a self-contained, offline-capable learning system providing real-time, personalized instruction.

oIntegration of AI models for multi-modal, multimedia, adaptive, and individualized learning, enabling automated processes.

oEnsures data security with point-to-point encryption and advanced caching mechanisms.

oHuman-designed architectures and designs are our foundation, with AI automating the implementation under human oversight to guarantee the desired aesthetic and character.

AI-Driven Real-time Investment:

oOpen-source developed, AI-governed, self-directed, self-evolving investment platform.

oBuilt a real-time predictive analytics engine with seamless trading platform integration.

oPrioritizes data security using point-to-point encryption. Incorporates advanced caching infrastructure for high-performance data processing

oAutomates user interface (UI) generation and streamlines release management processes.

oHuman-designed architectures and designs are our foundation, with AI automating the implementation under human oversight to guarantee the desired aesthetic and character.

Platform Independent ETL/Pipeline Integration:

oOpen-source engineered, platform-independent ETL solution streamlining data workflows.

oAutomates complex processes with no-code customization, reducing costs and enhancing operational efficiency.

oMaintains robust data security through point-to-point encryption.

oHuman-designed architectures and designs are our foundation, with AI automating the implementation under human oversight to guarantee the desired aesthetic and character.

Experience: In House & Contract-to-Hire

Amazon (Contracts: Lead ML/Big Data Engineer/Solutions Architect) 2023 to 2025

oFinishing contract a month early, 50% below budget. Being considered for additional projects. Sr. Data Engineer on the internal tools team. Streamlining pipeline integration, semi-automating data source onboarding from 1 every 10 weeks (prior project) to 250 per month.

oPrior: Lead on MODE (Media Operations, Data, and Execution) ML OPS team. Worked closely with engineers and scientists contributing to the ongoing development and evolution of the underlying machine learning systems and data infrastructure.

oFollowing AWS best practices, resolved architecture, design, and implementation issues introduced in POCs. The new data warehousing platform uses a streaming data-driven business-contract based architecture, reducing daily processing from 24-hour latency, 1.5-hour CPU usage to seconds. Elevated Redshift and S3 security measures to group/role level from individual level. Costs reduced from $20K per table per year to pennies.

Scripting: AWS Cloud. Python, SQL, Glue, Java, Cradle, Redshift, Linux Scripting. Agile/Kanban.

Pan American Life (In House: TPM/Lead Big Data Architect/Engineer) 2021 to 2023

oTook over management of a 2 year, $10 million, 20-person vendor contract which was 2 years behind schedule with 25% remaining budget. A near-real time system which was not.

oFollowed AWS best practices, identified and resolved partition alignment, refactored application and enterprise architecture. Identified short, mid, and long-term strategies. Response time improved from 100K records per hour to millions per second for 1000, 10-second pipelines.

oIntroduced meta data driven pipeline and regression test architectures. Reusable, near real-time pipeline logic is independent of table-driven business contracts. Tables are implemented in 15 minutes, down from 2 weeks. Introduced table driven data cleansing and row level security.

oKafka provided messages are transformed into efficient S3 storage and saved into Redshift 3rd Normal Form ODS and Star Schema persistence layers. Current and Historical data is separated to simplify business logic. Integrated ER Studio, Collibra, Redshift, Tableau, GIT, and S3.

oExecution latency and cost reduced by factor of 100+.

Scripting: Scala, Python, SQL, Java, Redshift, Linux Scripting, AWS Cloud Platform. Agile/Kanban.

Amazon (Contracts: Sr. Big Data Architect) 2020

oManaged legacy implementations for Amazon’s US Tax System transactions while team focuses on retrofitting with the latest AWS technologies.

oAlthough meta-data driven, extremely brittle architecture. Column renames involved 20 layers of code changes with no ability to unit test as the layer of impact was in the final 10 layers.

Scripting: Redshift, Teradata, Hadoop, Map Reduce, Spark, Hive, AWS, Python, Tableau, Quick Sight.

Donuts Inc. (In House: Sr. ML Data Warehouse Solutions Architect/Engineer) 2019

oManaged migration of ML systems from recently merged companies to hybrid cloud platform, including on-prem, Google Cloud, and AWS. Global services now execute on AWS Lambda with BI data stored in Google Big Query. Inefficient processing resides on prem

oOriginal processing involved rebuilding entire data warehouse each day. Unscalable daily processing takes six days to complete. Refactored system and reduced time to 15 minutes.

oTrained staff, partitioned data storage, introduced parallel processing, secured data at rest and in transit. Proactively identified and complied with platform specific thresholds, eliminating redundancies and complexities.

oAdjusted small company technical direction to keep on-prem inefficient processing as it would involve a complete system rebuild. They did not have the skill sets in house and the costs significantly outweighed the benefits.

oExecution latency and cost reduced by several factors.

Scripting: Python, Lambda, SQL, JSON. Storage: Big Query, S3, Parquet, CSV, SQL Server, Postgres.

Amazon & Nintendo (Contracts: Principal ML Data Solutions Architect/Lead Engineer) 2018 - 2019

oPart of a short-term contract, business development team. Responsible for taking over existing projects, introducing scalability and reducing maintenance costs, and handing off to new internal teams in no more than three months.

oRefactored Employee Health & Safety System, converting to scalable AWS Big Data technologies from SQL Server/Tableau. Dozens of cloned reports were not maintainable and were not scalable. No test strategy existed. New system consists of reusable modules. Partitions, sort keys, and indexes enable distributed, parallel processing. Re-engineered new system to scale for a 100K user base, reducing the processing window to 8 hours from 24. Handed off system to new 40-person team in India.

oIntroduced a table-driven, rule-based machine learning layer able to handle small training sets. Existing Machine Learning models require hundreds of thousands of test samples. The rule-based system was able to handle a handful of samples. Handed off system to new team.

oAssisted with introduction of an AWS data pipeline for a global team switching to AWS. Introduced parallel and compression strategies. External client was new to AWS. Internal cloud skill sets were minimal.

oExecution latency and cost reduced by factor of 3.

Scripting: Python, DMS, SQL, Spark, Glue, JSON. Storage: Redshift, SQL Server, S3, Parquet, MySQL.

Various In-House/Contracts: Principal/Lead/Manager ML/Data Architect/Engineer 1990 - 2017

Education

York University - M.Sc. – AI/OOA/OOD (50% Complete). A+ Average.

York University - B.Sc. Specialized Honors Computer Science. Top of class

Canada Scholarship: NSERC -- Top 100 Canadian graduate students for Ph.D.

York U. Scholarship -- Nancy Waisbord Award – Graduate consistently demonstrated excellence

Prior Engagements: Significant Scalability Milestones

*******@*****.***: 425-***-****

https://www.linkedin.com/in/johndangov/

Experienced Solutions Architect and Lead Big Data Machine Learning Engineer specializing in the delivery of production-ready frameworks, with a focus on best practices, identifying and correcting core deficiencies, and implementing improvements of orders of magnitude. Expertise in optimizing cloud operations to mitigate significant negative impacts on operating and maintenance costs. Key achievements related to scalability include:

Amazon: Finishing contract a month early, 50% below budget. Being considered for additional projects. Transformed file processing pipelines into streaming systems, significantly reducing the memory footprint and lowering latency from 24 hours to seconds, execution time from 1.5 hours to seconds, and costs from $20K to pennies.

Pan American: TPM: Implemented a metadata-driven API that reduced delivery schedules from 2 weeks per table to 15 minutes. Restructured partitions to optimize thousands of concurrent pipelines running every 10 seconds.

Donuts: Migrated a client-server architecture to a multi-cloud environment, eliminating the need for replication, reducing multi-day processing to 15 minutes.

Amazon: Reduced the daily processing time from 24 hours to 8 hours by converting repeatable processes into optimized, reusable layers. Introduced a table-driven ML engine capable of working with small sample spaces.

Kaiser: Reduced latency costs by a factor of three by relocating the filtering logic.

Disney: Developed a metadata-driven back-end independent ETL engine.

Microsoft: Realigned partitions, reducing execution time from 19 hours to 15 minutes. Implemented zero-downtime pipelines enabling future-dated release activation.

Net Motion: TPM: Introduced a 2-terabyte data warehouse for real-time and near-real-time reporting.

T-Mobile: Realigned partitions and normalized primary keys, reducing response time from 5 hours to seconds.

Expedia: TPM: Built and led a high-performing internal team, successfully replacing $2M/year in contractor roles.

Microsoft: TPM: Implemented metadata-driven replication for multi-master distributed databases.

Tamarac: CTO: Introduced an AI-based heuristic financial model that optimizes 1000 portfolios in 2 minutes, compared to the previous 4 hours.

Media Passage: Eliminated redundancies and reduced the code size by a factor of 5.

First Data: TPM: Introduced array processing algorithms, improving space usage by a factor of 8 and parallel processing by a factor of 10, with targets populated seconds after the last record was read from the source.

Goldman Sachs: TPM: Automated a 50-person back-office system to 5 people.

University: Designed a real-time streaming temporal logic theorem prover.

NATO: Modified systems to integrate the forces that were previously absent in telemetry data analytics.

Defense: Developed AI based expert systems, replacing hierarchical algorithms with graph-based approaches to optimize logic in core components, resulting in a significant increase in processing efficiency by several orders of magnitude.

Contact this candidate