Data Engineer Architect

Location:

Seattle, WA

Posted:

October 07, 2025

Contact this candidate

Resume:

Malik Tariq

Data Architect Principal Data Engineer Cloud Data Platform Expert

**********.*******.***@*****.*** 215-***-**** Aliquippa PA (15001) US Profile

Data Architect & Principal Data Engineer with 10+ years of experience designing and modernizing cloud-native data ecosystems across AWS, Azure, and GCP. Specialized in Data Lake, Lakehouse, and Data Mesh architectures, enabling advanced analytics, AI, and enterprise intelligence at scale. Proven expertise in data modernization, governance, and engineering using Databricks, Snowflake, Spark, Kafka, and Airflow. Skilled in DataOps, MLOps, and cloud automation, ensuring scalability, performance, and compliance. Renowned for architectural leadership, innovation, and mentoring high-performing data teams to deliver high- impact, data-driven solutions. Experienced in defining enterprise data strategies, building end-to-end pipelines, and integrating AI/ML-ready data platforms. Adept at collaborating with business and technology leaders to align architecture with strategic goals, drive digital transformation, and establish data governance, lineage, and quality frameworks for trusted, enterprise-wide analytics. Professional Experience

Data Architect, MemSQL / SingleStore

•Architect and lead the design of enterprise-scale, cloud-native data ecosystems

(Data Lake, Lakehouse, Data Mesh, Warehouse) ensuring performance, scalability, and governance across AWS, Azure, and GCP.

•Define and execute the enterprise data architecture strategy, setting standards for ingestion, storage, transformation, and analytics layers. 10/2022 – Present

•Develop data blueprints, frameworks, and reference architectures that guide all data engineering and analytics initiatives across the organization.

•Spearhead the modernization of legacy data platforms to modern architectures such as Databricks, Snowflake, Redshift, Synapse, and BigQuery, enabling advanced analytics and AI/ML use cases.

•Establish data modeling standards (conceptual, logical, and physical), ensuring normalization, scalability, and consistency across systems and domains.

•Lead data governance and quality initiatives, defining policies for metadata management, lineage tracking, cataloging, and data stewardship.

•Implement robust data security and compliance frameworks aligned with enterprise standards and regulations (GDPR, HIPAA, SOC2).

•Collaborate with business leaders, CDOs, CIOs, and enterprise architects to translate organizational objectives into scalable data solutions.

•Partner with data engineers, ML engineers, and analysts to ensure the data platform supports seamless integration for analytics, reporting, and machine learning workloads.

•Architect and optimize data integration pipelines (streaming and batch) leveraging tools like Kafka, Airflow, Glue, and Spark for high throughput and minimal latency.

•Evaluate and adopt emerging technologies such as Delta Lake, Apache Iceberg, Data Mesh, Vector Databases, and AI-powered data management tools to future-proof the data ecosystem.

•Define and enforce data platform performance benchmarks, SLAs, and cost optimization strategies to ensure operational efficiency.

•Oversee data lifecycle management, ensuring data integrity, retention policies, and regulatory compliance across environments.

•Lead data architecture reviews, design governance boards, and cross-functional strategy sessions to maintain architectural coherence.

•Provide technical mentorship and leadership to data engineers, BI teams, and developers, fostering a culture of innovation, automation, and excellence.

•Deliver architecture documentation, lineage maps, data flow diagrams, and platform roadmaps to ensure transparency and alignment with enterprise standards.

•Champion a data-driven culture, enabling analytics, AI, and decision intelligence through strong architectural foundations.

Principal Data Engineer, Monte Carlo

•Architect and lead the design of enterprise-scale data ecosystems including Data Lakes, Lakehouses, and Data Mesh architectures across AWS, Azure, or GCP.

•Define and implement data engineering strategies, frameworks, and governance models ensuring scalability, performance, and compliance. 08/2020 – 09/2022

•Build and optimize end-to-end data pipelines (batch & real-time) using Spark, Databricks, Kafka, Airflow, and modern ETL/ELT frameworks.

•Modernize legacy platforms into cloud-native, cost-efficient, and high-performance data solutions.

•Establish and enforce data quality, security, lineage, and metadata management standards across the organization.

•Drive DataOps and MLOps practices, integrating CI/CD pipelines, automation, monitoring, and observability for data workflows.

•Collaborate with Data Architects, Data Scientists, ML Engineers, and Analysts to deliver trusted, analytics-ready datasets for AI and BI use cases.

•Mentor and guide data engineering teams, setting coding standards, conducting reviews, and enabling technical excellence.

•Evaluate and introduce emerging technologies such as Lakehouse, Delta/Iceberg, vector databases, and AI-driven data management tools.

•Partner with business and technical leadership to align data strategy with enterprise goals, ensuring data-driven decision-making.

•Lead initiatives for data performance optimization, resource efficiency, and operational reliability at scale.

•Act as the technical authority and strategic advisor in enterprise data transformation, governance, and platform innovation. Senior Big Data Engineer, Saarthee

•Architect, design, and build scalable, secure, and high-performance data platforms supporting batch and real-time processing.

•Develop and maintain end-to-end data pipelines for ingestion, transformation, and delivery using Apache Spark, Kafka, Airflow, and dbt. 03/2017 – 07/2020

•Implement and optimize data lakehouse architectures (Delta Lake, Iceberg, or Hudi) for unified analytics and machine learning workloads.

•Engineer ETL/ELT frameworks across large-scale, distributed systems ensuring data accuracy, consistency, and low latency.

•Manage big data workflows processing terabytes to petabytes of data daily using cloud-native services (AWS Glue, EMR, GCP Dataproc, Azure Synapse).

•Design and enforce data modeling standards (dimensional, data vault, or wide tables) to improve query performance and usability.

•Implement data governance, lineage, and quality frameworks with tools like Great Expectations, DataHub, and Apache Atlas.

•Automate data infrastructure deployment using Terraform, Docker, and CI/CD pipelines (GitHub Actions, Jenkins).

•Collaborate with Data Architects, Analysts, and ML Engineers to deliver reliable, reusable datasets for analytics and AI/ML use cases.

•Monitor and tune data jobs and clusters for optimal cost and performance efficiency.

•Ensure data privacy, security, and compliance with enterprise standards (GDPR, HIPAA, SOC2).

•Lead code reviews, best practice sessions, and mentoring of junior engineers within the data engineering team.

•Evaluate and integrate emerging data technologies (Databricks Unity Catalog, Snowflake Streams, Delta Live Tables) to modernize the platform.

•Drive continuous improvement initiatives across the data lifecycle, from architecture to observability and automation.

Data Engineer, Jitsu

•Design, build, and maintain scalable data pipelines for ingestion, transformation, and integration from multiple structured and unstructured data sources.

•Develop and optimize ETL/ELT workflows using tools like Spark, PySpark, Airflow, Kafka, Databricks, Snowflake, or SQL-based frameworks. 01/2015 – 02/2017

•Implement data models, schemas, and architectures that support analytics, BI, and machine learning workloads efficiently.

•Ensure data quality, consistency, lineage, and governance through automated validation, monitoring, and version control.

•Collaborate with data scientists, analysts, and business stakeholders to deliver reliable, analytics-ready datasets.

•Work with cloud platforms (AWS, Azure, GCP) to build and manage data lakes, warehouses, and streaming systems.

•Apply best practices in performance tuning, partitioning, and optimization for high- volume data processing.

•Implement CI/CD pipelines and DataOps processes for continuous integration, testing, and deployment of data solutions.

•Ensure data security and compliance with enterprise and regulatory standards

(GDPR, HIPAA, SOC2, etc.).

•Monitor and troubleshoot data workflows, ensuring high availability and reliability of pipelines.

•Collaborate with architects and senior engineers to modernize legacy systems and enable scalable, cloud-native architectures.

•Document data flows, technical specifications, and operational procedures to ensure transparency and maintainability.

Skills

Data Architecture & Design

•Enterprise-scale data ecosystem design (Data Lake, Lakehouse, Data Mesh, Data Warehouse)

•End-to-end data architecture strategy- ingestion transformation analytics

•Conceptual, logical, and physical data modeling

(OLTP, OLAP, Dimensional, Data Vault)

•Reference architecture & framework development

for organization-wide data initiatives

•Legacy-to-modern platform migration (on-prem

cloud-native, Hadoop Databricks/Snowflake)

•Data lineage, cataloging, and metadata

management (DataHub, Atlas, Collibra, Alation)

ETL/ELT Development

•Apache Spark, PySpark, Databricks, Kafka, Airflow, dbt, Flink

•Real-time & batch processing pipelines

•Streaming architectures and event-driven data flows

•Data lakehouse frameworks, Delta Lake, Apache

Iceberg, Apache Hudi

•Data integration & transformation across large-scale distributed systems

DevOps, DataOps & MLOps

•Implement CI/CD pipelines for data workflows using Jenkins, GitHub Actions, Azure DevOps, or GitLab CI.

•Deploy infrastructure as code (IaC) using Terraform, CloudFormation, or Bicep for environment

consistency.

•Containerize data services using Docker and

orchestrate via Kubernetes.

•Apply DataOps principles, version control, testing, deployment automation, and observability.

•Integrate data pipelines with ML workflows enabling MLOps (model training, deployment, and

monitoring).

•Build automated testing and validation suites for data pipelines.

•Configure observability dashboards (Grafana,

Prometheus, CloudWatch) for data performance and

SLA monitoring.

Cloud Platforms

•Multi-cloud expertise: AWS, Azure, GCP

•Cloud-native data services:

•Hybrid and federated data architectures (cross-

cloud data integration)

Data Governance, Quality & Security

•Define and lead enterprise data governance

frameworks ensuring data reliability, trust, and

compliance.

•Implement metadata management, data lineage,

and cataloging using Apache Atlas, DataHub,

Collibra, or Alation.

•Establish data quality controls and validation

frameworks using Great Expectations, Deequ, or

Soda.

•Enforce data access controls, encryption, and

masking in compliance with standards (GDPR,

HIPAA, SOC2).

•Build data stewardship and ownership models,

enabling accountability across domains.

•Design data retention and archival policies for legal and operational compliance.

•Develop data compliance automation and

continuous monitoring frameworks.

Data Modeling & Analytics Enablement

•Design logical and physical data models, relational, dimensional, star/snowflake, and data vault.

•Build semantic layers and data marts supporting

self-service BI and reporting.

•Enable analytics and AI-ready datasets by

standardizing transformations and metadata layers.

•Integrate with BI and visualization tools (Tableau, Power BI, Looker, QuickSight).

•Support machine learning and AI teams through

curated, feature-rich, and versioned data sets.

•Design feature stores for ML model development

(Databricks Feature Store, SageMaker Feature

Store).

•Optimize query performance and cost efficiency

through indexing, partitioning, and caching

strategies.

Education

Bachelor of Science in Computer Science

Contact this candidate