Ash Malik
Accomplished Data Architect with over *0 years of experience designing, building, and scaling modern data platforms across healthcare, SaaS, and analytics industries. Combines strategic architectural vision with hands-on data engineering expertise to deliver high-performing, secure, and well-governed ecosystems that empower analytics and AI/ML initiatives. Adept at translating complex business goals into scalable data architectures leveraging Databricks, Spark, Kafka, Airflow, and dbt. Recognized for bridging the gap between architecture and engineering, driving data modernization initiatives, and establishing enterprise-wide data standards. Experienced in optimizing data pipelines, automating cloud infrastructure across AWS, Azure, and GCP, and mentoring technical teams to achieve operational excellence while fostering innovation and scalability. Skills
Data Architecture: Lakehouse, Data Mesh, Data Fabric, MDM, Metadata & Lineage Management
Big Data & Processing: Apache Spark, Databricks, Kafka, Flink, Hadoop
Data Orchestration & Modeling: Airflow, dbt, Star/Snowflake, Data Vault, Domain-Driven Design
Cloud Platforms: AWS (S3, Glue, EMR, Redshift, IAM), GCP
(BigQuery, Dataflow, Pub/Sub), Azure (Synapse, Data Factory) Programming & Querying: Python (PySpark, Pandas), SQL (CTEs, Window Functions), Scala (basic)
Data Quality & Governance: Unity Catalog, DataHub, Collibra, Alation, Great Expectations, Monte Carlo
AI/ML Enablement: MLflow, Feature Store Design, Model Pipelines, Vector DBs (Pinecone, Weaviate)
DevOps & Infra: Terraform, Docker, Kubernetes, Jenkins, GitHub Actions, Prometheus, Grafana
Data Architect Cloud & Modern Data Platform Expert Cloud, Big Data, And Modern Data Stack Specialist
Address Lima, OH 45801
Phone 571-***-****
E-mail ***.*****.***@*****.***
Security & Compliance: IAM, KMS, Secrets Mgmt, HIPAA, GDPR, SOC 2, HITECH
Leadership & Optimization: Architecture Review Board, Mentorship, FinOps, Roadmap Planning
Work History
Jan 2023 -
Current
Data Architect
Astronomer, New York, NY
Leadership & Achievements:
Technologies: Databricks, Spark, Airflow, dbt, Snowflake, AWS, Azure, Kafka, Terraform, Collibra, DataHub, Unity Catalog
Lead the design and delivery of modern cloud data ecosystems across AWS and Azure, combining architectural strategy with practical implementation.
•
Architected and launched a Lakehouse platform using Databricks, Delta Lake, and Snowflake to support unified analytics and AI/ML workloads.
•
Built and tuned data pipelines in Spark, dbt, and Airflow to process enterprise- scale datasets efficiently.
•
• Defined data-modeling standards for consistent, domain-driven delivery. Implemented governance and lineage via Collibra, Unity Catalog, and DataHub for transparency and compliance.
•
Authored enterprise-level data architecture roadmap aligned with business objectives.
•
• Mentored engineering teams on Spark, dbt, and Airflow best practices. Drove cloud cost optimization and performance tuning initiatives yielding tangible savings.
•
Jul 2020 -
Dec 2022
Principal Data Engineer
1up Health, Inc, Boston, MA
Leadership & Achievements:
Led the design and development of a multi-cloud Lakehouse platform serving large-scale healthcare and analytics workloads.
•
• Built Databricks + Delta Lake pipelines for batch and streaming processing.
• Architected cross-domain data sharing following Data Mesh principles.
• Developed Airflow DAGs for orchestration, scheduling, and error recovery. Built real-time ingestion pipelines with Kafka and Flink, supporting healthcare event streaming.
•
Managed and mentored engineers and data scientists, defining code and design standards.
•
Delivered architecture roadmap aligning platform scalability with product growth.
•
• Implemented multi-region replication strategy for improved reliability. Technologies: Databricks, Spark, Delta Lake, Airflow, dbt, AWS, GCP, Kafka, Flink, MLflow, Terraform, Unity Catalog, DataHub
Jul 2017 -
Jun 2020
Senior Big Data Engineer
Census, San Francisco, CA
Leadership & Achievements:
Technologies:Databricks, Spark, Kafka, Flink, dbt, Airflow, AWS, GCP, MLflow, Terraform, Great Expectations, Unity Catalog
Owned the architecture and build-out of distributed, multi-cloud data pipelines for large-scale customer data activation and analytics.
•
• Built streaming frameworks with Kafka and Flink for near real-time analytics.
• Developed dbt models to enhance query efficiency and maintainability. Automated CI/CD workflows and infrastructure provisioning via Terraform and Jenkins.
•
• Designed multi-tenant data systems ensuring compliance and data isolation. Managed a global team of engineers, establishing code and delivery standards.
•
• Defined Spark performance optimization frameworks and documentation. Authored internal data reliability guidelines to reduce downtime and SLA breaches.
•
Sep 2014 -
Jun 2017
Data Engineer
Knoema, New York, NY
Leadership & Achievements:
Technologies:
Built and maintained end-to-end ETL and ELT pipelines for SaaS analytics platforms.
•
Created reusable transformation modules with PySpark and SQL for consistent data processing.
•
• Modeled data marts and dimensional schemas for BI dashboards.
• Automated data validation using Great Expectations and Python scripts. Collaborated with DevOps to containerize workloads in Docker for reliable deployments.
•
Supported Knoema diligence by documenting architecture and scalability plans.
•
• Introduced automated recovery and monitoring for improved resilience.
• Mentored junior engineers and established ETL coding standards. Python, SQL, Airflow, PySpark, dbt, Databricks, Snowflake, BigQuery, Docker, Git, Linux
•
Education
Jun 2014 Bachelor of Science: Computer Science
Punjab University