Ash Malik
Senior Data Engineering & Architecture Expert Cloud, Big Data, And
Modern Data Stack Specialist
Lima, OH 45801 571-***-**** ***.*****.***@*****.*** Experienced Data Engineering and Architecture Professional with a strong track record in designing, building, and optimizing scalable, reliable, and secure data ecosystems across diverse industries, including healthcare. Demonstrated expertise in data architecture, ETL/ELT development, data modeling, and modern cloud data platforms (AWS, GCP, Azure). Highly skilled in big data frameworks (Apache Spark, Hadoop, Kafka), real- time streaming, batch processing, and data lakehouse architectures. Proficient in Python, SQL, and modern data stack tools including Airflow, dbt, Snowflake, Redshift, and BigQuery. Adept at integrating data from diverse sources (APIs, databases, streaming), ensuring data quality, governance, lineage, and compliance with standards such as HIPAA, GDPR, and HITECH. Experienced in architecting end-to-end data engineering solutions, implementing data quality frameworks, and optimizing systems for performance, fault tolerance, and cost efficiency. Recognized for technical leadership, best practice definition, and team mentorship. Proven ability to collaborate cross-functionally with data engineers, analysts, scientists, and business stakeholders to deliver analytics-ready datasets powering BI, AI/ML, and decision-making. Committed to building automated, governed, and enterprise-grade data infrastructures that enable scalable, compliant, and business-aligned data strategies. Data Engineering & ETL/ELT Development
Building scalable ETL/ELT pipelines with Airflow,
dbt, Spark, and Informatica Automating
workflows, data cleaning, transformation, and
orchestration
• Big Data & Real-Time Processing Expertise in
Apache Spark, Hadoop, Kafka, Flink, and
Streaming Pipelines Handling batch and real-time
data (IoT, logs, clickstreams)
•
Cloud Data Platforms Hands-on with AWS (S3,
Glue, EMR, Redshift), GCP (BigQuery,
Dataflow), and Azure (Synapse, Data Factory)
Designing and deploying cloud-native data
architectures
• Data Modeling & Architecture Designing Star,
Snowflake, Data Vault, and Dimensional Models
Building Data Lakes and Lakehouse Architectures
•
Professional Summary
Skills
Programming & Querying Proficient in Python
(Pandas, scripting) and SQL (Joins, CTEs,
Window Functions) Knowledge of Scala and Java
for distributed data systems
• Data Governance, Quality & Compliance
Implementing HIPAA, GDPR, SOC 2 standards
Using Great Expectations, Collibra, Alation for
validation and lineage
•
DevOps & Automation CI/CD with Jenkins,
GitHub Actions Infrastructure as Code using
Terraform, containerization with Docker,
orchestration with Kubernetes
• Monitoring & Observability Tracking data pipeline health using Prometheus, Grafana, Airflow UI, and
CloudWatch Setting up alerts, logging, and
performance tuning
•
Data Integration & APIs Integrating data from
APIs, FHIR, HL7, JSON, CSV, Parquet, and other
sources Combining data across structured and
unstructured systems
• Leadership & Collaboration Leading teams,
mentoring junior engineers, and driving best
practices Collaborating with data scientists,
analysts, business, and compliance teams
•
Data Architect, 01/2023 - Current
1up Health, Inc – United States
Key Technologies: AWS, GCP, Snowflake, BigQuery, Airflow, dbt, Apache Spark, Kafka, Data Governance Tools, Python, SQL
Designed and implemented enterprise-scale data architectures, enabling unified, governed, and secure data ecosystems aligned with HIPAA, GDPR, and HITECH standards
•
Defined data strategy, architecture blueprints, and governance frameworks to support BI, AI/ML, and analytics initiatives.
•
Led the integration of diverse EHR, EMR, and clinical data sources, ensuring compliance, lineage, and data quality across systems.
•
Architected cloud-native data platforms using AWS and GCP, with real-time ingestion, data lakehouse, and warehouse layers.
•
Established data governance policies, access control, and metadata management ensuring security and transparency.
•
Collaborated with engineering, compliance, and analytics teams to deliver analytics-ready, reliable, and auditable data pipelines.
•
Provided technical leadership, mentoring teams, and defining best practices for data modeling, orchestration, and automation.
•
Senior Big Data Engineer, 07/2018 - 12/2022
Census - a Fivetran Company – United States
Work History
Key Technologies: Apache Spark, Hadoop, Kafka, Airflow, dbt, AWS (EMR, Glue, Redshift), GCP (Dataproc, BigQuery), Azure (HDInsight, Synapse)
Led the design and implementation of large-scale distributed data systems using Apache Spark, Hadoop, and Kafka for real-time and batch processing at petabyte scale.
•
Architected and optimized data lakehouse solutions across AWS, GCP, and Azure, enabling scalable and cost-efficient data storage and processing.
•
Developed and managed streaming data pipelines, integrating data from IoT devices, logs, and transactional systems for real-time analytics.
•
Implemented data quality frameworks with automated validation, lineage tracking, and error handling to ensure data integrity.
•
• Drove performance tuning and cost optimization strategies for compute clusters, reducing cloud spend. Mentored junior engineers, established coding standards, CI/CD practices, and orchestration workflows with Airflow and Glue.
•
Partnered with cross-functional teams (data scientists, analysts, product) to deliver machine learning-ready datasets and advanced analytics.
•
Data Engineer, 04/2014 - 06/2018
Databand.ai – United States
Key Technologies: Python, SQL, Airflow, dbt, Snowflake, BigQuery, APIs, Git, Linux Designed, developed, and maintained scalable ETL/ELT pipelines using Python, SQL, and Airflow, ensuring high-quality and reliable data ingestion from APIs, databases, and flat files.
•
Built and optimized data models (star/snowflake schemas) for analytics and reporting, enabling self- service BI and accurate business insights.
•
Integrated data from multiple sources into data warehouses such as Snowflake and BigQuery, ensuring consistency, quality, and governance.
•
Automated data workflows, implemented monitoring and alerting systems to ensure pipeline reliability and performance.
•
Collaborated with analysts and business teams to deliver analytics-ready datasets, supporting dashboards, KPIs, and decision-making.
•
Contributed to data quality assurance, documentation, and code reviews, fostering best practices in data engineering.
•
Bachelor of Science: Computer Science
Punjab University
Education