Kamal Jutt
Senior Data Engineer ETL & MLOps Specialist Cloud Data Platform Expert
*****.****.***@*****.*** 203-***-**** San Diego, CA 92101, USA Professional Summary
Results-driven Lead Data Engineer with over 10+ years of experience designing, building, and optimizing cloud- native data platforms across AWS, Azure, and GCP. Expert in ETL/ELT pipelines, real-time data streaming, and big data solutions using Spark, Kafka, and Snowflake. Proven record of reducing data processing time by 65%, cutting infrastructure cost by 30%, and enabling real-time analytics for 2M+ messages per second. Skilled in MLOps, CI/CD, and data governance frameworks, ensuring high-quality, secure, and scalable data delivery. Adept at collaborating with cross-functional teams, leading engineers, and translating business needs into actionable data solutions. Recognized for driving innovation, improving system reliability, and accelerating data-driven decision-making across enterprises.
Skills
Programming & Scripting
Python, SQL, Scala, Java, Shell Scripting, REST APIs, Data Automation
Analytics & BI Tools
Tableau, Power BI, Looker, Qlik, Superset, Mode
Analytics, Excel (Advanced), Data Visualization Best Practices
Data Governance & Security
Data Quality Frameworks, Data Cataloging, Metadata Management, Master Data Management (MDM), Data
Lineage, GDPR, HIPAA, SOC 2, Role-Based Access
Control (RBAC), Data Masking & Encryption
Machine Learning & Advanced Analytics
(Supportive) Feature Engineering, ML Data
Preparation, Model Deployment Support (MLflow,
SageMaker, Vertex Al), A/B Testing Data Pipelines, Real-time Analytics, Data for AI/LLM Workloads
Data Engineering & ETL
ETL/ELT Development, Data Pipeline Orchestration,
Data Modeling (OLTP & OLAP), Data Warehousing, Data Integration, Real-time & Batch Processing, Workflow Scheduling (Airflow, Luigi, Oozie, Azkaban)
Cloud Platforms & Services
AWS (S3, Redshift, Glue, EMR, Lambda, Athena,
Kinesis), Azure (Data Lake, Synapse Analytics, Data Factory, Databricks), GCP (BigQuery, Dataflow,
Dataproc, Pub/Sub), Snowflake, Databricks, Terraform Cloud
DevOps, CI/CD & Infrastructure
CI/CD Pipelines (Jenkins, GitLab Cl, GitHub Actions, Azure DevOps), Docker, Kubernetes, Terraform,
Ansible, Infrastructure as Code (laC), Serverless
Architectures, Monitoring & Logging (Datadog,
Prometheus, Grafana, Splunk, ELK Stack)
Databases & Storage
Relational Databases (PostgreSQL, MySQL, Oracle, SQL Server), NoSQL (MongoDB, Cassandra, DynamoDB,
Redis, Couchbase), Columnar Databases (Redshift,
BigQuery, Vertica, Greenplum), Graph Databases
(Neoj, Amazon Neptune)
Big Data & Distributed Systems
Apache Spark, Hadoop Ecosystem (Hive, Pig, HDFS,
MapReduce), Kafka, Flink, Presto, Delta Lake, Druid, Storm
Snowflake
Designed and optimized Snowflake data warehouses
and ETL workflows.
Professional Experience
Senior Data Engineer, Secoda
•Led design and implementation of cloud-native data platforms (AWS, Azure) supporting real-time analytics and predictive insights, improving data accessibility by 60%.
•Built scalable Kafka-based pipelines processing 2M+ messages per second, reducing latency by 40% and improving data reliability.
12/2022 – Present
•Architected ETL/ELT workflows in Spark and Snowflake, cutting data processing time by 65% and boosting pipeline efficiency.
•Drove data governance & compliance (GDPR, HIPAA) frameworks, ensuring 100% audit-ready datasets across business domains.
•Partnered with Data Science teams to deliver ML-ready datasets, improving model training accuracy by 70% and reducing deployment cycles by 30%.
•Directed CI/CD & DevOps integration (Jenkins, Datadog, Terraform), reducing deployment time by 50% and enhancing observability.
•Spearheaded legacy-to-cloud migration, achieving a 25% cost reduction and improving infrastructure scalability.
•Mentored 5+ engineers, fostering best practices in data reliability, observability, and automation.
•Executed role-based access data masking, ensuring secure access to sensitive data and meeting SOC 2 compliance requirements.
•Created self-service analytics frameworks, enabling business users to generate insights independently and reducing BI team workload by 35%.
•Introduced predictive alerting for pipeline failures using Spark and Datadog, reducing downtime and manual interventions by 50%. Big Data Engineer, Rill Data
•Engineered high-volume big data pipelines using Apache Spark, Kafka, and Airflow, processing 10B+ records daily across distributed systems.
•Optimized Snowflake and Redshift data warehouses, reducing BI query latency by 55% and improving reporting performance.
08/2018 – 11/2022
•Automated ETL workflows and orchestration frameworks, reducing manual operations by 70% and boosting reliability.
•Modernized data infrastructure by migrating on-prem Hadoop clusters to AWS EMR and Azure Data Lake, cutting infrastructure cost by 30%.
•Built data quality & anomaly detection frameworks, increasing data accuracy and integrity by 80%.
•Collaborated cross-functionally with product & BI teams to design business-aligned data models, improving decision-making speed by 45%.
•Contributed to DevOps and CI/CD adoption, ensuring seamless integration, version control, and monitoring across data platforms.
•Implemented event-driven architecture with Kafka for real-time data processing, enabling near-instant business insights.
•Designed partitioning and indexing strategies for Snowflake, improving query efficiency and reducing compute costs by 20%.
Data Engineer, Narrator
•Developed and maintained ETL pipelines for structured/unstructured data across Hadoop and SQL ecosystems, improving data delivery SLAs by 40%.
•Built SQL-based data models, stored procedures, and views, enhancing analytics efficiency and supporting enterprise-wide BI dashboards. 06/2015 – 07/2018
•Implemented data validation and cleansing frameworks, increasing data accuracy by 90% and reducing manual QA effort by 50%.
•Designed data visualization dashboards using Tableau and Power BI, delivering actionable insights to business teams.
•Contributed to Hadoop adoption (Hive, Pig) for batch processing, enabling scalable data warehousing and analytics.
•Partnered with business teams to translate analytics requirements into technical data architecture solutions.
Projects
Real-Time Data Streaming Platform Kafka + Spark + Snowflake
•Architected and deployed a real-time data ingestion system handling 2M+ Kafka messages per second across distributed clusters.
•Integrated Spark Structured Streaming with Snowflake for real-time analytics and alerting, cutting latency by 45%.
•Achieved 99.9% message delivery reliability and reduced downstream lag through partition optimization and schema evolution.
Tech Stack: Kafka, Spark, Snowflake, AWS Kinesis, Airflow, Terraform, Datadog Cloud Data Lake Modernization AWS & Databricks
•Managed migration from on-prem Hadoop to AWS-based Databricks platform, improving data scalability and reducing storage cost by 35%.
•Built data ingestion frameworks using AWS Glue and Delta Lake, automating schema tracking and lineage.
•Planned cost-optimized storage tiering strategy using S3 and Redshift Spectrum for hybrid workloads. Tech Stack: Databricks, AWS S3, Redshift, Glue, Delta Lake, Python, Terraform Machine Learning Data Pipeline Automation
•Developed ML data preparation pipelines supporting model training and feature store population in Snowflake and Databricks.
•Enabled automate retraining triggers and versioned feature storage, improving model refresh efficiency by 60%.
•Supported MLOps deployment via MLflow and SageMaker, integrating CI/CD pipelines for reproducible experiments.
Tech Stack: Snowflake, Databricks, MLflow, AWS SageMaker, Airflow, Python Education
Bachelor of Science in Computer Science