Senior Cloud Data Architect & MLOps Lead

Location:

New York City, NY

Salary:

130000

Posted:

December 01, 2025

Contact this candidate

Resume:

MOHROZE RANA

Lead Data Engineer Cloud Data Architect Big Data & AI Specialist

Rochester, NY 14604 315-***-**** **********@*****.*** Summary

Innovative and results-driven Lead Data Engineer with 9+ years of experience designing and scaling cloud-native data platforms across AWS, Azure, and GCP. Specialized in real-time data streaming, data lakehouse architectures, and machine learning pipelines using Databricks, Snowflake, Spark, Kafka, and Airflow. Adept at building end-to-end ELT pipelines, optimizing performance across petabyte-scale systems, and implementing governance, observability, and security frameworks (HIPAA, GDPR, SOC2). Skilled in CI/CD automation, infrastructure as code (Terraform, Helm, GitOps), and data quality assurance with Great Expectations and Monte Carlo. Experienced in leading teams, conducting architecture reviews, and mentoring engineers to deliver high-impact, scalable data solutions that accelerate business insights. Passionate about driving modernization through automation, Data Mesh adoption, and MLOps integration to enable predictive and real-time analytics at enterprise scale. Technical Skills

Languages & Frameworks: Python, SQL (Advanced),

Scala, Java, C#, Bash, Go (familiar), TypeScript,

JavaScript (Node.js), R, Rust (basic), YAML, JSON, Terraform (HCL), MATLAB, Julia, Shell scripting

Cloud Platforms: AWS (Glue, S3, Lambda, EMR,

Redshift, Kinesis, CloudFormation), Azure (ADF,

Synapse, ADLS, Azure SQL, Monitor, Functions), GCP

(BigQuery, Dataflow, Dataproc, Composer, Pub/Sub,

Vertex AI)

ETL & Orchestration: Airflow, dbt, Azure Data

Factory, SSIS, Informatica, Apache NiFi, Talend,

Matillion, Fivetran, Stitch, Prefect, Dagster, Luigi Big Data & Streaming: Spark, Kafka (Connect, Streams, Schema Registry), Flink, Beam, Pulsar, Hadoop (HDFS, YARN), Hive, Presto, Druid, Storm, Kinesis Analytics, Delta Lake

Data Modeling & Storage: Star Schema, Data Vault,

Data Mesh, Data Fabric, Delta Lake, Parquet, ORC,

Snowflake, Vertica, Synapse, BigQuery, Redshift,

NoSQL (MongoDB, Cassandra, DynamoDB)

DevOps & CI/CD: Docker, Kubernetes, Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps, Terraform,

Helm, GitOps

Governance & Security: HIPAA, GDPR, CCPA, PHI,

IAM, RBAC, Data Masking, Encryption, Atlas, Purview, DataHub, Amundsen, Collibra, Alation, Monte Carlo, Great Expectations

MLOps & Analytics: MLflow, Kubeflow, Feast,

TensorFlow Extended (TFX), SageMaker Pipelines,

Feature Store, Vertex AI

Observability & Monitoring: ELK Stack, New Relic,

Prometheus, Grafana

BI & Visualization: Tableau, Power BI, Looker,

Superset, Mode Analytics

Methodologies: Agile/Scrum, Scrumban, TDD, Privacy- by-Design, DataOps, Stakeholder Communication,

Mentoring

Data Engineering & Orchestration: Dataform,

Astronomer, Prefect Cloud, Meltano, Soda Core,

OpenLineage, Marquez

AI/ML & MLOps: Vertex AI Pipelines, Azure ML,

Databricks Model Serving, Feast, BentoML, Weights & Biases (W&B), Ray, Vector Databases (FAISS, Pinecone, Milvus)

Cloud / Infrastructure: Cloud Run, EKS, AKS, Anthos, Outposts, Crossplane, Pulumi, AWS CDK, CAST AI,

CloudHealth, Kubecost

Observability & Reliability: Datadog, OpenTelemetry Professional Experience

Lead Data Engineer 10/2022 - Current

Employers

Architected multi-cloud data pipelines using Databricks (Delta Lake, PySpark) and Snowflake across AWS and Azure, supporting 5 TB+ daily data processing.

Implemented Kafka and Kinesis streaming pipelines for real-time ingestion of telemetry and event data with

Orchestrated ELT workflows with Apache Airflow and dbt, automating transformation, validation, and load processes.

Deployed infrastructure as code using Terraform and GitHub Actions, enabling consistent and version- controlled CI/CD releases.

Integrated Great Expectations and Monte Carlo for proactive data quality monitoring and anomaly detection.

Collaborated with ML engineers to deploy MLflow and Feature Store pipelines for live scoring and retraining models on Databricks.

Built observability dashboards with Grafana and Prometheus, improving system visibility and alerting accuracy by 60%.

Mentored junior engineers in data modeling, Data Mesh adoption, and containerized development using Docker and Kubernetes.

Data Engineer 08/2019 - 09/2022

Confluent

Developed and maintained ETL pipelines using Apache NiFi, dbt, and Azure Data Factory, automating ingestion from EHR and clinical data sources.

Engineered a HIPAA-compliant data warehouse using Snowflake and Databricks Delta Lake, ensuring scalability and secure access controls.

Built CDC (Change Data Capture) pipelines using Debezium and Kafka Streams for real-time synchronization between operational and analytical stores.

Migrated workloads from on-prem SQL Server to AWS Redshift and GCP BigQuery, improving performance by 40%.

Created metadata and lineage tracking in Azure Purview and Atlas, ensuring compliance with data governance frameworks.

Integrated dashboards in Power BI and Looker, leveraging optimized SQL views for business intelligence and reporting.

Automated CI/CD pipelines using Jenkins and Terraform for version-controlled data deployments.

Implemented encryption-at-rest and role-based access control (RBAC) for sensitive patient datasets. ETL & Data Warehouse Engineer 06/2015 - 07/2019

Accenture

Designed complex ETL workflows using Informatica, Talend, and SSIS for financial and healthcare clients, ensuring reliability and reusability.

Modeled enterprise data with Star and Snowflake schemas and implemented optimized materialized views in SQL Server and Oracle.

Migrated on-premise workloads to AWS Redshift and Azure Synapse, improving scalability and reducing infrastructure costs by 30%.

Optimized PL/SQL and T-SQL queries for datasets exceeding 500M rows, reducing runtime by 45%.

Built automated data validation scripts using Python and Great Expectations to ensure data integrity across pipelines.

Integrated observability with ELK Stack (Elasticsearch, Logstash, Kibana) and Grafana for pipeline monitoring.

Collaborated with DevOps teams to containerize ETL jobs using Docker and manage deployments via Jenkins CI/CD.

Enforced HIPAA and GDPR compliance through encryption, anonymization, and access auditing frameworks. Projects

Cross-Cloud Lakehouse Modernization

AWS Azure Databricks Snowflake Terraform

Designed a unified multi-cloud lakehouse architecture integrating Snowflake, Databricks, and Delta Lake., Automated provisioning using Terraform, improved query latency by 65%, and cut infrastructure costs by 45%. Real-Time Analytics & ML Platform

Kafka Spark Structured Streaming Databricks MLflow Airflow Built an event-driven streaming platform with Kafka, Spark, and Airflow, processing 2M+ events/hour., Integrated ML pipelines via MLflow and SageMaker, enabling real-time fraud detection and personalization at scale. Education

Bachelor of Science: Computer Science

#HRJ#2e8e15b2-6634-46b1-8479-1bb85f5b1bce#

Contact this candidate