MOHROZE RANA
Lead Data Engineer Cloud Data Architect Big Data & AI Specialist
Rochester, NY 14604 315-***-**** **********@*****.*** Summary
Innovative and results-driven Lead Data Engineer with 9+ years of experience designing and scaling cloud-native data platforms across AWS, Azure, and GCP. Specialized in real-time data streaming, data lakehouse architectures, and machine learning pipelines using Databricks, Snowflake, Spark, Kafka, and Airflow. Adept at building end-to-end ELT pipelines, optimizing performance across petabyte-scale systems, and implementing governance, observability, and security frameworks (HIPAA, GDPR, SOC2). Skilled in CI/CD automation, infrastructure as code (Terraform, Helm, GitOps), and data quality assurance with Great Expectations and Monte Carlo. Experienced in leading teams, conducting architecture reviews, and mentoring engineers to deliver high-impact, scalable data solutions that accelerate business insights. Passionate about driving modernization through automation, Data Mesh adoption, and MLOps integration to enable predictive and real-time analytics at enterprise scale. Technical Skills
Languages & Frameworks: Python, SQL (Advanced),
Scala, Java, C#, Bash, Go (familiar), TypeScript,
JavaScript (Node.js), R, Rust (basic), YAML, JSON, Terraform (HCL), MATLAB, Julia, Shell scripting
Cloud Platforms: AWS (Glue, S3, Lambda, EMR,
Redshift, Kinesis, CloudFormation), Azure (ADF,
Synapse, ADLS, Azure SQL, Monitor, Functions), GCP
(BigQuery, Dataflow, Dataproc, Composer, Pub/Sub,
Vertex AI)
ETL & Orchestration: Airflow, dbt, Azure Data
Factory, SSIS, Informatica, Apache NiFi, Talend,
Matillion, Fivetran, Stitch, Prefect, Dagster, Luigi Big Data & Streaming: Spark, Kafka (Connect, Streams, Schema Registry), Flink, Beam, Pulsar, Hadoop (HDFS, YARN), Hive, Presto, Druid, Storm, Kinesis Analytics, Delta Lake
Data Modeling & Storage: Star Schema, Data Vault,
Data Mesh, Data Fabric, Delta Lake, Parquet, ORC,
Snowflake, Vertica, Synapse, BigQuery, Redshift,
NoSQL (MongoDB, Cassandra, DynamoDB)
DevOps & CI/CD: Docker, Kubernetes, Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps, Terraform,
Helm, GitOps
Governance & Security: HIPAA, GDPR, CCPA, PHI,
IAM, RBAC, Data Masking, Encryption, Atlas, Purview, DataHub, Amundsen, Collibra, Alation, Monte Carlo, Great Expectations
MLOps & Analytics: MLflow, Kubeflow, Feast,
TensorFlow Extended (TFX), SageMaker Pipelines,
Feature Store, Vertex AI
Observability & Monitoring: ELK Stack, New Relic,
Prometheus, Grafana
BI & Visualization: Tableau, Power BI, Looker,
Superset, Mode Analytics
Methodologies: Agile/Scrum, Scrumban, TDD, Privacy- by-Design, DataOps, Stakeholder Communication,
Mentoring
Data Engineering & Orchestration: Dataform,
Astronomer, Prefect Cloud, Meltano, Soda Core,
OpenLineage, Marquez
AI/ML & MLOps: Vertex AI Pipelines, Azure ML,
Databricks Model Serving, Feast, BentoML, Weights & Biases (W&B), Ray, Vector Databases (FAISS, Pinecone, Milvus)
Cloud / Infrastructure: Cloud Run, EKS, AKS, Anthos, Outposts, Crossplane, Pulumi, AWS CDK, CAST AI,
CloudHealth, Kubecost
Observability & Reliability: Datadog, OpenTelemetry Professional Experience
Lead Data Engineer 10/2022 - Current
Employers
Architected multi-cloud data pipelines using Databricks (Delta Lake, PySpark) and Snowflake across AWS and Azure, supporting 5 TB+ daily data processing.
Implemented Kafka and Kinesis streaming pipelines for real-time ingestion of telemetry and event data with
Orchestrated ELT workflows with Apache Airflow and dbt, automating transformation, validation, and load processes.
Deployed infrastructure as code using Terraform and GitHub Actions, enabling consistent and version- controlled CI/CD releases.
Integrated Great Expectations and Monte Carlo for proactive data quality monitoring and anomaly detection.
Collaborated with ML engineers to deploy MLflow and Feature Store pipelines for live scoring and retraining models on Databricks.
Built observability dashboards with Grafana and Prometheus, improving system visibility and alerting accuracy by 60%.
Mentored junior engineers in data modeling, Data Mesh adoption, and containerized development using Docker and Kubernetes.
Data Engineer 08/2019 - 09/2022
Confluent
Developed and maintained ETL pipelines using Apache NiFi, dbt, and Azure Data Factory, automating ingestion from EHR and clinical data sources.
Engineered a HIPAA-compliant data warehouse using Snowflake and Databricks Delta Lake, ensuring scalability and secure access controls.
Built CDC (Change Data Capture) pipelines using Debezium and Kafka Streams for real-time synchronization between operational and analytical stores.
Migrated workloads from on-prem SQL Server to AWS Redshift and GCP BigQuery, improving performance by 40%.
Created metadata and lineage tracking in Azure Purview and Atlas, ensuring compliance with data governance frameworks.
Integrated dashboards in Power BI and Looker, leveraging optimized SQL views for business intelligence and reporting.
Automated CI/CD pipelines using Jenkins and Terraform for version-controlled data deployments.
Implemented encryption-at-rest and role-based access control (RBAC) for sensitive patient datasets. ETL & Data Warehouse Engineer 06/2015 - 07/2019
Accenture
Designed complex ETL workflows using Informatica, Talend, and SSIS for financial and healthcare clients, ensuring reliability and reusability.
Modeled enterprise data with Star and Snowflake schemas and implemented optimized materialized views in SQL Server and Oracle.
Migrated on-premise workloads to AWS Redshift and Azure Synapse, improving scalability and reducing infrastructure costs by 30%.
Optimized PL/SQL and T-SQL queries for datasets exceeding 500M rows, reducing runtime by 45%.
Built automated data validation scripts using Python and Great Expectations to ensure data integrity across pipelines.
Integrated observability with ELK Stack (Elasticsearch, Logstash, Kibana) and Grafana for pipeline monitoring.
Collaborated with DevOps teams to containerize ETL jobs using Docker and manage deployments via Jenkins CI/CD.
Enforced HIPAA and GDPR compliance through encryption, anonymization, and access auditing frameworks. Projects
Cross-Cloud Lakehouse Modernization
AWS Azure Databricks Snowflake Terraform
Designed a unified multi-cloud lakehouse architecture integrating Snowflake, Databricks, and Delta Lake., Automated provisioning using Terraform, improved query latency by 65%, and cut infrastructure costs by 45%. Real-Time Analytics & ML Platform
Kafka Spark Structured Streaming Databricks MLflow Airflow Built an event-driven streaming platform with Kafka, Spark, and Airflow, processing 2M+ events/hour., Integrated ML pipelines via MLflow and SageMaker, enabling real-time fraud detection and personalization at scale. Education
Bachelor of Science: Computer Science
#HRJ#2e8e15b2-6634-46b1-8479-1bb85f5b1bce#