Senior Data Engineer Lead with Cloud & Governance Expertise

Location:

United States

Salary:

$120000

Posted:

December 08, 2025

Contact this candidate

Resume:

Ash Irfan

Principal Data Engineer Lead Data Engineer

************@*****.*** 207-***-**** Newark, NJ

Profile

Accomplished Lead Data Engineer with over 10 years of expertise in data engineering, cloud computing, big data, and data governance. Demonstrated success in architecting scalable data infrastructures and managing data integration across SQL, NoSQL, graph, and time-series databases. Proficient in ETL/ELT processes using tools such as Apache NiFi, Talend, and Informatica. Expertise spans cloud platforms (AWS, Azure, GCP), big data technologies (Hadoop, Spark, Kafka), and machine learning frameworks (SageMaker, Azure ML). Skilled in data security, compliance, and performance optimization, with a strong emphasis on data governance, lineage, and cataloging.

Skills

Database

Expert in SQL (MySQL, PostgreSQL, SQL Server),

NoSQL (MongoDB, Cassandra, Redis), graph (Neo4j,

Amazon Neptune), time-series (InfluxDB,

TimescaleDB), NewSQL (CockroachDB, Google

Spanner), and additional databases like Oracle, DB2, and SQLite.

Big Data

Proficient in Hadoop, Spark, Kafka, Flink, Storm, and additional experience with tools like Apache HBase, Hive, Impala for data storage and querying, and

Apache Sqoop for bulkdata transfer between

Hadoop and structured datastores, enhancing

capabilities inhandling vast datasets and complex

processing tasks. ETL/ELT Mastery in Apache NiFi,

Talend, Informatica

Data Security & Compliance

Encryption, anonymization, GDPR, HIPAA.

Programming/Scripting: Advanced in Python, Scala,

Java. Data Visualization: Proficient in Tableau, Power BI, Looker, Superset. Machine Learning/AI:

Integrating ML/AI, SageMaker, Azure ML, Google AI

Platform.

Data Modeling/ Warehousing

Skilled in Snowflake, Big Query, Redshift, Synapse Analytics

Cloud Computing

AWS (Redshift, S3, EMR, Glue, Athena), Azure (Data Lake, Databricks, Data Factory, Cosmos DB), GCP

(BigQuery, Cloud Storage, Dataflow, Pub/Sub).

API Development

RESTful, GraphQL for data integration.

Containerization/Orchestration: Docker, Kubernetes, Docker Swarm. Performance Tuning: Optimizing

data jobs, applicationsReal-time Data Processing:

Skilled in Apache Flink, Samza, stream processing. Data Governance: Data governance, metadata

management, data lineage, cataloging

Professional Experience

Lead Data Engineer, Klarity Labs

•Designed and implemented a robust data infrastructure by integrating SQL, NoSQL, and NewSQL databases to meet diverse storage and processing requirements.

•Developed and executed a multi-cloud strategy leveraging AWS, Azure, and GCP to enhance data storage, processing, analytics, high availability, and disaster recovery.

09/2020 – Present

•Built and deployed a real-time data processing system with Apache Flink and Kafka, enabling instantaneous insights and supporting critical decision-making.

•Led the development and deployment of containerized data applications using Docker and Kubernetes, ensuring scalability and streamlined cross-environment operations.

•Strengthened data security by implementing encryption, anonymization, and ensuring strict adherence to international data protection standards.

•Established a centralized data governance framework, enhancing data quality, lineage tracking, and metadata management processes.

•Promoted a culture of innovation by integrating advanced data analytics and machine learning functionalities into existing data pipelines. Senior Data Engineer, Mezmo

•Designed and implemented a comprehensive data infrastructure, integrating SQL, NoSQL, and NewSQL databases to meet diverse storage and processing requirements.

•Executed a multi-cloud strategy, leveraging AWS, Azure, and GCP services to optimize data storage, processing, and analytics, ensuring high availability and robust disaster recovery.

07/2017 – 08/2020

•Architected and deployed a real-time data processing system using Apache Flink and Kafka, enabling real-time insights and enhancing decision-making capabilities.

•Led the development of containerized data applications with Docker and Kubernetes, ensuring scalability and smooth deployment across multiple environments.

•Strengthened data security by implementing encryption, anonymization, and ensuring compliance with global data protection regulations.

•Directed the development of a centralized data governance framework, enhancing data quality, lineage tracking, and metadata management.

•Promoted continuous improvement by integrating advanced data analytics and machine learning features into existing data pipelines. Data Engineer, 7Park Data

•Designed and managed scalable and efficient data pipelines, utilizing ETL/ELT processes with tools such as Informatica and custom scripts to support business intelligence and analytics initiatives. Integrated various databases, including graph and time-series databases, into the data ecosystem, enabling complex data analysis and enhancing data-driven decision-making.

•Utilized cloud services like AWS S3 and Azure Data Lake for secure, scalable data storage, ensuring data availability and integrity. 09/2013 – 05/2017

•Developed APIs to facilitate seamless data exchange and integration, optimizing data flow between microservices and external systems.

•Applied machine learning models to enhance data quality and predictive analytics using platforms such as Google Al Platform and Amazon SageMaker.

•Contributed to implementing data security protocols and compliance standards, ensuring the protection of sensitive information.

•Actively engaged in developing and enforcing data governance policies, ensuring the consistency and reliability of data assets.

Projects

Multi-Cloud Data Lakehouse Architecture

•Designed a scalable multi-cloud data lakehouse across AWS, Azure, and GCP, integrating SQL, NoSQL, and time-series databases for unified analytics.

•Built automated ETL/ELT pipelines ensuring secure data processing, strong governance, and high availability across distributed cloud environments.

Real-Time Event Streaming & Analytics Platform

•Built a high-throughput real-time streaming system using Kafka and Flink to process live event data with sub-second latency.

•Integrated ML-driven insights into streaming workflows, enabling predictive analytics and faster operational decision-making.

Centralized Data Governance & Metadata Lineage System

•Developed a unified governance platform featuring end-to-end lineage tracking, metadata cataloging, and automated data quality checks.

•Implemented compliance controls including encryption, masking, and audit policies to strengthen data trust and regulatory readiness.

Education

Bachelor of Science, University of Punjab

Contact this candidate