Ash Irfan
Principal Data Engineer Lead Data Engineer
************@*****.*** 207-***-**** Newark, NJ
Profile
Accomplished Lead Data Engineer with over 10 years of expertise in data engineering, cloud computing, big data, and data governance. Demonstrated success in architecting scalable data infrastructures and managing data integration across SQL, NoSQL, graph, and time-series databases. Proficient in ETL/ELT processes using tools such as Apache NiFi, Talend, and Informatica. Expertise spans cloud platforms (AWS, Azure, GCP), big data technologies (Hadoop, Spark, Kafka), and machine learning frameworks (SageMaker, Azure ML). Skilled in data security, compliance, and performance optimization, with a strong emphasis on data governance, lineage, and cataloging.
Skills
Database
Expert in SQL (MySQL, PostgreSQL, SQL Server),
NoSQL (MongoDB, Cassandra, Redis), graph (Neo4j,
Amazon Neptune), time-series (InfluxDB,
TimescaleDB), NewSQL (CockroachDB, Google
Spanner), and additional databases like Oracle, DB2, and SQLite.
Big Data
Proficient in Hadoop, Spark, Kafka, Flink, Storm, and additional experience with tools like Apache HBase, Hive, Impala for data storage and querying, and
Apache Sqoop for bulkdata transfer between
Hadoop and structured datastores, enhancing
capabilities inhandling vast datasets and complex
processing tasks. ETL/ELT Mastery in Apache NiFi,
Talend, Informatica
Data Security & Compliance
Encryption, anonymization, GDPR, HIPAA.
Programming/Scripting: Advanced in Python, Scala,
Java. Data Visualization: Proficient in Tableau, Power BI, Looker, Superset. Machine Learning/AI:
Integrating ML/AI, SageMaker, Azure ML, Google AI
Platform.
Data Modeling/ Warehousing
Skilled in Snowflake, Big Query, Redshift, Synapse Analytics
Cloud Computing
AWS (Redshift, S3, EMR, Glue, Athena), Azure (Data Lake, Databricks, Data Factory, Cosmos DB), GCP
(BigQuery, Cloud Storage, Dataflow, Pub/Sub).
API Development
RESTful, GraphQL for data integration.
Containerization/Orchestration: Docker, Kubernetes, Docker Swarm. Performance Tuning: Optimizing
data jobs, applicationsReal-time Data Processing:
Skilled in Apache Flink, Samza, stream processing. Data Governance: Data governance, metadata
management, data lineage, cataloging
Professional Experience
Lead Data Engineer, Klarity Labs
•Designed and implemented a robust data infrastructure by integrating SQL, NoSQL, and NewSQL databases to meet diverse storage and processing requirements.
•Developed and executed a multi-cloud strategy leveraging AWS, Azure, and GCP to enhance data storage, processing, analytics, high availability, and disaster recovery.
09/2020 – Present
•Built and deployed a real-time data processing system with Apache Flink and Kafka, enabling instantaneous insights and supporting critical decision-making.
•Led the development and deployment of containerized data applications using Docker and Kubernetes, ensuring scalability and streamlined cross-environment operations.
•Strengthened data security by implementing encryption, anonymization, and ensuring strict adherence to international data protection standards.
•Established a centralized data governance framework, enhancing data quality, lineage tracking, and metadata management processes.
•Promoted a culture of innovation by integrating advanced data analytics and machine learning functionalities into existing data pipelines. Senior Data Engineer, Mezmo
•Designed and implemented a comprehensive data infrastructure, integrating SQL, NoSQL, and NewSQL databases to meet diverse storage and processing requirements.
•Executed a multi-cloud strategy, leveraging AWS, Azure, and GCP services to optimize data storage, processing, and analytics, ensuring high availability and robust disaster recovery.
07/2017 – 08/2020
•Architected and deployed a real-time data processing system using Apache Flink and Kafka, enabling real-time insights and enhancing decision-making capabilities.
•Led the development of containerized data applications with Docker and Kubernetes, ensuring scalability and smooth deployment across multiple environments.
•Strengthened data security by implementing encryption, anonymization, and ensuring compliance with global data protection regulations.
•Directed the development of a centralized data governance framework, enhancing data quality, lineage tracking, and metadata management.
•Promoted continuous improvement by integrating advanced data analytics and machine learning features into existing data pipelines. Data Engineer, 7Park Data
•Designed and managed scalable and efficient data pipelines, utilizing ETL/ELT processes with tools such as Informatica and custom scripts to support business intelligence and analytics initiatives. Integrated various databases, including graph and time-series databases, into the data ecosystem, enabling complex data analysis and enhancing data-driven decision-making.
•Utilized cloud services like AWS S3 and Azure Data Lake for secure, scalable data storage, ensuring data availability and integrity. 09/2013 – 05/2017
•Developed APIs to facilitate seamless data exchange and integration, optimizing data flow between microservices and external systems.
•Applied machine learning models to enhance data quality and predictive analytics using platforms such as Google Al Platform and Amazon SageMaker.
•Contributed to implementing data security protocols and compliance standards, ensuring the protection of sensitive information.
•Actively engaged in developing and enforcing data governance policies, ensuring the consistency and reliability of data assets.
Projects
Multi-Cloud Data Lakehouse Architecture
•Designed a scalable multi-cloud data lakehouse across AWS, Azure, and GCP, integrating SQL, NoSQL, and time-series databases for unified analytics.
•Built automated ETL/ELT pipelines ensuring secure data processing, strong governance, and high availability across distributed cloud environments.
Real-Time Event Streaming & Analytics Platform
•Built a high-throughput real-time streaming system using Kafka and Flink to process live event data with sub-second latency.
•Integrated ML-driven insights into streaming workflows, enabling predictive analytics and faster operational decision-making.
Centralized Data Governance & Metadata Lineage System
•Developed a unified governance platform featuring end-to-end lineage tracking, metadata cataloging, and automated data quality checks.
•Implemented compliance controls including encryption, masking, and audit policies to strengthen data trust and regulatory readiness.
Education
Bachelor of Science, University of Punjab