Ahsan Chaudhary
Senior Data Engineer Big Data Engineer Data Engineer
****************@*****.*** 215-***-**** Pittsburgh, PA 15232 Dynamic Data Engineering professional with 10 years of experience in designing, building, and optimizing high- performance data platforms that empower analytics and machine learning initiatives. Expertise spans modern cloud ecosystems and distributed data processing, ensuring seamless pipeline automation and real-time decision- making capabilities. Proven ability to lead cross-functional teams in delivering impactful data solutions, driving business growth through strategic collaboration.Committed to integrating security and governance into data architectures while fostering a culture of innovation and continuous improvement. Skills
Data Architecture &
Modeling
Data Modeling (Star/Snowflake
Schema), Dimensional Modeling,
Data Vault, OLAP/OLTP Design,
Data Lakes, Data Mesh, Metadata
Management, ER/Entity Modeling,
Kimball & Inmon Methodologies,
Schema Evolution
ETL/ELT Pipelines
Batch & Streaming Pipelines, Data
Ingestion, Incremental Loads, CDC
(Change Data Capture),
Transformation Frameworks, Real-
Time Data Processing,
Orchestration, Data Validation,
Performance Optimization
Lakehouse & Warehousing
Databricks, Snowflake, BigQuery,
Amazon Redshift, Azure Synapse,
Teradata, Vertica, Greenplum,
DuckDB, Delta Tables, Parquet,
ORC, Athena
Monitoring & Observability
Datadog, Prometheus, Grafana,
ELK Stack (Elasticsearch, Logstash,
Kibana), OpenTelemetry,
CloudWatch, PagerDuty, Alerting,
Logging Pipelines, SLA/SLO
Tracking
Cloud Platforms (AWS, GCP,
Azure)
AWS (Glue, EMR, Redshift,
Lambda, S3, Step Functions),GCP
(BigQuery, Dataflow, Pub/Sub,
Cloud Composer),Azure (Data
Factory, Synapse, Databricks, Event
Hub, Blob Storage), Terraform,
Kubernetes, Docker
Big Data Tools
Apache Spark, PySpark, Flink,
Kafka, Hadoop, Hive, HDFS,
HBase, Presto, Trino, Impala, Delta
Lake, Iceberg, Druid, Kinesis, Beam
Machine Learning
Integration
Feature Engineering, Feature
Stores, Model Deployment, ML-
Ops, ML-flow, Feature Registry,
Data Versioning (DVC), Model
Serving (Sage-Maker, Vertex AI,
Azure ML), Real-Time Scoring
Python for Data Engineering
Pandas, NumPy, PySpark,
SQLAlchemy, FastAPI, Requests,
boto3, Airflow DAGs, ETL Scripts,
Automation, Unit Testing (pytest),
ML Integration (scikit-learn,
TensorFlow)
Leadership & Mentorship
Team Leadership, Technical
Reviews, Agile/Scrum, Stakeholder
Management, Knowledge Sharing,
Code Reviews, Cross-Functional
Collaboration, Hiring &
Onboarding, Project Planning,
Mentorship Programs
SQL & Databases
PostgreSQL, MySQL, SQL Server,
Oracle, Snowflake SQL, Redshift
Spectrum, NoSQL (MongoDB,
Cassandra, DynamoDB, Redis),
Query Optimization, Indexing,
Window Functions, Stored
Procedures
Workflow Orchestration
Apache Airflow, dbt (Core &
Cloud), Luigi, Dagster, Prefect,
Oozie, Argo Workflows, CI/CD
Integration (GitHub Actions,
Jenkins, GitLab CI)
Data Governance
Data Lineage (Collibra, Alation,
Atlan), Data Catalogs, Quality
Checks (Great Expectations, Soda),
Security & Compliance (GDPR,
HIPAA, SOC2), Access Controls
(IAM, RBAC), Data Masking
Professional Experience
Senior Data Engineer, Intellias
•Defined and implemented data architecture across Databricks, Snowflake, and BigQuery, shaping the long-term analytics and ML roadmap 01/2022 – Present
•Designed and optimized Spark, Airflow, and dbt pipelines for faster insights and smoother workflows.
•Improved system efficiency via query tuning and workload balancing; reduced costs and enhanced report reliability.
•Established data governance with lineage tracking validations, and access controls, boosting compliance and confidence.
•Mentored engineers, conducted code reviews, and elevated SQL and data modeling standards.
•Aligned technical roadmaps with business goals in collaboration with product, finance, and executives.
•Led root-cause investigations and fixes for complex system issues as an escalation point.
•Streamlined model deployment with data science teams, accelerating predictive solutions delivery.
Big Data Engineer, Algolia
•Built and managed petabyte scale data platforms using Hadoop, Hive, and Spark to support enterprise reporting and predictive analytics. 03/2018 – 08/2021
•Designed high-throughput streaming pipelines with Kafka, Flume, and Spark Streaming, enabling near real-time operational insights.
•Optimized AWS S3 and HDFS-based data lakes with partitioning, indexing, and compression, improving accessibility and responsiveness for analytics.
•Integrated machine learning workflows with big data systems, supporting large- scale training and production-ready feature engineering
•Collaborated with analytics and product teams to deliver dashboards and reporting systems that informed major strategic decisions.
•Enforced strict security and compliance standards with encryption and fine-grained access controls, reducing operational risk
•Mentored junior engineers on Spark optimization, pipeline best practices, and scalable data design.
•Spearheaded infrastructure efficiency initiatives that streamlined storage and compute usage across the platform.
Data Engineer, Nexora Insights
•Designed and maintained data pipelines to support analytics and ML. initiatives, ensuring reliable ingestion and transformation of business-critical data. 03/2015 – 02/2018
•Developed and managed data warehousing solutions with PostgreSQL. and Redshift, greatly improving scalability and accessibility.
•Partnered with data scientists to embed ML models into production workflows, leading to stronger predictive performance.
•Built reusable data models and transformation logic to support dashboards and operational reporting
•Implemented monitoring and observability with ELK and Prometheus, reducing downtime and increasing system stability.
•Collaborated with cross-functional stakeholders to deliver tailored data solutions that directly supported business priorities.
•Automated ETL, workflows, cutting down manual interventions and increasing operational reliability.
•Worked closely with data scientis algorithm improvements and data pipelines, resulting in e effective models
Projects
Real-Time Fraud Detection Pipeline
•Designed and deployed a real-time fraud detection system using Kafka and Spark Structured Streaming, enabling sub-second anomaly detection across millions of transactions.
•Partnered with ML to integrate models into the streaming pipeline, reducing false positives by improving fraud response times significantly.
Cloud Data Lakehouse Modernization
•Led migration from on-prem Hadoop to a cloud-native Lakehouse on Databricks, Delta Lake, and Snowflake, improving scalability and governance.
•Optimized query performance and storage, achieving a 35% reduction in infrastructure costs while enabling faster analytics for business stakeholders.
Certificates
Google Cloud Professional
Data Engineer
Data Bricks Certified
Data Engineer Professional
SnowFlake
SnowPro Core Certification
Education
Bachelors of Computer Science, Punjab University