Data Engineer Big

Location:

Santa Clara, CA

Salary:

130000

Posted:

October 07, 2025

Contact this candidate

Resume:

Ahsan Chaudhary

Senior Data Engineer Big Data Engineer Data Engineer

****************@*****.*** 215-***-**** Pittsburgh, PA 15232 Dynamic Data Engineering professional with 10 years of experience in designing, building, and optimizing high- performance data platforms that empower analytics and machine learning initiatives. Expertise spans modern cloud ecosystems and distributed data processing, ensuring seamless pipeline automation and real-time decision- making capabilities. Proven ability to lead cross-functional teams in delivering impactful data solutions, driving business growth through strategic collaboration.Committed to integrating security and governance into data architectures while fostering a culture of innovation and continuous improvement. Skills

Data Architecture &

Modeling

Data Modeling (Star/Snowflake

Schema), Dimensional Modeling,

Data Vault, OLAP/OLTP Design,

Data Lakes, Data Mesh, Metadata

Management, ER/Entity Modeling,

Kimball & Inmon Methodologies,

Schema Evolution

ETL/ELT Pipelines

Batch & Streaming Pipelines, Data

Ingestion, Incremental Loads, CDC

(Change Data Capture),

Transformation Frameworks, Real-

Time Data Processing,

Orchestration, Data Validation,

Performance Optimization

Lakehouse & Warehousing

Databricks, Snowflake, BigQuery,

Amazon Redshift, Azure Synapse,

Teradata, Vertica, Greenplum,

DuckDB, Delta Tables, Parquet,

ORC, Athena

Monitoring & Observability

Datadog, Prometheus, Grafana,

ELK Stack (Elasticsearch, Logstash,

Kibana), OpenTelemetry,

CloudWatch, PagerDuty, Alerting,

Logging Pipelines, SLA/SLO

Tracking

Cloud Platforms (AWS, GCP,

Azure)

AWS (Glue, EMR, Redshift,

Lambda, S3, Step Functions),GCP

(BigQuery, Dataflow, Pub/Sub,

Cloud Composer),Azure (Data

Factory, Synapse, Databricks, Event

Hub, Blob Storage), Terraform,

Kubernetes, Docker

Big Data Tools

Apache Spark, PySpark, Flink,

Kafka, Hadoop, Hive, HDFS,

HBase, Presto, Trino, Impala, Delta

Lake, Iceberg, Druid, Kinesis, Beam

Machine Learning

Integration

Feature Engineering, Feature

Stores, Model Deployment, ML-

Ops, ML-flow, Feature Registry,

Data Versioning (DVC), Model

Serving (Sage-Maker, Vertex AI,

Azure ML), Real-Time Scoring

Python for Data Engineering

Pandas, NumPy, PySpark,

SQLAlchemy, FastAPI, Requests,

boto3, Airflow DAGs, ETL Scripts,

Automation, Unit Testing (pytest),

ML Integration (scikit-learn,

TensorFlow)

Leadership & Mentorship

Team Leadership, Technical

Reviews, Agile/Scrum, Stakeholder

Management, Knowledge Sharing,

Code Reviews, Cross-Functional

Collaboration, Hiring &

Onboarding, Project Planning,

Mentorship Programs

SQL & Databases

PostgreSQL, MySQL, SQL Server,

Oracle, Snowflake SQL, Redshift

Spectrum, NoSQL (MongoDB,

Cassandra, DynamoDB, Redis),

Query Optimization, Indexing,

Window Functions, Stored

Procedures

Workflow Orchestration

Apache Airflow, dbt (Core &

Cloud), Luigi, Dagster, Prefect,

Oozie, Argo Workflows, CI/CD

Integration (GitHub Actions,

Jenkins, GitLab CI)

Data Governance

Data Lineage (Collibra, Alation,

Atlan), Data Catalogs, Quality

Checks (Great Expectations, Soda),

Security & Compliance (GDPR,

HIPAA, SOC2), Access Controls

(IAM, RBAC), Data Masking

Professional Experience

Senior Data Engineer, Intellias

•Defined and implemented data architecture across Databricks, Snowflake, and BigQuery, shaping the long-term analytics and ML roadmap 01/2022 – Present

•Designed and optimized Spark, Airflow, and dbt pipelines for faster insights and smoother workflows.

•Improved system efficiency via query tuning and workload balancing; reduced costs and enhanced report reliability.

•Established data governance with lineage tracking validations, and access controls, boosting compliance and confidence.

•Mentored engineers, conducted code reviews, and elevated SQL and data modeling standards.

•Aligned technical roadmaps with business goals in collaboration with product, finance, and executives.

•Led root-cause investigations and fixes for complex system issues as an escalation point.

•Streamlined model deployment with data science teams, accelerating predictive solutions delivery.

Big Data Engineer, Algolia

•Built and managed petabyte scale data platforms using Hadoop, Hive, and Spark to support enterprise reporting and predictive analytics. 03/2018 – 08/2021

•Designed high-throughput streaming pipelines with Kafka, Flume, and Spark Streaming, enabling near real-time operational insights.

•Optimized AWS S3 and HDFS-based data lakes with partitioning, indexing, and compression, improving accessibility and responsiveness for analytics.

•Integrated machine learning workflows with big data systems, supporting large- scale training and production-ready feature engineering

•Collaborated with analytics and product teams to deliver dashboards and reporting systems that informed major strategic decisions.

•Enforced strict security and compliance standards with encryption and fine-grained access controls, reducing operational risk

•Mentored junior engineers on Spark optimization, pipeline best practices, and scalable data design.

•Spearheaded infrastructure efficiency initiatives that streamlined storage and compute usage across the platform.

Data Engineer, Nexora Insights

•Designed and maintained data pipelines to support analytics and ML. initiatives, ensuring reliable ingestion and transformation of business-critical data. 03/2015 – 02/2018

•Developed and managed data warehousing solutions with PostgreSQL. and Redshift, greatly improving scalability and accessibility.

•Partnered with data scientists to embed ML models into production workflows, leading to stronger predictive performance.

•Built reusable data models and transformation logic to support dashboards and operational reporting

•Implemented monitoring and observability with ELK and Prometheus, reducing downtime and increasing system stability.

•Collaborated with cross-functional stakeholders to deliver tailored data solutions that directly supported business priorities.

•Automated ETL, workflows, cutting down manual interventions and increasing operational reliability.

•Worked closely with data scientis algorithm improvements and data pipelines, resulting in e effective models

Projects

Real-Time Fraud Detection Pipeline

•Designed and deployed a real-time fraud detection system using Kafka and Spark Structured Streaming, enabling sub-second anomaly detection across millions of transactions.

•Partnered with ML to integrate models into the streaming pipeline, reducing false positives by improving fraud response times significantly.

Cloud Data Lakehouse Modernization

•Led migration from on-prem Hadoop to a cloud-native Lakehouse on Databricks, Delta Lake, and Snowflake, improving scalability and governance.

•Optimized query performance and storage, achieving a 35% reduction in infrastructure costs while enabling faster analytics for business stakeholders.

Certificates

Google Cloud Professional

Data Engineer

Data Bricks Certified

Data Engineer Professional

SnowFlake

SnowPro Core Certification

Education

Bachelors of Computer Science, Punjab University

Contact this candidate